Exploiting a number of bioinformatics tools, we have mapped publicly available legume nodule protein sequences (nodulins) on the Arabidopsis genome. Some 760 nodulins [ Table of the nodulins, large file] have been searched against Arabidopsis predicted ORFs, using the FASTA program. The resulting set of legume protein sequences and homologous Arabidopsis protein sequences have been grouped into clusters, using the PhyloGrapher program.
We have identified 134 clusters of nodulins, with a significant level of homology within each cluster (e < 0.0001; identity > 20%). For 29 of these clusters, we have not found Arabidopsis homologues. Interestingly, only 4 of these clusters contained nodulin members from different legume genera. Those proteins (clusters) without significant homology to Arabidopsis proteins are excellent candidates for investigating the molecular basis of nodulation ability, as it has arisen in leguminous genera [ Group H graph ]. Every gene is web-linked to the corresponding GenBank file.
Arabidopsis homologues for 105 clusters of legume nodulins have been drawn on Arabidopsis chromosomes using the GenomePixelizer program. To obtain a visually readable picture of nodulin homologues on Arabidopsis chromosomes, the total set of about 1250 Arabidopsis genes [ Table of the Arabidopsis genes, large file ] have been divided into 7 separate groups [ Groups A, B, C, D, E, F, G ]. Each group represents Arabidopsis homologues of 13 to 15 clusters of nodulins. The Arabidopsis genes are web-linked to their legume homologues in GenBank, and to their protein sequences in the MIPS database.