Cargando…

Identification and analysis of gene families from the duplicated genome of soybean using EST sequences

BACKGROUND: Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nelson, Rex T, Shoemaker, Randy
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557498/ https://www.ncbi.nlm.nih.gov/pubmed/16899135 http://dx.doi.org/10.1186/1471-2164-7-204

_version_	1782129378783657984
author	Nelson, Rex T Shoemaker, Randy
author_facet	Nelson, Rex T Shoemaker, Randy
author_sort	Nelson, Rex T
collection	PubMed
description	BACKGROUND: Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced. However it does have the 6th largest EST collection comprised of ESTs from a variety of soybean genotypes. Many EST libraries were constructed from RNA extracted from various genetic backgrounds, thus gene identification from these sources is complicated by the existence of both gene and allele sequence differences. We used the ESTminer suite of programs to identify potential soybean gene transcripts from a single genetic background allowing us to observe functional classifications between gene families as well as structural differences between genes and gene paralogs within families. The identification of potential gene sequences (pHaps) from soybean allows us to begin to get a picture of the genomic history of the organism as well as begin to observe the evolutionary fates of gene copies in this highly duplicated genome. RESULTS: We identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicate that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicate functional classifications are not randomly distributed among gene families containing few or many genes. CONCLUSION: The predominance of single nucleotide insertion/deletions and substitution events between genes within families (individual genes and gene paralogs) is consistent with a model of gene amplification followed by single base random mutational events expected under the classical model of duplicated gene evolution. Molecular functions of small and large gene families appear to be non-randomly distributed possibly indicating a difference in retention of duplicates or local expansion.
format	Text
id	pubmed-1557498
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15574982006-08-30 Identification and analysis of gene families from the duplicated genome of soybean using EST sequences Nelson, Rex T Shoemaker, Randy BMC Genomics Research Article BACKGROUND: Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced. However it does have the 6th largest EST collection comprised of ESTs from a variety of soybean genotypes. Many EST libraries were constructed from RNA extracted from various genetic backgrounds, thus gene identification from these sources is complicated by the existence of both gene and allele sequence differences. We used the ESTminer suite of programs to identify potential soybean gene transcripts from a single genetic background allowing us to observe functional classifications between gene families as well as structural differences between genes and gene paralogs within families. The identification of potential gene sequences (pHaps) from soybean allows us to begin to get a picture of the genomic history of the organism as well as begin to observe the evolutionary fates of gene copies in this highly duplicated genome. RESULTS: We identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicate that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicate functional classifications are not randomly distributed among gene families containing few or many genes. CONCLUSION: The predominance of single nucleotide insertion/deletions and substitution events between genes within families (individual genes and gene paralogs) is consistent with a model of gene amplification followed by single base random mutational events expected under the classical model of duplicated gene evolution. Molecular functions of small and large gene families appear to be non-randomly distributed possibly indicating a difference in retention of duplicates or local expansion. BioMed Central 2006-08-09 /pmc/articles/PMC1557498/ /pubmed/16899135 http://dx.doi.org/10.1186/1471-2164-7-204 Text en Copyright © 2006 Nelson and Shoemaker; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Nelson, Rex T Shoemaker, Randy Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title	Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title_full	Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title_fullStr	Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title_full_unstemmed	Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title_short	Identification and analysis of gene families from the duplicated genome of soybean using EST sequences
title_sort	identification and analysis of gene families from the duplicated genome of soybean using est sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557498/ https://www.ncbi.nlm.nih.gov/pubmed/16899135 http://dx.doi.org/10.1186/1471-2164-7-204
work_keys_str_mv	AT nelsonrext identificationandanalysisofgenefamiliesfromtheduplicatedgenomeofsoybeanusingestsequences AT shoemakerrandy identificationandanalysisofgenefamiliesfromtheduplicatedgenomeofsoybeanusingestsequences

Identification and analysis of gene families from the duplicated genome of soybean using EST sequences

Ejemplares similares