Cargando…
Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to g...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955710/ https://www.ncbi.nlm.nih.gov/pubmed/20668003 http://dx.doi.org/10.1093/dnares/dsq017 |
_version_ | 1782188075110105088 |
---|---|
author | Amano, Naoki Tanaka, Tsuyoshi Numa, Hisataka Sakai, Hiroaki Itoh, Takeshi |
author_facet | Amano, Naoki Tanaka, Tsuyoshi Numa, Hisataka Sakai, Hiroaki Itoh, Takeshi |
author_sort | Amano, Naoki |
collection | PubMed |
description | We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes. |
format | Text |
id | pubmed-2955710 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-29557102010-10-18 Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs Amano, Naoki Tanaka, Tsuyoshi Numa, Hisataka Sakai, Hiroaki Itoh, Takeshi DNA Res Full Papers We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes. Oxford University Press 2010-10 2010-07-28 /pmc/articles/PMC2955710/ /pubmed/20668003 http://dx.doi.org/10.1093/dnares/dsq017 Text en © The Author 2010. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by-nc/2.5/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Full Papers Amano, Naoki Tanaka, Tsuyoshi Numa, Hisataka Sakai, Hiroaki Itoh, Takeshi Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title | Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title_full | Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title_fullStr | Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title_full_unstemmed | Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title_short | Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs |
title_sort | efficient plant gene identification based on interspecies mapping of full-length cdnas |
topic | Full Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955710/ https://www.ncbi.nlm.nih.gov/pubmed/20668003 http://dx.doi.org/10.1093/dnares/dsq017 |
work_keys_str_mv | AT amanonaoki efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas AT tanakatsuyoshi efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas AT numahisataka efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas AT sakaihiroaki efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas AT itohtakeshi efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas |