Cargando…

Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs

We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to g...

Descripción completa

Detalles Bibliográficos
Autores principales: Amano, Naoki, Tanaka, Tsuyoshi, Numa, Hisataka, Sakai, Hiroaki, Itoh, Takeshi
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955710/
https://www.ncbi.nlm.nih.gov/pubmed/20668003
http://dx.doi.org/10.1093/dnares/dsq017
_version_ 1782188075110105088
author Amano, Naoki
Tanaka, Tsuyoshi
Numa, Hisataka
Sakai, Hiroaki
Itoh, Takeshi
author_facet Amano, Naoki
Tanaka, Tsuyoshi
Numa, Hisataka
Sakai, Hiroaki
Itoh, Takeshi
author_sort Amano, Naoki
collection PubMed
description We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes.
format Text
id pubmed-2955710
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29557102010-10-18 Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs Amano, Naoki Tanaka, Tsuyoshi Numa, Hisataka Sakai, Hiroaki Itoh, Takeshi DNA Res Full Papers We present an annotation pipeline that accurately predicts exon–intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5′- and 3′-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ∼80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes. Oxford University Press 2010-10 2010-07-28 /pmc/articles/PMC2955710/ /pubmed/20668003 http://dx.doi.org/10.1093/dnares/dsq017 Text en © The Author 2010. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by-nc/2.5/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Full Papers
Amano, Naoki
Tanaka, Tsuyoshi
Numa, Hisataka
Sakai, Hiroaki
Itoh, Takeshi
Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title_full Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title_fullStr Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title_full_unstemmed Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title_short Efficient Plant Gene Identification Based on Interspecies Mapping of Full-Length cDNAs
title_sort efficient plant gene identification based on interspecies mapping of full-length cdnas
topic Full Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955710/
https://www.ncbi.nlm.nih.gov/pubmed/20668003
http://dx.doi.org/10.1093/dnares/dsq017
work_keys_str_mv AT amanonaoki efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas
AT tanakatsuyoshi efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas
AT numahisataka efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas
AT sakaihiroaki efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas
AT itohtakeshi efficientplantgeneidentificationbasedoninterspeciesmappingoffulllengthcdnas