Cargando…

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

BACKGROUND: With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more...

Descripción completa

Detalles Bibliográficos
Autores principales: Shangguan, Lingfei, Han, Jian, Kayesh, Emrul, Sun, Xin, Zhang, Changqing, Pervaiz, Tariq, Wen, Xicheng, Fang, Jinggui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3726750/
https://www.ncbi.nlm.nih.gov/pubmed/23922843
http://dx.doi.org/10.1371/journal.pone.0069890
_version_ 1782278703148957696
author Shangguan, Lingfei
Han, Jian
Kayesh, Emrul
Sun, Xin
Zhang, Changqing
Pervaiz, Tariq
Wen, Xicheng
Fang, Jinggui
author_facet Shangguan, Lingfei
Han, Jian
Kayesh, Emrul
Sun, Xin
Zhang, Changqing
Pervaiz, Tariq
Wen, Xicheng
Fang, Jinggui
author_sort Shangguan, Lingfei
collection PubMed
description BACKGROUND: With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. METHODOLOGY/PRINCIPAL FINDING: Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. CONCLUSION: The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.
format Online
Article
Text
id pubmed-3726750
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37267502013-08-06 Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags Shangguan, Lingfei Han, Jian Kayesh, Emrul Sun, Xin Zhang, Changqing Pervaiz, Tariq Wen, Xicheng Fang, Jinggui PLoS One Research Article BACKGROUND: With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. METHODOLOGY/PRINCIPAL FINDING: Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. CONCLUSION: The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. Public Library of Science 2013-07-29 /pmc/articles/PMC3726750/ /pubmed/23922843 http://dx.doi.org/10.1371/journal.pone.0069890 Text en © 2013 Shangguan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Shangguan, Lingfei
Han, Jian
Kayesh, Emrul
Sun, Xin
Zhang, Changqing
Pervaiz, Tariq
Wen, Xicheng
Fang, Jinggui
Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title_full Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title_fullStr Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title_full_unstemmed Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title_short Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags
title_sort evaluation of genome sequencing quality in selected plant species using expressed sequence tags
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3726750/
https://www.ncbi.nlm.nih.gov/pubmed/23922843
http://dx.doi.org/10.1371/journal.pone.0069890
work_keys_str_mv AT shangguanlingfei evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT hanjian evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT kayeshemrul evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT sunxin evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT zhangchangqing evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT pervaiztariq evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT wenxicheng evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags
AT fangjinggui evaluationofgenomesequencingqualityinselectedplantspeciesusingexpressedsequencetags