Cargando…
Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Curren...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625192/ https://www.ncbi.nlm.nih.gov/pubmed/23593174 http://dx.doi.org/10.1371/journal.pone.0060204 |
_version_ | 1782266081915699200 |
---|---|
author | Desai, Aarti Marwah, Veer Singh Yadav, Akshay Jha, Vineet Dhaygude, Kishor Bangar, Ujwala Kulkarni, Vivek Jere, Abhay |
author_facet | Desai, Aarti Marwah, Veer Singh Yadav, Akshay Jha, Vineet Dhaygude, Kishor Bangar, Ujwala Kulkarni, Vivek Jere, Abhay |
author_sort | Desai, Aarti |
collection | PubMed |
description | Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. |
format | Online Article Text |
id | pubmed-3625192 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-36251922013-04-16 Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data Desai, Aarti Marwah, Veer Singh Yadav, Akshay Jha, Vineet Dhaygude, Kishor Bangar, Ujwala Kulkarni, Vivek Jere, Abhay PLoS One Research Article Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. Public Library of Science 2013-04-12 /pmc/articles/PMC3625192/ /pubmed/23593174 http://dx.doi.org/10.1371/journal.pone.0060204 Text en © 2013 Desai et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Desai, Aarti Marwah, Veer Singh Yadav, Akshay Jha, Vineet Dhaygude, Kishor Bangar, Ujwala Kulkarni, Vivek Jere, Abhay Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title | Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title_full | Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title_fullStr | Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title_full_unstemmed | Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title_short | Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data |
title_sort | identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625192/ https://www.ncbi.nlm.nih.gov/pubmed/23593174 http://dx.doi.org/10.1371/journal.pone.0060204 |
work_keys_str_mv | AT desaiaarti identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT marwahveersingh identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT yadavakshay identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT jhavineet identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT dhaygudekishor identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT bangarujwala identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT kulkarnivivek identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata AT jereabhay identificationofoptimumsequencingdepthespeciallyfordenovogenomeassemblyofsmallgenomesusingnextgenerationsequencingdata |