Cargando…

What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain un...

Descripción completa

Detalles Bibliográficos
Autores principales: Whitacre, Lynsey K., Tizioto, Polyana C., Kim, JaeWoo, Sonstegard, Tad S., Schroeder, Steven G., Alexander, Leeson J., Medrano, Juan F., Schnabel, Robert D., Taylor, Jeremy F., Decker, Jared E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696311/
https://www.ncbi.nlm.nih.gov/pubmed/26714747
http://dx.doi.org/10.1186/s12864-015-2313-7
_version_ 1782407774364237824
author Whitacre, Lynsey K.
Tizioto, Polyana C.
Kim, JaeWoo
Sonstegard, Tad S.
Schroeder, Steven G.
Alexander, Leeson J.
Medrano, Juan F.
Schnabel, Robert D.
Taylor, Jeremy F.
Decker, Jared E.
author_facet Whitacre, Lynsey K.
Tizioto, Polyana C.
Kim, JaeWoo
Sonstegard, Tad S.
Schroeder, Steven G.
Alexander, Leeson J.
Medrano, Juan F.
Schnabel, Robert D.
Taylor, Jeremy F.
Decker, Jared E.
author_sort Whitacre, Lynsey K.
collection PubMed
description BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. RESULTS: We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. CONCLUSIONS: We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2313-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4696311
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46963112015-12-31 What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual Whitacre, Lynsey K. Tizioto, Polyana C. Kim, JaeWoo Sonstegard, Tad S. Schroeder, Steven G. Alexander, Leeson J. Medrano, Juan F. Schnabel, Robert D. Taylor, Jeremy F. Decker, Jared E. BMC Genomics Research Article BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. RESULTS: We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. CONCLUSIONS: We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2313-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-29 /pmc/articles/PMC4696311/ /pubmed/26714747 http://dx.doi.org/10.1186/s12864-015-2313-7 Text en © Whitacre et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Whitacre, Lynsey K.
Tizioto, Polyana C.
Kim, JaeWoo
Sonstegard, Tad S.
Schroeder, Steven G.
Alexander, Leeson J.
Medrano, Juan F.
Schnabel, Robert D.
Taylor, Jeremy F.
Decker, Jared E.
What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title_full What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title_fullStr What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title_full_unstemmed What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title_short What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
title_sort what’s in your next-generation sequence data? an exploration of unmapped dna and rna sequence reads from the bovine reference individual
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696311/
https://www.ncbi.nlm.nih.gov/pubmed/26714747
http://dx.doi.org/10.1186/s12864-015-2313-7
work_keys_str_mv AT whitacrelynseyk whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT tiziotopolyanac whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT kimjaewoo whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT sonstegardtads whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT schroedersteveng whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT alexanderleesonj whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT medranojuanf whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT schnabelrobertd whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT taylorjeremyf whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual
AT deckerjarede whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual