Cargando…
What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain un...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696311/ https://www.ncbi.nlm.nih.gov/pubmed/26714747 http://dx.doi.org/10.1186/s12864-015-2313-7 |
_version_ | 1782407774364237824 |
---|---|
author | Whitacre, Lynsey K. Tizioto, Polyana C. Kim, JaeWoo Sonstegard, Tad S. Schroeder, Steven G. Alexander, Leeson J. Medrano, Juan F. Schnabel, Robert D. Taylor, Jeremy F. Decker, Jared E. |
author_facet | Whitacre, Lynsey K. Tizioto, Polyana C. Kim, JaeWoo Sonstegard, Tad S. Schroeder, Steven G. Alexander, Leeson J. Medrano, Juan F. Schnabel, Robert D. Taylor, Jeremy F. Decker, Jared E. |
author_sort | Whitacre, Lynsey K. |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. RESULTS: We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. CONCLUSIONS: We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2313-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4696311 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46963112015-12-31 What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual Whitacre, Lynsey K. Tizioto, Polyana C. Kim, JaeWoo Sonstegard, Tad S. Schroeder, Steven G. Alexander, Leeson J. Medrano, Juan F. Schnabel, Robert D. Taylor, Jeremy F. Decker, Jared E. BMC Genomics Research Article BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. RESULTS: We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. CONCLUSIONS: We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2313-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-29 /pmc/articles/PMC4696311/ /pubmed/26714747 http://dx.doi.org/10.1186/s12864-015-2313-7 Text en © Whitacre et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Whitacre, Lynsey K. Tizioto, Polyana C. Kim, JaeWoo Sonstegard, Tad S. Schroeder, Steven G. Alexander, Leeson J. Medrano, Juan F. Schnabel, Robert D. Taylor, Jeremy F. Decker, Jared E. What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title | What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title_full | What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title_fullStr | What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title_full_unstemmed | What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title_short | What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual |
title_sort | what’s in your next-generation sequence data? an exploration of unmapped dna and rna sequence reads from the bovine reference individual |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4696311/ https://www.ncbi.nlm.nih.gov/pubmed/26714747 http://dx.doi.org/10.1186/s12864-015-2313-7 |
work_keys_str_mv | AT whitacrelynseyk whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT tiziotopolyanac whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT kimjaewoo whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT sonstegardtads whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT schroedersteveng whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT alexanderleesonj whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT medranojuanf whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT schnabelrobertd whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT taylorjeremyf whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual AT deckerjarede whatsinyournextgenerationsequencedataanexplorationofunmappeddnaandrnasequencereadsfromthebovinereferenceindividual |