Cargando…

Exploring the unmapped DNA and RNA reads in a songbird genome

BACKGROUND: A widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. Despite methodological and hardware improvements which have enhanced the efficiency and accuracy of alignments, a significant percentage of reads frequently remain unmapped. Usu...

Descripción completa

Detalles Bibliográficos
Autores principales: Laine, Veronika N., Gossmann, Toni I., van Oers, Kees, Visser, Marcel E., Groenen, Martien A. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323668/
https://www.ncbi.nlm.nih.gov/pubmed/30621573
http://dx.doi.org/10.1186/s12864-018-5378-2
_version_ 1783385809453318144
author Laine, Veronika N.
Gossmann, Toni I.
van Oers, Kees
Visser, Marcel E.
Groenen, Martien A. M.
author_facet Laine, Veronika N.
Gossmann, Toni I.
van Oers, Kees
Visser, Marcel E.
Groenen, Martien A. M.
author_sort Laine, Veronika N.
collection PubMed
description BACKGROUND: A widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. Despite methodological and hardware improvements which have enhanced the efficiency and accuracy of alignments, a significant percentage of reads frequently remain unmapped. Usually, unmapped reads are discarded from the analysis process, but significant biological information and insights can be uncovered from these data. We explored the unmapped DNA (normal and bisulfite treated) and RNA sequence reads of the great tit (Parus major) reference genome individual. From the unmapped reads we generated de novo assemblies, after which the generated sequence contigs were aligned to the NCBI non-redundant nucleotide database using BLAST, identifying the closest known matching sequence. RESULTS: Many of the aligned contigs showed sequence similarity to different bird species and genes that were absent in the great tit reference assembly. Furthermore, there were also contigs that represented known P. major pathogenic species. Most interesting were several species of blood parasites such as Plasmodium and Trypanosoma. CONCLUSIONS: Our analyses revealed that meaningful biological information can be found when further exploring unmapped reads. For instance, it is possible to discover sequences that are either absent or misassembled in the reference genome, and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5378-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6323668
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63236682019-01-10 Exploring the unmapped DNA and RNA reads in a songbird genome Laine, Veronika N. Gossmann, Toni I. van Oers, Kees Visser, Marcel E. Groenen, Martien A. M. BMC Genomics Research Article BACKGROUND: A widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. Despite methodological and hardware improvements which have enhanced the efficiency and accuracy of alignments, a significant percentage of reads frequently remain unmapped. Usually, unmapped reads are discarded from the analysis process, but significant biological information and insights can be uncovered from these data. We explored the unmapped DNA (normal and bisulfite treated) and RNA sequence reads of the great tit (Parus major) reference genome individual. From the unmapped reads we generated de novo assemblies, after which the generated sequence contigs were aligned to the NCBI non-redundant nucleotide database using BLAST, identifying the closest known matching sequence. RESULTS: Many of the aligned contigs showed sequence similarity to different bird species and genes that were absent in the great tit reference assembly. Furthermore, there were also contigs that represented known P. major pathogenic species. Most interesting were several species of blood parasites such as Plasmodium and Trypanosoma. CONCLUSIONS: Our analyses revealed that meaningful biological information can be found when further exploring unmapped reads. For instance, it is possible to discover sequences that are either absent or misassembled in the reference genome, and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5378-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-08 /pmc/articles/PMC6323668/ /pubmed/30621573 http://dx.doi.org/10.1186/s12864-018-5378-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Laine, Veronika N.
Gossmann, Toni I.
van Oers, Kees
Visser, Marcel E.
Groenen, Martien A. M.
Exploring the unmapped DNA and RNA reads in a songbird genome
title Exploring the unmapped DNA and RNA reads in a songbird genome
title_full Exploring the unmapped DNA and RNA reads in a songbird genome
title_fullStr Exploring the unmapped DNA and RNA reads in a songbird genome
title_full_unstemmed Exploring the unmapped DNA and RNA reads in a songbird genome
title_short Exploring the unmapped DNA and RNA reads in a songbird genome
title_sort exploring the unmapped dna and rna reads in a songbird genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323668/
https://www.ncbi.nlm.nih.gov/pubmed/30621573
http://dx.doi.org/10.1186/s12864-018-5378-2
work_keys_str_mv AT laineveronikan exploringtheunmappeddnaandrnareadsinasongbirdgenome
AT gossmanntonii exploringtheunmappeddnaandrnareadsinasongbirdgenome
AT vanoerskees exploringtheunmappeddnaandrnareadsinasongbirdgenome
AT vissermarcele exploringtheunmappeddnaandrnareadsinasongbirdgenome
AT groenenmartienam exploringtheunmappeddnaandrnareadsinasongbirdgenome