Cargando…
Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper
Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016220/ https://www.ncbi.nlm.nih.gov/pubmed/33750951 http://dx.doi.org/10.1371/journal.pcbi.1008815 |
_version_ | 1783673812628275200 |
---|---|
author | Richmond, Phillip Andrew Kaye, Alice Mary Kounkou, Godfrain Jacques Av-Shalom, Tamar Vered Wasserman, Wyeth W. |
author_facet | Richmond, Phillip Andrew Kaye, Alice Mary Kounkou, Godfrain Jacques Av-Shalom, Tamar Vered Wasserman, Wyeth W. |
author_sort | Richmond, Phillip Andrew |
collection | PubMed |
description | Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper. |
format | Online Article Text |
id | pubmed-8016220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-80162202021-04-08 Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper Richmond, Phillip Andrew Kaye, Alice Mary Kounkou, Godfrain Jacques Av-Shalom, Tamar Vered Wasserman, Wyeth W. PLoS Comput Biol Research Article Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper. Public Library of Science 2021-03-22 /pmc/articles/PMC8016220/ /pubmed/33750951 http://dx.doi.org/10.1371/journal.pcbi.1008815 Text en © 2021 Richmond et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Richmond, Phillip Andrew Kaye, Alice Mary Kounkou, Godfrain Jacques Av-Shalom, Tamar Vered Wasserman, Wyeth W. Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title | Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title_full | Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title_fullStr | Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title_full_unstemmed | Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title_short | Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper |
title_sort | demonstrating the utility of flexible sequence queries against indexed short reads with flextyper |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016220/ https://www.ncbi.nlm.nih.gov/pubmed/33750951 http://dx.doi.org/10.1371/journal.pcbi.1008815 |
work_keys_str_mv | AT richmondphillipandrew demonstratingtheutilityofflexiblesequencequeriesagainstindexedshortreadswithflextyper AT kayealicemary demonstratingtheutilityofflexiblesequencequeriesagainstindexedshortreadswithflextyper AT kounkougodfrainjacques demonstratingtheutilityofflexiblesequencequeriesagainstindexedshortreadswithflextyper AT avshalomtamarvered demonstratingtheutilityofflexiblesequencequeriesagainstindexedshortreadswithflextyper AT wassermanwyethw demonstratingtheutilityofflexiblesequencequeriesagainstindexedshortreadswithflextyper |