Cargando…

Visualization and probability-based scoring of structural variants within repetitive sequences

Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating...

Descripción completa

Detalles Bibliográficos
Autores principales: Halper-Stromberg, Eitan, Steranka, Jared, Burns, Kathleen H., Sabunciyan, Sarven, Irizarry, Rafael A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029030/
https://www.ncbi.nlm.nih.gov/pubmed/24501098
http://dx.doi.org/10.1093/bioinformatics/btu054
_version_ 1782317150784978944
author Halper-Stromberg, Eitan
Steranka, Jared
Burns, Kathleen H.
Sabunciyan, Sarven
Irizarry, Rafael A.
author_facet Halper-Stromberg, Eitan
Steranka, Jared
Burns, Kathleen H.
Sabunciyan, Sarven
Irizarry, Rafael A.
author_sort Halper-Stromberg, Eitan
collection PubMed
description Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line. Availability: We implement our method as an R package available at https://github.com/Eitan177/targetSeqView. Code to reproduce the figures and results are also available. Contact: ehalper2@jhmi.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4029030
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40290302014-05-21 Visualization and probability-based scoring of structural variants within repetitive sequences Halper-Stromberg, Eitan Steranka, Jared Burns, Kathleen H. Sabunciyan, Sarven Irizarry, Rafael A. Bioinformatics Original Papers Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line. Availability: We implement our method as an R package available at https://github.com/Eitan177/targetSeqView. Code to reproduce the figures and results are also available. Contact: ehalper2@jhmi.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-06-01 2014-02-04 /pmc/articles/PMC4029030/ /pubmed/24501098 http://dx.doi.org/10.1093/bioinformatics/btu054 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Halper-Stromberg, Eitan
Steranka, Jared
Burns, Kathleen H.
Sabunciyan, Sarven
Irizarry, Rafael A.
Visualization and probability-based scoring of structural variants within repetitive sequences
title Visualization and probability-based scoring of structural variants within repetitive sequences
title_full Visualization and probability-based scoring of structural variants within repetitive sequences
title_fullStr Visualization and probability-based scoring of structural variants within repetitive sequences
title_full_unstemmed Visualization and probability-based scoring of structural variants within repetitive sequences
title_short Visualization and probability-based scoring of structural variants within repetitive sequences
title_sort visualization and probability-based scoring of structural variants within repetitive sequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029030/
https://www.ncbi.nlm.nih.gov/pubmed/24501098
http://dx.doi.org/10.1093/bioinformatics/btu054
work_keys_str_mv AT halperstrombergeitan visualizationandprobabilitybasedscoringofstructuralvariantswithinrepetitivesequences
AT sterankajared visualizationandprobabilitybasedscoringofstructuralvariantswithinrepetitivesequences
AT burnskathleenh visualizationandprobabilitybasedscoringofstructuralvariantswithinrepetitivesequences
AT sabunciyansarven visualizationandprobabilitybasedscoringofstructuralvariantswithinrepetitivesequences
AT irizarryrafaela visualizationandprobabilitybasedscoringofstructuralvariantswithinrepetitivesequences