Cargando…

Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data

BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chrom...

Descripción completa

Detalles Bibliográficos
Autores principales: Webster, Timothy H, Couse, Madeline, Grande, Bruno M, Karlins, Eric, Phung, Tanya N, Richmond, Phillip A, Whitford, Whitney, Wilson, Melissa A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615978/
https://www.ncbi.nlm.nih.gov/pubmed/31289836
http://dx.doi.org/10.1093/gigascience/giz074
_version_ 1783433427435913216
author Webster, Timothy H
Couse, Madeline
Grande, Bruno M
Karlins, Eric
Phung, Tanya N
Richmond, Phillip A
Whitford, Whitney
Wilson, Melissa A
author_facet Webster, Timothy H
Couse, Madeline
Grande, Bruno M
Karlins, Eric
Phung, Tanya N
Richmond, Phillip A
Whitford, Whitney
Wilson, Melissa A
author_sort Webster, Timothy H
collection PubMed
description BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. RESULTS: Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). CONCLUSIONS: Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.
format Online
Article
Text
id pubmed-6615978
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66159782019-07-15 Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data Webster, Timothy H Couse, Madeline Grande, Bruno M Karlins, Eric Phung, Tanya N Richmond, Phillip A Whitford, Whitney Wilson, Melissa A Gigascience Technical Note BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. RESULTS: Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). CONCLUSIONS: Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes. Oxford University Press 2019-07-09 /pmc/articles/PMC6615978/ /pubmed/31289836 http://dx.doi.org/10.1093/gigascience/giz074 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Webster, Timothy H
Couse, Madeline
Grande, Bruno M
Karlins, Eric
Phung, Tanya N
Richmond, Phillip A
Whitford, Whitney
Wilson, Melissa A
Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title_full Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title_fullStr Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title_full_unstemmed Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title_short Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
title_sort identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615978/
https://www.ncbi.nlm.nih.gov/pubmed/31289836
http://dx.doi.org/10.1093/gigascience/giz074
work_keys_str_mv AT webstertimothyh identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT cousemadeline identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT grandebrunom identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT karlinseric identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT phungtanyan identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT richmondphillipa identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT whitfordwhitney identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata
AT wilsonmelissaa identifyingunderstandingandcorrectingtechnicalartifactsonthesexchromosomesinnextgenerationsequencingdata