Cargando…

DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies

BACKGROUND: Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrie...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangavittal, Samarth, Stopa, Natasha, Tomaszkiewicz, Marta, Sahlin, Kristoffer, Makova, Kateryna D., Medvedev, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6688218/
https://www.ncbi.nlm.nih.gov/pubmed/31399045
http://dx.doi.org/10.1186/s12864-019-5996-3
_version_ 1783442840287707136
author Rangavittal, Samarth
Stopa, Natasha
Tomaszkiewicz, Marta
Sahlin, Kristoffer
Makova, Kateryna D.
Medvedev, Paul
author_facet Rangavittal, Samarth
Stopa, Natasha
Tomaszkiewicz, Marta
Sahlin, Kristoffer
Makova, Kateryna D.
Medvedev, Paul
author_sort Rangavittal, Samarth
collection PubMed
description BACKGROUND: Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references. RESULTS: We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female. CONCLUSION: DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5996-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6688218
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66882182019-08-14 DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies Rangavittal, Samarth Stopa, Natasha Tomaszkiewicz, Marta Sahlin, Kristoffer Makova, Kateryna D. Medvedev, Paul BMC Genomics Methodology Article BACKGROUND: Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references. RESULTS: We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female. CONCLUSION: DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5996-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-09 /pmc/articles/PMC6688218/ /pubmed/31399045 http://dx.doi.org/10.1186/s12864-019-5996-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Rangavittal, Samarth
Stopa, Natasha
Tomaszkiewicz, Marta
Sahlin, Kristoffer
Makova, Kateryna D.
Medvedev, Paul
DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title_full DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title_fullStr DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title_full_unstemmed DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title_short DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies
title_sort discovery: a classifier for identifying y chromosome sequences in male assemblies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6688218/
https://www.ncbi.nlm.nih.gov/pubmed/31399045
http://dx.doi.org/10.1186/s12864-019-5996-3
work_keys_str_mv AT rangavittalsamarth discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies
AT stopanatasha discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies
AT tomaszkiewiczmarta discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies
AT sahlinkristoffer discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies
AT makovakaterynad discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies
AT medvedevpaul discoveryaclassifierforidentifyingychromosomesequencesinmaleassemblies