Cargando…

Regional sequence expansion or collapse in heterozygous genome assemblies

High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algo...

Descripción completa

Detalles Bibliográficos
Autores principales: Asalone, Kathryn C., Ryan, Kara M., Yamadi, Maryam, Cohen, Annastelle L., Farmer, William G., George, Deborah J., Joppert, Claudia, Kim, Kaitlyn, Mughal, Madeeha Froze, Said, Rana, Toksoz-Exley, Metin, Bisk, Evgeny, Bracht, John R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423139/
https://www.ncbi.nlm.nih.gov/pubmed/32735589
http://dx.doi.org/10.1371/journal.pcbi.1008104
_version_ 1783570124383453184
author Asalone, Kathryn C.
Ryan, Kara M.
Yamadi, Maryam
Cohen, Annastelle L.
Farmer, William G.
George, Deborah J.
Joppert, Claudia
Kim, Kaitlyn
Mughal, Madeeha Froze
Said, Rana
Toksoz-Exley, Metin
Bisk, Evgeny
Bracht, John R.
author_facet Asalone, Kathryn C.
Ryan, Kara M.
Yamadi, Maryam
Cohen, Annastelle L.
Farmer, William G.
George, Deborah J.
Joppert, Claudia
Kim, Kaitlyn
Mughal, Madeeha Froze
Said, Rana
Toksoz-Exley, Metin
Bisk, Evgeny
Bracht, John R.
author_sort Asalone, Kathryn C.
collection PubMed
description High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.
format Online
Article
Text
id pubmed-7423139
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74231392020-08-20 Regional sequence expansion or collapse in heterozygous genome assemblies Asalone, Kathryn C. Ryan, Kara M. Yamadi, Maryam Cohen, Annastelle L. Farmer, William G. George, Deborah J. Joppert, Claudia Kim, Kaitlyn Mughal, Madeeha Froze Said, Rana Toksoz-Exley, Metin Bisk, Evgeny Bracht, John R. PLoS Comput Biol Research Article High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. Public Library of Science 2020-07-31 /pmc/articles/PMC7423139/ /pubmed/32735589 http://dx.doi.org/10.1371/journal.pcbi.1008104 Text en © 2020 Asalone et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Asalone, Kathryn C.
Ryan, Kara M.
Yamadi, Maryam
Cohen, Annastelle L.
Farmer, William G.
George, Deborah J.
Joppert, Claudia
Kim, Kaitlyn
Mughal, Madeeha Froze
Said, Rana
Toksoz-Exley, Metin
Bisk, Evgeny
Bracht, John R.
Regional sequence expansion or collapse in heterozygous genome assemblies
title Regional sequence expansion or collapse in heterozygous genome assemblies
title_full Regional sequence expansion or collapse in heterozygous genome assemblies
title_fullStr Regional sequence expansion or collapse in heterozygous genome assemblies
title_full_unstemmed Regional sequence expansion or collapse in heterozygous genome assemblies
title_short Regional sequence expansion or collapse in heterozygous genome assemblies
title_sort regional sequence expansion or collapse in heterozygous genome assemblies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423139/
https://www.ncbi.nlm.nih.gov/pubmed/32735589
http://dx.doi.org/10.1371/journal.pcbi.1008104
work_keys_str_mv AT asalonekathrync regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT ryankaram regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT yamadimaryam regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT cohenannastellel regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT farmerwilliamg regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT georgedeborahj regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT joppertclaudia regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT kimkaitlyn regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT mughalmadeehafroze regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT saidrana regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT toksozexleymetin regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT biskevgeny regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT brachtjohnr regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies