Cargando…
Regional sequence expansion or collapse in heterozygous genome assemblies
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algo...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423139/ https://www.ncbi.nlm.nih.gov/pubmed/32735589 http://dx.doi.org/10.1371/journal.pcbi.1008104 |
_version_ | 1783570124383453184 |
---|---|
author | Asalone, Kathryn C. Ryan, Kara M. Yamadi, Maryam Cohen, Annastelle L. Farmer, William G. George, Deborah J. Joppert, Claudia Kim, Kaitlyn Mughal, Madeeha Froze Said, Rana Toksoz-Exley, Metin Bisk, Evgeny Bracht, John R. |
author_facet | Asalone, Kathryn C. Ryan, Kara M. Yamadi, Maryam Cohen, Annastelle L. Farmer, William G. George, Deborah J. Joppert, Claudia Kim, Kaitlyn Mughal, Madeeha Froze Said, Rana Toksoz-Exley, Metin Bisk, Evgeny Bracht, John R. |
author_sort | Asalone, Kathryn C. |
collection | PubMed |
description | High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. |
format | Online Article Text |
id | pubmed-7423139 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-74231392020-08-20 Regional sequence expansion or collapse in heterozygous genome assemblies Asalone, Kathryn C. Ryan, Kara M. Yamadi, Maryam Cohen, Annastelle L. Farmer, William G. George, Deborah J. Joppert, Claudia Kim, Kaitlyn Mughal, Madeeha Froze Said, Rana Toksoz-Exley, Metin Bisk, Evgeny Bracht, John R. PLoS Comput Biol Research Article High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. Public Library of Science 2020-07-31 /pmc/articles/PMC7423139/ /pubmed/32735589 http://dx.doi.org/10.1371/journal.pcbi.1008104 Text en © 2020 Asalone et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Asalone, Kathryn C. Ryan, Kara M. Yamadi, Maryam Cohen, Annastelle L. Farmer, William G. George, Deborah J. Joppert, Claudia Kim, Kaitlyn Mughal, Madeeha Froze Said, Rana Toksoz-Exley, Metin Bisk, Evgeny Bracht, John R. Regional sequence expansion or collapse in heterozygous genome assemblies |
title | Regional sequence expansion or collapse in heterozygous genome assemblies |
title_full | Regional sequence expansion or collapse in heterozygous genome assemblies |
title_fullStr | Regional sequence expansion or collapse in heterozygous genome assemblies |
title_full_unstemmed | Regional sequence expansion or collapse in heterozygous genome assemblies |
title_short | Regional sequence expansion or collapse in heterozygous genome assemblies |
title_sort | regional sequence expansion or collapse in heterozygous genome assemblies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423139/ https://www.ncbi.nlm.nih.gov/pubmed/32735589 http://dx.doi.org/10.1371/journal.pcbi.1008104 |
work_keys_str_mv | AT asalonekathrync regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT ryankaram regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT yamadimaryam regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT cohenannastellel regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT farmerwilliamg regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT georgedeborahj regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT joppertclaudia regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT kimkaitlyn regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT mughalmadeehafroze regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT saidrana regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT toksozexleymetin regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT biskevgeny regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT brachtjohnr regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies |