Cargando…
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/ https://www.ncbi.nlm.nih.gov/pubmed/28100699 http://dx.doi.org/10.1093/nar/gkx006 |
_version_ | 1782521305665372160 |
---|---|
author | Zhu, Yafeng Engström, Pär G. Tellgren-Roth, Christian Baudo, Charles D. Kennell, John C. Sun, Sheng Billmyre, R. Blake Schröder, Markus S. Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Thomas L. Heitman, Joseph Scheynius, Annika Lehtiö, Janne |
author_facet | Zhu, Yafeng Engström, Pär G. Tellgren-Roth, Christian Baudo, Charles D. Kennell, John C. Sun, Sheng Billmyre, R. Blake Schröder, Markus S. Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Thomas L. Heitman, Joseph Scheynius, Annika Lehtiö, Janne |
author_sort | Zhu, Yafeng |
collection | PubMed |
description | Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. |
format | Online Article Text |
id | pubmed-5389616 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-53896162017-04-24 Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis Zhu, Yafeng Engström, Pär G. Tellgren-Roth, Christian Baudo, Charles D. Kennell, John C. Sun, Sheng Billmyre, R. Blake Schröder, Markus S. Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Thomas L. Heitman, Joseph Scheynius, Annika Lehtiö, Janne Nucleic Acids Res Genomics Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. Oxford University Press 2017-03-17 2017-01-18 /pmc/articles/PMC5389616/ /pubmed/28100699 http://dx.doi.org/10.1093/nar/gkx006 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Genomics Zhu, Yafeng Engström, Pär G. Tellgren-Roth, Christian Baudo, Charles D. Kennell, John C. Sun, Sheng Billmyre, R. Blake Schröder, Markus S. Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Thomas L. Heitman, Joseph Scheynius, Annika Lehtiö, Janne Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title_full | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title_fullStr | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title_full_unstemmed | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title_short | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
title_sort | proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of malassezia sympodialis |
topic | Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/ https://www.ncbi.nlm.nih.gov/pubmed/28100699 http://dx.doi.org/10.1093/nar/gkx006 |
work_keys_str_mv | AT zhuyafeng proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT engstromparg proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT tellgrenrothchristian proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT baudocharlesd proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT kennelljohnc proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT sunsheng proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT billmyrerblake proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT schrodermarkuss proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT anderssonanna proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT holmtina proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT sigurgeirssonbenjamin proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT wuguangxi proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT sankaranarayanansundarram proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT siddharthanrahul proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT sanyalkaustuv proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT lundebergjoakim proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT nystedtbjorn proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT boekhoutteun proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT dawsonthomasl proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT heitmanjoseph proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT scheyniusannika proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis AT lehtiojanne proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis |