Cargando…

Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis

Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Yafeng, Engström, Pär G., Tellgren-Roth, Christian, Baudo, Charles D., Kennell, John C., Sun, Sheng, Billmyre, R. Blake, Schröder, Markus S., Andersson, Anna, Holm, Tina, Sigurgeirsson, Benjamin, Wu, Guangxi, Sankaranarayanan, Sundar Ram, Siddharthan, Rahul, Sanyal, Kaustuv, Lundeberg, Joakim, Nystedt, Björn, Boekhout, Teun, Dawson, Thomas L., Heitman, Joseph, Scheynius, Annika, Lehtiö, Janne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/
https://www.ncbi.nlm.nih.gov/pubmed/28100699
http://dx.doi.org/10.1093/nar/gkx006
_version_ 1782521305665372160
author Zhu, Yafeng
Engström, Pär G.
Tellgren-Roth, Christian
Baudo, Charles D.
Kennell, John C.
Sun, Sheng
Billmyre, R. Blake
Schröder, Markus S.
Andersson, Anna
Holm, Tina
Sigurgeirsson, Benjamin
Wu, Guangxi
Sankaranarayanan, Sundar Ram
Siddharthan, Rahul
Sanyal, Kaustuv
Lundeberg, Joakim
Nystedt, Björn
Boekhout, Teun
Dawson, Thomas L.
Heitman, Joseph
Scheynius, Annika
Lehtiö, Janne
author_facet Zhu, Yafeng
Engström, Pär G.
Tellgren-Roth, Christian
Baudo, Charles D.
Kennell, John C.
Sun, Sheng
Billmyre, R. Blake
Schröder, Markus S.
Andersson, Anna
Holm, Tina
Sigurgeirsson, Benjamin
Wu, Guangxi
Sankaranarayanan, Sundar Ram
Siddharthan, Rahul
Sanyal, Kaustuv
Lundeberg, Joakim
Nystedt, Björn
Boekhout, Teun
Dawson, Thomas L.
Heitman, Joseph
Scheynius, Annika
Lehtiö, Janne
author_sort Zhu, Yafeng
collection PubMed
description Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.
format Online
Article
Text
id pubmed-5389616
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-53896162017-04-24 Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis Zhu, Yafeng Engström, Pär G. Tellgren-Roth, Christian Baudo, Charles D. Kennell, John C. Sun, Sheng Billmyre, R. Blake Schröder, Markus S. Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Thomas L. Heitman, Joseph Scheynius, Annika Lehtiö, Janne Nucleic Acids Res Genomics Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. Oxford University Press 2017-03-17 2017-01-18 /pmc/articles/PMC5389616/ /pubmed/28100699 http://dx.doi.org/10.1093/nar/gkx006 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Genomics
Zhu, Yafeng
Engström, Pär G.
Tellgren-Roth, Christian
Baudo, Charles D.
Kennell, John C.
Sun, Sheng
Billmyre, R. Blake
Schröder, Markus S.
Andersson, Anna
Holm, Tina
Sigurgeirsson, Benjamin
Wu, Guangxi
Sankaranarayanan, Sundar Ram
Siddharthan, Rahul
Sanyal, Kaustuv
Lundeberg, Joakim
Nystedt, Björn
Boekhout, Teun
Dawson, Thomas L.
Heitman, Joseph
Scheynius, Annika
Lehtiö, Janne
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title_full Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title_fullStr Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title_full_unstemmed Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title_short Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
title_sort proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of malassezia sympodialis
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/
https://www.ncbi.nlm.nih.gov/pubmed/28100699
http://dx.doi.org/10.1093/nar/gkx006
work_keys_str_mv AT zhuyafeng proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT engstromparg proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT tellgrenrothchristian proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT baudocharlesd proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT kennelljohnc proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT sunsheng proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT billmyrerblake proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT schrodermarkuss proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT anderssonanna proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT holmtina proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT sigurgeirssonbenjamin proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT wuguangxi proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT sankaranarayanansundarram proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT siddharthanrahul proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT sanyalkaustuv proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT lundebergjoakim proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT nystedtbjorn proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT boekhoutteun proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT dawsonthomasl proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT heitmanjoseph proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT scheyniusannika proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis
AT lehtiojanne proteogenomicsproducescomprehensiveandhighlyaccurateproteincodinggeneannotationinacompletegenomeassemblyofmalasseziasympodialis