Cargando…

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an internat...

Descripción completa

Detalles Bibliográficos
Autores principales: Pujar, Shashikant, O’Leary, Nuala A, Farrell, Catherine M, Loveland, Jane E, Mudge, Jonathan M, Wallin, Craig, Girón, Carlos G, Diekhans, Mark, Barnes, If, Bennett, Ruth, Berry, Andrew E, Cox, Eric, Davidson, Claire, Goldfarb, Tamara, Gonzalez, Jose M, Hunt, Toby, Jackson, John, Joardar, Vinita, Kay, Mike P, Kodali, Vamsi K, Martin, Fergal J, McAndrews, Monica, McGarvey, Kelly M, Murphy, Michael, Rajput, Bhanu, Rangwala, Sanjida H, Riddick, Lillian D, Seal, Ruth L, Suner, Marie-Marthe, Webb, David, Zhu, Sophia, Aken, Bronwen L, Bruford, Elspeth A, Bult, Carol J, Frankish, Adam, Murphy, Terence, Pruitt, Kim D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753299/
https://www.ncbi.nlm.nih.gov/pubmed/29126148
http://dx.doi.org/10.1093/nar/gkx1031
_version_ 1783290251870994432
author Pujar, Shashikant
O’Leary, Nuala A
Farrell, Catherine M
Loveland, Jane E
Mudge, Jonathan M
Wallin, Craig
Girón, Carlos G
Diekhans, Mark
Barnes, If
Bennett, Ruth
Berry, Andrew E
Cox, Eric
Davidson, Claire
Goldfarb, Tamara
Gonzalez, Jose M
Hunt, Toby
Jackson, John
Joardar, Vinita
Kay, Mike P
Kodali, Vamsi K
Martin, Fergal J
McAndrews, Monica
McGarvey, Kelly M
Murphy, Michael
Rajput, Bhanu
Rangwala, Sanjida H
Riddick, Lillian D
Seal, Ruth L
Suner, Marie-Marthe
Webb, David
Zhu, Sophia
Aken, Bronwen L
Bruford, Elspeth A
Bult, Carol J
Frankish, Adam
Murphy, Terence
Pruitt, Kim D
author_facet Pujar, Shashikant
O’Leary, Nuala A
Farrell, Catherine M
Loveland, Jane E
Mudge, Jonathan M
Wallin, Craig
Girón, Carlos G
Diekhans, Mark
Barnes, If
Bennett, Ruth
Berry, Andrew E
Cox, Eric
Davidson, Claire
Goldfarb, Tamara
Gonzalez, Jose M
Hunt, Toby
Jackson, John
Joardar, Vinita
Kay, Mike P
Kodali, Vamsi K
Martin, Fergal J
McAndrews, Monica
McGarvey, Kelly M
Murphy, Michael
Rajput, Bhanu
Rangwala, Sanjida H
Riddick, Lillian D
Seal, Ruth L
Suner, Marie-Marthe
Webb, David
Zhu, Sophia
Aken, Bronwen L
Bruford, Elspeth A
Bult, Carol J
Frankish, Adam
Murphy, Terence
Pruitt, Kim D
author_sort Pujar, Shashikant
collection PubMed
description The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
format Online
Article
Text
id pubmed-5753299
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57532992018-01-05 Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation Pujar, Shashikant O’Leary, Nuala A Farrell, Catherine M Loveland, Jane E Mudge, Jonathan M Wallin, Craig Girón, Carlos G Diekhans, Mark Barnes, If Bennett, Ruth Berry, Andrew E Cox, Eric Davidson, Claire Goldfarb, Tamara Gonzalez, Jose M Hunt, Toby Jackson, John Joardar, Vinita Kay, Mike P Kodali, Vamsi K Martin, Fergal J McAndrews, Monica McGarvey, Kelly M Murphy, Michael Rajput, Bhanu Rangwala, Sanjida H Riddick, Lillian D Seal, Ruth L Suner, Marie-Marthe Webb, David Zhu, Sophia Aken, Bronwen L Bruford, Elspeth A Bult, Carol J Frankish, Adam Murphy, Terence Pruitt, Kim D Nucleic Acids Res Database Issue The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Oxford University Press 2018-01-04 2017-11-06 /pmc/articles/PMC5753299/ /pubmed/29126148 http://dx.doi.org/10.1093/nar/gkx1031 Text en Published by Oxford University Press on behalf of Nucleic Acids Research 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.
spellingShingle Database Issue
Pujar, Shashikant
O’Leary, Nuala A
Farrell, Catherine M
Loveland, Jane E
Mudge, Jonathan M
Wallin, Craig
Girón, Carlos G
Diekhans, Mark
Barnes, If
Bennett, Ruth
Berry, Andrew E
Cox, Eric
Davidson, Claire
Goldfarb, Tamara
Gonzalez, Jose M
Hunt, Toby
Jackson, John
Joardar, Vinita
Kay, Mike P
Kodali, Vamsi K
Martin, Fergal J
McAndrews, Monica
McGarvey, Kelly M
Murphy, Michael
Rajput, Bhanu
Rangwala, Sanjida H
Riddick, Lillian D
Seal, Ruth L
Suner, Marie-Marthe
Webb, David
Zhu, Sophia
Aken, Bronwen L
Bruford, Elspeth A
Bult, Carol J
Frankish, Adam
Murphy, Terence
Pruitt, Kim D
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title_full Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title_fullStr Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title_full_unstemmed Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title_short Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
title_sort consensus coding sequence (ccds) database: a standardized set of human and mouse protein-coding regions supported by expert curation
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753299/
https://www.ncbi.nlm.nih.gov/pubmed/29126148
http://dx.doi.org/10.1093/nar/gkx1031
work_keys_str_mv AT pujarshashikant consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT olearynualaa consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT farrellcatherinem consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT lovelandjanee consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT mudgejonathanm consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT wallincraig consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT gironcarlosg consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT diekhansmark consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT barnesif consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT bennettruth consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT berryandrewe consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT coxeric consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT davidsonclaire consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT goldfarbtamara consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT gonzalezjosem consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT hunttoby consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT jacksonjohn consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT joardarvinita consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT kaymikep consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT kodalivamsik consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT martinfergalj consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT mcandrewsmonica consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT mcgarveykellym consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT murphymichael consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT rajputbhanu consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT rangwalasanjidah consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT riddicklilliand consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT sealruthl consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT sunermariemarthe consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT webbdavid consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT zhusophia consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT akenbronwenl consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT brufordelspetha consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT bultcarolj consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT frankishadam consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT murphyterence consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration
AT pruittkimd consensuscodingsequenceccdsdatabaseastandardizedsetofhumanandmouseproteincodingregionssupportedbyexpertcuration