Cargando…
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the Internationa...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702849/ https://www.ncbi.nlm.nih.gov/pubmed/26553804 http://dx.doi.org/10.1093/nar/gkv1189 |
_version_ | 1782408664389255168 |
---|---|
author | O'Leary, Nuala A. Wright, Mathew W. Brister, J. Rodney Ciufo, Stacy Haddad, Diana McVeigh, Rich Rajput, Bhanu Robbertse, Barbara Smith-White, Brian Ako-Adjei, Danso Astashyn, Alexander Badretdin, Azat Bao, Yiming Blinkova, Olga Brover, Vyacheslav Chetvernin, Vyacheslav Choi, Jinna Cox, Eric Ermolaeva, Olga Farrell, Catherine M. Goldfarb, Tamara Gupta, Tripti Haft, Daniel Hatcher, Eneida Hlavina, Wratko Joardar, Vinita S. Kodali, Vamsi K. Li, Wenjun Maglott, Donna Masterson, Patrick McGarvey, Kelly M. Murphy, Michael R. O'Neill, Kathleen Pujar, Shashikant Rangwala, Sanjida H. Rausch, Daniel Riddick, Lillian D. Schoch, Conrad Shkeda, Andrei Storz, Susan S. Sun, Hanzhen Thibaud-Nissen, Francoise Tolstoy, Igor Tully, Raymond E. Vatsan, Anjana R. Wallin, Craig Webb, David Wu, Wendy Landrum, Melissa J. Kimchi, Avi Tatusova, Tatiana DiCuccio, Michael Kitts, Paul Murphy, Terence D. Pruitt, Kim D. |
author_facet | O'Leary, Nuala A. Wright, Mathew W. Brister, J. Rodney Ciufo, Stacy Haddad, Diana McVeigh, Rich Rajput, Bhanu Robbertse, Barbara Smith-White, Brian Ako-Adjei, Danso Astashyn, Alexander Badretdin, Azat Bao, Yiming Blinkova, Olga Brover, Vyacheslav Chetvernin, Vyacheslav Choi, Jinna Cox, Eric Ermolaeva, Olga Farrell, Catherine M. Goldfarb, Tamara Gupta, Tripti Haft, Daniel Hatcher, Eneida Hlavina, Wratko Joardar, Vinita S. Kodali, Vamsi K. Li, Wenjun Maglott, Donna Masterson, Patrick McGarvey, Kelly M. Murphy, Michael R. O'Neill, Kathleen Pujar, Shashikant Rangwala, Sanjida H. Rausch, Daniel Riddick, Lillian D. Schoch, Conrad Shkeda, Andrei Storz, Susan S. Sun, Hanzhen Thibaud-Nissen, Francoise Tolstoy, Igor Tully, Raymond E. Vatsan, Anjana R. Wallin, Craig Webb, David Wu, Wendy Landrum, Melissa J. Kimchi, Avi Tatusova, Tatiana DiCuccio, Michael Kitts, Paul Murphy, Terence D. Pruitt, Kim D. |
author_sort | O'Leary, Nuala A. |
collection | PubMed |
description | The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. |
format | Online Article Text |
id | pubmed-4702849 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-47028492016-01-07 Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation O'Leary, Nuala A. Wright, Mathew W. Brister, J. Rodney Ciufo, Stacy Haddad, Diana McVeigh, Rich Rajput, Bhanu Robbertse, Barbara Smith-White, Brian Ako-Adjei, Danso Astashyn, Alexander Badretdin, Azat Bao, Yiming Blinkova, Olga Brover, Vyacheslav Chetvernin, Vyacheslav Choi, Jinna Cox, Eric Ermolaeva, Olga Farrell, Catherine M. Goldfarb, Tamara Gupta, Tripti Haft, Daniel Hatcher, Eneida Hlavina, Wratko Joardar, Vinita S. Kodali, Vamsi K. Li, Wenjun Maglott, Donna Masterson, Patrick McGarvey, Kelly M. Murphy, Michael R. O'Neill, Kathleen Pujar, Shashikant Rangwala, Sanjida H. Rausch, Daniel Riddick, Lillian D. Schoch, Conrad Shkeda, Andrei Storz, Susan S. Sun, Hanzhen Thibaud-Nissen, Francoise Tolstoy, Igor Tully, Raymond E. Vatsan, Anjana R. Wallin, Craig Webb, David Wu, Wendy Landrum, Melissa J. Kimchi, Avi Tatusova, Tatiana DiCuccio, Michael Kitts, Paul Murphy, Terence D. Pruitt, Kim D. Nucleic Acids Res Database Issue The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Oxford University Press 2016-01-04 2015-11-08 /pmc/articles/PMC4702849/ /pubmed/26553804 http://dx.doi.org/10.1093/nar/gkv1189 Text en Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US. |
spellingShingle | Database Issue O'Leary, Nuala A. Wright, Mathew W. Brister, J. Rodney Ciufo, Stacy Haddad, Diana McVeigh, Rich Rajput, Bhanu Robbertse, Barbara Smith-White, Brian Ako-Adjei, Danso Astashyn, Alexander Badretdin, Azat Bao, Yiming Blinkova, Olga Brover, Vyacheslav Chetvernin, Vyacheslav Choi, Jinna Cox, Eric Ermolaeva, Olga Farrell, Catherine M. Goldfarb, Tamara Gupta, Tripti Haft, Daniel Hatcher, Eneida Hlavina, Wratko Joardar, Vinita S. Kodali, Vamsi K. Li, Wenjun Maglott, Donna Masterson, Patrick McGarvey, Kelly M. Murphy, Michael R. O'Neill, Kathleen Pujar, Shashikant Rangwala, Sanjida H. Rausch, Daniel Riddick, Lillian D. Schoch, Conrad Shkeda, Andrei Storz, Susan S. Sun, Hanzhen Thibaud-Nissen, Francoise Tolstoy, Igor Tully, Raymond E. Vatsan, Anjana R. Wallin, Craig Webb, David Wu, Wendy Landrum, Melissa J. Kimchi, Avi Tatusova, Tatiana DiCuccio, Michael Kitts, Paul Murphy, Terence D. Pruitt, Kim D. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title | Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title_full | Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title_fullStr | Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title_full_unstemmed | Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title_short | Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation |
title_sort | reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation |
topic | Database Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702849/ https://www.ncbi.nlm.nih.gov/pubmed/26553804 http://dx.doi.org/10.1093/nar/gkv1189 |
work_keys_str_mv | AT olearynualaa referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT wrightmatheww referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT bristerjrodney referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT ciufostacy referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT haddaddiana referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT mcveighrich referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT rajputbhanu referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT robbertsebarbara referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT smithwhitebrian referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT akoadjeidanso referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT astashynalexander referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT badretdinazat referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT baoyiming referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT blinkovaolga referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT brovervyacheslav referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT chetverninvyacheslav referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT choijinna referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT coxeric referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT ermolaevaolga referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT farrellcatherinem referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT goldfarbtamara referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT guptatripti referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT haftdaniel referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT hatchereneida referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT hlavinawratko referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT joardarvinitas referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT kodalivamsik referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT liwenjun referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT maglottdonna referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT mastersonpatrick referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT mcgarveykellym referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT murphymichaelr referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT oneillkathleen referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT pujarshashikant referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT rangwalasanjidah referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT rauschdaniel referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT riddicklilliand referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT schochconrad referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT shkedaandrei referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT storzsusans referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT sunhanzhen referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT thibaudnissenfrancoise referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT tolstoyigor referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT tullyraymonde referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT vatsananjanar referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT wallincraig referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT webbdavid referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT wuwendy referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT landrummelissaj referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT kimchiavi referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT tatusovatatiana referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT dicucciomichael referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT kittspaul referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT murphyterenced referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation AT pruittkimd referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation |