Cargando…

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the Internationa...

Descripción completa

Detalles Bibliográficos
Autores principales: O'Leary, Nuala A., Wright, Mathew W., Brister, J. Rodney, Ciufo, Stacy, Haddad, Diana, McVeigh, Rich, Rajput, Bhanu, Robbertse, Barbara, Smith-White, Brian, Ako-Adjei, Danso, Astashyn, Alexander, Badretdin, Azat, Bao, Yiming, Blinkova, Olga, Brover, Vyacheslav, Chetvernin, Vyacheslav, Choi, Jinna, Cox, Eric, Ermolaeva, Olga, Farrell, Catherine M., Goldfarb, Tamara, Gupta, Tripti, Haft, Daniel, Hatcher, Eneida, Hlavina, Wratko, Joardar, Vinita S., Kodali, Vamsi K., Li, Wenjun, Maglott, Donna, Masterson, Patrick, McGarvey, Kelly M., Murphy, Michael R., O'Neill, Kathleen, Pujar, Shashikant, Rangwala, Sanjida H., Rausch, Daniel, Riddick, Lillian D., Schoch, Conrad, Shkeda, Andrei, Storz, Susan S., Sun, Hanzhen, Thibaud-Nissen, Francoise, Tolstoy, Igor, Tully, Raymond E., Vatsan, Anjana R., Wallin, Craig, Webb, David, Wu, Wendy, Landrum, Melissa J., Kimchi, Avi, Tatusova, Tatiana, DiCuccio, Michael, Kitts, Paul, Murphy, Terence D., Pruitt, Kim D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702849/
https://www.ncbi.nlm.nih.gov/pubmed/26553804
http://dx.doi.org/10.1093/nar/gkv1189
_version_ 1782408664389255168
author O'Leary, Nuala A.
Wright, Mathew W.
Brister, J. Rodney
Ciufo, Stacy
Haddad, Diana
McVeigh, Rich
Rajput, Bhanu
Robbertse, Barbara
Smith-White, Brian
Ako-Adjei, Danso
Astashyn, Alexander
Badretdin, Azat
Bao, Yiming
Blinkova, Olga
Brover, Vyacheslav
Chetvernin, Vyacheslav
Choi, Jinna
Cox, Eric
Ermolaeva, Olga
Farrell, Catherine M.
Goldfarb, Tamara
Gupta, Tripti
Haft, Daniel
Hatcher, Eneida
Hlavina, Wratko
Joardar, Vinita S.
Kodali, Vamsi K.
Li, Wenjun
Maglott, Donna
Masterson, Patrick
McGarvey, Kelly M.
Murphy, Michael R.
O'Neill, Kathleen
Pujar, Shashikant
Rangwala, Sanjida H.
Rausch, Daniel
Riddick, Lillian D.
Schoch, Conrad
Shkeda, Andrei
Storz, Susan S.
Sun, Hanzhen
Thibaud-Nissen, Francoise
Tolstoy, Igor
Tully, Raymond E.
Vatsan, Anjana R.
Wallin, Craig
Webb, David
Wu, Wendy
Landrum, Melissa J.
Kimchi, Avi
Tatusova, Tatiana
DiCuccio, Michael
Kitts, Paul
Murphy, Terence D.
Pruitt, Kim D.
author_facet O'Leary, Nuala A.
Wright, Mathew W.
Brister, J. Rodney
Ciufo, Stacy
Haddad, Diana
McVeigh, Rich
Rajput, Bhanu
Robbertse, Barbara
Smith-White, Brian
Ako-Adjei, Danso
Astashyn, Alexander
Badretdin, Azat
Bao, Yiming
Blinkova, Olga
Brover, Vyacheslav
Chetvernin, Vyacheslav
Choi, Jinna
Cox, Eric
Ermolaeva, Olga
Farrell, Catherine M.
Goldfarb, Tamara
Gupta, Tripti
Haft, Daniel
Hatcher, Eneida
Hlavina, Wratko
Joardar, Vinita S.
Kodali, Vamsi K.
Li, Wenjun
Maglott, Donna
Masterson, Patrick
McGarvey, Kelly M.
Murphy, Michael R.
O'Neill, Kathleen
Pujar, Shashikant
Rangwala, Sanjida H.
Rausch, Daniel
Riddick, Lillian D.
Schoch, Conrad
Shkeda, Andrei
Storz, Susan S.
Sun, Hanzhen
Thibaud-Nissen, Francoise
Tolstoy, Igor
Tully, Raymond E.
Vatsan, Anjana R.
Wallin, Craig
Webb, David
Wu, Wendy
Landrum, Melissa J.
Kimchi, Avi
Tatusova, Tatiana
DiCuccio, Michael
Kitts, Paul
Murphy, Terence D.
Pruitt, Kim D.
author_sort O'Leary, Nuala A.
collection PubMed
description The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
format Online
Article
Text
id pubmed-4702849
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47028492016-01-07 Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation O'Leary, Nuala A. Wright, Mathew W. Brister, J. Rodney Ciufo, Stacy Haddad, Diana McVeigh, Rich Rajput, Bhanu Robbertse, Barbara Smith-White, Brian Ako-Adjei, Danso Astashyn, Alexander Badretdin, Azat Bao, Yiming Blinkova, Olga Brover, Vyacheslav Chetvernin, Vyacheslav Choi, Jinna Cox, Eric Ermolaeva, Olga Farrell, Catherine M. Goldfarb, Tamara Gupta, Tripti Haft, Daniel Hatcher, Eneida Hlavina, Wratko Joardar, Vinita S. Kodali, Vamsi K. Li, Wenjun Maglott, Donna Masterson, Patrick McGarvey, Kelly M. Murphy, Michael R. O'Neill, Kathleen Pujar, Shashikant Rangwala, Sanjida H. Rausch, Daniel Riddick, Lillian D. Schoch, Conrad Shkeda, Andrei Storz, Susan S. Sun, Hanzhen Thibaud-Nissen, Francoise Tolstoy, Igor Tully, Raymond E. Vatsan, Anjana R. Wallin, Craig Webb, David Wu, Wendy Landrum, Melissa J. Kimchi, Avi Tatusova, Tatiana DiCuccio, Michael Kitts, Paul Murphy, Terence D. Pruitt, Kim D. Nucleic Acids Res Database Issue The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Oxford University Press 2016-01-04 2015-11-08 /pmc/articles/PMC4702849/ /pubmed/26553804 http://dx.doi.org/10.1093/nar/gkv1189 Text en Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
spellingShingle Database Issue
O'Leary, Nuala A.
Wright, Mathew W.
Brister, J. Rodney
Ciufo, Stacy
Haddad, Diana
McVeigh, Rich
Rajput, Bhanu
Robbertse, Barbara
Smith-White, Brian
Ako-Adjei, Danso
Astashyn, Alexander
Badretdin, Azat
Bao, Yiming
Blinkova, Olga
Brover, Vyacheslav
Chetvernin, Vyacheslav
Choi, Jinna
Cox, Eric
Ermolaeva, Olga
Farrell, Catherine M.
Goldfarb, Tamara
Gupta, Tripti
Haft, Daniel
Hatcher, Eneida
Hlavina, Wratko
Joardar, Vinita S.
Kodali, Vamsi K.
Li, Wenjun
Maglott, Donna
Masterson, Patrick
McGarvey, Kelly M.
Murphy, Michael R.
O'Neill, Kathleen
Pujar, Shashikant
Rangwala, Sanjida H.
Rausch, Daniel
Riddick, Lillian D.
Schoch, Conrad
Shkeda, Andrei
Storz, Susan S.
Sun, Hanzhen
Thibaud-Nissen, Francoise
Tolstoy, Igor
Tully, Raymond E.
Vatsan, Anjana R.
Wallin, Craig
Webb, David
Wu, Wendy
Landrum, Melissa J.
Kimchi, Avi
Tatusova, Tatiana
DiCuccio, Michael
Kitts, Paul
Murphy, Terence D.
Pruitt, Kim D.
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title_full Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title_fullStr Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title_full_unstemmed Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title_short Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
title_sort reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702849/
https://www.ncbi.nlm.nih.gov/pubmed/26553804
http://dx.doi.org/10.1093/nar/gkv1189
work_keys_str_mv AT olearynualaa referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT wrightmatheww referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT bristerjrodney referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT ciufostacy referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT haddaddiana referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT mcveighrich referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT rajputbhanu referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT robbertsebarbara referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT smithwhitebrian referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT akoadjeidanso referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT astashynalexander referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT badretdinazat referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT baoyiming referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT blinkovaolga referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT brovervyacheslav referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT chetverninvyacheslav referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT choijinna referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT coxeric referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT ermolaevaolga referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT farrellcatherinem referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT goldfarbtamara referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT guptatripti referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT haftdaniel referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT hatchereneida referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT hlavinawratko referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT joardarvinitas referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT kodalivamsik referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT liwenjun referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT maglottdonna referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT mastersonpatrick referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT mcgarveykellym referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT murphymichaelr referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT oneillkathleen referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT pujarshashikant referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT rangwalasanjidah referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT rauschdaniel referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT riddicklilliand referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT schochconrad referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT shkedaandrei referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT storzsusans referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT sunhanzhen referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT thibaudnissenfrancoise referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT tolstoyigor referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT tullyraymonde referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT vatsananjanar referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT wallincraig referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT webbdavid referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT wuwendy referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT landrummelissaj referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT kimchiavi referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT tatusovatatiana referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT dicucciomichael referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT kittspaul referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT murphyterenced referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation
AT pruittkimd referencesequencerefseqdatabaseatncbicurrentstatustaxonomicexpansionandfunctionalannotation