Cargando…

Improving the consistency of domain annotation within the Conserved Domain Database

When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add...

Descripción completa

Detalles Bibliográficos
Autores principales: Derbyshire, Myra K., Gonzales, Noreen R., Lu, Shennan, He, Jane, Marchler, Gabriele H., Wang, Zhouxi, Marchler-Bauer, Aron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4356950/
https://www.ncbi.nlm.nih.gov/pubmed/25767294
http://dx.doi.org/10.1093/database/bav012
_version_ 1782361062426804224
author Derbyshire, Myra K.
Gonzales, Noreen R.
Lu, Shennan
He, Jane
Marchler, Gabriele H.
Wang, Zhouxi
Marchler-Bauer, Aron
author_facet Derbyshire, Myra K.
Gonzales, Noreen R.
Lu, Shennan
He, Jane
Marchler, Gabriele H.
Wang, Zhouxi
Marchler-Bauer, Aron
author_sort Derbyshire, Myra K.
collection PubMed
description When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that ‘rescues’ valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to ‘suppress’ domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs. URL: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
format Online
Article
Text
id pubmed-4356950
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43569502015-03-17 Improving the consistency of domain annotation within the Conserved Domain Database Derbyshire, Myra K. Gonzales, Noreen R. Lu, Shennan He, Jane Marchler, Gabriele H. Wang, Zhouxi Marchler-Bauer, Aron Database (Oxford) Original Article When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that ‘rescues’ valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to ‘suppress’ domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs. URL: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi Oxford University Press 2015-03-11 /pmc/articles/PMC4356950/ /pubmed/25767294 http://dx.doi.org/10.1093/database/bav012 Text en Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
spellingShingle Original Article
Derbyshire, Myra K.
Gonzales, Noreen R.
Lu, Shennan
He, Jane
Marchler, Gabriele H.
Wang, Zhouxi
Marchler-Bauer, Aron
Improving the consistency of domain annotation within the Conserved Domain Database
title Improving the consistency of domain annotation within the Conserved Domain Database
title_full Improving the consistency of domain annotation within the Conserved Domain Database
title_fullStr Improving the consistency of domain annotation within the Conserved Domain Database
title_full_unstemmed Improving the consistency of domain annotation within the Conserved Domain Database
title_short Improving the consistency of domain annotation within the Conserved Domain Database
title_sort improving the consistency of domain annotation within the conserved domain database
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4356950/
https://www.ncbi.nlm.nih.gov/pubmed/25767294
http://dx.doi.org/10.1093/database/bav012
work_keys_str_mv AT derbyshiremyrak improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT gonzalesnoreenr improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT lushennan improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT hejane improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT marchlergabrieleh improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT wangzhouxi improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase
AT marchlerbaueraron improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase