Cargando…
Improving the consistency of domain annotation within the Conserved Domain Database
When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4356950/ https://www.ncbi.nlm.nih.gov/pubmed/25767294 http://dx.doi.org/10.1093/database/bav012 |
_version_ | 1782361062426804224 |
---|---|
author | Derbyshire, Myra K. Gonzales, Noreen R. Lu, Shennan He, Jane Marchler, Gabriele H. Wang, Zhouxi Marchler-Bauer, Aron |
author_facet | Derbyshire, Myra K. Gonzales, Noreen R. Lu, Shennan He, Jane Marchler, Gabriele H. Wang, Zhouxi Marchler-Bauer, Aron |
author_sort | Derbyshire, Myra K. |
collection | PubMed |
description | When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that ‘rescues’ valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to ‘suppress’ domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs. URL: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi |
format | Online Article Text |
id | pubmed-4356950 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-43569502015-03-17 Improving the consistency of domain annotation within the Conserved Domain Database Derbyshire, Myra K. Gonzales, Noreen R. Lu, Shennan He, Jane Marchler, Gabriele H. Wang, Zhouxi Marchler-Bauer, Aron Database (Oxford) Original Article When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that ‘rescues’ valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to ‘suppress’ domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs. URL: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi Oxford University Press 2015-03-11 /pmc/articles/PMC4356950/ /pubmed/25767294 http://dx.doi.org/10.1093/database/bav012 Text en Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US. |
spellingShingle | Original Article Derbyshire, Myra K. Gonzales, Noreen R. Lu, Shennan He, Jane Marchler, Gabriele H. Wang, Zhouxi Marchler-Bauer, Aron Improving the consistency of domain annotation within the Conserved Domain Database |
title | Improving the consistency of domain annotation within the Conserved Domain Database |
title_full | Improving the consistency of domain annotation within the Conserved Domain Database |
title_fullStr | Improving the consistency of domain annotation within the Conserved Domain Database |
title_full_unstemmed | Improving the consistency of domain annotation within the Conserved Domain Database |
title_short | Improving the consistency of domain annotation within the Conserved Domain Database |
title_sort | improving the consistency of domain annotation within the conserved domain database |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4356950/ https://www.ncbi.nlm.nih.gov/pubmed/25767294 http://dx.doi.org/10.1093/database/bav012 |
work_keys_str_mv | AT derbyshiremyrak improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT gonzalesnoreenr improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT lushennan improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT hejane improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT marchlergabrieleh improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT wangzhouxi improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase AT marchlerbaueraron improvingtheconsistencyofdomainannotationwithintheconserveddomaindatabase |