Cargando…

MisPred: a resource for identification of erroneous protein sequences in public databases

Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that...

Descripción completa

Detalles Bibliográficos
Autores principales: Nagy, Alinda, Patthy, László
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713709/
https://www.ncbi.nlm.nih.gov/pubmed/23864220
http://dx.doi.org/10.1093/database/bat053
_version_ 1782277231661285376
author Nagy, Alinda
Patthy, László
author_facet Nagy, Alinda
Patthy, László
author_sort Nagy, Alinda
collection PubMed
description Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL: http://www.mispred.com
format Online
Article
Text
id pubmed-3713709
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37137092013-07-17 MisPred: a resource for identification of erroneous protein sequences in public databases Nagy, Alinda Patthy, László Database (Oxford) Database Tool Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL: http://www.mispred.com Oxford University Press 2013-07-17 /pmc/articles/PMC3713709/ /pubmed/23864220 http://dx.doi.org/10.1093/database/bat053 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Tool
Nagy, Alinda
Patthy, László
MisPred: a resource for identification of erroneous protein sequences in public databases
title MisPred: a resource for identification of erroneous protein sequences in public databases
title_full MisPred: a resource for identification of erroneous protein sequences in public databases
title_fullStr MisPred: a resource for identification of erroneous protein sequences in public databases
title_full_unstemmed MisPred: a resource for identification of erroneous protein sequences in public databases
title_short MisPred: a resource for identification of erroneous protein sequences in public databases
title_sort mispred: a resource for identification of erroneous protein sequences in public databases
topic Database Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713709/
https://www.ncbi.nlm.nih.gov/pubmed/23864220
http://dx.doi.org/10.1093/database/bat053
work_keys_str_mv AT nagyalinda mispredaresourceforidentificationoferroneousproteinsequencesinpublicdatabases
AT patthylaszlo mispredaresourceforidentificationoferroneousproteinsequencesinpublicdatabases