Cargando…

Non-redundant patent sequence databases with value-added annotations at two levels

The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different invent...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Weizhong, McWilliam, Hamish, de la Torre, Ana Richart, Grodowski, Adam, Benediktovich, Irina, Goujon, Mickael, Nauche, Stephane, Lopez, Rodrigo
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808894/
https://www.ncbi.nlm.nih.gov/pubmed/19884134
http://dx.doi.org/10.1093/nar/gkp960
_version_ 1782176547567828992
author Li, Weizhong
McWilliam, Hamish
de la Torre, Ana Richart
Grodowski, Adam
Benediktovich, Irina
Goujon, Mickael
Nauche, Stephane
Lopez, Rodrigo
author_facet Li, Weizhong
McWilliam, Hamish
de la Torre, Ana Richart
Grodowski, Adam
Benediktovich, Irina
Goujon, Mickael
Nauche, Stephane
Lopez, Rodrigo
author_sort Li, Weizhong
collection PubMed
description The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.
format Text
id pubmed-2808894
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28088942010-01-20 Non-redundant patent sequence databases with value-added annotations at two levels Li, Weizhong McWilliam, Hamish de la Torre, Ana Richart Grodowski, Adam Benediktovich, Irina Goujon, Mickael Nauche, Stephane Lopez, Rodrigo Nucleic Acids Res Articles The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. Oxford University Press 2010-01 2009-11-01 /pmc/articles/PMC2808894/ /pubmed/19884134 http://dx.doi.org/10.1093/nar/gkp960 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Li, Weizhong
McWilliam, Hamish
de la Torre, Ana Richart
Grodowski, Adam
Benediktovich, Irina
Goujon, Mickael
Nauche, Stephane
Lopez, Rodrigo
Non-redundant patent sequence databases with value-added annotations at two levels
title Non-redundant patent sequence databases with value-added annotations at two levels
title_full Non-redundant patent sequence databases with value-added annotations at two levels
title_fullStr Non-redundant patent sequence databases with value-added annotations at two levels
title_full_unstemmed Non-redundant patent sequence databases with value-added annotations at two levels
title_short Non-redundant patent sequence databases with value-added annotations at two levels
title_sort non-redundant patent sequence databases with value-added annotations at two levels
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808894/
https://www.ncbi.nlm.nih.gov/pubmed/19884134
http://dx.doi.org/10.1093/nar/gkp960
work_keys_str_mv AT liweizhong nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT mcwilliamhamish nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT delatorreanarichart nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT grodowskiadam nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT benediktovichirina nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT goujonmickael nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT nauchestephane nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels
AT lopezrodrigo nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels