Cargando…
Non-redundant patent sequence databases with value-added annotations at two levels
The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different invent...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808894/ https://www.ncbi.nlm.nih.gov/pubmed/19884134 http://dx.doi.org/10.1093/nar/gkp960 |
_version_ | 1782176547567828992 |
---|---|
author | Li, Weizhong McWilliam, Hamish de la Torre, Ana Richart Grodowski, Adam Benediktovich, Irina Goujon, Mickael Nauche, Stephane Lopez, Rodrigo |
author_facet | Li, Weizhong McWilliam, Hamish de la Torre, Ana Richart Grodowski, Adam Benediktovich, Irina Goujon, Mickael Nauche, Stephane Lopez, Rodrigo |
author_sort | Li, Weizhong |
collection | PubMed |
description | The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. |
format | Text |
id | pubmed-2808894 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28088942010-01-20 Non-redundant patent sequence databases with value-added annotations at two levels Li, Weizhong McWilliam, Hamish de la Torre, Ana Richart Grodowski, Adam Benediktovich, Irina Goujon, Mickael Nauche, Stephane Lopez, Rodrigo Nucleic Acids Res Articles The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. Oxford University Press 2010-01 2009-11-01 /pmc/articles/PMC2808894/ /pubmed/19884134 http://dx.doi.org/10.1093/nar/gkp960 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Articles Li, Weizhong McWilliam, Hamish de la Torre, Ana Richart Grodowski, Adam Benediktovich, Irina Goujon, Mickael Nauche, Stephane Lopez, Rodrigo Non-redundant patent sequence databases with value-added annotations at two levels |
title | Non-redundant patent sequence databases with value-added annotations at two levels |
title_full | Non-redundant patent sequence databases with value-added annotations at two levels |
title_fullStr | Non-redundant patent sequence databases with value-added annotations at two levels |
title_full_unstemmed | Non-redundant patent sequence databases with value-added annotations at two levels |
title_short | Non-redundant patent sequence databases with value-added annotations at two levels |
title_sort | non-redundant patent sequence databases with value-added annotations at two levels |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808894/ https://www.ncbi.nlm.nih.gov/pubmed/19884134 http://dx.doi.org/10.1093/nar/gkp960 |
work_keys_str_mv | AT liweizhong nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT mcwilliamhamish nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT delatorreanarichart nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT grodowskiadam nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT benediktovichirina nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT goujonmickael nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT nauchestephane nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels AT lopezrodrigo nonredundantpatentsequencedatabaseswithvalueaddedannotationsattwolevels |