Cargando…
A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830648/ https://www.ncbi.nlm.nih.gov/pubmed/35143560 http://dx.doi.org/10.1371/journal.pone.0263616 |
_version_ | 1784648317783769088 |
---|---|
author | Baron, Michael D. Bataille, Arnaud |
author_facet | Baron, Michael D. Bataille, Arnaud |
author_sort | Baron, Michael D. |
collection | PubMed |
description | Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences from other outbreaks, obtained from the public databases. However, there are a very large number (>1800) of PPRV sequences in the databases, a large majority of them relatively short, and not always well-documented. There is also a strong bias in the composition of the dataset, with countries with good sequencing capabilities (e.g. China, India, Turkey) being overrepresented, and most sequences coming from isolates in the last 20 years. In order to facilitate future analyses, we have prepared sets of PPRV sequences, sets which have been filtered for sequencing errors and unnecessary duplicates, and for which date and location information has been obtained, either from the database entry or from other published sources. These sequence datasets are freely available for download, and include smaller datasets which maximise phylogenetic information from the minimum number of sequences, and which will be useful for simple lineage identification. Their utility is illustrated by uploading the data to the MicroReact platform to allow simultaneous viewing of lineage date and geographic information on all the viruses for which we have information. While preparing these datasets, we identified a significant number of public database entries which contain clear errors, and propose guidelines on checking new sequences and completing metadata before submission. |
format | Online Article Text |
id | pubmed-8830648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-88306482022-02-11 A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses Baron, Michael D. Bataille, Arnaud PLoS One Research Article Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences from other outbreaks, obtained from the public databases. However, there are a very large number (>1800) of PPRV sequences in the databases, a large majority of them relatively short, and not always well-documented. There is also a strong bias in the composition of the dataset, with countries with good sequencing capabilities (e.g. China, India, Turkey) being overrepresented, and most sequences coming from isolates in the last 20 years. In order to facilitate future analyses, we have prepared sets of PPRV sequences, sets which have been filtered for sequencing errors and unnecessary duplicates, and for which date and location information has been obtained, either from the database entry or from other published sources. These sequence datasets are freely available for download, and include smaller datasets which maximise phylogenetic information from the minimum number of sequences, and which will be useful for simple lineage identification. Their utility is illustrated by uploading the data to the MicroReact platform to allow simultaneous viewing of lineage date and geographic information on all the viruses for which we have information. While preparing these datasets, we identified a significant number of public database entries which contain clear errors, and propose guidelines on checking new sequences and completing metadata before submission. Public Library of Science 2022-02-10 /pmc/articles/PMC8830648/ /pubmed/35143560 http://dx.doi.org/10.1371/journal.pone.0263616 Text en © 2022 Baron, Bataille https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Baron, Michael D. Bataille, Arnaud A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title | A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title_full | A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title_fullStr | A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title_full_unstemmed | A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title_short | A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
title_sort | curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830648/ https://www.ncbi.nlm.nih.gov/pubmed/35143560 http://dx.doi.org/10.1371/journal.pone.0263616 |
work_keys_str_mv | AT baronmichaeld acurateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses AT bataillearnaud acurateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses AT baronmichaeld curateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses AT bataillearnaud curateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses |