Cargando…

A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses

Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Baron, Michael D., Bataille, Arnaud
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830648/
https://www.ncbi.nlm.nih.gov/pubmed/35143560
http://dx.doi.org/10.1371/journal.pone.0263616
_version_ 1784648317783769088
author Baron, Michael D.
Bataille, Arnaud
author_facet Baron, Michael D.
Bataille, Arnaud
author_sort Baron, Michael D.
collection PubMed
description Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences from other outbreaks, obtained from the public databases. However, there are a very large number (>1800) of PPRV sequences in the databases, a large majority of them relatively short, and not always well-documented. There is also a strong bias in the composition of the dataset, with countries with good sequencing capabilities (e.g. China, India, Turkey) being overrepresented, and most sequences coming from isolates in the last 20 years. In order to facilitate future analyses, we have prepared sets of PPRV sequences, sets which have been filtered for sequencing errors and unnecessary duplicates, and for which date and location information has been obtained, either from the database entry or from other published sources. These sequence datasets are freely available for download, and include smaller datasets which maximise phylogenetic information from the minimum number of sequences, and which will be useful for simple lineage identification. Their utility is illustrated by uploading the data to the MicroReact platform to allow simultaneous viewing of lineage date and geographic information on all the viruses for which we have information. While preparing these datasets, we identified a significant number of public database entries which contain clear errors, and propose guidelines on checking new sequences and completing metadata before submission.
format Online
Article
Text
id pubmed-8830648
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88306482022-02-11 A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses Baron, Michael D. Bataille, Arnaud PLoS One Research Article Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences from other outbreaks, obtained from the public databases. However, there are a very large number (>1800) of PPRV sequences in the databases, a large majority of them relatively short, and not always well-documented. There is also a strong bias in the composition of the dataset, with countries with good sequencing capabilities (e.g. China, India, Turkey) being overrepresented, and most sequences coming from isolates in the last 20 years. In order to facilitate future analyses, we have prepared sets of PPRV sequences, sets which have been filtered for sequencing errors and unnecessary duplicates, and for which date and location information has been obtained, either from the database entry or from other published sources. These sequence datasets are freely available for download, and include smaller datasets which maximise phylogenetic information from the minimum number of sequences, and which will be useful for simple lineage identification. Their utility is illustrated by uploading the data to the MicroReact platform to allow simultaneous viewing of lineage date and geographic information on all the viruses for which we have information. While preparing these datasets, we identified a significant number of public database entries which contain clear errors, and propose guidelines on checking new sequences and completing metadata before submission. Public Library of Science 2022-02-10 /pmc/articles/PMC8830648/ /pubmed/35143560 http://dx.doi.org/10.1371/journal.pone.0263616 Text en © 2022 Baron, Bataille https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Baron, Michael D.
Bataille, Arnaud
A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title_full A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title_fullStr A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title_full_unstemmed A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title_short A curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
title_sort curated dataset of peste des petits ruminants virus sequences for molecular epidemiological analyses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830648/
https://www.ncbi.nlm.nih.gov/pubmed/35143560
http://dx.doi.org/10.1371/journal.pone.0263616
work_keys_str_mv AT baronmichaeld acurateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses
AT bataillearnaud acurateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses
AT baronmichaeld curateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses
AT bataillearnaud curateddatasetofpestedespetitsruminantsvirussequencesformolecularepidemiologicalanalyses