Cargando…

RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data

BACKGROUND: Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs...

Descripción completa

Detalles Bibliográficos
Autores principales:	Catanese, Helen N., Brayton, Kelly A., Gebremedhin, Assefaw H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4891823/ https://www.ncbi.nlm.nih.gov/pubmed/27260942 http://dx.doi.org/10.1186/s12864-016-2686-2

_version_	1782435332132700160
author	Catanese, Helen N. Brayton, Kelly A. Gebremedhin, Assefaw H.
author_facet	Catanese, Helen N. Brayton, Kelly A. Gebremedhin, Assefaw H.
author_sort	Catanese, Helen N.
collection	PubMed
description	BACKGROUND: Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs. Cataloguing and tracking these repeats can be difficult as diverse groups of researchers are involved in the identification of the repeats. Additionally, the task is error-prone when done manually. RESULTS: We developed RepeatAnalyzer, a new software tool capable of tracking, managing, analysing and cataloguing SSRs and genotypes using Anaplasma marginale as a model species. RepeatAnalyzer’s analysis capability includes novel metrics for measuring regional genetic diversity (corresponding to variety and regularity of SSR occurrence). As a part of its visualization capabilities, RepeatAnalyzer produces high quality maps of the geographic distribution of genotypes or SSRs over a region of interest. RepeatAnalyzer’s repeat identification functionality was validated for all SSRs and genotypes reported in 21 publications, using 380 A. marginale isolates gathered from the five publications within that list that provided access to their isolates. The tool produced accurate genotyping results in every case. In addition, it uncovered a number of errors in the published literature: 11 cases where SSRs were misreported, 5 cases where two different SSRs had been given the same name, and 16 cases where two or more names had been given to a single SSR. The analysis and visualization functionalities of the tool are demonstrated using several examples. CONCLUSIONS: RepeatAnalyzer is a robust software tool that can be used for storing, managing, and analysing short-sequence repeats for the purpose of strain identification. The tool can be used for any set of SSRs regardless of species. When applied to A. marginale, our test case, we show that genotype lengths for a given region follow a normal distribution, while SSR frequencies follow a power-law-like distribution. Further, we find that over 90 % of repeats are 28 to 29 amino acids long, which is in agreement with conventional wisdom. Lastly, our analysis reveals that the most common edit distance is five or six, which is counter-intuitive since we expected that result to be closer to one, resulting from the simplest change from one repeat to another. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2686-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4891823
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48918232016-06-04 RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data Catanese, Helen N. Brayton, Kelly A. Gebremedhin, Assefaw H. BMC Genomics Software BACKGROUND: Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs. Cataloguing and tracking these repeats can be difficult as diverse groups of researchers are involved in the identification of the repeats. Additionally, the task is error-prone when done manually. RESULTS: We developed RepeatAnalyzer, a new software tool capable of tracking, managing, analysing and cataloguing SSRs and genotypes using Anaplasma marginale as a model species. RepeatAnalyzer’s analysis capability includes novel metrics for measuring regional genetic diversity (corresponding to variety and regularity of SSR occurrence). As a part of its visualization capabilities, RepeatAnalyzer produces high quality maps of the geographic distribution of genotypes or SSRs over a region of interest. RepeatAnalyzer’s repeat identification functionality was validated for all SSRs and genotypes reported in 21 publications, using 380 A. marginale isolates gathered from the five publications within that list that provided access to their isolates. The tool produced accurate genotyping results in every case. In addition, it uncovered a number of errors in the published literature: 11 cases where SSRs were misreported, 5 cases where two different SSRs had been given the same name, and 16 cases where two or more names had been given to a single SSR. The analysis and visualization functionalities of the tool are demonstrated using several examples. CONCLUSIONS: RepeatAnalyzer is a robust software tool that can be used for storing, managing, and analysing short-sequence repeats for the purpose of strain identification. The tool can be used for any set of SSRs regardless of species. When applied to A. marginale, our test case, we show that genotype lengths for a given region follow a normal distribution, while SSR frequencies follow a power-law-like distribution. Further, we find that over 90 % of repeats are 28 to 29 amino acids long, which is in agreement with conventional wisdom. Lastly, our analysis reveals that the most common edit distance is five or six, which is counter-intuitive since we expected that result to be closer to one, resulting from the simplest change from one repeat to another. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2686-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-03 /pmc/articles/PMC4891823/ /pubmed/27260942 http://dx.doi.org/10.1186/s12864-016-2686-2 Text en © Catanese et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Catanese, Helen N. Brayton, Kelly A. Gebremedhin, Assefaw H. RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title	RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title_full	RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title_fullStr	RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title_full_unstemmed	RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title_short	RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
title_sort	repeatanalyzer: a tool for analysing and managing short-sequence repeat data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4891823/ https://www.ncbi.nlm.nih.gov/pubmed/27260942 http://dx.doi.org/10.1186/s12864-016-2686-2
work_keys_str_mv	AT catanesehelenn repeatanalyzeratoolforanalysingandmanagingshortsequencerepeatdata AT braytonkellya repeatanalyzeratoolforanalysingandmanagingshortsequencerepeatdata AT gebremedhinassefawh repeatanalyzeratoolforanalysingandmanagingshortsequencerepeatdata

RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data

Ejemplares similares