Cargando…

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences

Summary: Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix an...

Descripción completa

Detalles Bibliográficos
Autores principales: Pickett, B. D., Karlinsey, S. M., Penrod, C. E., Cormier, M. J., Ebbert, M. T. W., Shiozawa, D. K., Whipple, C. J., Ridge, P. G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013907/
https://www.ncbi.nlm.nih.gov/pubmed/27170037
http://dx.doi.org/10.1093/bioinformatics/btw298
_version_ 1782452236593397760
author Pickett, B. D.
Karlinsey, S. M.
Penrod, C. E.
Cormier, M. J.
Ebbert, M. T. W.
Shiozawa, D. K.
Whipple, C. J.
Ridge, P. G.
author_facet Pickett, B. D.
Karlinsey, S. M.
Penrod, C. E.
Cormier, M. J.
Ebbert, M. T. W.
Shiozawa, D. K.
Whipple, C. J.
Ridge, P. G.
author_sort Pickett, B. D.
collection PubMed
description Summary: Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software. Availability and implementation: SA-SSR is freely available at http://github.com/ridgelab/SA-SSR Contact: perry.ridge@byu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5013907
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50139072016-09-12 SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences Pickett, B. D. Karlinsey, S. M. Penrod, C. E. Cormier, M. J. Ebbert, M. T. W. Shiozawa, D. K. Whipple, C. J. Ridge, P. G. Bioinformatics Applications Notes Summary: Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software. Availability and implementation: SA-SSR is freely available at http://github.com/ridgelab/SA-SSR Contact: perry.ridge@byu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-09-01 2016-05-11 /pmc/articles/PMC5013907/ /pubmed/27170037 http://dx.doi.org/10.1093/bioinformatics/btw298 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Pickett, B. D.
Karlinsey, S. M.
Penrod, C. E.
Cormier, M. J.
Ebbert, M. T. W.
Shiozawa, D. K.
Whipple, C. J.
Ridge, P. G.
SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title_full SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title_fullStr SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title_full_unstemmed SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title_short SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
title_sort sa-ssr: a suffix array-based algorithm for exhaustive and efficient ssr discovery in large genetic sequences
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013907/
https://www.ncbi.nlm.nih.gov/pubmed/27170037
http://dx.doi.org/10.1093/bioinformatics/btw298
work_keys_str_mv AT pickettbd sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT karlinseysm sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT penrodce sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT cormiermj sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT ebbertmtw sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT shiozawadk sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT whipplecj sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences
AT ridgepg sassrasuffixarraybasedalgorithmforexhaustiveandefficientssrdiscoveryinlargegeneticsequences