Cargando…

STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data

BACKGROUND: Short tandem repeats (STRs) are important polymorphism makers for human identification and kinship analyses in forensic science. With the continuous development of massively parallel sequencing (MPS), more laboratories have utilized this technology for forensic applications. Existing STR...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Dong, Tao, Ruiyang, Li, Zhiqiang, Pan, Dun, Wang, Zhuo, Li, Chengtao, Shi, Yongyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075041/
https://www.ncbi.nlm.nih.gov/pubmed/32172688
http://dx.doi.org/10.1186/s41065-020-00120-6
_version_ 1783506962446548992
author Wang, Dong
Tao, Ruiyang
Li, Zhiqiang
Pan, Dun
Wang, Zhuo
Li, Chengtao
Shi, Yongyong
author_facet Wang, Dong
Tao, Ruiyang
Li, Zhiqiang
Pan, Dun
Wang, Zhuo
Li, Chengtao
Shi, Yongyong
author_sort Wang, Dong
collection PubMed
description BACKGROUND: Short tandem repeats (STRs) are important polymorphism makers for human identification and kinship analyses in forensic science. With the continuous development of massively parallel sequencing (MPS), more laboratories have utilized this technology for forensic applications. Existing STR genotyping tools, mostly developed for whole-genome sequencing data, are not effective for MPS data. More importantly, their backward compatibility with the conventional capillary electrophoresis (CE) technology has not been evaluated and guaranteed. RESULTS: In this study, we developed a new end-to-end pipeline called STRsearch for STR-MPS data analysis. The STRsearch can not only determine the allele by counting repeat patterns and INDELs that are actually in the STR region, but it also translates MPS results into standard STR nomenclature (numbers and letters). We evaluated the performance of STRsearch in two forensic sequencing datasets, and the concordance with CE genotypes was 75.73 and 75.75%, increasing 12.32 and 9.05% than the existing tool named STRScan, respectively. Additionally, we trained a base classifier using sequence properties and used it to predict the probability of correct genotyping at a given locus, resulting in the highest accuracy of 96.13%. CONCLUSIONS: All these results demonstrated that STRsearch was a better tool to protect the backward compatibility with CE for the targeted STR profiling in MPS data. STRsearch is available as open-source software at https://github.com/AnJingwd/STRsearch.
format Online
Article
Text
id pubmed-7075041
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70750412020-03-18 STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data Wang, Dong Tao, Ruiyang Li, Zhiqiang Pan, Dun Wang, Zhuo Li, Chengtao Shi, Yongyong Hereditas Research BACKGROUND: Short tandem repeats (STRs) are important polymorphism makers for human identification and kinship analyses in forensic science. With the continuous development of massively parallel sequencing (MPS), more laboratories have utilized this technology for forensic applications. Existing STR genotyping tools, mostly developed for whole-genome sequencing data, are not effective for MPS data. More importantly, their backward compatibility with the conventional capillary electrophoresis (CE) technology has not been evaluated and guaranteed. RESULTS: In this study, we developed a new end-to-end pipeline called STRsearch for STR-MPS data analysis. The STRsearch can not only determine the allele by counting repeat patterns and INDELs that are actually in the STR region, but it also translates MPS results into standard STR nomenclature (numbers and letters). We evaluated the performance of STRsearch in two forensic sequencing datasets, and the concordance with CE genotypes was 75.73 and 75.75%, increasing 12.32 and 9.05% than the existing tool named STRScan, respectively. Additionally, we trained a base classifier using sequence properties and used it to predict the probability of correct genotyping at a given locus, resulting in the highest accuracy of 96.13%. CONCLUSIONS: All these results demonstrated that STRsearch was a better tool to protect the backward compatibility with CE for the targeted STR profiling in MPS data. STRsearch is available as open-source software at https://github.com/AnJingwd/STRsearch. BioMed Central 2020-03-16 /pmc/articles/PMC7075041/ /pubmed/32172688 http://dx.doi.org/10.1186/s41065-020-00120-6 Text en © The Author(s) 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Dong
Tao, Ruiyang
Li, Zhiqiang
Pan, Dun
Wang, Zhuo
Li, Chengtao
Shi, Yongyong
STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title_full STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title_fullStr STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title_full_unstemmed STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title_short STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
title_sort strsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075041/
https://www.ncbi.nlm.nih.gov/pubmed/32172688
http://dx.doi.org/10.1186/s41065-020-00120-6
work_keys_str_mv AT wangdong strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT taoruiyang strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT lizhiqiang strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT pandun strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT wangzhuo strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT lichengtao strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata
AT shiyongyong strsearchanewpipelinefortargetedprofilingofshorttandemrepeatsinmassivelyparallelsequencingdata