Cargando…
Accuracy of short tandem repeats genotyping tools in whole exome sequencing data
Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327730/ https://www.ncbi.nlm.nih.gov/pubmed/32665844 http://dx.doi.org/10.12688/f1000research.22639.1 |
_version_ | 1783552602742456320 |
---|---|
author | Halman, Andreas Oshlack, Alicia |
author_facet | Halman, Andreas Oshlack, Alicia |
author_sort | Halman, Andreas |
collection | PubMed |
description | Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls. |
format | Online Article Text |
id | pubmed-7327730 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-73277302020-07-13 Accuracy of short tandem repeats genotyping tools in whole exome sequencing data Halman, Andreas Oshlack, Alicia F1000Res Research Article Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls. F1000 Research Limited 2020-03-23 /pmc/articles/PMC7327730/ /pubmed/32665844 http://dx.doi.org/10.12688/f1000research.22639.1 Text en Copyright: © 2020 Halman A and Oshlack A http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Halman, Andreas Oshlack, Alicia Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title | Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title_full | Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title_fullStr | Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title_full_unstemmed | Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title_short | Accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
title_sort | accuracy of short tandem repeats genotyping tools in whole exome sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327730/ https://www.ncbi.nlm.nih.gov/pubmed/32665844 http://dx.doi.org/10.12688/f1000research.22639.1 |
work_keys_str_mv | AT halmanandreas accuracyofshorttandemrepeatsgenotypingtoolsinwholeexomesequencingdata AT oshlackalicia accuracyofshorttandemrepeatsgenotypingtoolsinwholeexomesequencingdata |