Cargando…
Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow
BACKGROUND: Massively parallel sequencing includes many liquid handling steps which introduce the possibility of sample swaps, mixing, and duplication. The unique profile of inherited variants in human genomes allows for comparison of sample identity using sequence data. A comparison of all samples...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124007/ https://www.ncbi.nlm.nih.gov/pubmed/37095442 http://dx.doi.org/10.1186/s12859-023-05288-y |
_version_ | 1785029763805478912 |
---|---|
author | Eschrich, Steven A. Yu, Xiaoqing Teer, Jamie K. |
author_facet | Eschrich, Steven A. Yu, Xiaoqing Teer, Jamie K. |
author_sort | Eschrich, Steven A. |
collection | PubMed |
description | BACKGROUND: Massively parallel sequencing includes many liquid handling steps which introduce the possibility of sample swaps, mixing, and duplication. The unique profile of inherited variants in human genomes allows for comparison of sample identity using sequence data. A comparison of all samples vs. each other (all vs. all) provides both identification of mismatched samples and the possibility of resolving swapped samples. However, all vs. all comparison complexity grows as the square of the number of samples, so efficiency becomes essential. RESULTS: We have developed a tool for fast all vs. all genotype comparison using low level bitwise operations built into the Perl programming language. Importantly, we have also developed a complete workflow allowing users to start with either raw FASTQ sequence files, aligned BAM files, or genotype VCF files and automatically generate comparison metrics and summary plots. The tool is freely available at https://github.com/teerjk/TimeAttackGenComp/. CONCLUSIONS: A fast and easy to use method for genotype comparison as described here is an important tool to ensure high quality and robust results in sequencing studies. |
format | Online Article Text |
id | pubmed-10124007 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101240072023-04-25 Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow Eschrich, Steven A. Yu, Xiaoqing Teer, Jamie K. BMC Bioinformatics Software BACKGROUND: Massively parallel sequencing includes many liquid handling steps which introduce the possibility of sample swaps, mixing, and duplication. The unique profile of inherited variants in human genomes allows for comparison of sample identity using sequence data. A comparison of all samples vs. each other (all vs. all) provides both identification of mismatched samples and the possibility of resolving swapped samples. However, all vs. all comparison complexity grows as the square of the number of samples, so efficiency becomes essential. RESULTS: We have developed a tool for fast all vs. all genotype comparison using low level bitwise operations built into the Perl programming language. Importantly, we have also developed a complete workflow allowing users to start with either raw FASTQ sequence files, aligned BAM files, or genotype VCF files and automatically generate comparison metrics and summary plots. The tool is freely available at https://github.com/teerjk/TimeAttackGenComp/. CONCLUSIONS: A fast and easy to use method for genotype comparison as described here is an important tool to ensure high quality and robust results in sequencing studies. BioMed Central 2023-04-24 /pmc/articles/PMC10124007/ /pubmed/37095442 http://dx.doi.org/10.1186/s12859-023-05288-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Eschrich, Steven A. Yu, Xiaoqing Teer, Jamie K. Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title | Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title_full | Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title_fullStr | Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title_full_unstemmed | Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title_short | Fast all versus all genotype comparison using DNA/RNA sequencing data: method and workflow |
title_sort | fast all versus all genotype comparison using dna/rna sequencing data: method and workflow |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124007/ https://www.ncbi.nlm.nih.gov/pubmed/37095442 http://dx.doi.org/10.1186/s12859-023-05288-y |
work_keys_str_mv | AT eschrichstevena fastallversusallgenotypecomparisonusingdnarnasequencingdatamethodandworkflow AT yuxiaoqing fastallversusallgenotypecomparisonusingdnarnasequencingdatamethodandworkflow AT teerjamiek fastallversusallgenotypecomparisonusingdnarnasequencingdatamethodandworkflow |