Cargando…

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Yuna, Park, Kiejung, Koh, Insong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2019
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6944045/ https://www.ncbi.nlm.nih.gov/pubmed/31896240 http://dx.doi.org/10.5808/GI.2019.17.4.e40

_version_	1783484986393886720
author	Lee, Yuna Park, Kiejung Koh, Insong
author_facet	Lee, Yuna Park, Kiejung Koh, Insong
author_sort	Lee, Yuna
collection	PubMed
description	While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.
format	Online Article Text
id	pubmed-6944045
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-69440452020-01-09 Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data Lee, Yuna Park, Kiejung Koh, Insong Genomics Inform Original Article While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data. Korea Genome Organization 2019-12-20 /pmc/articles/PMC6944045/ /pubmed/31896240 http://dx.doi.org/10.5808/GI.2019.17.4.e40 Text en (c) 2019, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Lee, Yuna Park, Kiejung Koh, Insong Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title	Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title_full	Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title_fullStr	Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title_full_unstemmed	Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title_short	Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data
title_sort	analysis of unmapped regions associated with long deletions in korean whole genome sequences based on short read data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6944045/ https://www.ncbi.nlm.nih.gov/pubmed/31896240 http://dx.doi.org/10.5808/GI.2019.17.4.e40
work_keys_str_mv	AT leeyuna analysisofunmappedregionsassociatedwithlongdeletionsinkoreanwholegenomesequencesbasedonshortreaddata AT parkkiejung analysisofunmappedregionsassociatedwithlongdeletionsinkoreanwholegenomesequencesbasedonshortreaddata AT kohinsong analysisofunmappedregionsassociatedwithlongdeletionsinkoreanwholegenomesequencesbasedonshortreaddata

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

Ejemplares similares