Cargando…
Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencin...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026601/ https://www.ncbi.nlm.nih.gov/pubmed/31853540 http://dx.doi.org/10.1093/nar/gkz1173 |
_version_ | 1783498714623508480 |
---|---|
author | Zhou, Weichen Emery, Sarah B Flasch, Diane A Wang, Yifan Kwan, Kenneth Y Kidd, Jeffrey M Moran, John V Mills, Ryan E |
author_facet | Zhou, Weichen Emery, Sarah B Flasch, Diane A Wang, Yifan Kwan, Kenneth Y Kidd, Jeffrey M Moran, John V Mills, Ryan E |
author_sort | Zhou, Weichen |
collection | PubMed |
description | Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes. |
format | Online Article Text |
id | pubmed-7026601 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-70266012020-02-25 Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology Zhou, Weichen Emery, Sarah B Flasch, Diane A Wang, Yifan Kwan, Kenneth Y Kidd, Jeffrey M Moran, John V Mills, Ryan E Nucleic Acids Res Computational Biology Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes. Oxford University Press 2020-02-20 2019-12-19 /pmc/articles/PMC7026601/ /pubmed/31853540 http://dx.doi.org/10.1093/nar/gkz1173 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Zhou, Weichen Emery, Sarah B Flasch, Diane A Wang, Yifan Kwan, Kenneth Y Kidd, Jeffrey M Moran, John V Mills, Ryan E Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title | Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title_full | Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title_fullStr | Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title_full_unstemmed | Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title_short | Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology |
title_sort | identification and characterization of occult human-specific line-1 insertions using long-read sequencing technology |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026601/ https://www.ncbi.nlm.nih.gov/pubmed/31853540 http://dx.doi.org/10.1093/nar/gkz1173 |
work_keys_str_mv | AT zhouweichen identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT emerysarahb identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT flaschdianea identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT wangyifan identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT kwankennethy identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT kiddjeffreym identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT moranjohnv identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology AT millsryane identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology |