Cargando…

RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes

BACKGROUND -: The availability of multiple whole genome sequences has facilitated in silico identification of fixed and polymorphic transposable elements (TE). Whereas polymorphic loci serve as makers for phylogenetic and forensic analysis, fixed species-specific transposon insertions, when compared...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Vipin, Mishra, Rakesh K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3024322/
https://www.ncbi.nlm.nih.gov/pubmed/21184688
http://dx.doi.org/10.1186/1471-2105-11-609
_version_ 1782196762408124416
author Singh, Vipin
Mishra, Rakesh K
author_facet Singh, Vipin
Mishra, Rakesh K
author_sort Singh, Vipin
collection PubMed
description BACKGROUND -: The availability of multiple whole genome sequences has facilitated in silico identification of fixed and polymorphic transposable elements (TE). Whereas polymorphic loci serve as makers for phylogenetic and forensic analysis, fixed species-specific transposon insertions, when compared to orthologous loci in other closely related species, may give insights into their evolutionary significance. Besides, TE insertions are not isolated events and are frequently associated with subtle sequence changes concurrent with insertion or post insertion. These include duplication of target site, 3' and 5' flank transduction, deletion of the target locus, 5' truncation or partial deletion and inversion of the transposon, and post insertion changes like inter or intra element recombination, disruption etc. Although such changes have been studied independently, no automated platform to identify differential transposon insertions and the associated array of sequence changes in genomes of the same or closely related species is available till date. To this end, we have designed RISCI - 'Repeat Induced Sequence Changes Identifier' - a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify differential transposon insertions and associated sequence changes using specific alignment signatures, which may then be examined for their downstream effects. RESULTS -: We showcase the utility of RISCI by comparing full length and truncated L1HS and AluYa5 retrotransposons in the reference human genome with the chimpanzee genome and the alternate human assemblies (Celera and HuRef). Comparison of the reference human genome with alternate human assemblies using RISCI predicts 14 novel polymorphisms in full length L1HS, 24 in truncated L1HS and 140 novel polymorphisms in AluYa5 insertions, besides several insertion and post insertion changes. We present comparison with two previous studies to show that RISCI predictions are broadly in agreement with earlier reports. We also demonstrate its versatility by comparing various strains of Mycobacterium tuberculosis for IS 6100 insertion polymorphism. CONCLUSIONS -: RISCI combines comparative genomics with subtractive hybridization, inferring changes only when exclusive to one of the two genomes being compared. The pipeline is generic and may be applied to most transposons and to any two or more genomes sharing high sequence similarity. Such comparisons, when performed on a larger scale, may pull out a few critical events, which may have seeded the divergence between the two species under comparison.
format Text
id pubmed-3024322
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30243222011-01-21 RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes Singh, Vipin Mishra, Rakesh K BMC Bioinformatics Research Article BACKGROUND -: The availability of multiple whole genome sequences has facilitated in silico identification of fixed and polymorphic transposable elements (TE). Whereas polymorphic loci serve as makers for phylogenetic and forensic analysis, fixed species-specific transposon insertions, when compared to orthologous loci in other closely related species, may give insights into their evolutionary significance. Besides, TE insertions are not isolated events and are frequently associated with subtle sequence changes concurrent with insertion or post insertion. These include duplication of target site, 3' and 5' flank transduction, deletion of the target locus, 5' truncation or partial deletion and inversion of the transposon, and post insertion changes like inter or intra element recombination, disruption etc. Although such changes have been studied independently, no automated platform to identify differential transposon insertions and the associated array of sequence changes in genomes of the same or closely related species is available till date. To this end, we have designed RISCI - 'Repeat Induced Sequence Changes Identifier' - a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify differential transposon insertions and associated sequence changes using specific alignment signatures, which may then be examined for their downstream effects. RESULTS -: We showcase the utility of RISCI by comparing full length and truncated L1HS and AluYa5 retrotransposons in the reference human genome with the chimpanzee genome and the alternate human assemblies (Celera and HuRef). Comparison of the reference human genome with alternate human assemblies using RISCI predicts 14 novel polymorphisms in full length L1HS, 24 in truncated L1HS and 140 novel polymorphisms in AluYa5 insertions, besides several insertion and post insertion changes. We present comparison with two previous studies to show that RISCI predictions are broadly in agreement with earlier reports. We also demonstrate its versatility by comparing various strains of Mycobacterium tuberculosis for IS 6100 insertion polymorphism. CONCLUSIONS -: RISCI combines comparative genomics with subtractive hybridization, inferring changes only when exclusive to one of the two genomes being compared. The pipeline is generic and may be applied to most transposons and to any two or more genomes sharing high sequence similarity. Such comparisons, when performed on a larger scale, may pull out a few critical events, which may have seeded the divergence between the two species under comparison. BioMed Central 2010-12-26 /pmc/articles/PMC3024322/ /pubmed/21184688 http://dx.doi.org/10.1186/1471-2105-11-609 Text en Copyright ©2010 Singh and Mishra; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Singh, Vipin
Mishra, Rakesh K
RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title_full RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title_fullStr RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title_full_unstemmed RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title_short RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
title_sort risci - repeat induced sequence changes identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3024322/
https://www.ncbi.nlm.nih.gov/pubmed/21184688
http://dx.doi.org/10.1186/1471-2105-11-609
work_keys_str_mv AT singhvipin riscirepeatinducedsequencechangesidentifieracomprehensivecomparativegenomicsbasedinsilicosubtractivehybridizationpipelinetoidentifyrepeatinducedsequencechangesincloselyrelatedgenomes
AT mishrarakeshk riscirepeatinducedsequencechangesidentifieracomprehensivecomparativegenomicsbasedinsilicosubtractivehybridizationpipelinetoidentifyrepeatinducedsequencechangesincloselyrelatedgenomes