Cargando…
Frequency Analysis Techniques for Identification of Viral Genetic Data
Environmental metagenomic samples and samples obtained as an attempt to identify a pathogen associated with the emergence of a novel infectious disease are important sources of novel microorganisms. The low costs and high throughput of sequencing technologies are expected to allow for the genetic ma...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
American Society of Microbiology
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2932508/ https://www.ncbi.nlm.nih.gov/pubmed/20824103 http://dx.doi.org/10.1128/mBio.00156-10 |
_version_ | 1782186084675878912 |
---|---|
author | Trifonov, Vladimir Rabadan, Raul |
author_facet | Trifonov, Vladimir Rabadan, Raul |
author_sort | Trifonov, Vladimir |
collection | PubMed |
description | Environmental metagenomic samples and samples obtained as an attempt to identify a pathogen associated with the emergence of a novel infectious disease are important sources of novel microorganisms. The low costs and high throughput of sequencing technologies are expected to allow for the genetic material in those samples to be sequenced and the genomes of the novel microorganisms to be identified by alignment to those in a database of known genomes. Yet, for various biological and technical reasons, such alignment might not always be possible. We investigate a frequency analysis technique which on one hand allows for the identification of genetic material without relying on alignment and on the other hand makes possible the discovery of nonoverlapping contigs from the same organism. The technique is based on obtaining signatures of the genetic data and defining a distance/similarity measure between signatures. More precisely, the signatures of the genetic data are the frequencies of k-mers occurring in them, with k being a natural number. We considered an entropy-based distance between signatures, similar to the Kullback-Leibler distance in information theory, and investigated its ability to categorize negative-sense single-stranded RNA (ssRNA) viral genetic data. Our conclusion is that in this viral context, the technique provides a viable way of discovering genetic relationships without relying on alignment. We envision that our approach will be applicable to other microbial genetic contexts, e.g., other types of viruses, and will be an important tool in the discovery of novel microorganisms. |
format | Text |
id | pubmed-2932508 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | American Society of Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-29325082010-09-03 Frequency Analysis Techniques for Identification of Viral Genetic Data Trifonov, Vladimir Rabadan, Raul mBio Research Article Environmental metagenomic samples and samples obtained as an attempt to identify a pathogen associated with the emergence of a novel infectious disease are important sources of novel microorganisms. The low costs and high throughput of sequencing technologies are expected to allow for the genetic material in those samples to be sequenced and the genomes of the novel microorganisms to be identified by alignment to those in a database of known genomes. Yet, for various biological and technical reasons, such alignment might not always be possible. We investigate a frequency analysis technique which on one hand allows for the identification of genetic material without relying on alignment and on the other hand makes possible the discovery of nonoverlapping contigs from the same organism. The technique is based on obtaining signatures of the genetic data and defining a distance/similarity measure between signatures. More precisely, the signatures of the genetic data are the frequencies of k-mers occurring in them, with k being a natural number. We considered an entropy-based distance between signatures, similar to the Kullback-Leibler distance in information theory, and investigated its ability to categorize negative-sense single-stranded RNA (ssRNA) viral genetic data. Our conclusion is that in this viral context, the technique provides a viable way of discovering genetic relationships without relying on alignment. We envision that our approach will be applicable to other microbial genetic contexts, e.g., other types of viruses, and will be an important tool in the discovery of novel microorganisms. American Society of Microbiology 2010-08-24 /pmc/articles/PMC2932508/ /pubmed/20824103 http://dx.doi.org/10.1128/mBio.00156-10 Text en Copyright © 2010 Trifonov and Rabadan. http://creativecommons.org/licenses/by-nc-sa/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/) , which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Trifonov, Vladimir Rabadan, Raul Frequency Analysis Techniques for Identification of Viral Genetic Data |
title | Frequency Analysis Techniques for Identification of Viral Genetic Data |
title_full | Frequency Analysis Techniques for Identification of Viral Genetic Data |
title_fullStr | Frequency Analysis Techniques for Identification of Viral Genetic Data |
title_full_unstemmed | Frequency Analysis Techniques for Identification of Viral Genetic Data |
title_short | Frequency Analysis Techniques for Identification of Viral Genetic Data |
title_sort | frequency analysis techniques for identification of viral genetic data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2932508/ https://www.ncbi.nlm.nih.gov/pubmed/20824103 http://dx.doi.org/10.1128/mBio.00156-10 |
work_keys_str_mv | AT trifonovvladimir frequencyanalysistechniquesforidentificationofviralgeneticdata AT rabadanraul frequencyanalysistechniquesforidentificationofviralgeneticdata |