Cargando…
A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4375884/ https://www.ncbi.nlm.nih.gov/pubmed/25880169 http://dx.doi.org/10.1186/s12859-015-0493-4 |
_version_ | 1782363647432982528 |
---|---|
author | Tanchotsrinon, Watcharaporn Lursinsap, Chidchanok Poovorawan, Yong |
author_facet | Tanchotsrinon, Watcharaporn Lursinsap, Chidchanok Poovorawan, Yong |
author_sort | Tanchotsrinon, Watcharaporn |
collection | PubMed |
description | BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. RESULTS: The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. CONCLUSIONS: Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0493-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4375884 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43758842015-03-28 A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition Tanchotsrinon, Watcharaporn Lursinsap, Chidchanok Poovorawan, Yong BMC Bioinformatics Methodology Article BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. RESULTS: The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. CONCLUSIONS: Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0493-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-05 /pmc/articles/PMC4375884/ /pubmed/25880169 http://dx.doi.org/10.1186/s12859-015-0493-4 Text en © Tanchotsrinon et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Tanchotsrinon, Watcharaporn Lursinsap, Chidchanok Poovorawan, Yong A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title_full | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title_fullStr | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title_full_unstemmed | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title_short | A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition |
title_sort | high performance prediction of hpv genotypes by chaos game representation and singular value decomposition |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4375884/ https://www.ncbi.nlm.nih.gov/pubmed/25880169 http://dx.doi.org/10.1186/s12859-015-0493-4 |
work_keys_str_mv | AT tanchotsrinonwatcharaporn ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition AT lursinsapchidchanok ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition AT poovorawanyong ahighperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition AT tanchotsrinonwatcharaporn highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition AT lursinsapchidchanok highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition AT poovorawanyong highperformancepredictionofhpvgenotypesbychaosgamerepresentationandsingularvaluedecomposition |