Cargando…

TreeRipper web application: towards a fully automated optical tree recognition software

BACKGROUND: Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20(th )century, the worldwide web now provides faster and automated ways of transferring...

Descripción completa

Detalles Bibliográficos
Autor principal: Hughes, Joseph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111373/
https://www.ncbi.nlm.nih.gov/pubmed/21599881
http://dx.doi.org/10.1186/1471-2105-12-178
_version_ 1782205614652391424
author Hughes, Joseph
author_facet Hughes, Joseph
author_sort Hughes, Joseph
collection PubMed
description BACKGROUND: Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20(th )century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21(st )century. RESULTS: TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/). The program accepts a range of input image formats (PNG, JPG/JPEG or GIF). The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR) is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves. CONCLUSIONS: Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3.
format Online
Article
Text
id pubmed-3111373
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31113732011-06-10 TreeRipper web application: towards a fully automated optical tree recognition software Hughes, Joseph BMC Bioinformatics Software BACKGROUND: Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20(th )century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21(st )century. RESULTS: TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/). The program accepts a range of input image formats (PNG, JPG/JPEG or GIF). The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR) is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves. CONCLUSIONS: Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3. BioMed Central 2011-05-20 /pmc/articles/PMC3111373/ /pubmed/21599881 http://dx.doi.org/10.1186/1471-2105-12-178 Text en Copyright © 2011 Hughes; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Hughes, Joseph
TreeRipper web application: towards a fully automated optical tree recognition software
title TreeRipper web application: towards a fully automated optical tree recognition software
title_full TreeRipper web application: towards a fully automated optical tree recognition software
title_fullStr TreeRipper web application: towards a fully automated optical tree recognition software
title_full_unstemmed TreeRipper web application: towards a fully automated optical tree recognition software
title_short TreeRipper web application: towards a fully automated optical tree recognition software
title_sort treeripper web application: towards a fully automated optical tree recognition software
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111373/
https://www.ncbi.nlm.nih.gov/pubmed/21599881
http://dx.doi.org/10.1186/1471-2105-12-178
work_keys_str_mv AT hughesjoseph treeripperwebapplicationtowardsafullyautomatedopticaltreerecognitionsoftware