Cargando…

Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana

Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherent...

Descripción completa

Detalles Bibliográficos
Autores principales: Sangiovanni, Mara, Vigilante, Alessandra, Chiusano, Maria Luisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009786/
https://www.ncbi.nlm.nih.gov/pubmed/24833233
http://dx.doi.org/10.3390/biology2041465
_version_ 1782479804425043968
author Sangiovanni, Mara
Vigilante, Alessandra
Chiusano, Maria Luisa
author_facet Sangiovanni, Mara
Vigilante, Alessandra
Chiusano, Maria Luisa
author_sort Sangiovanni, Mara
collection PubMed
description Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community
format Online
Article
Text
id pubmed-4009786
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-40097862014-05-07 Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana Sangiovanni, Mara Vigilante, Alessandra Chiusano, Maria Luisa Biology (Basel) Article Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community MDPI 2013-12-09 /pmc/articles/PMC4009786/ /pubmed/24833233 http://dx.doi.org/10.3390/biology2041465 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Article
Sangiovanni, Mara
Vigilante, Alessandra
Chiusano, Maria Luisa
Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title_full Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title_fullStr Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title_full_unstemmed Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title_short Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana
title_sort exploiting a reference genome in terms of duplications: the network of paralogs and single copy genes in arabidopsis thaliana
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009786/
https://www.ncbi.nlm.nih.gov/pubmed/24833233
http://dx.doi.org/10.3390/biology2041465
work_keys_str_mv AT sangiovannimara exploitingareferencegenomeintermsofduplicationsthenetworkofparalogsandsinglecopygenesinarabidopsisthaliana
AT vigilantealessandra exploitingareferencegenomeintermsofduplicationsthenetworkofparalogsandsinglecopygenesinarabidopsisthaliana
AT chiusanomarialuisa exploitingareferencegenomeintermsofduplicationsthenetworkofparalogsandsinglecopygenesinarabidopsisthaliana