Cargando…

Exploiting orthology and de novo transcriptome assembly to refine target sequence information

BACKGROUND: The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Söllner, Julia F., Leparc, Germán, Zwick, Matthias, Schönberger, Tanja, Hildebrandt, Tobias, Nieselt, Kay, Simon, Eric
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6533699/ https://www.ncbi.nlm.nih.gov/pubmed/31122257 http://dx.doi.org/10.1186/s12920-019-0524-5

_version_	1783421261723992064
author	Söllner, Julia F. Leparc, Germán Zwick, Matthias Schönberger, Tanja Hildebrandt, Tobias Nieselt, Kay Simon, Eric
author_facet	Söllner, Julia F. Leparc, Germán Zwick, Matthias Schönberger, Tanja Hildebrandt, Tobias Nieselt, Kay Simon, Eric
author_sort	Söllner, Julia F.
collection	PubMed
description	BACKGROUND: The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly. METHODS: To assess whether de novo transcriptome assemblies can be used to determine a protein’s sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits. We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits’ sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement. RESULTS: For the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey. CONCLUSIONS: In summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-019-0524-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6533699
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65336992019-05-29 Exploiting orthology and de novo transcriptome assembly to refine target sequence information Söllner, Julia F. Leparc, Germán Zwick, Matthias Schönberger, Tanja Hildebrandt, Tobias Nieselt, Kay Simon, Eric BMC Med Genomics Research Article BACKGROUND: The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly. METHODS: To assess whether de novo transcriptome assemblies can be used to determine a protein’s sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits. We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits’ sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement. RESULTS: For the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey. CONCLUSIONS: In summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-019-0524-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-23 /pmc/articles/PMC6533699/ /pubmed/31122257 http://dx.doi.org/10.1186/s12920-019-0524-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Söllner, Julia F. Leparc, Germán Zwick, Matthias Schönberger, Tanja Hildebrandt, Tobias Nieselt, Kay Simon, Eric Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_full	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_fullStr	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_full_unstemmed	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_short	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_sort	exploiting orthology and de novo transcriptome assembly to refine target sequence information
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6533699/ https://www.ncbi.nlm.nih.gov/pubmed/31122257 http://dx.doi.org/10.1186/s12920-019-0524-5
work_keys_str_mv	AT sollnerjuliaf exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT leparcgerman exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT zwickmatthias exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT schonbergertanja exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT hildebrandttobias exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT nieseltkay exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT simoneric exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation

Exploiting orthology and de novo transcriptome assembly to refine target sequence information

Ejemplares similares