Cargando…

NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is r...

Descripción completa

Detalles Bibliográficos
Autores principales: Khelik, Ksenia, Lagesen, Karin, Sandve, Geir Kjetil, Rognes, Torbjørn, Nederbragt, Alexander Johan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508607/
https://www.ncbi.nlm.nih.gov/pubmed/28701187
http://dx.doi.org/10.1186/s12859-017-1748-z
_version_ 1783249901702873088
author Khelik, Ksenia
Lagesen, Karin
Sandve, Geir Kjetil
Rognes, Torbjørn
Nederbragt, Alexander Johan
author_facet Khelik, Ksenia
Lagesen, Karin
Sandve, Geir Kjetil
Rognes, Torbjørn
Nederbragt, Alexander Johan
author_sort Khelik, Ksenia
collection PubMed
description BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1748-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5508607
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55086072017-07-17 NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences Khelik, Ksenia Lagesen, Karin Sandve, Geir Kjetil Rognes, Torbjørn Nederbragt, Alexander Johan BMC Bioinformatics Software BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1748-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-12 /pmc/articles/PMC5508607/ /pubmed/28701187 http://dx.doi.org/10.1186/s12859-017-1748-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Khelik, Ksenia
Lagesen, Karin
Sandve, Geir Kjetil
Rognes, Torbjørn
Nederbragt, Alexander Johan
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_full NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_fullStr NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_full_unstemmed NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_short NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
title_sort nucdiff: in-depth characterization and annotation of differences between two sets of dna sequences
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508607/
https://www.ncbi.nlm.nih.gov/pubmed/28701187
http://dx.doi.org/10.1186/s12859-017-1748-z
work_keys_str_mv AT khelikksenia nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT lagesenkarin nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT sandvegeirkjetil nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT rognestorbjørn nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences
AT nederbragtalexanderjohan nucdiffindepthcharacterizationandannotationofdifferencesbetweentwosetsofdnasequences