Cargando…
Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly
A combination of short‐insert paired‐ended and mate‐pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third‐generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate‐pair libr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9833964/ https://www.ncbi.nlm.nih.gov/pubmed/36644701 http://dx.doi.org/10.1002/ece3.9745 |
_version_ | 1784868355797155840 |
---|---|
author | Zhou, Tao Lu, Liang Li, Chenhong |
author_facet | Zhou, Tao Lu, Liang Li, Chenhong |
author_sort | Zhou, Tao |
collection | PubMed |
description | A combination of short‐insert paired‐ended and mate‐pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third‐generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate‐pair libraries and the third‐generation libraries require high‐molecular‐weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate‐pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes, was not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate‐pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate‐pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6%–46.1%) and increased the contiguity of assembled genomes (9.7%–70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we developed a pipeline of the optimized in silico method and compared it with another reference‐guided assembler, RagTag. We found that RagTag produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate‐pair method. This optimized in silico pipeline developed in this study should facilitate further studies on genomics, population genetics, and conservation of endangered species. |
format | Online Article Text |
id | pubmed-9833964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-98339642023-01-13 Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly Zhou, Tao Lu, Liang Li, Chenhong Ecol Evol Research Articles A combination of short‐insert paired‐ended and mate‐pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third‐generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate‐pair libraries and the third‐generation libraries require high‐molecular‐weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate‐pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes, was not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate‐pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate‐pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6%–46.1%) and increased the contiguity of assembled genomes (9.7%–70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we developed a pipeline of the optimized in silico method and compared it with another reference‐guided assembler, RagTag. We found that RagTag produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate‐pair method. This optimized in silico pipeline developed in this study should facilitate further studies on genomics, population genetics, and conservation of endangered species. John Wiley and Sons Inc. 2023-01-11 /pmc/articles/PMC9833964/ /pubmed/36644701 http://dx.doi.org/10.1002/ece3.9745 Text en © 2023 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Articles Zhou, Tao Lu, Liang Li, Chenhong Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title | Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title_full | Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title_fullStr | Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title_full_unstemmed | Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title_short | Optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
title_sort | optimization of the “in‐silico” mate‐pair method improves contiguity and accuracy of genome assembly |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9833964/ https://www.ncbi.nlm.nih.gov/pubmed/36644701 http://dx.doi.org/10.1002/ece3.9745 |
work_keys_str_mv | AT zhoutao optimizationoftheinsilicomatepairmethodimprovescontiguityandaccuracyofgenomeassembly AT luliang optimizationoftheinsilicomatepairmethodimprovescontiguityandaccuracyofgenomeassembly AT lichenhong optimizationoftheinsilicomatepairmethodimprovescontiguityandaccuracyofgenomeassembly |