De novo genome assembly for third generation sequencing data

The second-generation sequencing techniques opened doors to further research on a world scale,
because the cost of DNA sequencing droped significantly. However, the second-generation sequencing
technology has some drawbacks, mainly short read length. In 2017, the new devices that use real-
time sequencing started to be available. This approach, called "the third-generation sequencing"
reaches read length of 20kbp and error rate about 15%. As a consequence of this process new DNA
assemblers were developed.

In this article we propose an implementation of Overlap Graph-based de novo assembly algorithm
for third-generation sequencing data. The proposed method involves graph algorithms and dynamic
programming, optimized using a MinHash filter. The solution has been tested on both simulated
and real data of bacteria obtained from Oxford Nanopore MinION sequencer.

The algorithm is included in "OLC" module of the dnaasm de novo assembler. Dnaasm appli-
cation provides command line interface as well as web browser-based client. Source code as well as
a demo web application and a docker image are available at the dnaasm project web-page:

http://dnaasm.sourceforge.net.

Author: Mateusz Forc
Conference: Title