Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs

Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs

Abstract

DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a convenient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation induced by reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, so finding a longest q-unique sequence is equivalent to finding an Euler tour and solved in linear time with respect to the output string length. For even q, self-complementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.

Grafik Top
Authors
  • D'Addario, Marianna
  • Kriege, Nils M.
  • Rahmann, Sven
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
German Conference on Bioinformatics (GCB)
Divisions
Data Mining and Machine Learning
Event Location
Jena, Germany
Event Type
Conference
Event Dates
20.-22.09.2012
Series Name
OASICS
ISSN/ISBN
978-3-939897-44-6
Publisher
Schloss Dagstuhl - Leibniz-Zentrum f\"r Informatik
Page Range
pp. 82-92
Date
20 September 2012
Official URL
https://doi.org/10.4230/OASIcs.GCB.2012.82
Export
Grafik Top