Structure of the space of taboo-free sequences

Structure of the space of taboo-free sequences

Abstract

Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition sequence can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as Embedded Image and any allowed string as a taboo-free string. We consider the graph Embedded Image whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals 1. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids deleterious taboos. We describe the construction of the vertex set of Embedded Image. Then we state conditions under which Embedded Image and its suffix subgraphs are connected. Moreover, we provide a simple algorithm that can determine, for an arbitrary Embedded Image, if all these graphs are connected. We concluded that bacterial taboo-free Hamming graphs are nearly always connected, although 4 properly chosen taboos are enough to disconnect one of its suffix subgraphs.

Grafik Top
Authors
  • Manuel, Cassius
  • von Haeseler, Arndt
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Bioinformatics and Computational Biology
Journal or Publication Title
bioRxiv Cold Spring Harbor Laboratory
ISSN
none
Publisher
bioRxiv Cold Spring Harbor Laboratory
Place of Publication
New York
Number
824847
Date
30 October 2020
Export
Grafik Top