Structure of the space of taboo-free sequences

Structure of the space of taboo-free sequences

Abstract

Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition site can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as T and any allowed string as a taboo-free string. We consider the so-called Hamming graph Γn(T), whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals one. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids taboos. We describe the construction of the vertex set of Γn(T). Then we state conditions under which Γn(T) and its suffix subgraphs are connected. Moreover, we provide an algorithm that determines if all these graphs are connected for an arbitrary T. As an application of the algorithm, we show that about 87% of bacteria listed in REBASE have a taboo-set that induces connected taboo-free Hamming graphs, because they have less than four type II restriction enzymes. On the other hand, four properly chosen taboos are enough to disconnect one suffix subgraph, and consequently connectivity of taboo-free Hamming graphs could change depending on the composition of restriction sites.

Grafik Top
Authors
  • Manuel, Cassius
  • von Haeseler, Arndt
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Bioinformatics and Computational Biology
Journal or Publication Title
Journal of Mathematical Biology
ISSN
0303-6812
Publisher
Springer Nature
Place of Publication
Heidelberg
Page Range
pp. 1029-1057
Volume
81
Date
17 September 2020
Export
Grafik Top