SUBS: Finding a Motif in DNA | Ben Cunningham

SUBS: Finding a Motif in DNA

Problem by Rosalind · on July 1, 2012

Given two strings and , is a substring of if is contained as a contiguous collection of symbols in (as a result, must be no longer than ).

The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of ‘U’ in “AUGCUUCAGAAAGGUCUUACG” are 2, 5, 6, 15, 17, and 18). The symbol at position of is denoted by .

A substring of can be represented as , where and represent the starting and ending positions of the substring in ; for example, if = “AUGCUUCAGAAAGGUCUUACG”, then = “UGCU”.

The location of a substring is its beginning position ; note that will have multiple locations in if it occurs more than once as a substring of (see the Sample below).

Given: Two DNA strings and (each of length at most 1 kbp).

Return: All locations of as a substring of .

Sample Dataset

GATATATGCATATACTT
ATAT

Sample Output

2 4 10

R

library(magrittr)

f <- "subs.txt"

s <- readLines(f)[1]
t <- readLines(f)[2]

sprintf("(?=%s)", t) %>%
  gregexpr(s, perl = TRUE) %>%
  unlist() %>%
  cat(sep = " ")
2 4 10