TRAN: Transitions and Transversions | Ben Cunningham

TRAN: Transitions and Transversions

Problem by Rosalind · on December 4, 2012

For DNA strings $s_1$ and $s_2$ having the same length, their transition/transversion ratio $R(s_1, s_2)$ is the ratio of the total number of transitions to the total number of transversions, where symbol substitutions are inferred from mismatched corresponding symbols as when calculating Hamming distance (see “Counting Point Mutations”).

Given: Two DNA strings $s_1$ and $s_2$ of equal length (at most 1 kbp).

Return: The transition/transversion ratio $R(s_1, s_2)$.

Sample Dataset

>Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGT


Sample Output

1.21428571429


R

library(dplyr)
library(purrr)
library(seqinr)

f <- "tran.txt"

dna <- data_frame(
raw = read.fasta(f, as.string = TRUE),
dna =
toupper(raw) %>%
strsplit(split = "")
)

s <-
data_frame(
a = unlist(dna$dna[1]), b = unlist(dna$dna[2]),
t = map2_lgl(a, b, function(x, y) {
all(c(x, y) %in% c("A", "G")) || all(c(x, y) %in% c("C", "T"))
})
) %>%
filter(a != b)

cat(sum(s$t) / (nrow(s) - sum(s$t)))

1.214286