MRNA: Inferring mRNA from Protein
For positive integers and , modulo (written in shorthand) is the remainder when is divided by . For example, because .
Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that and are congruent modulo if ; in this case, we use the notation .
Two useful facts in modular arithmetic are that if and , then and . To check your understanding of these rules, you may wish to verify these relationships for , , , , and .
As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.
Given: A protein string of length at most 1000 aa.
Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don’t neglect the importance of the stop codon in protein translation.)
library(dplyr) library(readr) f <- "mrna.txt" cod_f <- "rna_codon_table.csv" cod <- read_csv(cod_f)$acid %>% table(prot = .) %>% as_data_frame() prot <- readLines(f) %>% strsplit(split = "") %>% unlist() df <- data_frame(prot = c(prot, "Stop")) %>% left_join(cod, by = "prot") n <- Reduce(function(x, y) (x * y) %% 1E6, df$n) cat(n)