SPLC: RNA Splicing | Ben Cunningham

SPLC: RNA Splicing

Problem by Rosalind · on July 23, 2012

After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.

Given: A DNA string (of length at most 1 kbp) and a collection of substrings of acting as introns. All strings are given in FASTA format.

Return: A protein string resulting from transcribing and translating the exons of . (Note: Only one solution will exist for the dataset provided.)

Sample Dataset

>Rosalind_10
ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG
>Rosalind_12
ATCGGTCGAA
>Rosalind_15
ATCGGTCGAGCGTGT

Sample Output

MVYIADKQHVASREAYGHMFKVCA

R

library(dplyr)
library(readr)
library(seqinr)

f <- "splc.txt"
cod_f <- "rna_codon_table.csv"

cod <- read_csv(cod_f)

dna <-
  data_frame(
    raw = read.fasta(f, as.string = TRUE),
    id = names(raw),
    s = toupper(raw)
  )

s <- Reduce(function(x, y) gsub(y, "", x), dna$s)

prot <-
  data_frame(
    codon =
      gsub("T", "U", s) %>%
      gsub("(.{3})", "\\1 ", .) %>%
      strsplit(" ") %>%
      unlist()
  ) %>%
  left_join(cod, by = "codon") %>%
  filter(acid != "Stop")
  
cat(prot$acid, sep = "")
MVYIADKQHVASREAYGHMFKVCA