# GRPH: Overlap Graphs

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail and head is represented by (but *not* by ). A directed loop is a directed edge of the form .

For a collection of strings and a positive integer , the overlap graph for the strings is a directed graph in which each string is represented by a node, and string is connected to string with a directed edge when there is a length suffix of that matches a length prefix of , as long as ; we demand to prevent directed loops in the overlap graph (although directed cycles may be present).

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

Return: The adjacency list corresponding to . You may return edges in any order.

## Sample Dataset

```
>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG
```

## Sample Output

```
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323
```

# R

```
library(dplyr)
library(seqinr)
library(stringr)
library(tidyr)
f <- "grph.txt"
raw <- unlist(read.fasta(f, as.string = TRUE))
dna <- data_frame(
id = names(raw),
x = toupper(raw),
beg = str_extract(x, "^.{3}"),
end = str_extract(x, ".{3}$")
)
adj <-
dna %>%
inner_join(dna, by = c("end" = "beg")) %>%
filter(id.x != id.y) %>%
mutate(out = paste(id.x, id.y))
cat(adj$out, sep = "\n")
```

```
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323
```