MKVToolNix and R | Ben Cunningham

MKVToolNix and R

Written by Ben Cunningham · on March 23, 2018

About a year ago, I started putting some effort into remuxing a lot of my videos (i.e., TV shows and movies). In general, I was interested in cleaning up subtitle names and removing unwanted audio tracks, so mkvtoolnix was my go-to. For movies, the GUI was sufficient, but converting entire series was a pain.

This seemed like something I could speed up with R, so I’ve developed the idioms below to help. Because every set of files has its own problems, building them out into a package doesn’t seem right. However, I do think there’s value in sharing my most common workflow.

library(tidyverse)

#' Fetch MKV track information
#'
#' The trio of functions below are useful for fetching track information
#' about a directory of MKV files. What I ultimately want is a tidy data
#' frame of tracks that I can filter, mutate on, etc.
mediainfo <- function(file, verbose = TRUE) {
  purrr::map_df(file, mediainfo_file, verbose = verbose)
}

mediainfo_file <- function(file, verbose) {
  
  if (verbose) message(sprintf('Getting MediaInfo "%s"...', file))
  
  cmd <- sprintf('mediainfo --Output=XML "%s"', file)
  raw <- paste(system(cmd, intern = TRUE), collapse = "\n")
  
  xml <- xml2::read_xml(raw)
  track <- rvest::xml_nodes(xml, "track")
  
  purrr::map_df(track, mediainfo_track, verbose = verbose) %>%
    tidyr::fill(Complete_name)
  
}

mediainfo_track <- function(x, verbose) {
  
  type <- xml2::xml_attr(x, "type")
  if (type == "Menu") {
    if (verbose) message(sprintf('MediaInfo track type "%s" skipped.', type))
    return(NULL)
  }
  
  child <- xml2::xml_children(x)
  y <- xml2::xml_text(child)
  
  stats::setNames(y, xml2::xml_name(child)) %>%
    t() %>%
    tibble::as_tibble() %>%
    dplyr::mutate(type = type)
  
}


#' Individual directory workflow
#' 
#' Everything below gets applied to an individual directory (i.e., one
#' series or season) and gets updated depending on the type of remuxing
#' to be done.
setwd("~/Videos/Show")
dir.create("Remuxed")

# Get files to process and track info
f <- list.files(".")
info <- map_df(f, mediainfo)

# Set track-level options
tracks_df <-
  info %>%
  filter(!(type == "Audio" & Language == "English")) %>%
  mutate(
    tid = as.integer(ID) - 1,
    track_name = ifelse(
      !is.na(Language),
      sprintf("--track-name %s:%s", tid, Language),
      NA
    ),
    audio_track = ifelse(type == "Audio", tid, NA)
  )

# Aggregate multi-track options
files_df <-
  tracks_df %>%
  group_by(Complete_name) %>%
  summarize(
    track_names =
      track_name[!is.na(track_name)] %>%
      paste(collapse = " "),
    audio_tracks =
      audio_track[!is.na(audio_track)] %>%
      paste(collapse = ",") %>%
      sprintf("--audio-tracks %s", .)
  )

# Set file name, episode title, and format mkvmerge command
cmd_df <-
  files_df %>%
  mutate(
    new_name = sprintf("Remuxed/%s", Complete_name),
    ep = str_extract(new_name, "S\\d{2}E\\d{2}"),
    title = sprintf("--title 'My Show (%s)'", ep),
    cmd = sprintf(
      "mkvmerge -o %s %s %s %s %s",
      new_name,
      title,
      track_names,
      audio_tracks,
      Complete_name
    )
  )

# Run each mkvmerge command
walk(cmd_df$cmd, system)