Data science blog | Ben Cunningham

Data science blog

Ingesting Excel Sheets Fast
September 2, 2017
I helped instruct an intro to R course at work last fall. One of the most common questions I’ve received from attendees since that time is, “How do I read a bunch of Excel files really fast?” More precisely what this usually means is, “How do I append a bunch...
I recently started using the Private Internet Access VPN on Ubuntu. I followed the instructions on the PIA site, but noticed all of my connection attempts were timing out. Eventually, I ran across this great 30 second read from Eric von Foerster and realized the issue had to do with...
College football is almost here. In a fury of anticipation, I spent my weekend reimagining the information in the awesome “Video Boards of the Power 5 Conferences” infographics made by @JediASU (see the ACC, B1G, Big XII, Pac-12, and SEC images linked). I can’t take credit for the idea (the...
Remapped!
July 23, 2017
I like my ThinkPad. I bought a X220 a little under a year ago to replace my hulking T530 and it’s been great. I’ve especially enjoyed ditching Lenovo’s new chiclet console for a “regular” keyboard. But there are a few things about the old layout that I miss. First up:...
Recursive CTEs
May 1, 2017
Something I’ve run into a lot lately are databases with transactional data that has been compressed into some relatively small number of observations. Most often, a start and end date is supplied for each observation, along with some implied frequency. Take for example the following: DROP TABLE Transactions; CREATE TABLE...
Today I found an old email from Richard Stallman. Yes, Richard Stallman, president of the Free Software Foundation and all things idiosyncratic in computing. I had asked him about Death Grips and their latest album, NO LOVE DEEP WEB. And yeah, Death Grips, presidents of Third Worlds Records and all...
Since the release of tidytext, I’ve been a lot more interested in working with semi-unstructured text. Still, most of the language I have been interested in analyzing comes from my favorite movies and television shows. Subtitles, widely available online for free, provide an obvious bridge to that language data. When...
Visualizing Dead Presidents
January 20, 2017
I’ve heard quite a bit recently about President Trump being the oldest individual to take the executive oath. I’m not a big history buff, but all the hubbub did get me wondering about the timeline and ages of past presidents. I’ve also been looking for a chance to recreate the...

2016

What's the Risk?
November 26, 2016
I’ve been using R for a little over two years now. One of the most redeeming things about learning it has always been the little eureka moments along the way where I’ve used it to put some received wisdom to the test. Sometimes just stemming from daydreams of my undergrad...
Natural language processing, especially opinion mining, is (apparently) hot right now. It isn’t something that I’ve paid attention to in the past, but over the last six months, I’ve noticed more and more people talking about it. More specifically, I’ve heard more and more non-technical people asking about it (which...
The Setup
July 17, 2016
I’m finally settling into my new home in New York, but I just don’t feel my best unless I have a project to do. So in lieu of a more data-informed blog post, here’s my take on The Setup. What hardware do you use? I bought a Lenovo ThinkPad T530...
Almost two months ago today, I submitted my final undergraduate paper, Targeted Direct Mailings: An Application of Elastic Net Regularization to Marketing Strategy. In it, I presented a quantitative approach to marketing strategy for the Paralyzed Veterans of America group (PVA) based on the well-known KDD Cup 1998 dataset. While...
This morning marks the end of my foolhardy, week-long adventure that was migrating this site to Jekyll. If you are unfamiliar with the software, Jekyll is basically just a static site generator — but a very popular one, at that. Instead of using databases, it takes content (usually written in...
It’s spring break in Iowa City, but things aren’t any less collegiate around here — March Madness play-ins kick off in just under an hour and I’m pulling out all the stops for feckless sports analytics. From irresponsible win margin regressions to lousy Kaggle classifier entries, I’ve done my best...
Choropleth Maps with Leaflet
February 4, 2016
Like any loyal Hadleyverse user, I love ggplot2. Paired with ggmap, visualizing spatial data is just as much a breeze as traditional plotting with the package. Wrapping all this with a library like animation adds another neat layer of perspective. But short of developing a Shiny project, there’s not a...
It’s been a slow week. Frozen inside my apartment by another sub-zero Midwest winter, I’ve had a lot of time the past few days to rewatch some of my favorite television. Er, actually a lot of television — since Monday, I’ve marathoned through 88 (and counting) episodes of It’s Always...
The Fastest Bar Crawl
January 15, 2016
Starting tonight, my radio co-host is reliving a timeless adventure from her father’s university days — hitting up every bar in Iowa City in one weekend. In honor of the feat, and motived by this post by my professor Sam Burer, I decided to try to find the path that...