Recursive CTEs | Ben Cunningham

Recursive CTEs

Written by Ben Cunningham · on May 1, 2017

Something I’ve run into a lot lately are databases with transactional data that has been compressed into some relatively small number of observations. Most often, a start and end date is supplied for each observation, along with some implied frequency. Take for example the following:

DROP TABLE Transactions;
CREATE TABLE Transactions (
  "Customer" VARCHAR(10),
  "Payment"  DECIMAL(3, 2),
  "Start"    DATE,
  "End"      DATE
INSERT INTO Transactions ("Customer", "Payment", "Start", "End") VALUES
('Ben',     9.99, '2017-01-01', '2017-04-30'),
('Shelley', 4.50, '2016-11-15', '2017-01-14');

Let’s assume these are monthly payments. Shelley then, in this example, makes a payment of $4.50 on the 15th of every month from November to December (but not in January).

Between my coworkers and Stack Overflow, I’ve seen some efficient ways to “unpack” this data into multiple lines with SQL. However, I’ve become a bit obsessed with recursive CTEs lately, so I think the following solution is just kind of cool:

SELECT MAX((DATE_PART('YEAR',  "End") - DATE_PART('YEAR',  "Start")) * 12 +
           (DATE_PART('MONTH', "End") - DATE_PART('MONTH', "Start")))       "Max Month"
  FROM Transactions

Months(i) AS (
SELECT 0         "i"
SELECT x."i" + 1 "i"
  FROM Months x
 CROSS JOIN MaxMonths m
 WHERE x."i" < m."Max Month"

SELECT t."Customer"                           "Customer",
       t."Payment"                            "Payment",
       t."Start" + INTERVAL '1 MONTH' * m."i" "Date"
  FROM Transactions t
  LEFT JOIN Months  m ON t."Start" + INTERVAL '1 MONTH' * m."i" <= t."End";
##   Customer Payment                Date
## 1      Ben    9.99 2017-01-01 00:00:00
## 2      Ben    9.99 2017-02-01 00:00:00
## 3      Ben    9.99 2017-03-01 00:00:00
## 4      Ben    9.99 2017-04-01 00:00:00
## 5  Shelley    4.50 2016-11-15 00:00:00
## 6  Shelley    4.50 2016-12-15 00:00:00

The first CTE calculates the maximum difference (in months) across all Start and End observations in the set, . The second CTE (the recursive part) returns all . Finally, the root query joins each to the Transactions table where . From here, it’s a trivial task then to add months to Start.

Another oddity that I’ve recently tried addressing with recursive CTEs is non-atomic columns. With a bit of planning, it’s not too tricky to split and spread a delimited value into multiple rows.

DROP TABLE Products;
  "Supplier" VARCHAR(10),
  "Items"    VARCHAR(50)
INSERT INTO Products ("Supplier", "Items") VALUES
('Apple',   'iPhone 6S;Macbook Air'),
('Samsung', 'Galaxy S8;Galaxy Note 7;Galaxy Mega');

To get Products into a more normal form, we just need to repeatedly extract all of the characters to the left of the delimiter and bind each result to the set.

WITH RECURSIVE SplitItems("Supplier", "Item", "Items") AS (
SELECT "Supplier"                                                         "Supplier",
       LEFT("Items", STRPOS("Items" || ';', ';') - 1)                     "Item",
       OVERLAY("Items" PLACING '' FROM 1 FOR STRPOS("Items" || ';', ';')) "Items"
  FROM Products
SELECT "Supplier"                                                         "Supplier",
       LEFT("Items", STRPOS("Items" || ';', ';') - 1)                     "Item",
       OVERLAY("Items" PLACING '' FROM 1 FOR STRPOS("Items" || ';', ';')) "Items"
  FROM SplitItems
 WHERE "Items" != ''

SELECT "Supplier",
  FROM SplitItems;

Here, the first element of each observation of Items, say , is extracted into the Item field and superficially popped off its stack (in this case, by overlaying a zero-length string in its place in the delimited list). That query result (everything before UNION ALL) is ultimately bound with the result of (i.e., the second half of the CTE).

##   Supplier          Item
## 1    Apple     iPhone 6S
## 2  Samsung     Galaxy S8
## 3    Apple   Macbook Air
## 4  Samsung Galaxy Note 7
## 5  Samsung   Galaxy Mega

Note that I wrote everything in this post for a PostgreSQL database, but the concepts are certainly not system-specific. Be sure to let me know if you have other amusing (or serious) daily uses for this kind of recursion in your SQL work.