Introduction
Recursion in SQL is a lesser known technique, this technique is just like we expect as it would work in any developing language such as C, Javascript, …, it allows us to create a query that will be able to recursively get and process data to find a result. Think of these examples as possible use cases:
creating a calendar
finding the dates of easter
finding the shortest travel route through a country
Counting days
and many many more. Also please note that in this article I will make use of the Oracle DB the keywords used here are different then in MSSQL but the techniques are the same.
Keywords
Before we can start with learning how to write recursive queries, we first need to know the keywords that we can use.
(START WITH) CONNECT BY
The definition is: (START WITH condition_1) CONNECT BY condition_2 where the START WITH clause is optional.
START WITH specifies rows that are the root(s) of the hierarchical query.
CONNECT BY defines the relationship between the parent rows and the child rows of the hierarchy. it must include the keyword PRIOR when referring to the parent row.
We then have some special cases of the CONNECT BY keyword:
CONNECT_BY_PATH: This will merge the results together of a CONNECT BY with the given delimiter. Example: connect_by_path(country, ‘;’)
CONNECT_BY_ROOT: Unary operator only valid in hierarchical queries, this returns the column value using data from the root row when a column is qualified with this operator.
CONNECT_BY_ISLEAF: pseudocolumn that returns 1 when the current row is a leaf of the tree defined by the CONNECT_BY condition, else it returns 0
CONNECT_BY_ISCYCLE: pseudocolumn that returns 1 if the current row has a child which is also its ancestor, else it returns 0 (Note: You have to use the NOCYCLE parameter of the CONNECT_BY clause to use this)
level
level is a pseudocolumn in Oracle that can be used in 2 different ways:
Use it as the keyword for identifying the hierarchy level in a numeric format. 1 = root row, 2 = child of a root, …
The second way is to use it as a ‘n’ row generator, this will increment level for each time the query is looped.
DUAL
Not a table needed in recursion, but used in this article. We use this so we can create data without the need of an actual table.
Basic queries
Creating a simple counter
the purpose of this simple exercise is to get to know the structure and the base keywords of writing recursive queries. That’s why it is always useful to start with one of the most basic queries out there.
We want to be able to create a counter recursively that will count from 0 to 10 by just using SQL. The result would then be 11 rows stating 0 to 10.
To write this query we need to think of how recursion works properly. This works by writing a SELECT clause that will select data, if we want to select data from our recursive part then we need to use the keyword level this keyword can as explained before be used as a ‘n’ row generator.
Then after we created the SELECT we will say: “Run this query as long as it is smaller then 12” (12 because 11 - 1 = 10, which is the last value we need). We do this by writing CONNECT BY level <= 11. This will now loop the query as long as level is not 11, it will then increment level every time the query is looped.
Query
SELECT level - 1 FROM dual CONNECT BY level <= 11;
Finding all remaining days of this month
Almost the same as the previous query, we select the current date, we add the level column to it and we substract with 1 so that we can get level as 0 (since we want today to).
Then we say, as long as this is smaller or equal to the last day of the month, loop.
Query
SELECT sysdate + level - 1 FROM dual CONNECT BY EXTRACT(DAY FROM sysdate) + level - 1 <= EXTRACT(DAY FROM LAST_DAY(sysdate))
Counting every sunday of the month for the year 2014
To show how we construct this query, I will create every step in detail.
First we will write a query that can show us the first day of the year 2014
SELECT TO_DATE('01/01/2014', 'DD/MM/YYYY') FROM dual;
Secondly, we will need to say: Loop this query while it is not the last day of the year (which is always the 31st of December). So we use a recursive query here that will add (level - 1) to set level to 0 to the 1st of January:
SELECT TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1) FROM dual CONNECT BY TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1) <= LAST_DAY(TO_DATE('31/12/2014', 'DD/MM/YYYY'));
On the end, we group by the year and the month and we perform a count on the day where the day is a Sunday (so to_char(date, ‘d’) = 7:
SELECT 2014 year , EXTRACT(MONTH FROM TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1)) month , COUNT(CASE WHEN TO_CHAR(TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1), 'd') = 7 THEN 1 END) sundays FROM dual CONNECT BY TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1) <= LAST_DAY(TO_DATE('31/12/2014', 'DD/MM/YYYY')) GROUP BY 2014, EXTRACT(MONTH FROM TO_DATE('01/01/2014', 'DD/MM/YYYY') + (level - 1)) ORDER BY month ASC;
START WITH, PRIOR and SYS_CONNECT_BY_PATH queries
Merging all synonyms from a country into one column delimited by ;
Let’s say we are given this as a begin:
WITH x AS ( SELECT regios.name country , synoniemen.name , row_number() OVER(PARTITION BY eid ORDER BY dup) amount , count(1) OVER(PARTITION BY eid) total FROM synonyms JOIN regios ON eid = cid AND parent = 'EUR' ) SELECT country, ltrim(sys_connect_by_path(name, ';'), ';') names FROM x
Because we are using SYS_CONNECT_BY_PATH we will need to use CONNECT BY. The thing we want to do is merge every column of the same country with each other.
So in our CONNECT BY we will say that the previous country is equal to the current country and the amount of synonyms is 1 less of the current amount. This results into: CONNECT BY PRIOR country = country AND PRIOR amount = amount - 1
We also say that the amount needs to start with 1, since this is our parent. So we type: START WITH amount = 1
And last but not least, we want just the row that has everything merged together, so we say: WHERE level = total.
Query
WITH x AS ( SELECT regios.name country , synoniemen.name , row_number() OVER(PARTITION BY eid ORDER BY dup) amount , count(1) OVER(PARTITION BY eid) total FROM synonyms JOIN regios ON eid = cid AND parent = 'EUR' ) SELECT country, ltrim(sys_connect_by_path(name, ';'), ';') names FROM x WHERE level = total START WITH amount = 1 CONNECT BY PRIOR country = country and PRIOR amount = amount - 1;
Hierarchical Queries
Selecting the leaf elements from belgium
We start with the parent is Belgium, and then we loop for every parent is the previous hasc. Then when CONNECT_BY_ISLEAF=1 we display it.
Query
SELECT name, niveau, level - 1 FROM regios WHERE CONNECT_BY_ISLEAF = 1 START WITH parent = 'BE' CONNECT BY parent = PRIOR hasc ORDER BY parent, name;
Select every city that falls under the City Ghent
For this we will again use a recursive query, we use START WITH name = ‘Ghent’ and we will connect by parent = PRIOR hasc again.
Query
SELECT hasc, name, niveau, level FROM regios START WITH name = 'Ghent' CONNECT BY parent = PRIOR hasc;
Upward recursion, select every entity above where hasc = ‘BE’
We do exactly the same as going down, we only set the PRIOR different.
Query
SELECT SELECT hasc, niveau, level, connect_by_isleaf FROM regios START WITH hasc = 'BE' CONNECT BY PRIOR parent = hasc;
Select the cities under hasc ‘BE.OV’ and add identation
For this we will use the function LPAD, LPAD is going to add a padding from the left with the given length and the string1 where it will add to. We use a ‘.’ and a length of 10 so that this shows a proper identation where needed.
Then the rest is again the same. We however only apply identation to where the level > 0
Query
SELECT LPAD('.', 10 * (level - 1), '.') || name || ' (' || hasc || ')' FROM regios START WITH parent = 'BE.OV' CONNECT BY PRIOR hasc = parent;
CONNECT_BY_ROOT and SYS_CONNECT_BY_PATH functions
Creating a recursive query to create a path of the hierarchy.
Let’s create a query that will select every city of Belgium and connect the hierarchy in Belgium with a /. This all on 1 line. We then will check if the ROOT entity of this hierarchy is Belgium by using the CONNECT_BY_ROOT function.
To do this we will select the name, the level and the connect_by_root as columns and we will also create the parents column. This last column is special because we will use the sys_connect_by_path function to connect every name together with a delimiter that we give it.
Then we filter only the leaves and we start with the name = Belgie as our first element. We then use CONNECT BY to recursively loop every result as long as the parent is the previous hasc.
Query
SELECT name , ltrim(sys_connect_by_path(name, '/'), '/') parents , level , connect_by_root name
Comments (0)
Sign in to post comments.