12th Feb 2021 15 minutes read

6 Advanced SQL Queries for Analyzing Financial Data

data analysis

Table of Contents

Introducing the Database
6 Examples of Advanced SQL Queries for Financial Analysis
How Do You Like Using Advanced SQL Queries in Financial Data Analysis?

Are you an advanced SQL user who’s new to finance? Do you want to learn how to use your knowledge to analyze financial data? The article will give you some guidance.

You probably know that SQL is a potent tool for analyzing all sorts of data. The bigger and more complex data, the more beneficial SQL becomes. In certain situations, financial data can be very complicated; producing a sophisticated analysis requires sophisticated tools. SQL is one of these tools, and the better you know some advanced SQL practices, the easier it becomes to perform complex financial data analyses and create financial reports. I’m going to show you some examples of how to do that, drawn from my own experience.

The SQL knowledge level required in this article – such as window functions, GROUP BY extensions, and recursive queries – is covered in the LearnSQL.com Advanced SQL track.

Introducing the Database

All six examples will use this database, which consists of five tables:

country
card_type
customer
card_number
card_transaction

We’ll imagine that this is the database of a credit card processing company and you’re their go-to employee when it comes to analyzing financial data.

Now, let’s get into the details of each table.

The table country has the following attributes:

id – The country’s ID and the primary key (PK) of this table.
country_name – The name of the country.

The next table is card_type, with these attributes:

id – The card type’s ID; the primary key (PK).
card_type_name – The card type’s name.

The table customer consists of the columns:

id – The customer’s ID; the primary key (PK).
NIN – The customer’s national identification number.
first_name – The customer’s first name.
last_name – The customer’s last name.
country_id – The country’s ID; this is a foreign key (FK) that references the table country.

Next comes the card_number table with the attributes:

id – The card’s ID; the primary key (PK).
card_number – The card number.
customer_id – The customer’s ID; a foreign key (FK) that references the table customer.
card_type_id – The card type’s ID; a foreign key (FK) that references the table card_type.

The last is the card_transaction table, which has these columns:

id – The transaction’s ID; the primary key (PK).
date – The transaction date.
amount – The transaction amount, in dollars.
card_number_id – The card’s ID; a foreign key (FK) that references the table card_number.

Now that you’re familiar with the database, let’s move on to the examples!

6 Examples of Advanced SQL Queries for Financial Analysis

1: Group Data by Year and Quarter

Financial data usually needs to be grouped in specific time periods or time buckets. You’ll certainly need to group data into years and quarters if you’re creating financial reports. I’m going to show you how to do that. Learning this principle will allow you to group data on any other level (i.e. months, weeks, or days, depending on the data you have).

In this example, I’ll group data by years and quarters and show the number of transactions. To achieve that, I’ll use the table card_transaction and this code:

SELECT	EXTRACT(YEAR FROM date) AS year,
		EXTRACT(QUARTER FROM date) AS quarter,
		COUNT(amount) AS number_of_transactions
FROM card_transaction
GROUP BY EXTRACT(YEAR FROM date), EXTRACT(QUARTER FROM date)
ORDER BY EXTRACT(YEAR FROM date) ASC, EXTRACT(QUARTER FROM date);

The code uses the function EXTRACT() to get the years and quarters. This is a standard SQL function, which will work in MySQL and PostgreSQL. However, if you’re using SQL Server, you’ll have to use a different function – I’ll explain which one in a moment.

When you use the EXTRACT() function, you’re required to specify the period you want and the column this function will use to return the desired period. First, I want to get years from the date column. Then I want quarters, also from the date column. After that, I have to count the number of transactions, which I did using the COUNT() function. The data is grouped by the EXTRACT() functions, which represent years and quarters. Finally, the data is ordered by years and quarters ascendingly, since I want to see the first quarter of the first year at the top of the report. I’m sure you want to see the result:

year	quarter	number_of_transactions
2019	1	131
2019	2	132
2019	3	138
2019	4	99
2020	1	129
2020	2	123
2020	3	138
2020	4	110

This report looks quite lovely, I must say!

Now, if you’re using SQL Server, you’ll have to use the DATEPART() function instead of EXTRACT(). Here’s the first part of the code, just to make sure you understand. You follow the same principle in the remainder of the code:

SELECT	DATEPART(YEAR, date) AS year,
		DATEPART(QUARTER, date) AS quarter,
		COUNT(amount) AS number_of_transactions
...

2: Calculate Running Totals

Your task now is to show the running totals for all transactions made in December 2020. The report has to be aggregated at the card type level. To create this report, you’ll need to introduce window functions and JOINs to the code.

Before going any further, maybe you should check your knowledge of SQL. You can do that in the SQL Reporting track. This track will also teach you how to create basic SQL reports and perform revenue trend and customer behavior analyses.

The code that will return the desired data is:

SELECT	DISTINCT (ct.date),
		cty.card_type_name,
		SUM (ct.amount) OVER (PARTITION BY cty.card_type_name ORDER BY ct.date ASC) AS transaction_running_total
FROM card_transaction ct 
JOIN card_number cn ON ct.card_number_id = cn.id 
JOIN card_type cty ON cn.card_type_id = cty.id
WHERE date > '2020-11-30' AND date <= '2020-12-31'
ORDER BY cty.card_type_name ASC;

The code first selects the specific date, as there can be multiple daily transactions by the same card type, even the same card number. Then follows the card_type_name column. The last column in the query is transaction_running_total. To calculate the running total, I’ve used a window function. First, the data in the column amount is summed. Then, by using the OVER() clause, I’ve specified that the running total should be calculated on a card type level; hence PARTITION BY cty.card_type_name. Finally, I want the running total to be calculated from the oldest to the newest date: ORDER BY date ASC.

To get the data, I had to join three tables. The first join connects the card_transaction and card_number tables. The second join references the table card_type, which is how I get the card type name in my report. I’ve assigned aliases to all the joined tables; that way, I had to type less. The result is filtered using the WHERE clause, which will get me only December 2020 transactions. I’ve decided to order data by the card type name in an ascending sort.

Running the code will result in the report shown below:

date	card_type_name	transaction_running_total
2020-12-03	diners-club-international	8,988.79
2020-12-05	diners-club-international	23,403.95
2020-12-10	diners-club-international	38,396.95
2020-12-12	diners-club-international	51,525.07
2020-12-13	diners-club-international	61,643.00
2020-12-27	diners-club-international	89,522.36
2020-12-01	maestro	15,712.84
2020-12-03	maestro	31,737.02
2020-12-07	maestro	49,407.66
2020-12-08	maestro	60,526.36
2020-12-09	maestro	77,920.67
2020-12-12	maestro	92,465.81
2020-12-18	maestro	93,938.04
2020-12-19	maestro	110,541.99
2020-12-21	maestro	124,455.78
2020-12-23	maestro	127,626.83
2020-12-25	maestro	147,227.82
2020-12-26	maestro	170,589.49
2020-12-30	maestro	195,366.68
2020-12-01	visa-electron	16,881.70
2020-12-03	visa-electron	34,257.49
2020-12-13	visa-electron	51,982.98
2020-12-15	visa-electron	60,691.21
2020-12-22	visa-electron	80,816.65
2020-12-24	visa-electron	100,459.96
2020-12-29	visa-electron	104,595.89
2020-12-30	visa-electron	115,599.67

If you want to know more about cumulative totals in SQL, read this article to learn what running totals are and how to calculate them.

3: Calculate Running Averages

The next example is creating a report that will show every transaction in December 2020 and its amount for Visa Electron cards. Additionally, you’ll show the average daily transaction amount using a moving average. Take a look at the code:

SELECT	ct.date,
		cty.card_type_name,
		SUM(ct.amount) AS daily_sum,
		AVG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS transaction_running_average
FROM card_transaction ct 
JOIN card_number cn ON ct.card_number_id = cn.id 
JOIN card_type cty ON cn.card_type_id = cty.id
WHERE ct.date > '2020-11-30' AND date <= '2020-12-31' 
AND cty.card_type_name = 'visa-electron'
GROUP BY ct.date, cty.card_type_name
ORDER BY cty.card_type_name;

I’ve first selected the columns that will show the transaction’s date and the card type name. Then I’ve summed the daily transaction amounts, with the result shown in the column daily_sum. I’ve had to do that because there may be several transactions made daily by Visa Electron. Then I’ve used this daily sum of the transaction values to calculate its average. However, to get the moving average, I have to use the OVER() clause. I want the moving average to be calculated from the first to the last day of December 2020, so the operation is ordered by date ascending.

In calculating the moving averages, I’ve decided to use a three-day moving average, defined by the following part of the code: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. This tells the code to use the current row and the previous two rows (three rows/three dates in total) to calculate the moving average.

The tables are then joined in precisely the same way as in the previous example. There are two conditions in the WHERE clause; one defines the dates taken into account; the other defines the card type.

This query will give you the following table:

date	card_type_name	daily_sum	transaction_running_average
2020-12-01	visa-electron	16,881.70	16,881.70
2020-12-03	visa-electron	17,375.79	17,128.75
2020-12-13	visa-electron	17,725.49	17,327.66
2020-12-15	visa-electron	8,708.23	14,603.17
2020-12-22	visa-electron	20,125.44	15,519.72
2020-12-24	visa-electron	19,643.31	16,158.99
2020-12-29	visa-electron	4,135.93	14,634.89
2020-12-30	visa-electron	11,003.78	11,594.34

If you’re not familiar with moving averages, let me explain how they work. You already know this is a three-day moving average, which uses three rows to calculate the average.

For the date ‘2020-12-01’, the average in the table above is the same as the daily sum. This is because the query has only the data in the current row to use – there are no preceding rows.

For the date ‘2020-12-03’, the moving average is calculated as follows: (16,881.70 + 17,375.79) / 2 = 17,128.75. Be careful here! The moving average is not calculated in the following way: (16,881,70 + 17,375.79) / 3 = 11,419.16. That’s because there is only one previous row, or only two values to average.

The moving average for the date ‘2020-12-13’ is calculated in the following way: (16,881.70 + 17,375.79 + 17,725.49) / 3 = 17,327.66.

4: Time Series Analysis

A very common requirement when working with financial data is to analyze a time series (i.e. the difference between time periods, such as day-to-day or month-to-month). For example, suppose your boss has told you to create a report that will show the daily changes of transaction values made by customers from China during December 2020. The sales department in China is not happy with December 2020’s performance, so they want to analyze this month in detail to discover where there was a drop in transactions.

To create such a report, you’ll again need a SQL window function. This time it’ll be the LAG() function, which will allow you to fetch data from the previous row. Here’s how you should do it:

SELECT	ct.date,
		SUM(ct.amount) AS daily_sum,
		(SUM(ct.amount)-LAG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC)) 
AS daily_difference,
		co.country_name
FROM card_transaction ct 
JOIN card_number cn ON ct.card_number_id = cn.id 
JOIN customer cu ON cn.customer_id = cu.id 
JOIN country co ON cu.country_id = co.id
WHERE ct.date > '2020-11-30' AND date <= '2020-12-31'
AND co.country_name = 'China'
GROUP BY ct.date, co.country_name;

The query starts with a familiar process: it selects the date, then it calculates the daily sum of the transactions (in case there are several daily transactions from China). To calculate the daily difference, you need to deduct the previous day’s transaction sum from the current day’s sum. This calculation is done by this part of the query:

(SUM(ct.amount)-LAG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC)) AS daily_difference.

The previous day’s transaction sum is returned by the LAG() function combined with the OVER() clause. The operation should be performed from the oldest to the newest date, which we specify in ORDER BY ct.date ASC. The last column in the query is the country_name column from the table country.

Three tables are joined to get the required data; once again I’ve used aliases. There are two conditions in the WHERE clause, one defining the date and the other defining the country. Finally, the result is grouped by date and country name.

Run the query to get this table:

date	daily_sum	daily_difference	country
2020-12-01	16,881.70	NULL	China
2020-12-03	16,024.18	-857.52	China
2020-12-07	17,670.64	1,646.46	China
2020-12-08	2,856.29	-14,814.35	China
2020-12-09	17,394.31	14,538.02	China
2020-12-12	14,545.14	-2,849.17	China
2020-12-18	1,472.23	-13,072.91	China
2020-12-19	10,821.76	9,349.53	China
2020-12-22	20,125.44	9,303.68	China
2020-12-23	3,171.05	-16,954.39	China
2020-12-24	19,643.31	16,472.26	China
2020-12-25	19,600.99	-42.32	China
2020-12-26	17,514.61	-2,086.38	China
2020-12-29	4,135.93	-13,378.68	China
2020-12-30	26,393.10	22,257.17	China

The first value is NULL because there are no rows before the first one, i.e. the first row doesn’t have a row it can be compared with.

More on the LAG() function can be found in this article on calculating the difference between two rows.

If you like what window functions offer, you can find out more examples of using window functions. If you want to practice window functions, try our SQL Window Functions course.

5: Add Multiple Grouping Levels

The report I showed you in the first example is quite good, but that doesn’t mean it can’t be better. What I’m missing there is, for instance, a count subtotal for 2019 and 2020 and a grand total – something like pivot tables in Excel. This report can be easily enhanced by using the ROLLUP() function. Let’s add some subtotals and show all the values on a card type level, too. Here’s how to do it:

SELECT	EXTRACT(YEAR FROM ct.date) AS year,
		EXTRACT(QUARTER FROM ct.date) AS quarter,
		COUNT(ct.amount) AS number_of_transactions,
		cty.card_type_name
FROM card_transaction ct 
JOIN card_number cn ON ct.card_number_id = cn.id 
JOIN card_type cty ON cn.card_type_id = cty.id
GROUP BY ROLLUP(EXTRACT(YEAR FROM ct.date), EXTRACT(QUARTER FROM ct.date), cty.card_type_name);

The first part of the code is taken directly from the first example. It uses the EXTRACT() function to categorize dates into years and quarters; then it counts the number of transactions using the COUNT() function. The last column the code will select is card_type_name from the table card_type.

The data is selected from the tables card_transaction, card_number, and card_type, which are joined. Now comes the magical part – using ROLLUP(), which is an extension of the GROUP BY clause. After writing GROUP BY in the code, you simply use the ROLLUP() function to specify multiple grouping levels in your report. Since your task is to group data on a yearly, quarterly, and card type level, those columns should be in the ROLLUP(). Run the above code and you’ll get this nice report:

year	quarter	number_of_transactions	card_type_name
2019	1	49	diners-club-international
2019	1	46	maestro
2019	1	36	visa-electron
2019	1	131	NULL
2019	2	44	diners-club-international
2019	2	44	maestro
2019	2	44	visa-electron
2019	2	132	NULL
2019	3	44	diners-club-international
2019	3	56	maestro
2019	3	38	visa-electron
2019	3	138	NULL
2019	4	23	diners-club-international
2019	4	42	maestro
2019	4	34	visa-electron
2019	4	99	NULL
2019	NULL	500	NULL
2020	1	39	diners-club-international
2020	1	59	maestro
2020	1	31	visa-electron
2020	1	129	NULL
2020	2	33	diners-club-international
2020	2	50	maestro
2020	2	40	visa-electron
2020	2	123	NULL
2020	3	41	diners-club-international
2020	3	57	maestro
2020	3	40	visa-electron
2020	3	138	NULL
2020	4	26	diners-club-international
2020	4	48	maestro
2020	4	36	visa-electron
2020	4	110	NULL
2020	NULL	500	NULL
NULL	NULL	1000	NULL

Don’t let the NULL values scare you! Everything’s OK with the report; NULL values appear when there’s a subtotal, total, or grand total. Here’s the first part of the table:

year	quarter	number_of_transactions	card_type_name
2019	1	49	diners-club-international
2019	1	46	maestro
2019	1	36	visa-electron
2019	1	131	NULL

This table shows the number of transactions for the first quarter of 2019 by card type. The subtotal of all transactions in the first quarter is 49 + 46 +36 = 131. The logic is the same for the rest of the table. So, for instance, when you see a row like the one below, it means the total number of transactions for the year 2019 is 500:

year	quarter	number_of_transactions	card_type_name
2019	NULL	500	NULL

ROLLUP() and other GROUP BY extensions are cool; I like them a lot! If you want to see just how useful they can be, then our GROUP BY Extensions course is right for you.

Let’s now go to our last example of SQL financial data analysis queries.

6: Create a Revenue Report on a Yearly Level

This will probably be the most complex query I’ll show you, but I think it’ll be worth it. Learning what this query does will allow you to create complex reports and perform statistical analyses, both of which are very often required in the financial industry.

Your task is to create a report that will show the 2020 revenue, 2019 revenue, and the total revenue. All three categories have to be displayed on a customer level. Remember, you’re working for a credit card processing company, so revenue is the same as the transaction amount. You also need to assign categories to the customer; if the client brought $1,000,000 or more in total revenue, they are categorized as ‘Platinum’. If the total revenue is below $1,000,000, the customer should be classified as ‘Gold’. Here’s the query:

SELECT	cu.NIN,
		cu.first_name,
		cu.last_name,
		SUM(ct.amount) AS total_revenue_per_customer,
		CASE
			WHEN SUM(ct.amount) >= 1000000 THEN 'Platinum'
			WHEN SUM(ct.amount) < 1000000 THEN 'Gold'
		END AS customer_category,
		SUM(CASE WHEN ct.date >= '2019-01-01' AND ct.date < '2020-01-01' THEN ct.amount ELSE 0 END) AS revenue_2019,
		SUM(CASE WHEN ct.date >= '2020-01-01' AND ct.date < '2021-01-01' THEN ct.amount ELSE 0 END) AS revenue_2020
FROM card_transaction ct 
JOIN card_number cn ON ct.card_number_id = cn.id 
JOIN customer cu ON cn.customer_id = cu.id
GROUP BY cu.NIN, cu.first_name, cu.last_name
ORDER BY total_revenue_per_customer DESC;

Let’s start with the easiest part: the query selects the NIN, first_name, and the last_name columns from the table customer. Then it sums the amounts, which will be the total revenue. After that follows the categorization of the customers using a CASE WHEN statement. The first WHEN assigns the category ‘Platinum’, while the other assigns the category ‘Gold’. Those statements are closed by END, and this new column will be named customer_category.

Next, I had to specify conditions for the columns that will contain the numbers for 2019 revenue and 2020 revenue. To do that, I again used the CASE WHEN statement. For the 2019 revenue, the condition is that the dates should be equal to 2019-01-01 or above/newer, but below/older than 2020-01-01. This column is named revenue_2019.

The same principle is applied when creating the column revenue_2020.

To get the data, you need to join three tables: card_transaction, card_number, and customer. All three tables have aliases associated with them.

At the end of the query, the data is grouped by the NIN, first_name, and last_name columns because you want the data on a customer level. Also, the result is ordered by the total revenue in descending order to look nicer.

Here’s the sexy table that will make you cool amongst the data geeks:

NIN	first_name	last_name	total_revenue_per_customer	customer_category	revenue_2019	revenue_2020
116-17-3179	Even	Sturt	1,098,891.00	Platinum	602,075.43	496,815.57
654-50-1963	Korie	Heims	1,091,108.71	Platinum	536,126.43	554,982.28
675-95-5293	Brier	Drillingcourt	1,058,022.84	Platinum	461,799.16	596,223.68
568-26-1849	Margette	Henlon	1,040,565.01	Platinum	525,759.81	514,805.20
836-72-0333	Nikolaos	Kolakowski	1,024,073.74	Platinum	512,434.92	511,638.82
642-47-8286	Jude	Knivett	994,881.03	Gold	534,644.07	460,236.96
552-56-0279	Lilli	Layson	991,257.18	Gold	416,496.63	574,760.55
405-45-9879	Ninnette	Cockitt	965,413.18	Gold	516,239.21	449,173.97
487-13-1311	Tarrance	Angrock	946,170.32	Gold	472,225.09	473,945.23
254-88-4824	Leon	Souter	944,216.96	Gold	528,915.58	415,301.38

How Do You Like Using Advanced SQL Queries in Financial Data Analysis?

I intended to show you relatively advanced SQL practices that allow you to analyze financial data and create reports. This is not everything SQL can do, but maybe I got you interested in some advanced SQL features. All six examples were based on my experience as a data analyst. I’ve done many such reports in my life, and I would have been thrilled if I knew all those SQL possibilities when I started working; it would have been much easier for me.

It’s never too late to learn advanced SQL, but it’s always better to start earlier. If you have the opportunity, enroll in the Advanced SQL track or see our guide on how to practice advanced SQL with our platform to discover more SQL practice opportunities at advanced level.

Tags:

data analysis