12th Feb 2021 15 minutes read 6 Advanced SQL Queries for Analyzing Financial Data Tihomir Babic data analysis Table of Contents Introducing the Database 6 Examples of Advanced SQL Queries for Financial Analysis 1: Group Data by Year and Quarter 2: Calculate Running Totals 3: Calculate Running Averages 4: Time Series Analysis 5: Add Multiple Grouping Levels 6: Create a Revenue Report on a Yearly Level How Do You Like Using Advanced SQL Queries in Financial Data Analysis? Are you an advanced SQL user who’s new to finance? Do you want to learn how to use your knowledge to analyze financial data? The article will give you some guidance. You probably know that SQL is a potent tool for analyzing all sorts of data. The bigger and more complex data, the more beneficial SQL becomes. In certain situations, financial data can be very complicated; producing a sophisticated analysis requires sophisticated tools. SQL is one of these tools, and the better you know some advanced SQL practices, the easier it becomes to perform complex financial data analyses and create financial reports. I’m going to show you some examples of how to do that, drawn from my own experience. The SQL knowledge level required in this article – such as window functions, GROUP BY extensions, and recursive queries – is covered in the LearnSQL.com Advanced SQL track. Introducing the Database All six examples will use this database, which consists of five tables: country card_type customer card_number card_transaction We’ll imagine that this is the database of a credit card processing company and you’re their go-to employee when it comes to analyzing financial data. Now, let’s get into the details of each table. The table country has the following attributes: id – The country’s ID and the primary key (PK) of this table. country_name – The name of the country. The next table is card_type, with these attributes: id – The card type’s ID; the primary key (PK). card_type_name – The card type’s name. The table customer consists of the columns: id – The customer’s ID; the primary key (PK). NIN – The customer’s national identification number. first_name – The customer’s first name. last_name – The customer’s last name. country_id – The country’s ID; this is a foreign key (FK) that references the table country. Next comes the card_number table with the attributes: id – The card’s ID; the primary key (PK). card_number – The card number. customer_id – The customer’s ID; a foreign key (FK) that references the table customer. card_type_id – The card type’s ID; a foreign key (FK) that references the table card_type. The last is the card_transaction table, which has these columns: id – The transaction’s ID; the primary key (PK). date – The transaction date. amount – The transaction amount, in dollars. card_number_id – The card’s ID; a foreign key (FK) that references the table card_number. Now that you’re familiar with the database, let’s move on to the examples! 6 Examples of Advanced SQL Queries for Financial Analysis 1: Group Data by Year and Quarter Financial data usually needs to be grouped in specific time periods or time buckets. You’ll certainly need to group data into years and quarters if you’re creating financial reports. I’m going to show you how to do that. Learning this principle will allow you to group data on any other level (i.e. months, weeks, or days, depending on the data you have). In this example, I’ll group data by years and quarters and show the number of transactions. To achieve that, I’ll use the table card_transaction and this code: SELECT EXTRACT(YEAR FROM date) AS year, EXTRACT(QUARTER FROM date) AS quarter, COUNT(amount) AS number_of_transactions FROM card_transaction GROUP BY EXTRACT(YEAR FROM date), EXTRACT(QUARTER FROM date) ORDER BY EXTRACT(YEAR FROM date) ASC, EXTRACT(QUARTER FROM date); The code uses the function EXTRACT() to get the years and quarters. This is a standard SQL function, which will work in MySQL and PostgreSQL. However, if you’re using SQL Server, you’ll have to use a different function – I’ll explain which one in a moment. When you use the EXTRACT() function, you’re required to specify the period you want and the column this function will use to return the desired period. First, I want to get years from the date column. Then I want quarters, also from the date column. After that, I have to count the number of transactions, which I did using the COUNT() function. The data is grouped by the EXTRACT() functions, which represent years and quarters. Finally, the data is ordered by years and quarters ascendingly, since I want to see the first quarter of the first year at the top of the report. I’m sure you want to see the result: yearquarternumber_of_transactions 20191131 20192132 20193138 2019499 20201129 20202123 20203138 20204110 This report looks quite lovely, I must say! Now, if you’re using SQL Server, you’ll have to use the DATEPART() function instead of EXTRACT(). Here’s the first part of the code, just to make sure you understand. You follow the same principle in the remainder of the code: SELECT DATEPART(YEAR, date) AS year, DATEPART(QUARTER, date) AS quarter, COUNT(amount) AS number_of_transactions ... 2: Calculate Running Totals Your task now is to show the running totals for all transactions made in December 2020. The report has to be aggregated at the card type level. To create this report, you’ll need to introduce window functions and JOINs to the code. Before going any further, maybe you should check your knowledge of SQL. You can do that in the SQL Reporting track. This track will also teach you how to create basic SQL reports and perform revenue trend and customer behavior analyses. The code that will return the desired data is: SELECT DISTINCT (ct.date), cty.card_type_name, SUM (ct.amount) OVER (PARTITION BY cty.card_type_name ORDER BY ct.date ASC) AS transaction_running_total FROM card_transaction ct JOIN card_number cn ON ct.card_number_id = cn.id JOIN card_type cty ON cn.card_type_id = cty.id WHERE date > '2020-11-30' AND date <= '2020-12-31' ORDER BY cty.card_type_name ASC; The code first selects the specific date, as there can be multiple daily transactions by the same card type, even the same card number. Then follows the card_type_name column. The last column in the query is transaction_running_total. To calculate the running total, I’ve used a window function. First, the data in the column amount is summed. Then, by using the OVER() clause, I’ve specified that the running total should be calculated on a card type level; hence PARTITION BY cty.card_type_name. Finally, I want the running total to be calculated from the oldest to the newest date: ORDER BY date ASC. To get the data, I had to join three tables. The first join connects the card_transaction and card_number tables. The second join references the table card_type, which is how I get the card type name in my report. I’ve assigned aliases to all the joined tables; that way, I had to type less. The result is filtered using the WHERE clause, which will get me only December 2020 transactions. I’ve decided to order data by the card type name in an ascending sort. Running the code will result in the report shown below: datecard_type_nametransaction_running_total 2020-12-03diners-club-international8,988.79 2020-12-05diners-club-international23,403.95 2020-12-10diners-club-international38,396.95 2020-12-12diners-club-international51,525.07 2020-12-13diners-club-international61,643.00 2020-12-27diners-club-international89,522.36 2020-12-01maestro15,712.84 2020-12-03maestro31,737.02 2020-12-07maestro49,407.66 2020-12-08maestro60,526.36 2020-12-09maestro77,920.67 2020-12-12maestro92,465.81 2020-12-18maestro93,938.04 2020-12-19maestro110,541.99 2020-12-21maestro124,455.78 2020-12-23maestro127,626.83 2020-12-25maestro147,227.82 2020-12-26maestro170,589.49 2020-12-30maestro195,366.68 2020-12-01visa-electron16,881.70 2020-12-03visa-electron34,257.49 2020-12-13visa-electron51,982.98 2020-12-15visa-electron60,691.21 2020-12-22visa-electron80,816.65 2020-12-24visa-electron100,459.96 2020-12-29visa-electron104,595.89 2020-12-30visa-electron115,599.67 If you want to know more about cumulative totals in SQL, read this article to learn what running totals are and how to calculate them. 3: Calculate Running Averages The next example is creating a report that will show every transaction in December 2020 and its amount for Visa Electron cards. Additionally, you’ll show the average daily transaction amount using a moving average. Take a look at the code: SELECT ct.date, cty.card_type_name, SUM(ct.amount) AS daily_sum, AVG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS transaction_running_average FROM card_transaction ct JOIN card_number cn ON ct.card_number_id = cn.id JOIN card_type cty ON cn.card_type_id = cty.id WHERE ct.date > '2020-11-30' AND date <= '2020-12-31' AND cty.card_type_name = 'visa-electron' GROUP BY ct.date, cty.card_type_name ORDER BY cty.card_type_name; I’ve first selected the columns that will show the transaction’s date and the card type name. Then I’ve summed the daily transaction amounts, with the result shown in the column daily_sum. I’ve had to do that because there may be several transactions made daily by Visa Electron. Then I’ve used this daily sum of the transaction values to calculate its average. However, to get the moving average, I have to use the OVER() clause. I want the moving average to be calculated from the first to the last day of December 2020, so the operation is ordered by date ascending. In calculating the moving averages, I’ve decided to use a three-day moving average, defined by the following part of the code: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. This tells the code to use the current row and the previous two rows (three rows/three dates in total) to calculate the moving average. The tables are then joined in precisely the same way as in the previous example. There are two conditions in the WHERE clause; one defines the dates taken into account; the other defines the card type. This query will give you the following table: datecard_type_namedaily_sumtransaction_running_average 2020-12-01visa-electron16,881.7016,881.70 2020-12-03visa-electron17,375.7917,128.75 2020-12-13visa-electron17,725.4917,327.66 2020-12-15visa-electron8,708.2314,603.17 2020-12-22visa-electron20,125.4415,519.72 2020-12-24visa-electron19,643.3116,158.99 2020-12-29visa-electron4,135.9314,634.89 2020-12-30visa-electron11,003.7811,594.34 If you’re not familiar with moving averages, let me explain how they work. You already know this is a three-day moving average, which uses three rows to calculate the average. For the date ‘2020-12-01’, the average in the table above is the same as the daily sum. This is because the query has only the data in the current row to use – there are no preceding rows. For the date ‘2020-12-03’, the moving average is calculated as follows: (16,881.70 + 17,375.79) / 2 = 17,128.75. Be careful here! The moving average is not calculated in the following way: (16,881,70 + 17,375.79) / 3 = 11,419.16. That’s because there is only one previous row, or only two values to average. The moving average for the date ‘2020-12-13’ is calculated in the following way: (16,881.70 + 17,375.79 + 17,725.49) / 3 = 17,327.66. 4: Time Series Analysis A very common requirement when working with financial data is to analyze a time series (i.e. the difference between time periods, such as day-to-day or month-to-month). For example, suppose your boss has told you to create a report that will show the daily changes of transaction values made by customers from China during December 2020. The sales department in China is not happy with December 2020’s performance, so they want to analyze this month in detail to discover where there was a drop in transactions. To create such a report, you’ll again need a SQL window function. This time it’ll be the LAG() function, which will allow you to fetch data from the previous row. Here’s how you should do it: SELECT ct.date, SUM(ct.amount) AS daily_sum, (SUM(ct.amount)-LAG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC)) AS daily_difference, co.country_name FROM card_transaction ct JOIN card_number cn ON ct.card_number_id = cn.id JOIN customer cu ON cn.customer_id = cu.id JOIN country co ON cu.country_id = co.id WHERE ct.date > '2020-11-30' AND date <= '2020-12-31' AND co.country_name = 'China' GROUP BY ct.date, co.country_name; The query starts with a familiar process: it selects the date, then it calculates the daily sum of the transactions (in case there are several daily transactions from China). To calculate the daily difference, you need to deduct the previous day’s transaction sum from the current day’s sum. This calculation is done by this part of the query: (SUM(ct.amount)-LAG(SUM(ct.amount)) OVER (ORDER BY ct.date ASC)) AS daily_difference. The previous day’s transaction sum is returned by the LAG() function combined with the OVER() clause. The operation should be performed from the oldest to the newest date, which we specify in ORDER BY ct.date ASC. The last column in the query is the country_name column from the table country. Three tables are joined to get the required data; once again I’ve used aliases. There are two conditions in the WHERE clause, one defining the date and the other defining the country. Finally, the result is grouped by date and country name. Run the query to get this table: datedaily_sumdaily_differencecountry 2020-12-0116,881.70NULLChina 2020-12-0316,024.18-857.52China 2020-12-0717,670.641,646.46China 2020-12-082,856.29-14,814.35China 2020-12-0917,394.3114,538.02China 2020-12-1214,545.14-2,849.17China 2020-12-181,472.23-13,072.91China 2020-12-1910,821.769,349.53China 2020-12-2220,125.449,303.68China 2020-12-233,171.05-16,954.39China 2020-12-2419,643.3116,472.26China 2020-12-2519,600.99-42.32China 2020-12-2617,514.61-2,086.38China 2020-12-294,135.93-13,378.68China 2020-12-3026,393.1022,257.17China The first value is NULL because there are no rows before the first one, i.e. the first row doesn’t have a row it can be compared with. More on the LAG() function can be found in this article on calculating the difference between two rows. If you like what window functions offer, you can find out more examples of using window functions. If you want to practice window functions, try our SQL Window Functions course. 5: Add Multiple Grouping Levels The report I showed you in the first example is quite good, but that doesn’t mean it can’t be better. What I’m missing there is, for instance, a count subtotal for 2019 and 2020 and a grand total – something like pivot tables in Excel. This report can be easily enhanced by using the ROLLUP() function. Let’s add some subtotals and show all the values on a card type level, too. Here’s how to do it: SELECT EXTRACT(YEAR FROM ct.date) AS year, EXTRACT(QUARTER FROM ct.date) AS quarter, COUNT(ct.amount) AS number_of_transactions, cty.card_type_name FROM card_transaction ct JOIN card_number cn ON ct.card_number_id = cn.id JOIN card_type cty ON cn.card_type_id = cty.id GROUP BY ROLLUP(EXTRACT(YEAR FROM ct.date), EXTRACT(QUARTER FROM ct.date), cty.card_type_name); The first part of the code is taken directly from the first example. It uses the EXTRACT() function to categorize dates into years and quarters; then it counts the number of transactions using the COUNT() function. The last column the code will select is card_type_name from the table card_type. The data is selected from the tables card_transaction, card_number, and card_type, which are joined. Now comes the magical part – using ROLLUP(), which is an extension of the GROUP BY clause. After writing GROUP BY in the code, you simply use the ROLLUP() function to specify multiple grouping levels in your report. Since your task is to group data on a yearly, quarterly, and card type level, those columns should be in the ROLLUP(). Run the above code and you’ll get this nice report: yearquarternumber_of_transactionscard_type_name 2019149diners-club-international 2019146maestro 2019136visa-electron 20191131NULL 2019244diners-club-international 2019244maestro 2019244visa-electron 20192132NULL 2019344diners-club-international 2019356maestro 2019338visa-electron 20193138NULL 2019423diners-club-international 2019442maestro 2019434visa-electron 2019499NULL 2019NULL500NULL 2020139diners-club-international 2020159maestro 2020131visa-electron 20201129NULL 2020233diners-club-international 2020250maestro 2020240visa-electron 20202123NULL 2020341diners-club-international 2020357maestro 2020340visa-electron 20203138NULL 2020426diners-club-international 2020448maestro 2020436visa-electron 20204110NULL 2020NULL500NULL NULLNULL1000NULL Don’t let the NULL values scare you! Everything’s OK with the report; NULL values appear when there’s a subtotal, total, or grand total. Here’s the first part of the table: yearquarternumber_of_transactionscard_type_name 2019149diners-club-international 2019146maestro 2019136visa-electron 20191131NULL This table shows the number of transactions for the first quarter of 2019 by card type. The subtotal of all transactions in the first quarter is 49 + 46 +36 = 131. The logic is the same for the rest of the table. So, for instance, when you see a row like the one below, it means the total number of transactions for the year 2019 is 500: yearquarternumber_of_transactionscard_type_name 2019NULL500NULL ROLLUP() and other GROUP BY extensions are cool; I like them a lot! If you want to see just how useful they can be, then our GROUP BY Extensions course is right for you. Let’s now go to our last example of SQL financial data analysis queries. 6: Create a Revenue Report on a Yearly Level This will probably be the most complex query I’ll show you, but I think it’ll be worth it. Learning what this query does will allow you to create complex reports and perform statistical analyses, both of which are very often required in the financial industry. Your task is to create a report that will show the 2020 revenue, 2019 revenue, and the total revenue. All three categories have to be displayed on a customer level. Remember, you’re working for a credit card processing company, so revenue is the same as the transaction amount. You also need to assign categories to the customer; if the client brought $1,000,000 or more in total revenue, they are categorized as ‘Platinum’. If the total revenue is below $1,000,000, the customer should be classified as ‘Gold’. Here’s the query: SELECT cu.NIN, cu.first_name, cu.last_name, SUM(ct.amount) AS total_revenue_per_customer, CASE WHEN SUM(ct.amount) >= 1000000 THEN 'Platinum' WHEN SUM(ct.amount) < 1000000 THEN 'Gold' END AS customer_category, SUM(CASE WHEN ct.date >= '2019-01-01' AND ct.date < '2020-01-01' THEN ct.amount ELSE 0 END) AS revenue_2019, SUM(CASE WHEN ct.date >= '2020-01-01' AND ct.date < '2021-01-01' THEN ct.amount ELSE 0 END) AS revenue_2020 FROM card_transaction ct JOIN card_number cn ON ct.card_number_id = cn.id JOIN customer cu ON cn.customer_id = cu.id GROUP BY cu.NIN, cu.first_name, cu.last_name ORDER BY total_revenue_per_customer DESC; Let’s start with the easiest part: the query selects the NIN, first_name, and the last_name columns from the table customer. Then it sums the amounts, which will be the total revenue. After that follows the categorization of the customers using a CASE WHEN statement. The first WHEN assigns the category ‘Platinum’, while the other assigns the category ‘Gold’. Those statements are closed by END, and this new column will be named customer_category. Next, I had to specify conditions for the columns that will contain the numbers for 2019 revenue and 2020 revenue. To do that, I again used the CASE WHEN statement. For the 2019 revenue, the condition is that the dates should be equal to 2019-01-01 or above/newer, but below/older than 2020-01-01. This column is named revenue_2019. The same principle is applied when creating the column revenue_2020. To get the data, you need to join three tables: card_transaction, card_number, and customer. All three tables have aliases associated with them. At the end of the query, the data is grouped by the NIN, first_name, and last_name columns because you want the data on a customer level. Also, the result is ordered by the total revenue in descending order to look nicer. Here’s the sexy table that will make you cool amongst the data geeks: NINfirst_namelast_nametotal_revenue_per_customercustomer_categoryrevenue_2019revenue_2020 116-17-3179EvenSturt1,098,891.00Platinum602,075.43496,815.57 654-50-1963KorieHeims1,091,108.71Platinum536,126.43554,982.28 675-95-5293BrierDrillingcourt1,058,022.84Platinum461,799.16596,223.68 568-26-1849MargetteHenlon1,040,565.01Platinum525,759.81514,805.20 836-72-0333NikolaosKolakowski1,024,073.74Platinum512,434.92511,638.82 642-47-8286JudeKnivett994,881.03Gold534,644.07460,236.96 552-56-0279LilliLayson991,257.18Gold416,496.63574,760.55 405-45-9879NinnetteCockitt965,413.18Gold516,239.21449,173.97 487-13-1311TarranceAngrock946,170.32Gold472,225.09473,945.23 254-88-4824LeonSouter944,216.96Gold528,915.58415,301.38 How Do You Like Using Advanced SQL Queries in Financial Data Analysis? I intended to show you relatively advanced SQL practices that allow you to analyze financial data and create reports. This is not everything SQL can do, but maybe I got you interested in some advanced SQL features. All six examples were based on my experience as a data analyst. I’ve done many such reports in my life, and I would have been thrilled if I knew all those SQL possibilities when I started working; it would have been much easier for me. It’s never too late to learn advanced SQL, but it’s always better to start earlier. If you have the opportunity, enroll in the Advanced SQL track or see our guide on how to practice advanced SQL with our platform to discover more SQL practice opportunities at advanced level. Tags: data analysis