10th Feb 2021 11 minutes read Want to Learn Advanced PostgreSQL? Try This! Tihomir Babic learn sql PostgreSQL Table of Contents Do I Have a Choice? Which DB Programming Language is The Most Common? Why Learn PostgreSQL? It’s Popular Its Use is Trending Upward It’s Completely Free It’s Community-Driven Learn Using the Advanced SQL in PostgreSQL Track What Does It Contain? PostgreSQL Window Functions Recursive Queries in PostgreSQL 2 Examples of Advanced PostgreSQL PostgreSQL Window Functions Example PostgreSQL CTE in Practice Want to Learn Basic and Advanced PostgreSQL? Why and how should you learn advanced PostgreSQL? I’ll give you some tips that will help you achieve your goals. Keeping up with technology is hard; anyone who has foolishly tried to do so can confirm this (me, for example!). Most of the time, it just doesn't work. And to be brutally honest, keeping up with all the technologies all the time is pointless. You shouldn’t know everything about everything; it’s better to know as much as you need within the technology you use or want to use. Defining your need can be hard in a world with so many options. And, with SQL, there are many options, like the level (advanced, intermediate, fundamental) or dialect (PostgreSQL, MySQL, etc.). Spoiler alert! Learning advanced PostgreSQL is one of the best decisions you can make. Becoming an expert in it will make you better at what you do and may help you find a much better job. Do I Have a Choice? Yes, you do have a choice; that is the problem! There are literally hundreds of Database Management Systems (DBMS) used worldwide today: PostgreSQL, MS SQL Server, MySQL, Oracle, IBM Db2, MongoDB, Microsoft Access, Elasticsearch – just to name a few. One of the usual methods of choosing DBMS is looking at what other people or businesses are doing. It at least narrows down the choices. In the best-case scenario, the logic “so many people are doing it for a reason” will result in making a good decision. Which DB Programming Language is The Most Common? When looking at available database management systems (DBMSs), your first question should be ‘How will I communicate with this?’. Not all DBMSs use the same programming language, so you should choose one that uses a language you already know (or want to learn). If you’re not sure, look at a popularity ranking. You’ll notice that databases that use SQL repeatedly appear at the top of the list. For instance, according to Statista, the most popular DBMSs are: Oracle MySQL Microsoft SQL Server PostgreSQL All of them use SQL to query (get information from) the database. According to the principle of popularity, it would be beneficial to know SQL. If your (desired) job has anything to do with data, it will most probably require SQL knowledge. And if you intend to pursue a career that requires SQL knowledge, here are some job interview questions (and answers) that you might want to know if you decide to change your job. By learning SQL, you’ll be able to work in all four of those DBMSs. I’m speaking from experience; I started learning SQL with Microsoft SQL Server and I’m able to work in PostgreSQL without any problems. As with any language, if you know the standard, you’ll be able to understand (more or less) any dialect. However, every SQL dialect is specific and offers different possibilities and functions that other dialects don’t. Maybe you’re asking why I’m recommending PostgreSQL when I myself learned on SQL Server? The answer is simple; the company where I worked used SQL Server, so they sent me to a SQL Server DBA course. That can be another criterion; see what your company or other companies are using and learn that dialect. That being said, you won’t make a mistake learning any of those four SQL dialects, but there are several reasons why PostgreSQL might be the better choice among good choices. Why Learn PostgreSQL? It’s Popular All four above-mentioned SQL dialects are popular, and knowledge of one dialect can be easily applied to other dialects. However, according to the Stack Overflow survey, if you choose PostgreSQL, you’ll choose the second-most-popular SQL dialect. (Read more about it in The Most Popular Databases in 2020.) Source: Stack Overflow If you want to be in good company, you can also have a look at some of the biggest and most famous companies that use PostgreSQL. Its Use is Trending Upward Some other surveys, like DB-Engines, rank PostgreSQL as the fourth-most-popular DBMS. However, learning something like SQL is an investment, and an investment is made with the future benefits in mind. Thus, it’s always good to take potential popularity into account. In this department, learning PostgreSQL would be a wise choice. If you take a look at the graph, it’s the only DBMS using SQL that has an almost constant, and relatively steep, upward trend since 2013. Source: DB-Engines It’s Completely Free PostgreSQL is an open-source DBMS. This means using PostgreSQL is entirely free, as in all the features are available without charge. You don’t get a trial version, and you don’t have to put up with some stripped-down edition of the software. You just download it, install it, and you’re able to use every single feature. (The installation is not a big deal; on Windows machines, you can install it in about 5 minutes.) To see how PostgreSQL compares with several other open-source databases, read this article on SQL in the open-source world. It’s Community-Driven The above features would be enough for many companies to use PostgreSQL. But it has one more neat trick: as open-source software, you can customize it by developing plugins (or using plugins developed by others ). Knowledge is shared freely, which makes a PostgreSQL community very vibrant. Learn Using the Advanced SQL in PostgreSQL Track If you’re serious about learning PostgreSQL, I recommend you look at the Advanced SQL in PostgreSQL track. It’s primarily designed for users who already know SQL and want to get to a more advanced level. Even though this learning track is for more advanced users, I’m suggesting that beginners look at it too. Why? Because when you see what PostgreSQL offers, you may feel more motivated to master the basics of SQL. It will remind you why you started learning SQL in the first place. What Does It Contain? The advanced PostgreSQL learning track consists of two courses: Window Functions in PostgreSQL Recursive Queries in PostgreSQL PostgreSQL Window Functions As the name says, you’ll learn about window functions, which are useful in creating reports with deltas, running totals, and ranked data. In short, this is when the real data analysis starts. More specifically, you’ll learn what an OVER() clause is and how to use it with aggregate functions like COUNT() or in combination with the WHERE clause. After learning that, you’ll get to know more advanced window function uses by learning how to use PARTITION BY with the OVER() clause. You’ll also learn ranking window functions and how to define the window frame. The course will then take you to the analytics functions and teach you how to use PARTITION BY with ORDER BY. Finally, you’ll learn how to use the window function in the various parts of the query. Recursive Queries in PostgreSQL The second part of this path deals with recursive queries. And to learn recursive queries, you first have to learn common table expressions (CTEs), which are also called WITH clauses. CTEs are similar to subqueries, but they allow you to write your code faster and make it easier to read. In this part, you’ll learn CTE syntax. You’ll also learn how to use multiple CTEs and how CTEs fare in comparison with subqueries. Next, you’ll move on to nested CTEs (i.e. a CTE within a CTE). Since this is a course about recursive queries, the next step is to show you how to create recursive CTEs and process the trees and graphs using them. Finally, you’ll learn how to use CTEs with data-modifying statements, such as INSERT, DELETE, and UPDATE. 2 Examples of Advanced PostgreSQL I bet you want to see the above concepts for yourself, don’t you? Then let’s look at a window function and a CTE. It’s okay if you don’t understand these examples; I’ll explain them as we go. Hopefully, it will whet your appetite for advanced SQL. PostgreSQL Window Functions Example Let’s imagine you’re working for a company named Groovy, Inc. The database you have at your disposal is a table named groovyinc_revenue. This database contains the data on the company's revenue in the last five years. It has the following columns: id – The quarter’s ID and the table’s primary key (PK). quarter – The quarter’s name. revenue – The revenue amount, in dollars. Now that you have the data you need, you could use windows functions in PostgreSQL to create a nice report. As an example, let’s demonstrate a report that will show the running totals and differences in revenue between two quarters (i.e. quarterly deltas). Here’s how to achieve it: SELECT quarter, revenue, SUM(revenue) OVER (ORDER BY quarter ASC) AS running_total, revenue - LAG(revenue) OVER (ORDER BY quarter ASC) AS quarterly_delta FROM groovyinc_revenue; The first part of the code is not complicated if you know SQL basics; the code simply selects the columns quarter and revenue from the database. Then comes my first window function, which is SUM(). You’re probably familiar with the SUM() function alone. But when used with the OVER() clause, it becomes a window function. So what does it do? SQL window functions in general allow you to aggregate the data, but unlike regular aggregate functions, they don’t group the query result in a single row. SUM() as a window function allows us to calculate the running total ( i.e. the sum of all preceding rows plus the current row). In this code, I first sum the revenue by using the SUM() function. By using the OVER() clause, I’m defining it as a window function. And by adding ORDER BY in ascending order, I’m telling it to perform the operation from the oldest quarter (1Q2016) to the newest quarter (4Q2020). In plain language, this part of the code says: sum the values in the column revenue over the entire table and do that in ascending order by quarters. The other window function in the above code calculates the quarterly delta, i.e. the difference in revenue between quarters. Mathematically, it is done by deducting the previous quarter from the actual one. In the code, I’ve done that by using the LAG() function. The LAG() function allows you to get data from the previous row(s). First, I’ve taken the values from the column revenue, which is the actual quarter revenue. Then the LAG() function takes the previous quarter’s revenue and does that quarter by quarter in ascending order. When you run the code, you get quite an informative report: quarterrevenuerunning_totalquarterly_delta 1Q20161,256,748.321,256,748.32NULL 1Q20171,374,441.322,631,189.64117,693.00 1Q20182,252,211.224,883,400.86877,769.90 1Q20193,657,412.328,540,813.181,405,201.10 1Q2020889,642.559,430,455.73-2,767,769.77 2Q20161,836,457.6511,266,913.38946,815.10 2Q20171,125,568.6112,392,481.99-710,889.04 2Q20181,866,357.9414,258,839.93740,789.33 2Q2019332,115.6314,590,955.56-1,534,242.31 2Q2020654,221.3915,245,176.95322,105.76 3Q20161,023,457.9816,268,634.93369,236.59 3Q20173,268,745.2619,537,380.192,245,287.28 3Q20181,025,874.9220,563,255.11-2,242,870.34 3Q20192,841,369.7523,404,624.861,815,494.83 3Q2020987,411.2224,392,036.08-1,853,958.53 4Q20162,147,855.3126,539,891.391,160,444.09 4Q2017987,441.9927,527,333.38-1,160,413.32 4Q2018123,425.4127,650,758.79-864,016.58 4Q20193,214,442.6630,865,201.453,091,017.25 4Q2020421,556.3331,286,757.78-2,792,886.33 There is a NULL value as delta in the first row; this is because there is no row that precedes the first row. The first row is, well, the first row and “nothing compares to it”, to paraphrase Prince. PostgreSQL CTE in Practice You’re an employee of the very famous non-existing company SensyLab. The database you’re using has the sensylab_sales table, which contains the following columns: id – The sales ID number and the table’s primary key (PK). product – The name of the product. date – The sale date. sales – The sale amount in dollars. This table contains sales amounts for every day and for every product. How would you create a report that shows the average sales by product, the total average sales, and the difference between these two values? Take a look at this example with a CTE: WITH average_by_product AS ( SELECT product, AVG(sales) AS avg_sales_by_product FROM sensylab_sales GROUP BY product ), total_average AS ( SELECT avg(sales) avg_total_sales FROM sensylab_sales ) SELECT *, (avg_sales_by_product - avg_total_sales) AS difference FROM average_by_product, total_average; CTEs start with the WITH keyword. After that follows the name of the CTE, which is average_by_product in this case. The CTE name should be followed by AS, and after that you simply write your SELECT statement inside the parentheses. In this case, the SELECT statement selects the column product from the table sensylab_sales, and then calculates the average sales value. The result is grouped by the column product, which in turn will show the average sales value by product. When you write the next CTE, you don’t need to write WITH again; just separate the CTEs by a comma and start the second CTE by its name – in this case, total_average. This is a simple CTE that calculates the total average sales value. Once you have the total average and average by product, the next part of the query is a SELECT statement where you’ll use the CTEs you created. First, I’ve selected all columns from both CTEs. You do that by treating the CTEs like tables; that’s why you can find average_by_product and total_average in the FROM clause. Besides all columns from those tables/CTEs, there is an additional one. This additional column contains the difference between the average sales by product and the total average. To calculate that, I’ve simply deducted the average sales by product from the total average sales, both of which were calculated by the CTE. Do you want to see the query’s result? I’m sure you do, and here it is: productavg_sales_by_productavg_total_salesdifference FancySchmancy12,225.5817,608.01-5,382.43 GigaFlex35,264.7517,608.0117,656.74 PowerSnap5,333.7017,608.01-12,274.31 Want to Learn Basic and Advanced PostgreSQL? In this article, I’ve explained why learning SQL is a good idea and listed a few reasons why PostgreSQL may be your preferred dialect. If you want to learn PostgreSQL, the LearnSQL tracks are the best solution (in my opinion). These great interactive courses that give you theory and the opportunity to test it in practice. Since the Advanced SQL in PostgreSQL is, well, advanced, maybe you’re wondering how you get to a basic level in the first place. The most logical choice is to see what the SQL Basics in PostgreSQL can offer you. Finally, having PostgreSQL on your computer is very advisable; that way you can practice and get to know the software’s features. Installing it is really not a big deal; just follow this article to install it in five minutes. Tags: learn sql PostgreSQL