14th Jun 2022 10 minutes read These SQL Queries Will Help You With Data Analysis Himanshu Kathuria data analysis Table of Contents What Is SQL? Basic SQL Constructs and Queries 1. SELECT and FROM 2. WHERE 3. ORDER BY 4. CASE 5. GROUP BY 6. JOIN Next Steps for Learning SQL Data Analysis Need to streamline your everyday data analysis tasks? This article explains how basic SQL queries can help. If you are planning a corporate career or any field that involves working with data, you must have come across SQL, or Structured Query Language. SQL was first developed in the 1970s and it continues to be the industry standard for database interactions. Many of you may already have decided to learn this language, given how powerful it is. Or maybe you’re wondering whether it is worth learning at all. SQL is certainly one of job hunters’ most valuable skills. For roles like data analyst or data engineer, SQL is your daily bread and butter. In other positions, it may not seem as necessary. However, I strongly feel that learning SQL gives you an edge in working with data that will make your business decisions more robust. It will give you the ability to comfortably analyze large volumes of data and present your results in a succinct manner to any audience. In this article, I will cover the basic SQL queries and constructs that will get you started with your data analysis needs. But before I jump into it, let me take you through what SQL is and its general applications. What Is SQL? SQL is a standard language for accessing and manipulating databases. In short, it helps you ‘talk’ to your database. You can use it to retrieve information already stored in the database, create new tables, and store or update data as needed. Sounds simple, right? It is. But that is where its power lies: It is simple, but it produces very useful results. Organizations that follow a data-centric approach to their business decisions thrive. And if you present your ideas using data analysis, you are much more likely to get a buy-in from your stakeholders. In fact, every company needs data analysis in one form or another to grow. SQL helps make this process of analyzing data seamless. It can be used for countless data-related applications no matter the domain – tech, HR, finance, marketing, sales – or the size of the company. For instance, SQL can help you: Analyze customer data and generate insights related to customer behavior. Analyze your PnL data to reveal your biggest cost heads. Perform general financial analyses. Understand which departments have the most expensive payroll/workforce. Find areas in your operations with the most errors or inefficiencies. Identify your best-performing salespeople, stores, locations, regions, etc. Analyze bottlenecks in your website and other data-related tasks. Of course, these are merely some examples. In reality, this list is almost endless. If you’re a business owner or manager, you may even want your employees to learn SQL or to incorporate SQL courses into your onboarding process. At the very least, you’ll probably want to learn SQL yourself so you can make data-backed decisions. Now that you have an idea of what SQL is and how it can be used, let's dive into some basic queries to get you started in your data analysis tasks. Basic SQL Constructs and Queries There are many more SQL clauses and functions than we can cover in this article, but the following are the foundation of most SQL queries. Also, the examples cited here are built on a limited dataset. In reality, you will work with thousands of rows and many tables. Don’t worry; the constructs remain the same irrespective of the amount of data. These basic queries will work on any database table. In fact, one of the major differentiators of SQL is scalability; it sets SQL apart from its Excel and similar tools. Here are the basic parts of a SQL query: 1. SELECT and FROM The SELECT keyword is used to retrieve data from the database. You can specify the names of the columns (i.e. the type of data you want to see, like customer name or age), or you can see all the data in the table. The FROM keyword specifies the table(s) from which the data is to be retrieved. For instance, say you want to get a list of all the customers in your database with their name and email information. This information is in the Customers table shown below: Customer_IDCustomer_NameEmailCityCountryAgeSex 1211Antonioxxxx@xxxx.comParisFR32Male 1212Paulineyyyy@yyyy.comLondonUK42Female 1213Matteozzzz@zzzz.comNew YorkUS28Male 1214Sethaaa@bb.comSydneyAU32Male 1215Agathabbb@ccc.comWashington DCUS24Female Here’s the query we’d use to get a list of all the customer names with their email addresses: SELECT Customer_Name, Email FROM Customers; And the output: Customer_NameEmail Antonioxxxx@xxxx.com Paulineyyyy@yyyy.com Matteozzzz@zzzz.com Sethaaa@bb.com Agathabbb@ccc.com You can learn more about SELECT in the SQL Basics course and in the article How Do You Write a SELECT Statement in SQL, which includes tons of examples. 2. WHERE Suppose you run an ecommerce website that’s has a promotion for US customers. In this case, you want to send out the email only to US customers; the list you want to select should only have US customers. This is when you can use the WHERE clause. So, if you use this query … SELECT Customer_Name, Email FROM Customers WHERE Country = 'US'; … you’ll get these results: Customer_NameEmail Matteozzzz@zzzz.com Agathabbb@ccc.com The WHERE clause is used to filter specific rows based on the value of one or more columns (in this case, it’s the Country column, which is not shown in the results). Now say you only want to list females from the US. You can filter for multiple conditions in WHERE if you use the AND keyword: SELECT Customer_Name, Email FROM Customers WHERE Country = 'US' AND Sex = 'FEMALE'; Here’s the result: Customer_NameEmail Agathabbb@ccc.com In the above query, the value for Country and Sex for each row is evaluated against the conditions in the WHERE clause. Only those rows where both conditions are true will be returned. If you want to return the results when either (or any) of the WHERE conditions is true, use the OR keyword: SELECT Customer_Name, Email FROM Customers WHERE Country = 'US' OR Sex = 'FEMALE'; This selects all female customers as well as all customers who live in the US. Notice the difference in the results: Customer_NameEmail Paulineyyyy@yyyy.com Matteozzzz@zzzz.com Agathabbb@ccc.com Need more resources on the WHERE clause? Check out these articles: The Complete Guide to the SQL WHERE Clause How to Write a WHERE Clause in SQL What Is the SQL WHERE Clause? 3. ORDER BY Say you have another table in your database called the Orders table. Order_IdItemQuantityCustomer_IdTotal_Value 23A62411212$100 22Ae3431213$210 13A21141213$249 11AXXX41212$212 You want to see the top orders and their details. One way to do this is to list orders in descending order (10–1, Z–A) by value. You can use something like this query: SELECT * FROM Orders ORDER BY Total_Value DESC; And you’d get: Order_IdItemQuantityCustomer_IdTotal_Value 13A21141213$249 11AXXX41212$212 22Ae3431213$210 23A62411212$100 Using a ‘*’ after SELECT tells the SQL engine to retrieve all the columns for a table. The ORDER BY clause directs that the results be shown in a particular order. By default, the results are returned in ascending order (1–10, A–Z); however, since we explicitly mention the DESC keyword, our results are returned in descending order. If you'd like to know more, check out the excellent Detailed Guide to SQL ORDER BY. 4. CASE Typically, a business will have thousands of rows in the Orders table. You may not always want to see the total value of each order. Instead, maybe you want to categorize them based on value or some other logic. If you want to categorize orders into ‘High’ or ‘Low’ value – depending on whether the order value is above or below $150 – you can use the CASE construct: SELECT Order_Id, Total_Value CASE WHEN Total_value > 150 THEN 'HIGH' WHEN Total_value < 150 THEN 'LOW' ELSE 'MEDIUM' END as Order_Category FROM Orders; And this is what you’d get: Order_IdTotal_ValueOrder_Category 13$249HIGH 11$212HIGH 22$210HIGH 23$100LOW In this query, each row is evaluated by the CASE function. Rows are assigned a category based on the first condition (i.e. WHEN .. THEN ..) that evaluates to true. Since the orders with the IDs of 13, 11 and 22 are above $150, they are categorized as ‘HIGH’. The order with ID 23 is assigned ‘LOW’. The ELSE specifies a default return value if none of the conditions evaluate to true for any row. On the LearnSQL.com blog, we have several interesting articles on using CASE in SQL queries. Here are just a few: How to Use CASE in SQL What Is CASE in SQL? How to Use CASE WHEN With SUM() in SQL 5. GROUP BY Oftentimes, to generate insights from large data sets, we may want to group similar items together. We can use this information to make informed decisions; grouping the data helps us find patterns, behaviors, or metrics. This grouping is called aggregation; it can be achieved with the SQL GROUP BY clause. You generally use a function to calculate the group metric. Commonly used aggregate functions are SUM(), COUNT(), AVG(), MAX(), and MIN(). With these, you can do many things, e.g. calculating the total expenses for a department or counting the number of employees in each location. Say you run an offer in your business where you send a free gift to the customer who spends the most on your website. You can use SQL GROUP BY clause here to retrieve the top customer in terms of total order value. Here’s the query: SELECT Customer_Id, SUM(Total_Value) FROM Orders GROUP BY Customer_Id ORDER BY 2 DESC; Here's the result: Customer_IdSUM(Total_Value) 1213$459 1212$312 Clearly, your top customer is Customer_Id 1213. The query works by first selecting all the unique Customer_Id values and then using GROUP BY to find the aggregate total sum for each (i.e. the total value of all their orders). The ORDER BY 2 DESC clause displays the results in decreasing order by SUM(Total_Value). For more information on GROUP BY in data analysis, check out What Is GROUP BY in SQL?. 6. JOIN In the previous example, we found the Customer_id of the highest-spending customer. But the Orders table doesn’t store any other customer information, such as name or communication details. You’d need to write another query to the Customers table to retrieve the required data – or you could just use JOIN. The SQL JOIN feature lets you select information from multiple tables in a single query. Typically, you will connect two tables based on one or more column values common to both tables. In our case we can use Customer_Id as the common field. It will join the Customers and Orders tables: SELECT b.Customer_Name, b.Email, a.Customer_Id, SUM(Total_Value) FROM Orders a JOIN Customers b ON a.Customer_id = b.Customer_Id GROUP BY b.Customer_Name, b.Email, a.Customer_Id ORDER BY 4 DESC; This is the output: Customer_NameEmailCustomer_IdSUM(Total_Value) Matteozzzz@zzzz.com1213$459 Paulineyyyy@yyyy.com1212$312 Now the query also returns the customer details you needed. It works by comparing the Customer_Id fields in both tables and then retrieving only those rows where it finds a match for the values. Notice that the customer IDs 1211, 1214 and 1215 are not in the results; there were no matching rows in the Orders table for these values. If you want to display all the values – even those which do not have a corresponding match in the other table – you can use something called an OUTER JOIN. Joins can be somewhat tricky to understand, but they are one of the most important concepts in SQL. Feel free to check out this course on SQL JOINs. It offers 93 exercises covering all types of JOINs (there are several). Next Steps for Learning SQL Data Analysis I hope these queries have given you enough ideas to get started with data analysis using SQL. If you don’t understand the queries perfectly, don’t worry. You can take our SQL Basics course to practice them, or our comprehensive SQL from A to Z learning path to get a full view of what this language can do. The ultimate goal is not just to learn SQL, but use it in your day to day work and develop a habit of using data analysis to make decisions. The examples I have shown you are of course very simple. I just wanted you to understand the main concepts and see how easy it is to create SQL queries. When the time comes, you’ll learn to create more sophisticated queries. They'll make your work easier and maybe even attract the general admiration of your colleagues (or boss). Knowing SQL is awesome! Once you have learned the basics of SQL, ask yourself these questions when preparing for any business decision: What data do I need to support or aid my decision? Do I have that data in my database already? If yes, how do I retrieve and present the data so it helps me make my decision? Being data-driven is a trait you can inculcate with practice. So, keep practicing and keep learning! Tags: data analysis