23rd Sep 2021 7 minutes read Find Top Salaries by Department with SQL Himanshu Kathuria GROUP BY Table of Contents The Sample Dataset What Is Aggregation? The SQL GROUP BY Clause Sample Structure of a Query With a GROUP BY Clause SELECT Columns and Aggregate Function FROM <table(s)> and WHERE <conditions> Listing the Columns for Grouping With a GROUP BY Conditions on the Aggregate Function Using a HAVING Clause Sorting With an ORDER BY Clause Ready to Write Your SQL GROUP BY Query? This article shows how to get the total salary by department for an organization, using the GROUP BY functionality of SQL. We explain the SQL GROUP BY clause in detail along with conditional filtering using a HAVING clause. Understanding the salary outlay for an organization is one of the common starting points for business leaders. It helps them see who in the organization has the highest total salary cost, department-wise. You may also want to see the distribution of salary payments to get a better sense of the cost allocation. In this article, I will illustrate how you can use the SQL GROUP BY to achieve this. If you are a professional who can use such an analysis, or if you are a student trying to understand the SQL GROUP BY functionality through a practical use case, read on! Before we go about solving the problem, let’s take a look at the dataset we will use. The Sample Dataset employees: employeenumberlastnamelevelannual_salarydepartment 1056Patterson1010000Finance 1076Firrel57000Marketing 1088Patterson1012500Finance 1102Bondur25000Human Resources 1143Bow25000Sales 1165Jennings25000Sales 1166Thompson1010000Marketing Given above is the employees table with five columns, with self-explanatory column names. employeenumber: Unique identifier for the employee. lastname: The employee’s last name. level: The employee’s level in the hierarchy. annual_salary: Annual compensation for the employee. department: The employee’s department. Now, let’s say you want to find the total annual salary given to employees belonging to different departments, like finance, marketing, human resources, and sales. This type of data operation is known as aggregation. What Is Aggregation? Since it is difficult to analyze each row separately especially with large data sets, it is often useful to group similar data together to understand some statistics for each of those groups. This is known as aggregation. You might want to group various customer segments (aggregating by customer segment), calculate their average order size, calculate total sales by region (aggregating by geography), or check the total number of items by various sellers on a website (aggregating by seller). All these are examples of data aggregation which can help in analyzing data and generating insights. So, here, we want to aggregate the employees by department, then calculate the total salary (i.e., the sum of the salaries) for all employees belonging to that department. You may think you can easily do that for the table above by adding up the salaries by hand. But you will surely need a faster way if you have thousands of employees! This is where SQL GROUP BY clause can help you. The SQL GROUP BY Clause The SQL GROUP BY clause helps you aggregate data into groups and then calculate related statistics for that group. For our case, you can use the following query for calculating the total salary by department. The comments briefly explain the function of each construct in the query. Query: SELECT department, -- column to be returned SUM(annual_salary) -- aggregate function FROM employees -- table name GROUP BY department; -- column to use for grouping Output: departmentSUM(annual_salary) Finance22500 Marketing17000 Sales10000 Human Resources5000 To help you understand this better, let me dive a little deeper into the structure of this query and explain each clause or keyword used. Sample Structure of a Query With a GROUP BY Clause If I were to draw a skeleton or sample structure of an SQL query that uses a GROUP BY clause, it would look something like this. You may not use all of the constructs all of the time, but it’s useful to understand them. Sample structure: SELECT <columns>, <aggregate function> FROM <table name> WHERE <conditions> GROUP BY <columns> HAVING <aggregate condition> ORDER BY <columns> In our query to find the total salary by department, we used only the SELECT, FROM, and GROUP BY clauses. Let’s take a look at how you can use each of these constructs. SELECT Columns and Aggregate Function In this part of the query, you specify the columns to be aggregated and returned. So, for the total salary by department, department is one of the columns to be returned. You also choose the aggregate function here. The aggregate function is the metric or statistic you want to calculate for the grouped column. In our case, SUM() is the aggregate function. SQL also provides other useful built-in aggregate functions. Take a look at this table for various aggregate functions and a sample use case for each. Aggregate FunctionExample Use Case SUM()Find the total salary by department COUNT()Find the number of employees in each department MAX()Find the highest salary paid in each department MIN()Find the lowest salary paid in each department AVG()Find the average salary for each department To change the statistic, all you need to do is use the appropriate function. For instance, if you want to calculate the average salary instead, you can use: Query: SELECT department, AVG(annual_salary) FROM employees GROUP BY department; Output: departmentAVG(annual_salary) Finance11250 Marketing8500 Sales5000 Human Resources5000 You can also use multiple aggregate functions together. For example: Query: SELECT department, AVG(annual_salary), SUM(annual_salary) FROM employees GROUP BY department; Output: departmentAVG(annual_salary)SUM(annual_salary) Finance1125022500 Marketing850017000 Sales500010000 Human Resource50005000 FROM <table(s)> and WHERE <conditions> In this section, you specify the table(s) you want to get the columns from and any conditions that you may want to apply on the selected columns. Let’s say you want to get the data for only two departments – marketing and sales – from the employees table. Query: SELECT department, SUM(annual_salary) FROM employees WHERE department in (‘Marketing’,’Sales’) GROUP BY department; Output: departmentSUM(annual_salary) Marketing17000 Sales10000 Listing the Columns for Grouping With a GROUP BY In this part of the query, you specify the columns you want to use for grouping the data. We have already seen grouping by department. One thing to be careful about here: if you are using multiple columns in your SELECT statement, you need to include them all here in the GROUP BY clause, except the column(s) being used by the aggregate function(s). If you don’t, you will probably get an error. So, for example, say you want to SELECT and GROUP BY both department and level. The query would look like this. Query: SELECT department, level, SUM(annual_salary) FROM employees WHERE department in (‘Marketing’,’Sales’) GROUP BY department, level; Output: departmentlevelSUM(annual_salary) Marketing57000 Marketing1010000 Sales210000 Conditions on the Aggregate Function Using a HAVING Clause You can choose to refine and filter the output of a query with an aggregate function by using a HAVING clause. A HAVING clause evaluates the condition(s) on the output of an aggregate function and returns the rows satisfying that criteria. So, if you want to find every department whose average salary is greater than 7000, you can use the following query. Query: SELECT department, SUM(annual_salary) FROM employees GROUP BY department HAVING AVG(annual_salary)>7000; Output: departmentSUM(annual_salary) Finance22500 Marketing17000 Note: Don’t confuse the WHERE clause with the HAVING clause. While a WHERE clause filters records from tables, a HAVING clause filters groups. Sorting With an ORDER BY Clause Finally, you can sort your results by using an ORDER BY clause. It can be used to sort the results in ascending or descending order. To sort the departments in alphabetical order, you can use the following query. Query: SELECT department, level, SUM(annual_salary) FROM employees WHERE department in (‘Marketing’,’Sales’,’Human Resources’) GROUP BY department,level ORDER BY department asc; -- asc is used for ascending, desc for descending Output: departmentlevelSUM(annual_salary) Human Resource25000 Marketing57000 Marketing1010000 Sales210000 For more examples and use cases of the GROUP BY clause, check out this article. Ready to Write Your SQL GROUP BY Query? If you have read the article this far, I am confident that you are ready to write your SQL GROUP BY query to get the desired result. I used a lot of example queries in this article to acquaint you with the process of writing queries. Trust me – practice is the best way to get better at writing queries. To practice queries like these and more, you can check out the SQL Practice track by LearnSQL. It is a comprehensive way to further hone your skill. SQL is a very powerful tool, not just for data aggregation but also for many other use cases that require data crunching and manipulation. In fact, it is a must-have skill not just for data analysts but for anyone who aspires to work in an environment where decisions are made based on data. If you are new to SQL and want to learn more about how to write SQL queries, I strongly recommend the course by LearnSQL for writing basic queries. It will give you a strong foundation for analyzing data with SQL and augment your career further. So, what are you waiting for? Get started today! Tags: GROUP BY