28th Nov 2024 12 minutes read SQL Queries for Google Sheets Ekre Ceannmor Data Analysis Table of Contents Why Use SQL with Google Sheets? Using Google Sheets’ QUERY Function General Syntax Filtering Data Sorting Data Aggregate Functions and Grouping Combining Data From Multiple Sheets Using SQL Queries Makes Google Sheets More Powerful Power up your data analysis skills by utilizing SQL queries for Google Sheets! This comprehensive guide will demonstrate how to use the QUERY function to combine the power of SQL with the accessibility of Google Sheets! Looking to make data analysis in Google Sheets faster and more powerful? The QUERY function lets you filter, sort, and analyze data just like you would in SQL – without leaving your spreadsheet! In this article, we will break down everything you need to get started using SQL queries for Google sheets. Let’s dive in! Why Use SQL with Google Sheets? Google Sheets is a popular online spreadsheet service that allows users to comfortably work and collaborate from any place with an Internet connection. Along with other spreadsheets, Google Sheets is a popular tool for data analysis, being easy to use and beginner friendly. However, creating complex reports with just the basic functions available in Google Sheets can be hard. Tasks like filtering complex data, creating multi-level summaries, or combining large datasets often require a lot of manual work, making the process time-consuming and error-prone. That’s where SQL comes in! Google Sheets allows you to write SQL-like queries right inside your spreadsheets, operating on existing data and giving you dynamic responses just like regular functions. Sound interesting? Let's look at how you can start using SQL with Google Sheets. Using Google Sheets’ QUERY Function Google Sheets’ QUERY function has a similar syntax to SQL, so any of your existing skills will translate nicely. We will discuss the syntax in detail in a moment, but first, let’s see some sample data that we will be using to showcase the QUERY function: This is an excerpt from one of the databases in our SQL Basics course, simplified to fit our current needs. General Syntax The general syntax for using the QUERY function is as follows: =QUERY(data_range, query, [headers]) The components are: data_range is the range of rows where the source data is stored. query is the query itself, which we will discuss in a moment. headers is the number of header rows in the range of data (most often 1 or 0). This is optional. It’s automatically set if it’s not provided, so this parameter is omitted most of the time. Let’s take a look at some example queries that use the above dataset. Note: Here and in all the subsequent examples, data_range means the range of rows for the QUERY function, i.e. the table where the data is stored for the SQL query. Google Sheets vs. SQL Sample Queries Google Sheets SQL Explanation =QUERY(data_range, "select *", 1) SELECT * FROM data_range; Retrieves all data in the data range. =QUERY(data_range, "select *") SELECT * FROM data_range; Also retrieves all data in the range, but lets Google Sheets decide how many header rows there are. We will avoid the header argument from now on, as Google Sheets can reliably infer it. =QUERY(data_range, "select A, B") SELECT Department First_Name FROM data_range; Retrieves only the department names and first names of the employees (stored in columns A and B) Filtering Data The QUERY function’s syntax for filtering data is similar to the SQL syntax: Google Sheets: =QUERY(data_range, “select * where ”) SQL: SELECT * FROM data_range WHERE ; Let’s examine some examples of how the QUERY function compares to SQL in terms of filtering data: Google Sheets SQL Explanation =QUERY(data_range, "select * where D > 3000") SELECT * FROM data_range WHERE Salary > 3000; Filters using numbers. Retrieves all data for employees who have a salary (column D) greater than 3000. =QUERY(data_range, "select * where A = 'IT'") SELECT * FROM data_range WHERE Department = ‘IT’; Filters using text. Retrieves all data for employees who work in the IT department (column A). =QUERY(data_range, "select * where E = ‘Trainee’ and D >= 3000”) SELECT * FROM data_range WHERE Position = ‘Trainee’ AND Salary >= 3000; Filters using multiple conditions. Retrieves all data for trainees (column E) that have a salary of 3000 or more (column D) =QUERY(data_range, “select * where B starts with ‘J’”) SELECT * FROM data_range WHERE First_Name LIKE ‘J%’; Filters by a given part of a string. Retrieves all data about employees whose name starts with the letter J. There also are other functions of the same type in Google Sheets; see the full list below. The WHERE condition can filter numbers, text, and much more: Numbers support all of these operators: <=, <, >, >=, =, !=, <>. Both != and <> mean “not equals” and can be used interchangeably. To filter null values, you should use is null or is not null. Note that <column> = null and <column>!= null will give you an error. Several functions are available to filter text data within the QUERY function: starts with filters by the first characters of the string. starts with ‘A’ will match all of these: “Accounting”, “Amelia”, “A”. ends with filters by the last characters in the string. ends with ‘nt’ will match all of these: “management”, “Ant”, “nt”. contains filters by the substring. contains ‘r’ will match all of these: “Harry”, “Taylor”, “r”. matches filters using regular expressions (regex), i.e. a pattern that describes a string. like filters similarly to LIKE in SQL. In Sheets, it supports two wildcards: “%” for matching 0 or more characters, and “_” for matching one character. like ‘_o%’ will match all of these: “Poole”, “ooze”, “to”. To join several conditions into one, you can use the keywords and and or: <condition1> and <condition2> will be true only if both are true. <condition1> or <condition2> will be true when either or both are true. To filter by the inverse of the condition you can use the not keyword: where D > 3000 would give the same result as where not D <= 3000. Sorting Data Sorting in the QUERY function is also similar to sorting in SQL. Here’s the basic syntax: Google Sheets: =QUERY(data_range, “select * order by <column> [asc/desc]”) SQL: SELECT * FROM data_range ORDER BY <column> [ASC/DESC]; Let’s take a look at how they compare using some examples: Google Sheets SQL Explanation =QUERY(data_range, “select B order by B”) SELECT First_Name FROM data_range ORDER BY First_Name; Sorts by one column; ASCending order is implied. Returns all employee first names in alphabetical order. =QUERY(data_range, “select A, B order by A, B desc”) SELECT Department, First_Name FROM data_range ORDER BY Department, First_Name DESC; Sorts by multiple columns in different order. Returns the departments and names of the employees, sorted by the department in alphabetical order and, within each department, by name in reverse alphabetical order. If you do not specify the ordering direction, ascending will be used by default. This means that strings will be shown in alphabetical order and numbers will be shown from smallest to largest. Specifying … order by A asc … explicitly will get the same result as avoiding specifying the order: … order by A … . If you want to specify different ordering directions for different columns, you have to specify the order for every column. For example, this … =QUERY(data_range, “select A, B order by A, B, C desc”) … will sort only the C column in descending order. Here is the proper way to sort all columns in decreasing order: =QUERY(data_range, “select A, B order by A desc, B desc, C desc”) Aggregate Functions and Grouping The QUERY function supports several SQL aggregate functions. In case you’re not familiar with them, aggregate functions take a group of rows as an input and return a single value. (Groups are usually based on shared values in a given column; we’ll discuss more about that in a moment. If you don’t specify a grouping column, the entire dataset is the group.) The aggregate functions supported in Google Sheets’ QUERY are: count(col) – Returns the number of non-null values from the column col. max(col) – Returns the biggest value from col. When comparing dates, earlier dates are treated as “smaller”. Strings are compared alphabetically and are case sensitive. min(col) –Returns the smallest value from col. sum(col) - Returns the total sum of numeric values in col. avg(col) - Returns the average of numeric values in col. Here are some examples of aggregate functions being used by themselves, without grouping: Google Sheets SQL Explanation =QUERY(data_range, “select avg(D)”) SELECT AVG(Salary) FROM data_range; Using the avg() function on its own makes it use the whole data range as the input. Returns the average salary for all employees in the company. =QUERY(data_range, “select count(B)”) SELECT COUNT(First_Name) FROM data_range; Same principle; without groups, count() counts the number of rows in the B column for the whole dataset. Returns the total number of employees by counting their first names. Note that null values are omitted; to get the proper count of rows, we need to make sure we are counting a not-null column. To group the rows by a column value in QUERY, we use the group by keyword (just like in SQL). Here is a comparison between the two syntaxes: Google Sheets: =QUERY(data_range, “select <data> group by <columns>”) SQL: SELECT <data> FROM data_range GROUP BY <columns>; The group by keyword groups the rows based on shared values in the provided column(s), creating a single row for each distinct value. When using group by, all final columns must either be an aggregate function or be used in the group by clause. The result is automatically sorted by the grouping columns, but this can be overridden by using order by. Here are some examples of using group by with aggregate functions: Google Sheets SQL Explanation =QUERY(data_range, “select A, avg(D) group by A”) SELECT Department, AVG(Salary) FROM data_range GROUP BY Department ORDER BY Department; Returns the average salary between employees of each department. Note that the SQL equivalent includes the ORDER BY clause, but QUERY does not. When using group by with QUERY, the result is automatically sorted. =QUERY(data_range, “select E, count(B) group by E” order by count(B) desc) SELECT Position, COUNT(First_Name) FROM data_range GROUP BY Position ORDER BY COUNT(First_Name) DESC; Returns the number of employees working in each position. Notice that in QUERY we override the default sort order to show the positions with the most employees first. =QUERY(A1:E14, "select A, E, avg(D) group by A, E") SELECT Department, Position, AVG(Salary) FROM data_range GROUP BY Department, Position ORDER BY Department, Position; This is an example of grouping by multiple columns. Returns the average salary for each position in each department. The QUERY function uses the same order of columns when sorting the data as the order specified when grouping. Combining Data From Multiple Sheets Most of the time, you’d probably like to have a separate sheet with all the source data. This section will describe how to use the QUERY function on a sheet other than the source data. We’ll also cover how to combine different data sources in one QUERY. Retrieving Data from Another Sheet Google Sheets allows you to refer to a different sheet when choosing a data source. You can do this by adding the name of the sheet and an exclamation point (!) to the range of your dataset. For example, let’s assume we’re currently using the “analysis” sheet and our source data is in the “data” sheet in the same spreadsheet. (Note: You can reference data from other spreadsheets only if you import it into your current spreadsheet.) We can specify the range like this: =QUERY(data!A1:E14, “select *”) Important: If your sheet has spaces or special characters in the name, you should surround the name of the sheet with single quotes (‘’), like this: =QUERY(‘Sheet with a complex name’!A1:E14, “select *”) Now you know how to better format your sheets by separating the source data from the final reports. Combining Multiple Data Sources You can also combine data from multiple data ranges in one QUERY function. There are two ways in which the data will be combined: Vertically: The data from range 1 will be put on top of the data from range 2, increasing the final number of rows. Horizontally: The data from range 1 will be put side-by-side with the data from range 2, increasing the final number of columns. The first option works like a UNION in SQL. The data in both ranges should be similar; when referencing a column, data from both ranges will be shown. To use this method of combining sheets, replace the data range in the query with the list of ranges, separated by semicolons and encased in curly braces {}, like this: =QUERY({sheet1!A1:B1;sheet2!A1:B1}, “select *”) The second option works like a JOIN in SQL. Every row from range 1 is appended with a row from range 2, with the joining condition being the relative row number. In other words, the first row from range 1 gets appended to the first row of range 2. This method has a similar syntax to the previous one, but it uses commas instead of semicolons to separate the data ranges: =QUERY({sheet1!A1:B1,sheet2!A1:B1}, “select *”) You can learn more about how UNION and JOIN work in SQL in our extensive SQL Basics course. It includes more than 100 exercises that will help you learn and practice the most important SQL syntax. Each exercise in this interactive SQL course is designed to help you solidify your understanding. As you complete the exercises, you'll gain the confidence to use these commands in the real world. Whether you're just starting out or need a refresher, our step-by-step approach ensures you'll learn at your own pace. By the end of the course, you'll have a solid foundation in SQL that will open up opportunities for data analysis, reporting, and more. Using SQL Queries Makes Google Sheets More Powerful Using Google Sheets’ QUERY function is a great way to combine the power of SQL with the usability of Google Sheets – without having to learn much additional syntax. With it, you can create easily shareable reports that your colleagues can use without having to leave Google Sheets or understand advanced SQL structures. In this article, we’ve looked at the syntax for the QUERY function and its similarities and differences compared to regular SQL. As you can see, your existing SQL skills can be easily transferred to writing queries within Google Sheets. Want to explore more SQL integrations with Google’s tools? Read about how to use SQL with Google Analytics. Tags: Data Analysis