12th Mar 2024 10 minutes read Data Analyst's Guide to SQL Indexing: Fix Slow Queries Jeffrey Edison sql query performance Table of Contents SQL and Database Queries What Is an SQL Index? Database Indexing Example #1 Database Indexing Example #2 Creating an SQL Index SQL Indexing Best Practices What’s Next with SQL Indexing? Does your SQL query response time leave something to be desired? Or maybe you don't know if your queries could be faster. In this article, we’ll explain how SQL indexing can help. Diving into SQL is like unlocking a superpower. It's all about mastering the art of querying your database effectively so you can retrieve the information you need quickly and easily. But what happens when you ask a question that's complex enough to slow response time from seconds to minutes – or even longer? That's when you say hello to SQL indexing. In this article, I'll share everything you need to know about SQL indexing and why it should be the next step in optimizing your SQL queries. Let's dive in! SQL and Database Queries SQL is a language for querying and maintaining databases. It allows you to store and view data as well as update that data. Managing data is crucial for data analytics and business intelligence. We need to manage the underlying data so that we can analyze it and draw conclusions from it: Which products are selling well? Which customers are buying a lot? Which customers aren’t paying on time? Once we have the data, then we can answer these types of questions. But first, we need to "ask questions" or “make requests” about our data. Those requests are called queries – e.g. something like “For last month's sales, count the number of times each product was sold and provide me with the sum of the sales”. Of course, as we ask more complex questions, our queries become more complex. We may need to combine (or JOIN) information from multiple sources so that we can extract the information that we need. For example, it's useful to know that product ID 2123876123 sold the most during the previous quarter, but it is even more useful to know that that product is a blue sweatshirt. As the complexity of our queries increases, it becomes more challenging and time-consuming for our database system to respond to the query. We need to speed up our SQL queries, but how? This is where the fun begins. There is a subtle art to understanding when a query is responding slowly and how to improve its performance. We could try to modify our query and make it more efficient, but SQL performance tuning is only going to take us so far. At some point, we need to perform query optimization. This makes it easier for the database system to retrieve the results that we are looking for, which improves query performance. This is where indexes come in. What Is an SQL Index? Like the index of a book, SQL indexes prepare the database system for more efficient data retrieval. Creating indexes is straightforward; we have a comprehensive course on indexes that will teach you everything you need to know. This includes how they are created within the database, the SQL syntax for creating an index, and when to create an index. There are over 50 exercises and an estimated 10 hours of learning. If you'd like an introduction to the theoretical side of indexes, we also have additional articles for you about SQL indexing basics and What Is a Database Index? For our purposes, we won't worry about the details of how the database builds an index or its underlying B-tree structure. Instead, we’ll focus on how the database uses an index. Suffice it to say that a B-tree (balanced tree) index allows us to access any row in the database in the same amount of time. Database indexing speeds up data retrieval. Think back to our analogy; it's easier to find the page that refers to Abraham Lincoln by looking for “Lincoln, Abraham” in the book's index. (Compare this to looking through every single page for a mention of Mr. Lincoln and you’ll get the idea.) Using an index, finding the reference to any particular person in the book would take the same amount of time. On the other hand, imagine the book has no index and you have to scan every page to find a name. Locating any specific reference will take an unknown, variable amount of time – people who appear on the first pages will be found more quickly than people who appear at the end (unless we scan from back to front). SQL indexing works in the same way. An index is applied to a column that makes it easier for the database to return information for a search. Should we create indexes for every column in a table? Wouldn’t this speed up the database? No. Indexing too many columns will adversely affect database performance, making it very slow to add and update rows. The current best practice is to only index columns that are frequently used for ordering or sorting data. Database Indexing Example #1 Let's assume that you need to store information about people: their first name, last name, street name, street number, postal/ZIP code, city, country, phone number, and date of birth. Now, let's assume that you have millions of records stored in this database. Would you likely search (i.e., query) the database based on last name? Yes, probably. Would you likely query the database using only the first name? No, probably not; it's more likely that you would query based on first name plus last name. But maybe you want to see when a particular first name was the most popular in the dataset. In that case, you’d be asking “in what year were the most people with this first name born?” Each of these cases would require a different approach to indexing. If we are searching by last name only, we would create an index on the last_name column. If we were searching by first and last name, we’d index both the first_name column and the last_name column. In the third case, we’d create an index on the first_name column alone. In short, there are different SQL indexing techniques. By applying these different techniques, we can achieve SQL performance tuning. The creation of indexes depends on how you are going to use the data – or, in other words, how you are going to query the data. Database Indexing Example #2 Imagine we have a person table with four columns: ssn (social security number, which is similar to a national ID number), first_name, last_name, and zip_code (postal code). When we have millions of records, running an SQL query to find a row based on the last name takes almost a minute – even for such a simple table. Can you imagine waiting a minute for the application to retrieve the information that you need each time you query it? In this example, it took 46 seconds to scan through 40 million records. There was no index on the last_name column, so the database system had to read every record in the table to check which records matched a particular last name. We desperately need to speed up this SQL query. In this situation, we will create an index on the column last_name. This will speed up the SQL query by three orders of magnitude (approximately 3,000 times faster) to 15 milliseconds: a huge gain in SQL query performance. To index a database, we need to define: The name of the index. Which column(s) will have the index. The name of the table containing these columns. Remember, we shouldn't create indexes for every column or every combination of columns. If we do, we risk killing the database and creating an application that is not responsive when users are adding or updating records. Let’s see how this is done. Creating an SQL Index The syntax of creating an index is simple. To create one simple index on one column of the table, we use the following statement. In this case, it will index the zip_code column of our person table: CREATE INDEX index_zip ON person (zip_code); It’s not complicated. However, remember that each index must be updated when new rows are added and existing rows are modified or deleted. These updates take time; if you overuse indexes, then indexes can cause your database and its application to become slower or even too slow for users. You can also alter existing indexes or drop (remove) them, but that’s outside the scope of this article. There are different types of indexes: Unique indexes maintain data integrity by defining that no two rows in the table can have the same value for the unique index. Primary indexes are a specific type of unique index, but there can only be one per table. The primary index is created when the table is created. As its name implies, it is created from the primary key of the table. Secondary indexes are additional indexes that are created on demand (using CREATE INDEX) and can be dropped. Secondary indexes can have duplicate, non-unique values (i.e. can have more than one row with the same value). Composite indexes (multi-column indexes) are indexes that include multiple columns. Indexes are important, but over-usage is as damaging as under- or non-usage. A well-implemented indexing strategy is key. You can avoid unnecessary indexes by understanding how the data in your tables will be maintained. But be sure to include important indexes based on how your data will be accessed and used. SQL Indexing Best Practices Let's dive into indexing best practices. Don't index every table. Small tables do not require indexes, as a table scan will be more efficient than looking through the index and then retrieving the data from the table. Don't index every column. I hope that this is obvious from previous examples. Indexing every column adds overhead to keep those indexes up-to-date and slows down other database operations. Index columns that you filter on (i.e. often use in WHERE clauses). Don't index large columns. A large field within your table will result in a large index. Index foreign keys. This improves the performance of JOIN Only use multi-column indexes when appropriate. Multi-column indexes are great. (Remember our example of indexing on both first name and last name so that we can efficiently query on that combination). However, composite indexes are more challenging, as you have to consider the order of the columns within the index. Shall we create an index on the first name and last name or on the last name and first name? These are two different indexes. Which one will perform more efficiently? The response will depend on the SQL queries. Generally, a single-column index is enough (and it saves time). Use indexes to pre-sort data. Repeatedly sorting data can be avoided when an index is added with the sort order (i.e. ascending or descending). Make sure things are working as expected. Use EXPLAIN PLAN to verify that your indexes are helping. Now that we have a strategy for creating indexes, let's consider how we maintain and tune them. You need to monitor your indexes: check the index statistics to understand the usage of the indexes and check the query plan to analyze how your SQL queries are using indexes. SQL index maintenance involves reorganizing or rebuilding indexes. Reorganizing an index is less intensive than a full rebuild. Both reorganizing and rebuilding an index are resource-intensive and they may or may not improve performance. So, analyze your indexes before you maintain them. Do not assume that maintaining an index will improve the SQL query performance. What’s Next with SQL Indexing? Now that we have gone through what indexes are and the best practices for creating and maintaining them, the next step for you is to dive deeper into indexes. Our Understanding Indexes course explains how indexes work and guides you through how and when to create an index. Its hands-on exercises will help you quickly grasp what you need to keep your queries fast and efficient. Happy learning! Tags: sql query performance