10th Apr 2017 Updated: 24th Aug 2017 10 minutes read Preventing Common SQL Mistakes Francisco Claria SQL Tips Table of Contents Legibility and Formatting Naming Tables and Fields Using Aliases Understanding Syntax and Reserved Words "…It used to work!" Writing Solid Queries Understand the Structure Understand What Your Query Should Retrieve Validate with Some Real Data One More Thing to Remember Regardless of the engine you are using (SQL Server, MySQL, Oracle, etc.), you can prevent common errors and simplify the debugging process. This article will discuss some of the common SQL mistakes you’ll face and will help you correct them easily. One of the best ways to prevent the most common SQL mistakes is to keep your queries clear and readable. It’s very easy to forget the ideas behind your code! If you have to revisit it (and you will), messy code can be hard to understand. So let’s start with some tips for writing clearer SQL code. Legibility and Formatting select person_id from address adr, `option` opt, option_address_type opt_adt where adr.country_id = 1 and opt.person_id = adr.person_id and opt_adt.option_id = opt.option_id and opt_adt.type_id = adr.type_id and ((opt.sample = 1 OR opt.sample = 22) and opt.param = false Do you find that easy to read? I don’t. But this same query looks much better when properly formatted: SELECT person_id FROM address adr, "option" opt, option address_type opt_adt WHERE adr.country_id = 1 AND opt.person_id = adr.person_id AND opt_adt.option_id = opt.option_id AND opt_adt.type_id = adr.type_id AND ((opt.sample = 1 OR opt.sample = 22) AND opt.param = FALSE It’s now quite clear that there is an extra parenthesis after the 4th AND, which will make the query fail. Naming Tables and Fields The positive impact of using naming conventions consistently is sometimes underestimated. In addition to following a convention, give tables and fields the most self-explanatory names that you can. For instance, if you store records for users and their purchases in two tables, you could name the tables users and users_purchases. It’s apparent what each table stores. Naming the same tables "uss" and "usspchs" is not clear, so it’s not a good idea. Also, don’t switch from singular (user) to plural (users_purchases) table names; pick one and stick with it as part of your convention. It’ll most likely help you avoid common SQL mistakes. The same applies for table columns. Sometimes it is not clear what columns are for. A user role column that could have been named something like role or user_role, might end up being called "param" or another undescriptive name. If you are worried about table names getting too large and therefore queries becoming too verbose, you can use table aliases. Using Aliases Aliases allow you to write more compact queries. However, if you are not careful, you may end up with unreadable lines of SQL leading to common SQL mistakes. Here’s an example: SELECT a.id, b.name FROM platform_events AS a LEFT JOIN platform_events_calendars AS b ON (b.platform_event_id = a.id) WHERE a.enabled = 1 AND b.platform is not null Better aliases would make this query more descriptive: SELECT Event.id, Calendar.name FROM platform_events AS Event LEFT JOIN platform_event_calendars AS Calendar ON (Calendar.platform_event_id = Event.id) WHERE Event.enabled = 1 AND Calendar.platform IS NOT NULL Understanding Syntax and Reserved Words SQL parsers will generally point out the location of one of the most common SQL mistakes: a syntax error. Be sure to check the documentation if you are in doubt. Sometimes the parser will return an empty message. Or it might give you a message for an error that you have a hard time finding. If this happens, check your parentheses and quotes; it’s easy to forget to properly close them. Also, pay special attention to how you use reserved words. Try not to use them as part of your table or column names. For instance, if you have a column named desc, maxvalue, or order, this name could get misinterpreted by the parser. Name columns and tables descriptively, without using reserved words. This will help prevent possible collisions. In the former example, you could name the columns sale_description, speed_maxvalue, or purchase_order. If you absolutely must use reserved words, enclose the field names that use them with double quotes ("), as per the SQL standard. However, your specific engine may use other escape characters. For example, MySQL uses single quotes (‘). Say that you have a table called table with a column called from. You could write the query by escaping the reserved words, like this: SELECT "from" FROM "table"; While you can use escape characters to get round the use of reserved words, consider that every database documentation advises you not to use reserved words as object names. It’s really prone to common SQL mistakes. "…It used to work!" If a query stops working for no apparent reason, look for: A schema change A change in the database user’s permissions A change to the underlying engine (upgrading or downgrading) The query’s error message will tell you where you should look for your SQL mistake. If you test the query in your environment and it works, then look at three scenarios shown above. You may have an outdated database schema, or your users have a different engine version or different permissions than you. Writing Solid Queries Now that we have covered some easy-to-spot, common SQL mistakes, let’s turn our attention to tips for writing solid queries. Understand the Structure First, I recommend familiarizing yourself with your database’s structure and information. To understand the general structure: Look at all existing tables in your model. Pay extra attention to the tables that you need to use for your query. Get familiar with your tables’ columns and their primary keys. Figure out how these tables relate to others; check for foreign keys. Check for unique columns and indexes. Next, look at the data you will use in your query: Are there any records? How many rows are available in the tables? Find any columns holding null and empty values. Look at the various data values (numeric, Boolean, date, etc.) that are stored as strings. Beware accented characters! (Hint: Pay attention to the database’s charset.) Sometimes the data is wrongly encoded and stored with weird symbols instead of accented characters. This will make your query fail. Knowing your data will help you approach a query in a more enlightened way and avoid common SQL mistakes. For instance, working with dates that are stored as strings is different than working with dates stored as DATETIME types. Or what if you are not supposed to have null values in a required column but your table definition allows null values? Or what if your table has many millions of records? Your query will have to make optimal use of indexes to prevent really slow execution times. These are all things that you want to know before you start writing! There is no standard SQL command that will get all tables from a database. It depends on your database engine. Usually you will have a tool for visually exploring the tables that you can use. If not, check the engine’s documentation. For instance: MySQL: USE database_name; SHOW TABLES Oracle: SELECT * FROM dba_tables MS SQL: SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE' Knowing this, you could start exploring the available columns data with a simple SELECT: SELECT * FROM your_desired_table Then you could see how many rows the table has: SELECT COUNT(*) FROM your_desired_table Depending on your engine, you can explore the table structure in greater detail using something like: MySQL: DESCRIBE your_table_name Oracle: DESC your_table_name MS SQL: SP_HELP your_table_name Understand What Your Query Should Retrieve Sometimes queries are not as simple as SELECTing a name FROM a table, which can lead to common SQL mistakes. You may need to use averages, summarizations, groups, or values that depend on whether another related table satisfies a given value. My best advice is to be really specific in what you expect and try to consider all possible scenarios. These techniques can help you to validate your result records and simplify a complex scenario to avoid common SQL mistakes: Populate testing tables with well-known data of your own and hit those in your query. If you are using so many columns that they overwhelm the output of your SELECT, start with only the most meaningful columns and add others as you can. Use hard-coded values strategically in your query. For instance, replace a subselect with a known fixed value (or a list of values) so you can better focus on other aspects of the query. If you are using subqueries, test those independently before you try the entire query. This will ensure the subqueries are working as you intend. Test complex expressions (like ones involving math) separately. When possible, test and add the component parts of a complex query incrementally to ensure each step works. Suppose you have a table like this: CREATE TABLE "syscities" ( "id" INT(11) NOT NULL, "cityname" VARCHAR(512) NOT NULL, "population" INT(11) NOT NULL ); If you have a visual tool, you can use that to populate the table. Otherwise, use some INSERTs to create a few test values: INSERT INTO "syscities" ("id", "cityname", "population") VALUES (1, 'Manchester', 1000000); INSERT INTO "syscities" ("id", "cityname", "population") VALUES (2, 'Los Angeles', 2000000); Having values that you know beforehand and makes it easier to see if your queries are working properly. For instance, if you wanted to select the ID of the city with the largest population, it’s easy to see that ID 2 (Los Angeles) would be the right value to return: SELECT syscities.id FROM syscities ORDER BY syscities.population DESC LIMIT 1; Validate with Some Real Data Before jumping into all of your actual data, I suggest grabbing only a few records for validation purposes. (In this case, "few" could be a couple of records – or maybe hundreds or even thousands if you’re working with large datasets.) Put these known records into a testing table and work with that. Once you see that everything looks good, test the query with the full set. It will help you avoid common SQL mistakes. Always keep in mind that data will change over time, so you’ll need to plan ahead and test your query with some common scenarios: No rows: What if your query (and its subqueries) are used on a table with no records? Only one row: What if there is only one record in the table? A few rows: Testing with odd and even numbers of rows may return different results. Make sure you use both even and odd amounts of records when you test. Many, many rows: Have you considered the possibility of having a really huge dataset? If some of the tables referenced in your query could become very large, you must anticipate the effect this can have on the query; maybe it kills the response time or even renders the query useless.For instance, the MySQL function expr IN (value,...) is limited to the max_allowed_packet value. If you put more than that number of elements in the list, the query will fail. Let’s continue with the syscities table to illustrate this. Suppose I create the following (suboptimal) query: SELECT syscities.cityname FROM syscities WHERE syscities.id IN (SELECT id FROM syscities WHERE population > 10000) If you are storing records for all major cities worldwide, the SELECT used in the WHERE condition would retrieve thousands of records, which could cause the IN clause to get too many IDs. The entire statement could fail. It’s one of the most common SQL mistakes. One More Thing to Remember Hopefully this guide will help you prevent some common SQL mistakes. Remember, the server you use (Oracle, MS SQL Server, MySQL, etc.) will impact how you learn SQL, since each one has its own interpretation of the SQL standard. And pay close attention when you search for help online, as you may find solutions that don’t apply to the server (or version) you are using. If you have any comments, questions, or suggestions that could help other SQL learners, please share them in the comment section below. Tags: SQL Tips