18th Jul 2024 20 minutes read

SQL Project for Portfolio: Northwind Store

Table of Contents

The Northwind Store Dataset
1. Database Inspection
2. Data Exploration
- Exploring the orders Table
- Bonus Data Exploration
3. Advanced Data Analysis and Actionable Insights
Ready for Your Own SQL Portfolio Project?

SQL portfolio projects are an important part of a data analyst’s education. How do you start a project, and where do you go with the insights you discover? We’ll use the Northwind store database to answer this.

Doing a solid SQL project is an essential element of any data analyst’s learning process. An SQL project is a great tool for learning, as it makes you use SQL on a real-world dataset. This is especially important if you’re a beginner lacking the opportunities to work with real-world datasets. By working with multiple real-world datasets, you learn what issues you may encounter in the real world. As a bonus, doing several SQL projects for your portfolio is always a nice thing to see in a resume.

The question is, how do you prepare an SQL project portfolio? Once you find a free online dataset you want to analyze, what do you do next? We’ll answer these questions using the Northwind store database.

You can find this dataset in the SQL Databases for Practice course. It is one of six datasets that include data from a university, blog traffic, athletic scores, a music store, and the Museum of Modern Art (MoMA). This course is a part of the SQL Practice track, where you can practice aggregation, JOINs, subqueries, CTEs, CASE WHEN, and other important SQL topics. If you need to refresh your knowledge on some of those areas important for reporting, try our Creating Basic SQL Reports course.

Let’s now take the Northwind store dataset and use it for an SQL project for a data analyst portfolio. I’ll do this project in PostgreSQL, but everything I do is transferable to other databases with minor syntax tweaks.

As we go, feel free to consult our free SQL for Data Analysis Cheat Sheet.

The Northwind Store Dataset

The only info I’ll give you about this database is that it consists of six tables:

categories – A list of product categories.
channels – A list of sources through which the store acquires customers.
customers – A list of the store’s customers.
order_items – A list of the products included in each order.
orders – A list of orders placed by customers.
products – A list of the products the store offers.

We’ll gather the rest of the info by doing our project. As a matter of fact, this database inspection should be every project’s initial stage, the one before you start doing analysis.

1. Database Inspection

This stage of preparing an SQL project involves getting to know your data, such as table and column names, primary and foreign keys, the relationships between the tables, and the data types in each table.

Inspecting Table Names

A primitive way of learning about the tables in the dataset is to find them in the browser of the RDBMS where you imported the dataset, e.g., PostgreSQL, SQL Server, or MySQL.

In PostgreSQL, you can write this query to get a list of all tables in a database:

SELECT table_name
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'pg_catalog') AND table_type = 'BASE TABLE'
ORDER BY table_schema, table_name;

It looks for data in the information_schema, where you can find database metadata. Since I want the name of every table, I use the view tables after referencing the schema and put table_name in the SELECT.

The first condition in WHERE filters out system schemas and leaves only user-defined tables. The second condition ensures that only base tables are listed, without views and other tables.

Here’s the list of tables in the Northwind database:

table_name
categories
channels
customers
order_items
orders
products

Inspecting Column Information

Now we want to understand the details about each table better. Knowing their columns is a good start.

We can again query information_schema to get important info about the columns:

SELECT table_name,
       column_name,
       data_type,
       is_nullable,
       column_default
FROM information_schema.columns
WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
ORDER BY table_name, ordinal_position;

We can find table names, column names, and the data type of each column by listing table_name, column_name, and data_type in the SELECT.

The is_nullable column will get us info about whether the column accepts NULL values.

table_name	column_name	data_type	is_nullable
categories	category_id	integer	NO
categories	category_name	character varying	NO
categories	description	text	YES
channels	id	integer	NO
channels	channel_name	character varying	NO
customers	customer_id	integer	NO
customers	email	character varying	NO
customers	full_name	character varying	NO
customers	address	character varying	YES
customers	city	character varying	YES
customers	region	character varying	YES
customers	postal_code	character varying	YES
customers	country	character varying	YES
customers	phone	character varying	YES
customers	registration_date	timestamp without time zone	NO
customers	channel_id	integer	NO
customers	first_order_id	integer	YES
customers	first_order_date	timestamp without time zone	YES
customers	last_order_id	integer	YES
customers	last_order_date	timestamp without time zone	YES
order_items	order_id	integer	NO
order_items	product_id	integer	NO
order_items	unit_price	numeric	NO
order_items	quantity	smallint	NO
order_items	discount	numeric	NO
orders	order_id	integer	NO
orders	customer_id	integer	NO
orders	order_date	timestamp without time zone	YES
orders	total_amount	numeric	NO
orders	ship_name	character varying	YES
orders	ship_address	character varying	YES
orders	ship_city	character varying	YES
orders	ship_region	character varying	YES
orders	ship_postalcode	character varying	YES
orders	ship_country	character varying	YES
orders	shipped_date	timestamp without time zone	YES
products	product_id	integer	NO
products	product_name	character varying	NO
products	category_id	integer	NO
products	unit_price	numeric	YES
products	discontinued	boolean	NO

This way, we have all the info in one place, which makes it easier to investigate.

First, we can understand what data each table shows by seeing the column names. The data types all seem logical. For example, there are no IDs defined as character varying. So, there will be no need to convert data into a suitable format at this stage.

Regarding NULLs, we get some valuable information about the tables:

The table categories allows for the category description to be NULL.
The table channels allows no NULLs at all.
The table customers allows NULLs in many columns, including country. If we want to create a geographical segmentation of customers, these NULLs could cause problems for us.
The above output shows that no columns in the table order_items are nullable.
For the products table, it’s interesting to note that the unit price can be NULL.
It seems a bit strange that the table orders has so many nullable columns. It basically makes it possible to have an order ID and no other information about the order. We have to keep that in mind when analyzing this table.

Finding Primary and Foreign Keys

The next step of data inspection is understanding how the tables work together. To do that, we first need to find their primary keys (PKs) and foreign keys (FKs).

PKs will show us which column(s) a table uses to uniquely identify data. By doing this, you’ll understand the structure of the table better.

FKs will show us which column is linked to a primary key from another table. This is the basis for understanding the relationships between the tables.

You can list both PKs and FKs using the query below:

SELECT kcu.table_name,
       kcu.column_name,
       tc.constraint_type
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
ON kcu.constraint_name = tc.constraint_name
AND kcu.constraint_schema = tc.constraint_schema
WHERE tc.constraint_type IN ('PRIMARY KEY', 'FOREIGN KEY')
ORDER BY kcu.table_name ASC, tc.constraint_type DESC;

This time, we need to use two views from the information_schema, namely table_constraints and key_column_usage.

The table_constraints view contains info about the constraints for each table. The key_column_usage view is for getting the info about columns with the key constraints.

These two views are joined by two common columns: constraint_name (the name of the constraint) and constraint_schema (the name of the schema containing the constraint).

I use the WHERE clause to output only the primary key and foreign key constraints.

As a final step, I order the output alphabetically by the table name and then reverse alphabetically by constraint type.

The code outputs this table:

table_name	column_name	constraint_type
categories	category_id	PRIMARY KEY
channels	id	PRIMARY KEY
customers	customer_id	PRIMARY KEY
customers	last_order_id	FOREIGN KEY
customers	first_order_id	FOREIGN KEY
customers	channel_id	FOREIGN KEY
order_items	order_id	PRIMARY KEY
order_items	product_id	PRIMARY KEY
order_items	order_id	FOREIGN KEY
order_items	product_id	FOREIGN KEY
orders	order_id	PRIMARY KEY
orders	customer_id	FOREIGN KEY
products	product_id	PRIMARY KEY
products	category_id	FOREIGN KEY

The tables categories and channels are the only tables that have PK but no FK.

All other tables have one PK and at least one FK. The only exception is the table order_items, which has the PK constraint on two columns: order_id and product_id. In other words, the table has a composite primary key, which means that the unique identifier of the data is the unique combination of two columns. In this case, it’s to show only one product ID by an individual order.

A closer look reveals that the same columns are also a foreign key for the same table. This means that the primary keys are also foreign, as they come from other tables.

Understanding the Relationships Between the Tables

Listing PKs and FKs already gives us some idea about the relationships between the tables. However, we can better understand them by showing the foreign keys' parent and child tables and columns.

Why would we do that? A foreign key is always another table’s primary key. Listing all the parent and child tables and columns is an upgrade on the previous query because we’ll easily see the shared columns between tables.

This information is useful for understanding the relationships between the tables, knowing which columns you can use to join tables, and whether they can be joined directly or through another table.

Here’s the code:

SELECT ccu.table_name AS parent_table,
       ccu.column_name AS parent_column,
       kcu.table_name AS child_table,
       kcu.column_name AS child_column      
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
ORDER BY kcu.table_name;

In the query, we use info from the table_constraints, key_column_usage, and constraint_column_usage views.

The constraint_column_usage view shows the columns used by a constraint. We’ll use it to show the FK parent table and column, i.e., the table and column where this FK is actually a PK.

The data from key_column_usage will show the table and the column of an FK.

parent_table	parent_column	child_table	child_column
orders	order_id	customers	last_order_id
orders	order_id	customers	first_order_id
channels	id	customers	channel_id
orders	order_id	order_items	order_id
products	product_id	order_items	product_id
customers	customer_id	orders	customer_id
categories	category_id	products	category_id

From the output above, we see that the table orders is directly connected to the table customers via the columns order_id (PK) and customer_id (FK). It’s connected to the table order_items via order_id (PK).

The table channels is directly connected only with the table customers.

We’ve already established that the primary keys of the table order_items are also foreign keys. From the above table, we can see they come from the tables orders and products.

The table categories is connected only with the table products.

2. Data Exploration

When you start working with a new database, the first step should be exploring your data to gain some basic info about each table in a database. It’s a good practice to look at the following:

Counting rows.
Finding minimum and maximum values (especially for dates).
Listing distinct categories.
Counting distinct values in each category.
Counting rows per category value.
Sums and averages for key numerical values.

This is not an exhaustive list, and it can change with the table you’re exploring. However, it’s a minimum and should work on most tables.

In general, this step boils down to some fundamental data aggregation and grouping. I’ll show you how this works on one table from our database, and then you can explore other tables the same way.

Exploring the orders Table

Count of Rows

As expected, we use the COUNT() aggregate function here:

SELECT COUNT(*) AS number_of_rows
FROM orders;

I use the COUNT() function with an asterisk; this will count all the rows, including NULLs.

number_of_rows
11,618

The table orders has 11,618 rows. In other words, there are 11,618 orders.

Minimum and Maximum Values

Let’s see which columns are suitable for use with the MIN() and MAX() functions. Perfect – we can find the oldest and the most recent dates when an order was placed and shipped:

SELECT MIN(order_date) AS oldest_order_date, 
	 MAX(order_date) AS latest_order_date,
	 MIN(shipped_date) AS oldest_shipped_date,
	 MAX(shipped_date) AS latest_shipped_date
FROM orders;

From the output, we can see the first order was on 2017-01-01 and the last was on 2024-06-24. As for the shipping dates, the first is on 2017-01-01 and the most recent is on 2024-06-23.

oldest_order_date	latest_order_date	oldest_shipped_date	latest_shipped_date
2017-01-01 0:00:00	2024-06-24 0:00:00	2017-01-06 0:00:00	2024-06-23 0:00:00

Listing Categories

We don’t need any aggregate functions to do this:

SELECT DISTINCT ship_country
FROM orders
ORDER BY ship_country;

In the query, we use the DISTINCT clause to show each country only once.

Here’s a list of the countries in this table:

ship_country
Argentina
Austria
Belgium
Brazil
Canada
Denmark
Finland
France
Germany
Ireland
Italy
Mexico
Norway
Poland
Portugal
Spain
Sweden
Switzerland
UK
USA
Venezuela

It will also be useful to see a list of cities:

SELECT DISTINCT ship_city
FROM orders
ORDER BY ship_city;

As there are many cities, here’s a partial list:

ship_city
Aachen
Abilene
Achille
Adelphi
Adrian
Akron
Albany
Alberton
Albuquerque
…
Yucca Valley

Counting Distinct Values in Each Category

Let’s now see the number of countries to which Northwind has delivered shipments:

SELECT COUNT(DISTINCT ship_country) AS number_of_countries
FROM orders;

We again use COUNT(), but this time we add DISTINCT and reference the column we want to count.

The output shows there are 21 different countries where the company delivers:

number_of_countries
21

We could do the same for cities:

SELECT COUNT(DISTINCT ship_city) AS number_of_cities
FROM orders;

There are 494 unique cities:

number_of_cities
494

Even though we didn’t list them as a separate category, knowing how many customers have placed orders will be useful:

SELECT COUNT(DISTINCT customer_id) AS number_of_customers_with_orders
FROM orders;

We use COUNT() to count customer IDs. Of course, we need DISTINCT, as it could be expected that some customers will appear many times, i.e., every time they place an order.

There are 761 unique customers who ordered at least once from the company.

number_of_customers_with_orders
761

Counting Rows per Category Value

For this, we want to list each category value (we did that earlier) and count the number of rows for each value.

For example, this will return the number of orders per country:

SELECT ship_country,
	 COUNT(*) AS number_of_orders
FROM orders
GROUP BY ship_country
ORDER BY number_of_orders DESC;

I select the ship_country column and then use COUNT(*) to count all rows. For counting to be by each country, I need to introduce the GROUP BY ship_country clause.

What we get by this is the number of orders by country. The output shows the USA is the biggest market, with 9,789 orders:

ship_country	number_of_orders
USA	9,789
France	236
Brazil	233
Germany	230
UK	144
Spain	123
Mexico	118
Venezuela	103
Argentina	87
Canada	82
Italy	64
Austria	58
Portugal	58
Belgium	56
Denmark	56
Finland	47
Norway	30
Sweden	28
Poland	27
Ireland	26
Switzerland	23

Sum and Average Values

If your data is suitable, you should find averages and totals of numerical values. We can do that for the column total_amount, which is the value of an order:

SELECT SUM(total_amount) AS orders_total_value,
       AVG(total_amount) AS average_order_value
FROM orders;

To show these two calculations, use the SUM() and AVG() aggregate functions.

We can see the total store revenue is a little over 19 million. The average order value is 1,636.15:

orders_total_value	average_order_value
19,008,819.69	1,636.15

Bonus Data Exploration

As I mentioned, the previous aggregations should be the minimum you do for each table in the database.

This data exploration goes a bit further (but not too far) from simple aggregations and GROUP BY. While this is still the basis, you can also employ other concepts like filtering data (using WHERE and/or HAVING), extracting date parts or the time, using CASE WHEN to label data, and so on.

Let’s examine several examples.

Number of Products

We can employ the COUNT() function to find the total number of products the store sells.

SELECT COUNT(*) AS number_of_active_products
FROM products
WHERE discontinued IS FALSE;

There’s a condition in the WHERE clause to show only products that are not discontinued, i.e., these are the products the store is currently selling.

From the earlier section, we know that the column discontinued is a Boolean type. We, therefore, have to use the IS FALSE operator to include only non-discontinued products in the count.

The count is 69:

number_of_active_products
69

Revenue by Year

A simple report showing revenue by year can be created using SUM():

SELECT EXTRACT(YEAR FROM order_date) AS revenue_year,
       SUM(total_amount) AS revenue  
FROM orders
GROUP BY revenue_year;

We use the EXTRACT() function to get only years from the order dates. Then, we sum the total amounts of all orders and group by year to show the values for each year separately.

We can see from the results that the company's peak year was 2018. The year with the lowest revenue is 2024, but this might be because the year is not over (at the time of doing the analysis).

Another interesting thing is that there’s no revenue data for the years 2019-2022. This should be checked to see if the data is missing for a reason or if that’s a mistake.

revenue_year	revenue
2017	3,088,759.84
2018	9,368,330.91
2023	4,646,048.11
2024	1,905,680.83

Customer Segmentation by Country

We have data about customers’ countries, so an overview of the number of customers in each country would be informative.

Also, we are not interested in customers who haven’t placed an order, as this can artificially inflate the number of customers. No, we want only customers who order from us. Here’s the query:

SELECT country,
	 COUNT(*) AS number_of_customers
FROM customers
WHERE first_order_id IS NOT NULL
GROUP BY country
ORDER BY number_of_customers DESC;

We select and group by country and use the COUNT() aggregate function to find the number of customers. If the data in the column first_order is not null, then this customer placed at least one order; this is the condition we have to use in WHERE.

The data is ordered from the highest to the lowest number of customers.

The output shows that Northwind’s largest market in terms of customers is the USA. It’s also safe to conclude that it’s the biggest market by revenue.

country	number_of_customers
USA	697
Germany	8
France	8
Brazil	8
UK	5
Venezuela	4
Spain	4
Mexico	4
Argentina	3
Canada	3
Belgium	2
Denmark	2
Portugal	2
Finland	2
Italy	2
Austria	2
Sweden	1
Poland	1
Ireland	1
Switzerland	1
Norway	1

3. Advanced Data Analysis and Actionable Insights

What we did so far is a good start. However, data analysis in an SQL project for a portfolio shouldn’t stop at simple aggregation and exploration for each table.

We’ll now go beyond that to write more complex queries that will allow us to take action and improve Northwind’s business. For example, we may want to see how the sales change depending on the channel. Let’s see what we get and then decide what our next steps will be.

Sales by Channel

For each channel, we want to show the total sales, average order value, total number of orders, and number of unique customers. We also want to rank channels by total sales. Here’s the query:

SELECT ch.channel_name,
       SUM(o.total_amount) AS total_sales,
       AVG(o.total_amount) AS average_order_value,
       COUNT(o.order_id) AS total_orders,
       COUNT(DISTINCT o.customer_id) AS unique_customers,
       RANK() OVER (ORDER BY SUM(o.total_amount) DESC) AS sales_rank
FROM channels ch
LEFT JOIN customers c
ON ch.id = c.channel_id
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY ch.channel_name
ORDER BY sales_rank;

This query uses several aggregations: SUM() to calculate total sales, AVG() for the average order value, and COUNT() for the total number of orders and (with DISTINCT) for unique customers.

Then, we use the RANK() window function to rank channels by total sales. Since window functions are executed before aggregation, we can’t simply use the total_sales column in the window function. Instead, I need to replicate the entire calculation – a simple task completed by copying and pasting.

We use data from the tables channels and orders. However, I can’t directly join these two tables as they don’t have a shared column. Instead, we have to join them via the table customers.

Here’s the output:

channel_name	total_sales	average_order_value	total_orders	unique_customers	sales_rank
Organic Search	14,003,046.95	1,603.10	8,735	565	1
Direct	2,436,649.06	1,799.59	1,354	91	2
Referral	1,034,734.45	1,645.05	629	46	3
Social	837,378.77	1,824.35	459	32	4
Paid Search	483,824.24	1,645.66	294	19	5
Email	213,186.22	1,450.25	147	8	6

We see that most of Northwind’s sales come through organic search. There are no contradictions in the data – the rank by total sales also reflects the number of orders and unique customers.

It’s interesting to note that the Organic Search channel doesn’t have the highest average order value. This calls for some improvement!

Let’s envisage a revenue increase strategy that focuses on increasing the average order value of all channels rather than on new acquisitions.

We probably won’t be able to take a uniform approach based only on the marketing channel. Within each segment, there might be customers with polar opposite spending habits. This, we can assume, especially applies to Organic Search, which is such a huge channel.

So, we need to learn more about the order value distribution for each channel.

Order Value Distribution

Let’s calculate the average order value, median, upper quartile, lower quartile, and standard deviation for each channel:

SELECT ch.channel_name,
	 AVG(total_amount) AS average_order_value,
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) AS median_order_value,
       PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount) AS upper_quartile_order_value,
       PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_amount) AS lower_quartile_order_value,
       STDDEV(total_amount) AS order_value_stddev
FROM channels ch
LEFT JOIN customers c
ON ch.id = c.channel_id
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY ch.channel_name;

The average order value is calculated by AVG().

The median (or 50th percentile), upper quartile, and lower quartile are calculated using the PERCENTILE_CONT() ordered-set aggregate function, with the percentile specified in parentheses.

For the percentile calculation, data has to be sorted in ascending order; we do that using the WITHIN GROUP clause.

After the percentile calculation, STDDEV() is used to calculate the standard deviation.

Here’s the output:

channel_name	average_order_value	median_order_value	upper_quartile_order_value	lower_quartile_order_value	order_value_stddev
Direct	1,799.59	1,005.90	2,166.80	517.40	2,323.63
Email	1,450.25	960.00	1,970.41	492.50	1,655.97
Organic Search	1,603.10	1,007.70	2,018.20	480.00	1,916.39
Paid Search	1,645.66	1,083.00	2,104.40	486.55	1,813.22
Referral	1,645.05	1,064.00	2,034.50	482.90	1,969.01
Social	1,824.35	1,122.80	2,123.20	559.90	2,319.10

Let’s analyze the metrics for the first channel.

The Direct channel’s standard deviation is 2,323.63, which is higher than the mean or average value. This indicates high variability, i.e., there are probably outliers or a wide range of order values.

Lower and upper quartiles show that 50% of orders fall between 517.40 and 2,166.80. However, a high standard deviation means many orders are outside of this range.

The median is significantly lower than the mean, which indicates that the distribution is right-skewed, i.e., a low number of high-order values that increase the mean.

We can analyze each channel the same way.

This gives us another idea. To create customized strategies for revenue increase, we can further segment the customers by the upper and lower quartile. We’ll consider all the customers above the upper quartile as high spenders; low spenders will be the customers below the lower quartile.

That way, we can adapt our strategy to each channel and each group’s spending habits.

High- and Low-Value Customers by Channel

I will do this calculation for only one marketing channel. You can do it for all the rest, as the code will stay the same; you only change the channel in WHERE.

The customers in the upper quartile are high-value customers, and we want them on a separate list:

WITH customer_order_totals AS (
    SELECT c.customer_id,
           c.full_name,
           SUM(o.total_amount) AS total_order_value,
           ch.channel_name
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN channels ch ON c.channel_id = ch.id
    WHERE ch.channel_name = 'Direct'
    GROUP BY c.customer_id, c.full_name, ch.channel_name
),

upper_quartile_value AS (
    SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_order_value) AS upper_quartile
    FROM customer_order_totals
)

SELECT cot.customer_id,
       cot.full_name,
       cot.total_order_value,
       cot.channel_name
FROM customer_order_totals cot
JOIN upper_quartile_value uqv 
ON cot.total_order_value >= uqv.upper_quartile
ORDER BY cot.total_order_value DESC;

The first CTE calculates each customer's total order value by using SUM() and joining the tables customers and orders. In WHERE, we filter out all the channels except Direct.

The second CTE calculates the upper quartile in a familiar manner.

The third SELECT then CROSS JOINs the two CTEs to find all the customers whose order value is above the upper quartile.

Here are the first ten rows of the output:

customer_id	full_name	total_order_value	channel_name
134	Barry Michael	79,371.50	Direct
152	Carolann Williams	64,365.21	Direct
7	Frédérique Citeaux	61,865.74	Direct
17	Sven Ottlieb	57,251.14	Direct
64	Sergio Gutiérrez	55,140.75	Direct
490	Alice Blevins	54,736.24	Direct
8	Martín Sommer	54,499.55	Direct
303	Gregory Mack	52,554.20	Direct
316	Jeff Heard	51,976.31	Direct
129	Stephan Bufford	50,868.70	Direct

In total, there are 23 customers in the output. We can use this information to target these high-value customers. For example, we can create loyalty programs where these customers can get special discounts, personalized service, redeemable points for purchases, VIP membership programs, and so on.

Similarly, we can list customers whose orders are below the lower quartile:

WITH customer_order_totals AS (
    SELECT c.customer_id,
           c.full_name,
           SUM(o.total_amount) AS total_order_value,
	       ch.channel_name
    FROM customers c
    JOIN orders o 
    ON c.customer_id = o.customer_id
    JOIN channels ch
    ON c.channel_id = ch.id
	WHERE ch.channel_name = 'Direct'
    GROUP BY c.customer_id, c.full_name, ch.channel_name
),

lower_quartile_value AS (
    SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_order_value) AS lower_quartile
    FROM customer_order_totals
)

SELECT cot.customer_id,
       cot.full_name,
       cot.total_order_value,
       cot.channel_name
FROM customer_order_totals cot, lower_quartile_value lqv
WHERE cot.total_order_value <= lqv.lower_quartile
ORDER BY cot.total_order_value ASC;

This query is almost the same as the previous one except this time we calculate the lower quartile. Also, we search for customers with orders below that.

Once again, the query returns 191 customers. Here are the first ten rows:

customer_id	full_name	total_order_value	channel_name
939	Shannon Aguilar	98.40	Direct
997	Barbra Armstrong	251.50	Direct
687	Andrew Scott	452.90	Direct
787	Dennis Myer	912.00	Direct
917	Les Allen	991.40	Direct
921	Shelby Turgeon	1,162.25	Direct
560	Nancy Wiggins	1,425.80	Direct
678	Tracey Thomas	2,555.20	Direct
756	Dora Rowlands	2,713.50	Direct
715	George Scott	2,906.50	Direct

These customers need to be approached differently, as they need to be motivated to spend more. Strategies to increase revenue from them may include personalized product bundles, free shipping above a certain amount, increasing discounts for higher order thresholds, or offering a free gift above a certain order amount.

Now, do the same analysis for all other channels and think about what tactics you could use to increase the order values from that channel.

Ready for Your Own SQL Portfolio Project?

In this article, we showed how to prepare an SQL portfolio project using the Northwind store database. Of course, if you feel creative enough, you can also create your own dataset.

Completing an SQL project for your portfolio is an important part of preparing for the hiring process. Portfolios are a great showcase of your practical data analysis skills in SQL. The above project is just an example. There are many more analysis ideas you can try out in our SQL Databases for Practice course, which we highly recommend!

Tags: