In this blog post, we make it through the 2 means of combining data: row-wise (UNION, INTERSECT, etc.) and column-wise (JOIN, etc.)
Table of content
Row-wise combination
UNION [ALL]
INTERSECT [ALL]
EXCEPT [ALL]
Column-wise combination
[INNER] JOIN
LEFT JOIN
RIGHT JOIN
FULL [OUTER] JOIN
NATURAL [INNER, LEFT, RIGHT] JOIN
CROSS JOIN
self-JOIN
For illustrative purposes, we will be using the Pagila database, which is a port of the Sakila database (for MySQL) to Postgresql – the SQL version of this blog post. Pagila can be downloaded from here.
Row-wise combination
Row-wise combination means we combine by appending rows of 2 or more query results together.
For example, we have the first query returns a data with rows, columns and the second query returns a data with rows, columns. Then, after a row-wise combination, the combined result may have () rows and columns.
Prerequisites for a row-wise combination of 2 queries are:
UNION [ALL]
The UNION
operator combines data row-wise by appending all the rows together, however, for duplicated rows, only one of them is kept. The UNION ALL
also appends all the rows but does not eliminate duplicated rows.
Example 1:
Get all actors whose first name is CHRISTIAN or whose last name is AKROYD. Duplicate rows should be eliminated.
( SELECT actor_id, first_name, last_name FROM actor WHERE first_name = 'CHRISTIAN' ) UNION ( SELECT actor_id, first_name, last_name FROM actor WHERE last_name = 'AKROYD' );
actor_id | first_name | last_name |
---|---|---|
92 | KIRSTEN | AKROYD |
58 | CHRISTIAN | AKROYD |
182 | DEBBIE | AKROYD |
61 | CHRISTIAN | NEESON |
10 | CHRISTIAN | GABLE |
Example 2:
Get all actors whose first name is CHRISTIAN or whose last name is AKROYD. Duplicate rows are kept.
( SELECT actor_id, first_name, last_name FROM actor WHERE first_name = 'CHRISTIAN' ) UNION ALL ( SELECT actor_id, first_name, last_name FROM actor WHERE last_name = 'AKROYD' );
actor_id | first_name | last_name |
---|---|---|
10 | CHRISTIAN | GABLE |
58 | CHRISTIAN | AKROYD |
61 | CHRISTIAN | NEESON |
58 | CHRISTIAN | AKROYD |
92 | KIRSTEN | AKROYD |
182 | DEBBIE | AKROYD |
INTERSECT [ALL]
The INTERSECT
operator returns rows that appear in both inputs.
Giving an ALL
after INTERSECT
makes it return duplicated rows. Note that for duplicated rows, the number of duplicates appears in the result equals the minimum number of duplicates in the 2 inputs. (Look at the figure below, pay attention to the number of red rows.)
Example 1:
get the customer-ids that satisfy all the below conditions:
- customer-id < 200
- these customers have a rental at 2017-02-14 15:16:03
- these customers have ever made a payment of more than 10$
- order by the customer-ids
Duplicated rows must be eliminated.
( SELECT customer_id FROM rental WHERE rental_date='2017-02-14 15:16:03' AND customer_id < 200 ORDER BY customer_id ) INTERSECT ( SELECT customer_id FROM payment WHERE amount > 10 AND customer_id < 200 ) ORDER BY customer_id;
customer_id |
---|
21 |
29 |
33 |
87 |
114 |
163 |
168 |
178 |
Example 2:
The conditions are the same as in example 1, however, duplicated rows are kept.
( SELECT customer_id FROM rental WHERE rental_date='2017-02-14 15:16:03' AND customer_id < 200 ORDER BY customer_id ) INTERSECT ALL ( SELECT customer_id FROM payment WHERE amount > 10 AND customer_id < 200 ) ORDER BY customer_id;
customer_id |
---|
21 |
29 |
33 |
87 |
114 |
163 |
163 |
168 |
178 |
EXCEPT [ALL]
A EXCEPT B
returns the elements appear in A but not appear in B. The process is:
- Firstly, it finds all the common rows appear in both A and B. For duplicated rows, the minimum number of times appear in the 2 inputs is recorded. In other words, find the intersection of A and B.
- Secondly, it subtracts these rows from A and takes the remaining.
- With
ALL
, it returns the result. WithoutALL
, it eliminates duplicated rows and returns.
Example 1:
get the customer-ids that satisfy:
- customer-id < 20
- these customers have a rental at 2017-02-14 15:16:03
- have never had a payment of more than 10$
- order by the customer-ids
Duplicated rows must be eliminated.
( SELECT customer_id FROM rental WHERE rental_date='2017-02-14 15:16:03' AND customer_id < 20 ORDER BY customer_id ) EXCEPT ( SELECT customer_id FROM payment WHERE amount > 10 ) ORDER BY customer_id;
customer_id |
---|
5 |
9 |
11 |
14 |
15 |
Example 2:
The conditions are the same as in example 1, however, duplicated rows are kept.
( SELECT customer_id FROM rental WHERE rental_date='2017-02-14 15:16:03' AND customer_id < 20 ORDER BY customer_id ) EXCEPT ALL ( SELECT customer_id FROM payment WHERE amount > 10 ) ORDER BY customer_id;
customer_id |
---|
5 |
9 |
11 |
14 |
15 |
15 |
Column-wise combination
Column-wise combining means we append columns of data together. We also refer to column-wise combination as a JOIN.
For example, we have the first query returns a data with rows, columns and the second query returns a data with rows, columns. Then after joining, the result may have () columns, while the number of rows depends on the type of JOIN we use.
Let’s say we join 2 tables. If we do not state which columns are used to match each row of a table to one or more rows of the other table, every row from one table will be matched with every row from the other. That is, if the first table has rows and the second table has rows, the result will have () rows.
Normally, we do state the matching condition, this can be done using the clause ON
.
[INNER] JOIN
As the default type of join operation is the inner join, we can use JOIN
and INNER JOIN
interchangeably.
INNER JOIN means the result will consist of rows that appear in both input tables. This is opposite to outer join, in which rows that appear in only 1 of the 2 input tables are also valid. Look at the examples and the subsequent JOIN operations will help you be clear about these differences.
Example 1:
Show id, first name, last name and postal code of all customers whose first name is JAMIE.
SELECT customer_id AS id , first_name, last_name, postal_code FROM customer JOIN address ON customer.address_id=address.address_id WHERE first_name='JAMIE';
id | first_name | last_name | postal_code |
---|---|---|---|
146 | JAMIE | RICE | 90732 |
531 | JAMIE | WAUGH | 88408 |
LEFT JOIN
LEFT JOIN
is a type of outer join, which means not only the rows that appear in both inputs are included in the result. Instead, the rows that appear in the left table but not appear in the right table are also accepted. In this case, the missing values are set to NULL
.
Example 1:
Show the language id, name of the language and the titles of films that use each language. Remember to list all the languages, even those that have no corresponding films. Order the result by language id, decreasingly. Show the first 10 rows.
SELECT language.language_id, name, film.title FROM language LEFT JOIN film ON language.language_id=film.language_id ORDER BY language.language_id DESC LIMIT 10;
language_id | name | title |
---|---|---|
6 | German | |
5 | French | |
4 | Mandarin | |
3 | Japanese | |
2 | Italian | |
1 | English | ACADEMY DINOSAUR |
1 | English | ACE GOLDFINGER |
1 | English | ADAPTATION HOLES |
1 | English | AFFAIR PREJUDICE |
1 | English | AFRICAN EGG |
(The blank values in column TITLE mean NULLs)
RIGHT JOIN
Similar to LEFT JOIN
, RIGHT JOIN
is also a type of outer join, in which rows that appear in the right input but not in the left input are also included in the result.
Example 1:
Show the language id, name of the language and the titles of films that use each language. Remember to list all the languages, even those that have no corresponding films. Order the result by language id, decreasingly. Show the first 10 rows. (The same as example 1 for LEFT JOIN
above.)
SELECT language.language_id, name, film.title FROM film RIGHT JOIN language ON language.language_id=film.language_id ORDER BY language.language_id DESC LIMIT 10;
language_id | name | title |
---|---|---|
6 | German | |
5 | French | |
4 | Mandarin | |
3 | Japanese | |
2 | Italian | |
1 | English | ACADEMY DINOSAUR |
1 | English | ACE GOLDFINGER |
1 | English | ADAPTATION HOLES |
1 | English | AFFAIR PREJUDICE |
1 | English | AFRICAN EGG |
(The blank values in column TITLE mean NULLs)
FULL [OUTER] JOIN
FULL JOIN and FULL OUTER JOIN are interchangeable, they both include in the result all rows that appear in at least one of the 2 input tables.
Example 1:
We are curious about if there is any email belonging to both a customer and a staff. Show customer id and staff id for each email in the database.
SELECT customer_id, staff_id FROM customer FULL JOIN staff ON customer.email=staff.email;
customer_id | staff_id |
---|---|
1 | |
2 | |
502 | |
70 | |
35 | |
316 | |
445 | |
224 | |
377 | |
141 |
(The blank values mean NULLs. There are a total of 601 rows returned.)
NATURAL [INNER, LEFT, RIGHT] JOIN
NATURAL JOIN
relieves us from having to tell exactly which columns should be matched when joining. Instead, it implicitly takes all the common column names as matching criteria.
We can also specify which type of join (inner or left or right) to be executed. By default, NATURAL JOIN
assumes an inner join.
Example 1:
Get all information from a natural (inner) join of the customer and rental tables.
These 2 tables have a common column: customer_id. Normally, this query will work:
SELECT * FROM customer NATURAL JOIN rental;
customer_id | last_update | store_id | … |
---|
However, the result shows no rows at all. This is because, in this Pagila database, every table has a column named last_update. Thus, when NATURAL JOIN joins these 2 tables, it requires both customer_id and last_update be equal for 2 rows to be matched, and of course, no pair of rows satisfies this criterion.
Lesson learned: using NATURAL JOIN
requires carefulness.
To solve this issue, we should exclude last_update from the 2 tables, so that rows will be matched using only the customer_id.
SELECT * FROM ( ( SELECT customer_id, first_name, last_name , email, address_id, activebool , create_date, active FROM customer ) a NATURAL JOIN ( SELECT rental_id, rental_date, inventory_id, customer_id , return_date, staff_id FROM rental ) b );
customer_id | first_name | last_name | … |
---|---|---|---|
459 | TOMMY | COLLAZO | … |
408 | MANUEL | MURRELL | … |
333 | ANDREW | PURDY | … |
222 | DELORES | HANSEN | … |
549 | NELSON | CHRISTENSON | … |
269 | CASSANDRA | WALTERS | … |
239 | MINNIE | ROMERO | … |
126 | ELLEN | SIMPSON | … |
399 | DANNY | ISOM | … |
142 | APRIL | BURNS | … |
CROSS JOIN
CROSS JOIN
matches every row from one table to each row from the other, thus, if at the beginning, the tables have and rows, the result will contain () rows in total.
Note that we do not specify any matching criteria for CROSS JOIN
.
Example 1:
Show each possible pair of manager id and staff id.
SELECT manager_staff_id, staff_id FROM staff CROSS JOIN store;
manager_staff_id | staff_id |
---|---|
1 | 1 |
1 | 2 |
2 | 1 |
2 | 2 |
Example 2:
The requirement is the same as in Example 1, however, we will call the cross join in a different way.
SELECT manager_staff_id, staff_id FROM staff, store;
manager_staff_id | staff_id |
---|---|
1 | 1 |
1 | 2 |
2 | 1 |
2 | 2 |
self-JOIN
Self-join is nothing more than joining a table with itself. The little difference between self-join and joining 2 different tables is that in a self-join, we have to make an alias for each instance of the joining table.
A self-join may be either [INNER] JOIN
, LEFT JOIN
, RIGHT JOIN
, FULL JOIN
or even CROSS JOIN
.
Example 1:
Show all pairs of customers who have the same email address.
SELECT * FROM customer c1 JOIN customer c2 ON c1.email=c2.email AND c1.customer_id <> c2.customer_id;
customer_id | store_id | first_name | … |
(The result indicates that there is no pair of customers who share the same email.)
References:
- Postgresqltutorial’s UNION, INTERSECT, EXCEPT, INNER JOIN, LEFT JOIN, FULL JOIN, NATURAL JOIN, self-JOIN
- Sqlzoo’s JOIN tutorial: link