Wednesday, 28 September 2022

Joining Tables In SQL

In a typical relational database, data is often distributed in different tables. This is by design; normalization is utilized for scalability and data integrity. However, this leads to a common challenge. What if the data required is found in multiple different tables?


Joining tables!

That is where Joins come in. There are several types of Joins, and in order to make this as system-agnostic as possible, I will be covering Joins that are common to most database systems. For examples, we will be using the following tables below.

tblPlayers

Id Name Club RoleId
1 Alisson Becker Liverpool 1
2 Roberto Firmino Liverpool 6
3 Erling Haaland Manchester City 6
4 Bruno Fernandes Manchester United 0
5 Harry Kane Tottenham 6
6 Richarlisson Tottenham 6
7 Andy Robertson Liverpool 3
8 Édouard Mendy Chelsea 1
9 Adam Smith Bournemouth 4
10 Nathan Redmond Southampton 5
11 James Tarkowski Everton 2
12 David De Gea Manchester United 1
13 Anthony Martial Manchester United 0
14 John Stones Manchester City 3
15 Adam Lallana Brighton 0
16 Ben White Arsenal 2
17 Marquinhos Arsenal 0
18 James Maddison Leicester City 0
19 Jonny Evans Leicester City 0
20 Jamie Vardy Leicester City 0


tblRoles

Id Name Type
1 Goalkeeper DEF
2 Center-back DEF
3 Full-back DEF
4 Defensive Midfielder MID
5 Attacking Midfielder MID
6 Forward ATK
7 Winger MID


Since I'm a soccer nut, there's a table of football players, tblPlayers, and a table of roles, tblRoles. I know that modern football has evolved beyond static roles, but bear with me; this is only an example. In tblPlayers, the RoleId column is a foreign key to tblRoles, corresponding to that table's Id column.

Not all the rows in tblPlayers has a corresponding entry in tblRoles. Some of the entries in the RoleId column is a 0, which corresponds to nothing in tblRoles. While I do know the information, I have deliberately done this to illustrate excluded rows in different Joins.

Inner Joins

When one table is joined to another via an Inner Join, it means that all rows from the first table where the foreign key matches with an entry in the second table, are returned.

A sample INNER JOIN query
SELECT p.Id as Id, p.Name, r.Name as Role
FROM tblPlayers p  as Name
INNER JOIN tblRoles r
ON r.Id = p.RoleId

13 results
Id Name Role
1 Alisson Becker Goalkeeper
2 Roberto Firmino Forward
3 Erling Haaland Forward
5 Harry Kane Forward
6 Richarlisson Forward
7 Andy Robertson Full-back
8 Édouard Mendy Goalkeeper
9 Adam Smith Defensive Midfielder
10 Nathan Redmond Attacking Midfielder
11 James Tarkowski Center-back
12 David De Gea Goalkeeper
14 John Stones Full-back
16 Ben White Center-back


There are only 13 out of 20 players in the result set. You can see that players such as James Maddison and Jamie Vardy do not make the list. This is because their RoleId is 0, which does not have a corresponding value in the Id column of tbLRoles.

Venn Diagram of INNER JOIN



Outer Joins

There are two kinds of Outer Joins - the Left Outer Join and the Right Outer Join. Normally, this is shortened to just "Left Join" or "Right Join" respectively. This means that all the rows in one table are returned, with null values from the other table where no data is available.

A sample LEFT JOIN query
SELECT p.Id as Id, p.Name, r.Name as Role
FROM tblPlayers p  as Name
LEFT JOIN tblRoles r
ON r.Id = p.RoleId

20 results
Id Name Role
1 Alisson Becker Goalkeeper
2 Roberto Firmino Forward
3 Erling Haaland Forward
5 Harry Kane Forward
6 Richarlisson Forward
7 Andy Robertson Full-back
8 Édouard Mendy Goalkeeper
9 Adam Smith Defensive Midfielder
10 Nathan Redmond Attacking Midfielder
11 James Tarkowski Center-back
12 David De Gea Goalkeeper
13 Anthony Martial NULL
14 John Stones Full-back
15 Adam Lallana NULL
16 Ben White Center-back
17 Marquinhos NULL
18 James Maddison NULL
19 Jonny Evans NULL
20 Jamie Vardy NULL


In a Left Join, the first table takes precedence. Therefore, in the resultant table, all records in tblPlayers are returned - a full 20 out of 20. However, those who have a RoleId of 0 will have a null value in the Role column because there is no such value in the tblRoles table.

Venn Diagram of LEFT JOIN



A sample RIGHT JOIN query
SELECT p.Id as Id, p.Name, r.Name as Role
FROM tblPlayers p  as Name
RIGHT JOIN tblRoles r
ON r.Id = p.RoleId

13 results
Id Name Role
1 Alisson Becker Goalkeeper
2 Roberto Firmino Forward
3 Erling Haaland Forward
5 Harry Kane Forward
6 Richarlisson Forward
7 Andy Robertson Full-back
8 Édouard Mendy Goalkeeper
9 Adam Smith Defensive Midfielder
10 Nathan Redmond Attacking Midfielder
11 James Tarkowski Center-back
12 David De Gea Goalkeeper
14 John Stones Full-back
16 Ben White Center-back
NULL NULL Winger


In a Right Join, the second table takes precedence. Therefore, in the resultant table, all records in tblRoles are present. There may be repeats where RoleId is referenced more than once. You will see that for the role "Winger", there is an entry; however, since there are no entries in tbPLayers that ave this role, the value for the Name column is null.

Venn Diagram of RIGHT JOIN



Last words on Joins!

Joins are incredibly useful when data needs to be assembled. And often, this is necessary to create a coherent and comprehensive data set.

What Joins to use, however, are the usual challenge. It is up to the individual programmer or data analyst to decide what is best for the current context.

Thanks for JOINing,
T___T

No comments:

Post a Comment