Skip to content

Latest commit

Β 

History

History
143 lines (100 loc) Β· 5.57 KB

File metadata and controls

143 lines (100 loc) Β· 5.57 KB

πŸ”— Topic 10.1: Why We Need Joins

One of the most frequent and important operations in SQL. Every time a website loads a user's profile with their order history, every time a dashboard groups products by category β€” it's using a JOIN. Without joins, relational databases would be useless.


1. Definition

JOIN: A SQL operation that combines rows from two or more tables based on a related column β€” usually a Primary Key from one table matching a Foreign Key from another.


2. The Problem Joins Solve

Why not just store everything in one big table? Let's see what happens:

The "Everything in One Table" Disaster:

Order_ID Customer_Name Customer_Email Customer_Phone Product_Name Product_Price
1001 Rohan Sharma rohan@email.com 9876543210 iPhone 15 79999
1002 Rohan Sharma rohan@email.com 9876543210 AirPods 14999
1003 Priya Singh priya@email.com 8765432109 iPhone 15 79999

Problems:

  • Redundancy: Rohan's email is stored twice. If he changes it, we must update every order row.
  • Update Anomaly: Change "iPhone 15" price in 1 row β€” other rows still show the old price.
  • Deletion Anomaly: Delete Order 1003 β†’ Priya's contact info is gone forever.

The Normalized (Correct) Solution: Store customers in their own table, products in their own table, and orders in their own table. Then use JOINs to put the picture back together when needed.


3. How Joins Work (Primary Key + Foreign Key)

The join "link" is always based on the PK-FK relationship:

Customers table              Orders table
Customer_ID (PK) ←────────  Customer_ID (FK)
Customer_Name                Order_ID
Email                        Total_Amount
-- This JOIN reconstructs the relationship at read time:
SELECT c.Customer_Name, o.Total_Amount
FROM Customers c
INNER JOIN Orders o ON c.Customer_ID = o.Customer_ID;

4. How It Works Internally (Nested Loop Join β€” PRO LEVEL)

MySQL's most common join algorithm is the Nested Loop Join (NLJ):

  1. MySQL picks the "Outer Table" (usually the smaller or first table).
  2. For each row in the outer table, it does a lookup in the inner table using the join key.
  3. If the inner table's join column is indexed β†’ B-Tree O(log N) lookup. ⚑ Fast!
  4. If it is NOT indexed β†’ full table scan for EVERY row. 🐌 Catastrophic!

Critical Rule: Always index your Foreign Keys. A missing index on a join column in a million-row table can turn a 20ms query into a 30-second query.

-- After creating the join, always check performance!
EXPLAIN SELECT c.Name, o.Total FROM Customers c JOIN Orders o ON c.ID = o.Customer_ID;
-- Look for 'type: ALL' β†’ means no index is being used!

5. Types of Joins (Overview)

Join Type What it returns
INNER JOIN Only rows that have a match in BOTH tables
LEFT JOIN All rows from the left table + matches from right (NULL if no match)
RIGHT JOIN All rows from the right table + matches from left
FULL OUTER JOIN All rows from both tables (simulated in MySQL with UNION)
CROSS JOIN Every possible combination of rows (Cartesian Product)
SELF JOIN A table joined to itself

6. The Basic JOIN Syntax Template

SELECT t1.column1, t2.column2
FROM table1 AS t1
[JOIN TYPE] table2 AS t2 ON t1.shared_column = t2.shared_column
WHERE optional_filter
ORDER BY optional_sort;

Always use short table aliases (c for Customers, o for Orders). Without aliases, column names become ambiguous and queries become unreadable.


7. Real-Life Examples

Example 1 β€” E-commerce Invoice:

SELECT c.Customer_Name, o.Order_ID, o.Total_Amount, o.Order_Date
FROM Customers c
INNER JOIN Orders o ON c.Customer_ID = o.Customer_ID
WHERE o.Status = 'Delivered'
ORDER BY o.Order_Date DESC;

Example 2 β€” Hospital: Patient + Doctor + Ward:

SELECT p.Patient_Name, d.Doctor_Name, w.Ward_Name
FROM Patients p
INNER JOIN Doctors d ON p.Doctor_ID = d.Doctor_ID
INNER JOIN Wards w ON p.Ward_ID = w.Ward_ID;

8. Common Mistakes

  • Forgetting the ON clause (Accidental CROSS JOIN): If you write SELECT * FROM A, B or JOIN without ON, MySQL joins every row of A with every row of B. 1,000 rows Γ— 1,000 rows = 1,000,000 rows of garbage. Always write the ON condition!
  • Not using aliases: SELECT Employees.Name, Departments.Name is verbose and error-prone on large queries. Use e.Name, d.Name with aliases.
  • Joining on un-indexed Foreign Keys: The most common cause of slow queries in production. Run SHOW INDEX FROM table_name to verify your FK columns are indexed.

9. Tips & Best Practices

  • Always index Foreign Key columns. Run CREATE INDEX idx_orders_customer ON Orders(Customer_ID) after creating the FK.
  • Use EXPLAIN before any complex join goes to production to verify the optimizer is using indexes.
  • Prefer explicit JOIN over comma syntax. Use A JOIN B ON A.id = B.id, NOT FROM A, B WHERE A.id = B.id. The explicit syntax is clearer and safer.

10. Mini Practice Tasks

  • Task 1: A Students table and a Courses table are linked by Student_ID. Write the JOIN query to display each student's name alongside the course they're enrolled in.
  • Task 2: Why is missing an index on a Foreign Key column so dangerous for JOIN performance? What happens internally when the index is missing?
  • Task 3: You have an Orders table and a Customers table. Write an EXPLAIN query to check if the join is using an index.