What Is It?
What Are Set Operations?
Set operations combine the results of two or more SELECT queries into a single result set. Instead of joining tables side-by-side (like JOINs do), they stack results on top of each other. The mathematical analogy is the set operations from school: union (all members), intersection (common members), difference (members of A not in B).
SQL defines four set operations: UNION, UNION ALL, INTERSECT, and EXCEPT (called MINUS in Oracle). MySQL supports UNION and UNION ALL natively. As of MySQL 8.0.31 it supports INTERSECT and EXCEPT too, but many production MySQL versions (and MariaDB) do not, so we cover the workarounds carefully.
Sample Tables Used Throughout This Chapter
-- Customers who signed up in India
CREATE TABLE customers_in (id INT, name VARCHAR(50), email VARCHAR(100));
INSERT INTO customers_in VALUES
(1, 'Aarav Kumar', 'aarav@mail.com'),
(2, 'Priya Sharma', 'priya@mail.com'),
(3, 'Rohan Mehta', 'rohan@mail.com'),
(4, 'Meera Iyer', 'meera@mail.com');
-- Customers who signed up in UK
CREATE TABLE customers_uk (id INT, name VARCHAR(50), email VARCHAR(100));
INSERT INTO customers_uk VALUES
(101, 'Priya Sharma', 'priya@mail.com'), -- same person, dual account!
(102, 'Ishaan Gupta', 'ishaan@mail.co.uk'),
(103, 'Ananya Reddy', 'ananya@mail.co.uk');
-- For reconciliation examples
CREATE TABLE users_active (user_id INT, email VARCHAR(100));
INSERT INTO users_active VALUES (1,'a@x.com'),(2,'b@x.com'),(3,'c@x.com');
CREATE TABLE users_purchased (user_id INT, email VARCHAR(100));
INSERT INTO users_purchased VALUES (2,'b@x.com'),(3,'c@x.com'),(4,'d@x.com');
Why Does It Matter?
Why Set Operations Matter
1. Combining Similar Data From Multiple Sources
Many real systems keep partitioned tables: orders_2024, orders_2025, orders_2026 or sales_emea, sales_apac, sales_amer. When you need a unified view across all of them, UNION ALL stacks the rows into one result — no changes to underlying schema required.
2. Data Reconciliation and Diffing
'Which users signed up but have not yet purchased?' 'Which products exist in the warehouse but not in the current catalog?' These are set-difference problems. EXCEPT (or LEFT JOIN + IS NULL) answers them.
3. UNION vs UNION ALL Is a Performance Decision
UNION removes duplicates by performing an implicit sort or hash. UNION ALL just concatenates. On large datasets, UNION can be dramatically slower. Knowing when you can get away with UNION ALL is a real interview question and a real engineering decision.
4. Emulating Missing Operators Is Basic MySQL Survival
Older MySQL versions (pre-8.0.31) and most MariaDB installations lack INTERSECT and EXCEPT. Every working MySQL developer must know the JOIN/EXISTS equivalents. This chapter covers both.
Detailed Explanation
Detailed Explanation
1. UNION - Combine and Deduplicate
UNION stacks the results of two SELECTs and removes duplicate rows from the combined output.
-- All distinct customers across India and UK
SELECT name, email FROM customers_in
UNION
SELECT name, email FROM customers_uk;Output:
+---------------+-------------------+
| name | email |
+---------------+-------------------+
| Aarav Kumar | aarav@mail.com |
| Priya Sharma | priya@mail.com |
| Rohan Mehta | rohan@mail.com |
| Meera Iyer | meera@mail.com |
| Ishaan Gupta | ishaan@mail.co.uk |
| Ananya Reddy | ananya@mail.co.uk |
+---------------+-------------------+Priya appears only once even though she is in both tables, because (name, email) matches exactly. UNION's implicit deduplication requires sorting or hashing, which costs time and memory.
2. UNION ALL - Combine Without Deduplication (Faster)
SELECT name, email FROM customers_in
UNION ALL
SELECT name, email FROM customers_uk;Output now has Priya twice:
-- 4 rows from customers_in + 3 rows from customers_uk = 7 rowsUse UNION ALL when:
- You know duplicates cannot occur (different primary keys, disjoint data).
- You actually want duplicates (e.g., auditing, concatenating logs).
- Performance matters. UNION ALL is almost always faster than UNION.
3. Rules for UNION / UNION ALL
Both sides of the operation must satisfy:
- Same number of columns.
- Compatible data types in corresponding positions. SQL will implicitly convert where reasonable, but mixing INT and VARCHAR usually fails.
- Column names come from the first SELECT. The second query's column names are ignored. Aliases on the first query propagate.
-- Column name 'customer_name' comes from the first query only
SELECT name AS customer_name, email FROM customers_in
UNION
SELECT name, email FROM customers_uk;4. ORDER BY in UNIONs
ORDER BY applies to the combined result, and must be placed at the end:
SELECT name, email FROM customers_in
UNION
SELECT name, email FROM customers_uk
ORDER BY name; -- Sorts the merged listPutting ORDER BY inside one branch is invalid unless it is inside a subquery with LIMIT. You cannot order each branch independently within a UNION.
5. INTERSECT - Common Rows
INTERSECT returns rows present in BOTH queries. PostgreSQL, SQL Server, Oracle, and MySQL 8.0.31+ support it natively:
-- Users who are both active AND purchased
SELECT email FROM users_active
INTERSECT
SELECT email FROM users_purchased;MySQL workaround for older versions: use INNER JOIN or IN:
-- Equivalent using INNER JOIN
SELECT DISTINCT a.email
FROM users_active a
JOIN users_purchased p ON a.email = p.email;
-- Equivalent using IN
SELECT DISTINCT email FROM users_active
WHERE email IN (SELECT email FROM users_purchased);6. EXCEPT (MINUS) - Difference
EXCEPT returns rows in the first query but NOT in the second. Oracle calls it MINUS.
-- Active users who have NOT yet purchased
SELECT email FROM users_active
EXCEPT
SELECT email FROM users_purchased;MySQL workaround: LEFT JOIN + IS NULL, or NOT EXISTS:
-- LEFT JOIN version
SELECT a.email FROM users_active a
LEFT JOIN users_purchased p ON a.email = p.email
WHERE p.email IS NULL;
-- NOT EXISTS version (prefer this — NULL-safe)
SELECT a.email FROM users_active a
WHERE NOT EXISTS (
SELECT 1 FROM users_purchased p WHERE p.email = a.email
);Output for sample data: user 1 (a@x.com) is active but has not purchased.
7. Chaining Multiple UNIONs
You can chain set operations. Precedence: INTERSECT binds tighter than UNION/EXCEPT. Use parentheses when unsure:
SELECT name FROM customers_in
UNION
SELECT name FROM customers_uk
UNION
SELECT name FROM customers_usa;8. Performance: When to Use UNION ALL Instead of UNION
UNION performs deduplication which requires sorting or a hash table, both memory-hungry. On a 10-million-row combined result, this can take minutes and consume gigabytes of RAM. UNION ALL is a simple concatenation.
Rule: use UNION ALL unless you specifically need deduplication. If duplicates would be produced by the data but you want them removed, consider adding a condition that prevents overlap rather than relying on UNION:
-- Instead of this (UNION dedupe is expensive)
SELECT * FROM orders WHERE status = 'shipped'
UNION
SELECT * FROM orders WHERE priority = 'high';
-- Write this (no duplicates by construction)
SELECT * FROM orders
WHERE status = 'shipped' OR priority = 'high';9. Use Cases
- Active users from multiple tables: Paid users + trial users + preview users as one active-user stream (UNION ALL).
- Data reconciliation: What is in table A but missing from table B? (EXCEPT)
- Cross-system matching: Emails that exist in both CRM and billing. (INTERSECT)
- Partitioned tables: Yearly tables unified into one view (UNION ALL).
Code Examples
SELECT name, email FROM customers_in
UNION
SELECT name, email FROM customers_uk
ORDER BY name;SELECT name, email FROM customers_in
UNION ALL
SELECT name, email FROM customers_uk
ORDER BY name;SELECT name, email, 'India' AS source FROM customers_in
UNION ALL
SELECT name, email, 'UK' AS source FROM customers_uk
ORDER BY name;-- Native INTERSECT (MySQL 8.0.31+, PostgreSQL, SQL Server)
SELECT email FROM users_active
INTERSECT
SELECT email FROM users_purchased;-- Works in any MySQL version
SELECT DISTINCT a.email
FROM users_active a
JOIN users_purchased p ON a.email = p.email;
-- Alternative: IN
SELECT DISTINCT email FROM users_active
WHERE email IN (SELECT email FROM users_purchased);SELECT email FROM users_active
EXCEPT
SELECT email FROM users_purchased;-- Method 1: LEFT JOIN + IS NULL
SELECT a.email FROM users_active a
LEFT JOIN users_purchased p ON a.email = p.email
WHERE p.email IS NULL;
-- Method 2: NOT EXISTS (preferred — NULL-safe)
SELECT a.email FROM users_active a
WHERE NOT EXISTS (
SELECT 1 FROM users_purchased p WHERE p.email = a.email
);-- Simulate yearly partitioned tables
CREATE TABLE sales_2024 (id INT, amount DECIMAL(10,2));
CREATE TABLE sales_2025 (id INT, amount DECIMAL(10,2));
CREATE TABLE sales_2026 (id INT, amount DECIMAL(10,2));
INSERT INTO sales_2024 VALUES (1, 1000), (2, 2000);
INSERT INTO sales_2025 VALUES (1, 3000), (2, 4500);
INSERT INTO sales_2026 VALUES (1, 5200);
-- Combined view across all three years (no duplicates by id collision — different years)
SELECT id, amount, 2024 AS year FROM sales_2024
UNION ALL
SELECT id, amount, 2025 FROM sales_2025
UNION ALL
SELECT id, amount, 2026 FROM sales_2026
ORDER BY year, id;Common Mistakes
Using UNION When UNION ALL Is Enough (Performance Trap)
-- Combining disjoint date ranges; duplicates are impossible
SELECT * FROM orders WHERE order_date >= '2025-01-01'
UNION
SELECT * FROM orders WHERE order_date < '2025-01-01';SELECT * FROM orders WHERE order_date >= '2025-01-01'
UNION ALL
SELECT * FROM orders WHERE order_date < '2025-01-01';Mismatched Column Counts
SELECT name, email FROM customers_in
UNION
SELECT name FROM customers_uk;SELECT name, email FROM customers_in
UNION
SELECT name, email FROM customers_uk;SELECT name, NULL AS email FROM ....Placing ORDER BY Inside a UNION Branch
SELECT name FROM customers_in ORDER BY name
UNION
SELECT name FROM customers_uk;SELECT name FROM customers_in
UNION
SELECT name FROM customers_uk
ORDER BY name;(SELECT ... ORDER BY ... LIMIT N) UNION (SELECT ...).Using NOT IN for EXCEPT When Subquery Has NULL
-- Get emails in active but not in purchased
SELECT email FROM users_active
WHERE email NOT IN (SELECT email FROM users_purchased);SELECT email FROM users_active a
WHERE NOT EXISTS (
SELECT 1 FROM users_purchased p WHERE p.email = a.email
);Summary
- UNION combines two SELECT results and removes duplicates. UNION ALL combines them without deduplication and is significantly faster.
- Both branches must have the same number of columns and compatible data types. Column names come from the first SELECT.
- ORDER BY applies to the combined result and must appear at the very end of the entire UNION statement.
- Use UNION ALL whenever duplicates are impossible or acceptable — it skips the expensive sort/hash deduplication step.
- INTERSECT returns rows present in both queries. MySQL 8.0.31+ supports it natively; older versions use INNER JOIN or IN.
- EXCEPT (MINUS in Oracle) returns rows in the first query but not in the second. Workaround: LEFT JOIN + IS NULL, or NOT EXISTS.
- NOT EXISTS is always safer than NOT IN when emulating EXCEPT — NOT IN fails silently if the subquery contains NULL.
- Add a constant column (like 'India' or 2024) to tag rows with their source when combining similar tables with UNION ALL.
- Chained UNIONs are associative. Parentheses rarely change the result but help readability when mixing UNION with INTERSECT or EXCEPT.
- Classic use cases: unifying partitioned tables (orders_2024/2025/2026), data reconciliation (what is in A not in B), cross-system matching.