What Is It?
What Is GROUP BY?
In the last chapter you computed aggregates over the whole table or over rows filtered by WHERE. But what if you want one aggregate per department, or per city, or per month? That is what GROUP BY does.
GROUP BY splits the rows into groups based on the values of one or more columns, then applies aggregate functions within each group. The result has one row per group.
Example: SELECT department, AVG(salary) FROM employees GROUP BY department returns one row for each department with that department's average salary.
What Is HAVING?
Once you have grouped rows, you often want to filter the groups themselves. "Show only departments with more than 5 employees." "Show only cities where total sales exceed 100000." That is HAVING.
WHERE filters rows before grouping. HAVING filters groups after. This is the single most important distinction in this chapter and a favorite interview question.
Why Does It Matter?
Why GROUP BY Is Essential
1. Every Summary Report Uses GROUP BY
"Revenue per country," "Orders per month," "Active users per day," "Enrollments per course" — all GROUP BY queries. Business intelligence is 80% GROUP BY.
2. It's Where SQL Gets Powerful
Simple SELECT and WHERE queries are useful but limited. GROUP BY unlocks real analytics: top-selling products, busiest hours, best-performing regions. Many data analyst roles exist specifically to write complex GROUP BY queries.
3. HAVING Answers the "Which Group" Questions
"Which courses have more than 100 students?" "Which products have never been returned?" "Which days had zero signups?" All need HAVING because you are filtering GROUPS based on their aggregate properties.
4. Clause Order Misunderstandings Kill Careers
Using an alias in WHERE, putting HAVING before GROUP BY, referencing an aggregate in WHERE — these are career-limiting mistakes in interviews. This chapter nails down the exact order.
5. Every Interview Tests This
"Find departments where the average salary is above 70000" and variants are asked in 100% of SQL interviews. After this chapter, you will write these in your sleep.
Detailed Explanation
Detailed Explanation
Reference Table: sales
We use a sales table for this chapter:
CREATE TABLE sales (
id INT PRIMARY KEY,
salesperson VARCHAR(50),
region VARCHAR(30),
product VARCHAR(40),
amount DECIMAL(10,2),
sale_date DATE
);
INSERT INTO sales VALUES
(1, 'Aarav Sharma', 'North', 'Laptop', 55000, '2026-01-10'),
(2, 'Priya Patel', 'South', 'Phone', 25000, '2026-01-15'),
(3, 'Rohan Gupta', 'North', 'Laptop', 60000, '2026-01-20'),
(4, 'Ananya Singh', 'East', 'Tablet', 18000, '2026-02-05'),
(5, 'Aarav Sharma', 'North', 'Phone', 22000, '2026-02-12'),
(6, 'Vikram Reddy', 'South', 'Laptop', 58000, '2026-02-18'),
(7, 'Priya Patel', 'South', 'Tablet', 20000, '2026-02-25'),
(8, 'Rohan Gupta', 'North', 'Phone', 28000, '2026-03-03'),
(9, 'Karan Mehta', 'West', 'Laptop', 65000, '2026-03-10'),
(10, 'Sneha Iyer', 'East', 'Phone', 24000, '2026-03-15'),
(11, 'Aarav Sharma', 'North', 'Tablet', 19000, '2026-03-22'),
(12, 'Ananya Singh', 'East', 'Laptop', 54000, '2026-04-02'),
(13, 'Karan Mehta', 'West', 'Phone', 26000, '2026-04-08'),
(14, 'Diya Nair', 'West', 'Laptop', 62000, '2026-04-15'),
(15, 'Priya Patel', 'South', 'Laptop', 57000, '2026-04-20');1. Basic GROUP BY
Syntax: SELECT group_columns, aggregate(col) FROM table GROUP BY group_columns.
-- Total sales per region
SELECT region, SUM(amount) AS total_sales
FROM sales
GROUP BY region;Rows with the same region are collapsed into one group. SUM(amount) sums the amounts within each group. Result has 4 rows — one per region.
2. The Rule: Non-Aggregated Columns MUST Be in GROUP BY
Standard SQL requires every column in SELECT to be either (a) inside an aggregate function, or (b) listed in GROUP BY. This is strict:
-- CORRECT
SELECT region, SUM(amount) FROM sales GROUP BY region;
-- BAD SQL (but MySQL may allow in lenient mode)
SELECT region, salesperson, SUM(amount) FROM sales GROUP BY region;
-- salesperson isn't aggregated and isn't in GROUP BY — what should MySQL return?MySQL's default strict mode (ONLY_FULL_GROUP_BY, since 5.7) throws an error. Old lenient mode silently picked an arbitrary value — a dangerous bug. Always follow the rule.
3. GROUP BY Multiple Columns
You can group by combinations:
-- Sales per (region, product)
SELECT region, product, SUM(amount) AS total
FROM sales
GROUP BY region, product;Groups are unique (region, product) pairs. North-Laptop, North-Phone, South-Laptop, etc.
4. HAVING — Filtering Groups
HAVING applies AFTER grouping. You can use aggregates in HAVING:
-- Regions with total sales > 100000
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;First, GROUP BY creates region groups with their sums. Then HAVING keeps only groups where the sum exceeds 100000.
5. WHERE vs HAVING — The Critical Distinction
WHERE filters rows BEFORE aggregation. HAVING filters groups AFTER.
-- Only Laptop sales, per region, where that region has > 100000 in laptops
SELECT region, SUM(amount) AS laptop_revenue
FROM sales
WHERE product = 'Laptop' -- filter rows first: keep only Laptop sales
GROUP BY region
HAVING SUM(amount) > 100000; -- filter groups: keep only big laptop regionsWHERE trims the input. HAVING trims the groups. Different jobs, different clauses.
6. Order of Clauses — Syntax vs Logic
Syntactic order (how you write it):
SELECT columns
FROM table
WHERE row_conditions
GROUP BY group_columns
HAVING group_conditions
ORDER BY columns
LIMIT n;Logical execution order (how SQL actually runs it):
- FROM — identify the table
- WHERE — filter rows
- GROUP BY — form groups
- HAVING — filter groups
- SELECT — compute selected columns and aggregates
- ORDER BY — sort
- LIMIT — trim to N
Two consequences:
- WHERE cannot reference aggregates or column aliases (they don't exist yet).
- ORDER BY CAN reference aggregates and aliases (SELECT ran already).
7. GROUP BY with ORDER BY and LIMIT
The full power:
-- Top 3 regions by total sales
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
ORDER BY total DESC
LIMIT 3;GROUP BY creates groups, ORDER BY sorts them by the aggregate, LIMIT keeps top 3.
8. Counting Per Group
-- Number of sales per salesperson
SELECT salesperson, COUNT(*) AS num_sales, SUM(amount) AS total
FROM sales
GROUP BY salesperson
ORDER BY num_sales DESC;This answers "who made the most sales?" Aarav has 3 sales, Priya has 3, Rohan has 2, etc.
9. Multiple Aggregates Per Group
SELECT
region,
COUNT(*) AS num_sales,
SUM(amount) AS total,
AVG(amount) AS avg_sale,
MIN(amount) AS smallest,
MAX(amount) AS largest
FROM sales
GROUP BY region;Gives a complete per-region summary in one query.
10. WITH ROLLUP — Adding Subtotals
MySQL's WITH ROLLUP adds grand total (and subtotals for multi-column GROUP BY):
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region WITH ROLLUP;Returns per-region sums PLUS a final row with region=NULL and total across all regions. Useful for reports.
11. Classic Interview Patterns
Top N per group (requires window functions, chapter 18): "Top 3 salespeople per region." Cannot be done cleanly with GROUP BY alone. We will tackle this with ROW_NUMBER() later.
Filtering groups by count:
-- Regions with 4 or more sales
SELECT region, COUNT(*) AS n FROM sales
GROUP BY region HAVING COUNT(*) >= 4;Filtering groups by sum:
-- Salespeople with total sales above 100000
SELECT salesperson, SUM(amount) AS total FROM sales
GROUP BY salesperson HAVING SUM(amount) > 100000;Combining WHERE and HAVING:
-- For Laptops only, regions with more than 2 laptop sales
SELECT region, COUNT(*) AS laptop_sales FROM sales
WHERE product = 'Laptop'
GROUP BY region
HAVING COUNT(*) > 2;
Code Examples
SELECT region, SUM(amount) AS total_sales
FROM sales
GROUP BY region;SELECT region, product, COUNT(*) AS sales_count, SUM(amount) AS revenue
FROM sales
GROUP BY region, product
ORDER BY region, product;-- Regions with total sales above 150000
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) > 150000;-- For Laptop sales only, regions with total laptop revenue > 100000
SELECT region, SUM(amount) AS laptop_revenue
FROM sales
WHERE product = 'Laptop' -- filter rows: keep only Laptop
GROUP BY region
HAVING SUM(amount) > 100000; -- filter groups-- Top 3 salespeople by total revenue
SELECT salesperson, SUM(amount) AS total_revenue, COUNT(*) AS num_sales
FROM sales
GROUP BY salesperson
ORDER BY total_revenue DESC
LIMIT 3;-- Salespeople who made more than 2 sales
SELECT salesperson, COUNT(*) AS num_sales, SUM(amount) AS total
FROM sales
GROUP BY salesperson
HAVING COUNT(*) > 2;-- WRONG
SELECT region, SUM(amount)
FROM sales
WHERE SUM(amount) > 100000 -- ERROR
GROUP BY region;
-- CORRECT: use HAVING
SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;-- Total sales per month of 2026
SELECT
MONTH(sale_date) AS month,
COUNT(*) AS num_sales,
SUM(amount) AS total_revenue
FROM sales
WHERE YEAR(sale_date) = 2026
GROUP BY MONTH(sale_date)
ORDER BY month;-- Subtotals per region plus a grand total
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region WITH ROLLUP;Common Mistakes
Using WHERE to Filter Aggregates
SELECT region, SUM(amount)
FROM sales
WHERE SUM(amount) > 100000
GROUP BY region;SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;Non-Aggregated Columns Missing From GROUP BY
SELECT region, salesperson, SUM(amount)
FROM sales
GROUP BY region;-- If you want per-region, drop salesperson:
SELECT region, SUM(amount) FROM sales GROUP BY region;
-- If you want per-(region, salesperson), include both:
SELECT region, salesperson, SUM(amount) FROM sales GROUP BY region, salesperson;Confusing Clause Order
-- Trying to write HAVING before GROUP BY
SELECT region, SUM(amount)
FROM sales
HAVING SUM(amount) > 100000
GROUP BY region;SELECT region, SUM(amount)
FROM sales
GROUP BY region
HAVING SUM(amount) > 100000;Using WHERE Alias
SELECT region, SUM(amount) AS total
FROM sales
WHERE total > 50000 -- alias not available in WHERE
GROUP BY region;-- Cannot use alias in WHERE. Options:
-- 1) Repeat the expression (but aggregates don't work in WHERE):
-- 2) Use HAVING with the alias (MySQL) or expression:
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING total > 50000; -- MySQL lets you use the alias hereExpecting GROUP BY To Return Top N Per Group
-- Goal: top 2 salespeople per region. This does NOT do that.
SELECT region, salesperson, SUM(amount)
FROM sales
GROUP BY region, salesperson
ORDER BY region, SUM(amount) DESC
LIMIT 2;-- Top 2 per group requires window functions (chapter 18):
SELECT region, salesperson, total FROM (
SELECT region, salesperson, SUM(amount) AS total,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS rn
FROM sales
GROUP BY region, salesperson
) t
WHERE rn <= 2;Summary
- GROUP BY groups rows that share column values, then applies aggregates within each group. Result has one row per group.
- Every column in SELECT must be either aggregated or listed in GROUP BY. MySQL's strict mode (ONLY_FULL_GROUP_BY) enforces this; always code as if enforced.
- GROUP BY multiple columns creates groups for each unique combination: GROUP BY region, product groups by (region, product) pairs.
- HAVING filters groups based on aggregate conditions. WHERE filters rows BEFORE grouping; HAVING filters GROUPS after.
- Use WHERE for row-level conditions (status = 'Delivered'). Use HAVING for aggregate-level conditions (SUM(amount) > 100000).
- Syntactic order (how you write): SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT. Cannot rearrange.
- Logical execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Aggregates and aliases are created in SELECT.
- WHERE cannot reference aggregates or SELECT aliases (they don't exist yet). HAVING and ORDER BY can (MySQL/PostgreSQL differ slightly).
- WITH ROLLUP (MySQL) adds subtotals and a grand total row where grouped columns are NULL. Useful for executive reports.
- GROUP BY alone cannot return 'top N per group.' That pattern needs window functions (ROW_NUMBER, RANK) — covered in chapter 18.