What Is It?
What Are Window Functions?
Window functions are the single most powerful feature of modern SQL. They compute a value for each row using a window of other rows — without collapsing the result set the way GROUP BY does. That difference is everything.
-- GROUP BY: one row per department, details lost
SELECT dept, AVG(salary) FROM employees GROUP BY dept;
-- Window function: every row preserved, with dept avg alongside
SELECT name, dept, salary,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;Notice the OVER (PARTITION BY dept) clause — that's the window. It tells MySQL: "for each row, compute AVG(salary) over the window of rows sharing its dept value." You keep the row-level detail and get aggregate context on the same row.
Once you know window functions, you will reach for them for:
- Ranking rows (top 3 per category, Nth highest salary)
- Running totals and moving averages
- Comparing each row to the previous / next row (LAG / LEAD)
- Month-over-month growth, retention curves, percentile buckets
Every FAANG / product company data analyst interview has a window function question. Most have several.
Why Does It Matter?
Why Window Functions Matter
1. They Replace Ugly Self-Joins
Pre-window-functions SQL had to compute "current salary minus previous salary per employee" with a self-join on sequence numbers. Now it's one line with LAG. Queries that used to be 30 lines become 4.
2. Interview Favorites Live Here
Nth highest salary, top 3 products per category, first and last order per customer, running totals, month-over-month growth, cumulative percentages, median — these are the classic SQL interview questions. All of them have elegant window-function solutions.
3. Analytics Dashboards Need Row + Aggregate Together
A sales dashboard shows each salesperson's revenue and what % of the team total that is. That requires aggregate context on every row. GROUP BY can't do it without a self-join. Window functions can.
4. They Express Sequential Logic Naturally
Questions like "how many days since the previous order per customer?" are naturally sequential. Window functions with PARTITION BY + ORDER BY let you walk down the timeline per partition.
5. Modern SQL Is Window-Function-Heavy
Data engineering code you will write at any job — ETL for analytics, cohort analysis, attribution windows, session sequencing — is full of window functions. If you can't write them fluently, your SQL ceiling is low.
Detailed Explanation
Detailed Explanation
1. Anatomy of a Window Function Call
function_name() OVER (
[PARTITION BY col1, col2, ...]
[ORDER BY col3 [ASC|DESC], ...]
[frame_clause]
)Three optional parts inside OVER():
- PARTITION BY — split rows into groups. The function restarts at each partition boundary. No PARTITION BY means one single partition = the whole result set.
- ORDER BY — order rows within each partition. Required for ranking functions and for cumulative computations.
- Frame clause — a sub-window within the partition. Controls running totals, moving averages, etc.
2. Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
All three assign a number to each row. They differ in how they handle ties.
-- Sample data: salaries [100, 100, 90, 80, 80]
-- ROW_NUMBER 1 2 3 4 5 (arbitrary tiebreak, unique sequence)
-- RANK 1 1 3 4 4 (ties share rank, then JUMP the skipped numbers)
-- DENSE_RANK 1 1 2 3 3 (ties share rank, NO jump)SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;When to use which:
ROW_NUMBER— pick exactly N rows (pagination, top-3 with tiebreak). Always returns unique 1, 2, 3, ...RANK— competition ranking. 1st, 1st, 3rd (Olympic medals: two gold, no silver).DENSE_RANK— nth-distinct ranking. "Find the 3rd highest DISTINCT salary." No gaps.
3. PARTITION BY — The Game Changer
-- Top salary per department
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees;
-- Each dept gets its own ranking starting from 1.
-- Keep only #1 per dept
SELECT * FROM (
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
) t
WHERE rn = 1;"Top N per group" is the #1 window function interview question. The pattern is always: rank within partition, then filter on the rank in an outer query.
4. LAG and LEAD — Previous and Next Row
LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)-- Previous order date per customer
SELECT customer_id, order_date,
LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order,
DATEDIFF(order_date,
LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)
) AS days_since_prev
FROM orders;Arguments: LAG(col) defaults to offset 1 and default NULL. The first row of each partition has no "previous" row, so LAG returns NULL unless you pass a default.
5. FIRST_VALUE, LAST_VALUE, NTH_VALUE
-- First and last order amount per customer
SELECT customer_id, order_date, amount,
FIRST_VALUE(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_amt,
LAST_VALUE(amount) OVER (PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS last_amt
FROM orders;Trap: LAST_VALUE without an explicit frame returns the CURRENT row, not the last row of the partition. The default frame for ORDER BY is "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", which stops at the current row. Always explicitly override the frame for LAST_VALUE.
6. Running Totals with Aggregates + OVER
-- Running total of daily sales
SELECT sale_date, amount,
SUM(amount) OVER (ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS running_total
FROM sales;Any aggregate (SUM, AVG, COUNT, MIN, MAX) becomes a window function by adding OVER(). Combined with a frame clause, it computes cumulatively.
7. The Frame Clause — ROWS vs RANGE
-- Generic frame syntax
ROWS BETWEEN AND
RANGE BETWEEN AND
-- start/end can be:
UNBOUNDED PRECEDING -- beginning of partition
N PRECEDING -- N rows before current
CURRENT ROW -- current row
N FOLLOWING -- N rows after
UNBOUNDED FOLLOWING -- end of partition ROWS counts physical rows: "3 rows before me." RANGE counts by value: "rows whose ORDER BY column is within 3 of mine." 95% of the time you want ROWS.
-- 7-day moving average of daily sales
SELECT sale_date, amount,
AVG(amount) OVER (ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7d
FROM sales;
-- Centered 3-row moving average
SELECT sale_date, amount,
AVG(amount) OVER (ORDER BY sale_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS ma_3_centered
FROM sales;8. NTILE for Bucketing
-- Divide employees into 4 salary quartiles
SELECT name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;NTILE(n) spreads rows as evenly as possible across n buckets. Useful for percentile analysis, AB-testing cohort splitting, and revenue deciles.
9. PERCENT_RANK and CUME_DIST
-- Both return a fraction between 0 and 1
SELECT name, salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 2) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 2) AS cume_dist
FROM employees;PERCENT_RANK: (RANK - 1) / (total_rows - 1). First row = 0, last row = 1. CUME_DIST: fraction of rows with value <= current. Always > 0.
10. Classic Interview Patterns
Top N Per Group
-- Top 3 employees per dept by salary
SELECT * FROM (
SELECT name, dept, salary,
DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk <= 3;Nth Highest Salary
-- 3rd highest distinct salary
SELECT DISTINCT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = 3;Month-Over-Month Growth
WITH monthly AS (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY month
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100, 2) AS mom_pct
FROM monthly;Running Total and Cumulative Share
SELECT sale_date, amount,
SUM(amount) OVER (ORDER BY sale_date ROWS UNBOUNDED PRECEDING) AS running_total,
ROUND(100.0 * SUM(amount) OVER (ORDER BY sale_date ROWS UNBOUNDED PRECEDING)
/ SUM(amount) OVER (), 2) AS pct_of_total
FROM sales;Consecutive Rows / Gaps and Islands
-- Find streaks of consecutive login dates per user
SELECT user_id, login_date,
login_date - INTERVAL ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) DAY AS grp
FROM logins;
-- Rows with the same grp form a consecutive streak.11. Where Window Functions Can and Can't Appear
- Allowed: SELECT list and ORDER BY of the outer query.
- Not allowed: WHERE, GROUP BY, HAVING. You cannot filter on a window result directly. Wrap in a subquery/CTE and filter on the outer query.
- MySQL supports window functions from 8.0+. MariaDB from 10.2+.
Code Examples
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
dept VARCHAR(30),
salary DECIMAL(10,2)
);
INSERT INTO employees VALUES
(1, 'Aarav Sharma', 'Engineering', 95000),
(2, 'Priya Nair', 'Engineering', 95000), -- tie with Aarav
(3, 'Rohan Mehta', 'Engineering', 82000),
(4, 'Ananya Iyer', 'Engineering', 72000),
(5, 'Vikram Singh', 'Sales', 68000),
(6, 'Meera Krishnan', 'Sales', 68000), -- tie with Vikram
(7, 'Diya Bhatt', 'Sales', 55000);
SELECT name, dept, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;-- Using employees table from above
SELECT name, dept, salary
FROM (
SELECT name, dept, salary,
DENSE_RANK() OVER (
PARTITION BY dept
ORDER BY salary DESC
) AS rnk
FROM employees
) t
WHERE rnk <= 2
ORDER BY dept, salary DESC;CREATE TABLE sales (
sale_date DATE,
amount DECIMAL(10,2)
);
INSERT INTO sales VALUES
('2026-04-10', 1200),
('2026-04-11', 800),
('2026-04-12', 2400),
('2026-04-13', 950),
('2026-04-14', 1500),
('2026-04-15', 700),
('2026-04-16', 1100);
SELECT sale_date, amount,
SUM(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
ROUND(100.0 * amount / SUM(amount) OVER (), 2) AS pct_of_grand_total
FROM sales
ORDER BY sale_date;CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2)
);
INSERT INTO orders VALUES
(1, 101, '2026-01-10', 1200),
(2, 101, '2026-02-15', 800),
(3, 101, '2026-03-20', 1600),
(4, 102, '2026-02-01', 2500),
(5, 102, '2026-03-28', 900),
(6, 103, '2026-04-10', 3100);
SELECT customer_id, order_date, amount,
LAG(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS prev_order_date,
DATEDIFF(
order_date,
LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)
) AS days_since_prev
FROM orders
ORDER BY customer_id, order_date;-- Using the sales table from the running total example
SELECT sale_date, amount,
ROUND(AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS ma_7d,
-- 3-day centered moving average
ROUND(AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
), 2) AS ma_3_centered
FROM sales
ORDER BY sale_date;-- Reusing the orders table from the LAG example
WITH monthly AS (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY month
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100,
2
) AS mom_growth_pct
FROM monthly
ORDER BY month;-- Using employees table from the first example
SELECT name, dept, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile,
ROUND(PERCENT_RANK() OVER (ORDER BY salary DESC), 2) AS pct_rank
FROM employees
ORDER BY salary DESC;-- orders table from LAG example
SELECT customer_id, order_date, amount,
FIRST_VALUE(amount) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS first_order_amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order_amount
FROM orders
ORDER BY customer_id, order_date;Common Mistakes
LAST_VALUE Without an Explicit Frame
-- Expected: each row shows last order amount for that customer.
SELECT customer_id, order_date, amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS last_amt
FROM orders;SELECT customer_id, order_date, amount,
LAST_VALUE(amount) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amt
FROM orders;RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. FIRST_VALUE is fine because it still sees the partition start. LAST_VALUE is broken because the frame ends at the current row. Always specify UNBOUNDED FOLLOWING for LAST_VALUE.Using a Window Function in WHERE
-- Intent: keep only top 3 per dept
SELECT name, dept, salary
FROM employees
WHERE RANK() OVER (PARTITION BY dept ORDER BY salary DESC) <= 3;SELECT name, dept, salary
FROM (
SELECT name, dept, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk <= 3;Forgetting PARTITION BY — Global Ranking When Per-Group Was Meant
-- Intent: rank employees within each dept.
SELECT name, dept, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;
-- This ranks across ALL employees, ignoring dept.SELECT name, dept, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees;Confusing RANK and DENSE_RANK for 'Nth Highest'
-- Intent: 3rd highest DISTINCT salary.
SELECT salary FROM (
SELECT salary, RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t WHERE rnk = 3;
-- If top 2 salaries tie, RANK goes 1,1,3,... and this 'works'.
-- But if 3 salaries tie at top, RANK goes 1,1,1,4 — rnk=3 returns nothing.-- Use DENSE_RANK for Nth DISTINCT value
SELECT DISTINCT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t WHERE rnk = 3;Mixing PARTITION BY Columns Incorrectly
-- Intent: running total of amount per customer.
SELECT customer_id, order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- Every customer's rows are mixed together.SELECT customer_id, order_date, amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS UNBOUNDED PRECEDING
) AS running_total
FROM orders;Summary
- Window functions compute a value using a window of rows without collapsing the result set. Each row keeps its detail and gets aggregate context alongside.
- The OVER() clause has three optional parts: PARTITION BY (split rows into groups), ORDER BY (order within partition), and a frame clause (sub-window for running calculations).
- ROW_NUMBER gives unique 1..N. RANK gives ties same number then skips. DENSE_RANK gives ties same number and does NOT skip. Use DENSE_RANK for 'Nth distinct value'.
- PARTITION BY restarts the computation at each group boundary. Without it, the window is the entire result set.
- LAG(col, offset, default) returns the previous row's value within the partition. LEAD does the same for the next row. Both are essential for time-series and sequential analysis.
- Running totals use SUM/AVG with OVER(ORDER BY ... ROWS UNBOUNDED PRECEDING). Moving averages use ROWS BETWEEN N PRECEDING AND CURRENT ROW.
- ROWS counts physical rows; RANGE counts by value. Prefer ROWS unless you specifically need value-based windows.
- LAST_VALUE needs an explicit frame (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) — the default frame ends at CURRENT ROW, which makes LAST_VALUE return the current row's value.
- Window functions cannot appear in WHERE, GROUP BY, or HAVING. To filter on a window result, compute it in a subquery/CTE and filter in the outer query.
- Classic interview patterns: top N per group (ROW_NUMBER or DENSE_RANK with PARTITION BY), Nth highest salary (DENSE_RANK), month-over-month growth (LAG on monthly aggregate), running totals, moving averages, gaps and islands analysis.