Chapter 24 Advanced 58 Questions

Practice Questions — Database Design and Normalization (1NF, 2NF, 3NF, BCNF)

← Back to Notes

9 Easy

12 Medium

12 Hard

Topic-Specific Questions

Question 1

Easy

Why do we normalize databases?

Three keywords: redundancy, anomalies, integrity.

To eliminate data redundancy (each fact stored once), to avoid update/insertion/deletion anomalies, and to enable referential integrity via foreign keys. A normalized schema survives schema evolution and concurrent modification far better than a denormalized one.

Question 2

Easy

What makes a table violate 1NF?

Think about a single cell.

Any column that stores multiple values in a single cell — CSV strings, JSON arrays of scalars used as lookup keys, comma-separated foreign-key lists. In a 1NF table, every cell holds a single atomic value.

Question 3

Easy

What is a functional dependency?

X determines Y.

A rule X -> Y saying: for every X value, there is exactly one Y value. Example: roll_no -> student_name means every roll number identifies exactly one student. Functional dependencies are the foundation of normalization theory.

Question 4

Easy

What is an update anomaly?

Think about updating redundant data.

A situation where the same fact is stored in many places, and updating it requires changing every copy atomically. If even one copy is missed, the database contradicts itself. Eliminated by normalization — each fact lives in one row.

Question 5

Easy

Is this table in 1NF?

CREATE TABLE t (
    id INT PRIMARY KEY,
    tags VARCHAR(200)   -- 'sports,news,cricket'
);

Multiple values in one column?

No. The tags column stores multiple values, violating 1NF.

Question 6

Easy

What is a composite primary key?

More than one column.

A primary key that consists of two or more columns whose combined values uniquely identify a row. Typical in junction tables for N-M relationships: enrollments(student_id, course_id) where each pair is unique.

Question 7

Medium

What is the difference between 2NF and 3NF?

Partial vs transitive.

2NF forbids partial dependencies: when the PK is composite, no non-key column may depend on only part of it. 3NF forbids transitive dependencies: no non-key column may depend on another non-key column. 2NF is only ever violated with composite PKs; 3NF can be violated with any PK.

Question 8

Medium

Given (emp_id PK, emp_name, dept_id, dept_name), which normal form is violated?

dept_name depends on dept_id, not emp_id.

3NF. dept_name is transitively dependent on emp_id (emp_id -> dept_id -> dept_name). The table is in 1NF and 2NF but not 3NF. Fix: move dept_name into a departments table keyed by dept_id.

Question 9

Medium

What is a transitive dependency?

A -> B -> C, but C does not directly need A.

A dependency where a non-key column A determines another non-key column B (A -> B), and the PK determines A. So the PK determines B only through A — indirectly. 3NF requires you to break this chain by moving B into a table keyed by A.

Question 10

Medium

When would you denormalize?

Trade-off.

When a specific read path is hot enough that the JOIN cost dominates performance, and the write cost of keeping copies in sync is acceptable. Typical cases: dashboards, feeds, full-text search indexes, reporting summaries.

Question 11

Medium

What is the difference between a primary key and a surrogate key?

Natural vs invented.

A primary key is the column(s) that uniquely identify rows. A surrogate key is an invented PK (usually INT AUTO_INCREMENT or UUID) that has no business meaning. A natural key is a real-world identifier used as PK (email, ISBN). Modern practice: surrogate keys as PKs; natural keys as UNIQUE constraints.

Question 12

Medium

What is cardinality in an ER diagram?

1-1, 1-N, N-M.

The number of rows on each side of a relationship. 1-1: each row relates to at most one row on the other side (user <-> profile). 1-N: one row relates to many on the other (customer -> orders). N-M: many on each side (students <-> courses). Cardinality dictates how you model: FK column for 1-1 and 1-N, junction table for N-M.

Question 13

Hard

What is BCNF and how does it differ from 3NF in practice?

Every determinant must be a superkey.

BCNF requires that for every non-trivial FD X -> Y, X must be a superkey. 3NF allows a non-superkey X to determine a prime attribute (part of a candidate key). BCNF is strictly stronger but in practice coincides with 3NF for almost all schemas. It matters only when the table has multiple overlapping candidate keys.

Question 14

Hard

Design the minimum schema for an Instagram-like 'follows' feature. Include self-reference and the check that prevents self-following.

Junction on users.

CREATE TABLE follows(follower_id INT, followee_id INT, followed_at DATETIME, PRIMARY KEY (follower_id, followee_id), FOREIGN KEY (follower_id) REFERENCES users(id), FOREIGN KEY (followee_id) REFERENCES users(id), CHECK (follower_id <> followee_id));

Question 15

Hard

Why is ON DELETE CASCADE dangerous for invoices but appropriate for comments?

Audit requirements vs ownership.

Invoices are business-critical records. Deleting a customer should not automatically delete their invoices — regulatory, tax, and audit rules demand retention. Use ON DELETE RESTRICT so deletes fail until the operator handles the invoices. Comments, on the other hand, only exist to decorate their post; deleting the post cascade-deletes comments because a comment without its post is noise.

Question 16

Hard

Why is storing unit_price on order_items not a 3NF violation even though products already has a price?

Time.

Because order_items.unit_price is a snapshot of the price at the moment of sale. Products.price can (and will) change. If you always JOIN to products, historical orders show today's price, not the price they were sold at — bad for accounting. The duplication is not redundant: it captures a fact that would otherwise be lost.

Question 17

Hard

Can a table be in 3NF but not in BCNF? Give a sketch.

Non-superkey determinants.

Yes, when the table has multiple overlapping candidate keys. Example: (course_id, instructor) -> room AND (room, time_slot) -> instructor. There are two candidate keys. A non-superkey column determines another, but because the determined column is prime (part of another CK), 3NF's exception allows it. BCNF does not — it forbids any non-superkey determinant.

Mixed & Application Questions

Question 1

Easy

What normal form removes repeating groups?

First.

1NF

Question 2

Easy

Give an example of a 1-1 relationship.

User and profile.

A user has one profile; a profile belongs to one user. Model: a FK user_id UNIQUE on the profiles table (the UNIQUE constraint enforces the 1-1 cardinality).

Question 3

Easy

Which cascade option causes child rows to be deleted along with the parent?

Keyword is in the name.

ON DELETE CASCADE

Question 4

Medium

Is (student_id, course_id) PK, grade in 2NF if student_name is added as a column?

student_name depends on student_id alone.

No. student_name depends only on part of the PK (student_id), a partial dependency. Table violates 2NF. Fix: move student_name into students(id, name).

Question 5

Medium

Why is a junction table necessary for many-to-many relationships?

Neither side can hold the relationship alone.

Because you cannot store a many-to-many relationship as a foreign key on either side: a single column cannot hold multiple foreign keys without violating 1NF. The junction table represents the relationship itself, with a row for each pair, and can carry relationship-specific attributes (grade, role, enrolled_at).

Question 6

Medium

Give a real case where ON DELETE SET NULL is the right choice.

Child can survive parent's absence.

Employee.manager_id -> Employee.id with ON DELETE SET NULL. When a manager leaves, their reports remain employed — they just lose a manager until reassigned. SET NULL preserves the rows and leaves them in a recoverable state.

Question 7

Medium

The table has PK = order_id and columns (customer_id, customer_name, shipping_city). Which normal form is violated?

customer_name depends on customer_id.

3NF. customer_name and shipping_city depend on customer_id, which is a non-key column. This is a transitive dependency: order_id -> customer_id -> customer_name.

Question 8

Medium

Describe when you would use a UUID PK instead of INT AUTO_INCREMENT.

Distributed systems.

When IDs must be generated by multiple servers without coordination (distributed systems, offline-first clients, merging data from several databases) or when you want IDs that are unguessable (public URLs). Trade-offs: UUIDs are 16 bytes (vs 4 for INT), slower for index scans, and not contiguous — which hurts insert performance on clustered indexes.

Question 9

Hard

Normalize this to 3NF: (invoice_id, customer_id, customer_name, item_id, item_name, item_price, qty).

Three tables minimum.

Four tables: customers(id, name); items(id, name, price); invoices(id, customer_id, FK to customers); invoice_items(invoice_id, item_id, qty, unit_price, PK=(invoice_id,item_id), FKs to invoices and items). Note unit_price snapshot on invoice_items — captures price at sale time.

Question 10

Hard

Why are denormalized counts (e.g. post_count on categories) risky in distributed or replicated setups?

Consistency under concurrency.

Updates to the count race against concurrent inserts/deletes on the children. Without careful locking (or atomic counters), the count drifts. In replicated systems, the count may be computed differently on master and replica if the replica's trigger logic differs or if statement-based replication mis-orders events. Either use transactions + FOR UPDATE on the parent row, or accept eventual consistency and reconcile the count via a scheduled job.

Question 11

Hard

Design a minimal WhatsApp-style chat schema: users, one-to-one chats, group chats, messages.

Conversations table as the unifying entity.

users(id, phone UNIQUE); conversations(id, type ENUM('DM','GROUP')); conversation_members(conv_id, user_id, PK=(conv_id,user_id)); messages(id, conv_id, sender_id, body, sent_at). DMs and groups are both conversations, distinguished by type and the count of members. Messages always belong to a conversation, not to a user pair — that keeps 1-1 and group chats symmetric.

Question 12

Hard

What is the smallest number of tables needed for an e-commerce system with users, products, categories, carts, and orders?

Count entities and junctions.

Seven: users, categories, products (FK to category), carts (FK to user), cart_items (junction: PK=(cart_id,product_id), with qty), orders (FK to user), order_items (junction: PK=(order_id,product_id), with qty and unit_price).

Question 13

Hard

Explain why 'email' should usually be a UNIQUE column, not a primary key.

Stability.

Emails change. If email is the PK, every FK that references it must be updated in sync (either manually or via ON UPDATE CASCADE), and every index that contains it must be rebuilt. The overhead is large. A surrogate PK (id) stays fixed forever; email moves to a UNIQUE constraint for uniqueness enforcement without being the anchor that other rows point to.

Question 14

Hard

Does 4NF matter in practice?

Multi-valued dependencies are rare.

Rarely. 4NF eliminates multi-valued dependencies — situations where a table encodes two independent multi-valued facts about one entity (e.g. a student's courses AND their phone numbers in one table). The violations usually scream at you as '1NF with two repeating groups' and you split them into two child tables without needing the 4NF theory. In 15 years of practice, 3NF plus common sense catches almost everything.

Question 15

Hard

Draft a brief migration plan to move from a 2NF-violating order_items (with product_name copied) to a 3NF-compliant schema with a products table.

Safe rollout.

(1) Create products(id, name, price). (2) Backfill products from DISTINCT (product_id, product_name, product_price) in order_items_bad. (3) Add product_id FK (already present, just verify). (4) Dual-write for a week: writes hit both the old and new structure. (5) Stop using product_name/product_price columns from order_items_bad in queries; read from the JOIN. (6) Drop the columns. Do it in small reversible steps, verifying each with row-count checks and query correctness.

Question 16

Medium

Why is a junction table's primary key usually the pair (fk_a, fk_b) and not a separate surrogate id?

Think uniqueness and duplicate prevention.

The composite PK enforces the business rule that each (fk_a, fk_b) pair can appear at most once — you cannot enroll Aarav in DBMS twice. A surrogate id would permit duplicates unless you also added a UNIQUE constraint on the pair, which makes the surrogate redundant. The one exception is when you need a stable small integer to reference the junction row from yet another table — then add a surrogate id alongside the composite UNIQUE.

Multiple Choice Questions

MCQ 1

What does 1NF require?

A. Single-column primary keys
B. Atomic values in every column; no repeating groups
C. All columns to be NOT NULL
D. No foreign keys

Answer: B
B is correct. 1NF forbids lists, CSVs, or multi-valued columns. Each cell must hold one value.

MCQ 2

Which form eliminates partial dependencies on composite keys?

A. 1NF
B. 2NF
C. 3NF
D. BCNF

Answer: B
B is correct. 2NF is defined specifically to eliminate partial dependencies — a concern only when the PK is composite.

MCQ 3

Which form eliminates transitive dependencies?

A. 1NF
B. 2NF
C. 3NF
D. 4NF

Answer: C
C is correct. 3NF forbids non-key columns from determining other non-key columns.

MCQ 4

How do you model a many-to-many relationship?

A. A foreign key on each side
B. A junction/associative table with a composite PK of the two FKs
C. A VARCHAR column with CSV-encoded IDs
D. You can't

Answer: B
B is correct. The junction table represents the relationship itself and can hold relationship-specific attributes.

MCQ 5

What does ON DELETE CASCADE do?

A. Blocks the delete
B. Sets the FK to NULL
C. Deletes referencing child rows automatically
D. Raises a warning

Answer: C
C is correct. Child rows are deleted when their parent is deleted. Use only for ownership hierarchies.

MCQ 6

Which is TRUE about surrogate keys?

A. They must be natural identifiers
B. They are invented IDs with no business meaning, stable across entity changes
C. They must be UUIDs
D. They cannot be primary keys

Answer: B
B is correct. Surrogate keys (INT AUTO_INCREMENT, UUID) are stable and decoupled from business data — ideal for PKs.

MCQ 7

Which NF is violated by (order_id, product_id) PK + product_name?

A. 1NF
B. 2NF (partial dependency on product_id)
C. 3NF
D. 4NF

Answer: B
B is correct. product_name depends on only part of the composite PK, a partial dependency.

MCQ 8

Which NF is violated by (emp_id PK, dept_id, dept_name)?

A. 1NF
B. 2NF
C. 3NF (transitive dependency through dept_id)
D. BCNF only

Answer: C
C is correct. dept_name depends on dept_id, a non-key column — a transitive dependency via emp_id -> dept_id -> dept_name.

MCQ 9

Which is NOT a legitimate reason to denormalize?

A. Hot read path dominated by expensive JOINs
B. Dashboard that queries thousands of times per minute
C. Developer preference — 'it looks cleaner'
D. Analytics/reporting tables that must be pre-aggregated

Answer: C
C is correct. Denormalization must be driven by measured performance, not aesthetics.

MCQ 10

Which cardinality does enrollments(student_id, course_id) represent?

A. 1-1
B. 1-N
C. N-M
D. No relationship

Answer: C
C is correct. A student enrolls in many courses; a course has many students. Junction table is the N-M pattern.

MCQ 11

Why store unit_price on order_items despite products.price existing?

A. 3NF violation for no reason
B. To capture price at the moment of sale, since products.price changes over time
C. To speed up writes
D. Because MySQL requires it

Answer: B
B is correct. unit_price is a historical fact, not redundant — products.price may change, but the sale's price is fixed.

MCQ 12

BCNF is strictly stronger than 3NF because:

A. It allows more columns per table
B. It requires every determinant to be a superkey (no exceptions for prime attributes)
C. It forbids composite keys
D. It requires SERIALIZABLE isolation

Answer: B
B is correct. 3NF allows a non-superkey to determine a prime attribute; BCNF does not.

MCQ 13

Best choice for an 'invoices' FK to customers, where legal retention is required?

A. ON DELETE CASCADE
B. ON DELETE SET NULL
C. ON DELETE RESTRICT
D. ON DELETE NO ACTION (deferred)

Answer: C
C is correct. RESTRICT blocks accidental deletes; operator must clear invoices first. Loss of invoices due to CASCADE can be a legal nightmare.

MCQ 14

Why do we keep normalized design as the source of truth even when denormalized copies exist?

A. Normalized tables are always faster
B. So that copies can be regenerated and reconciled when they drift; normalized is the canonical version of facts
C. Because ORMs require it
D. To comply with ACID

Answer: B
B is correct. Denormalization is a cache. When it drifts, the normalized tables are the ground truth you rebuild from.

MCQ 15

Which is NOT a valid reason to prefer UUID PKs over AUTO_INCREMENT?

A. Multi-master distributed ID generation
B. Unguessable public URLs
C. Faster clustered-index inserts
D. Offline-first clients that create rows before reaching the server

Answer: C
C is correct. UUIDs are worse for clustered-index insert locality. All the others are valid UUID use cases.

MCQ 16

In an ER diagram, what does a 'weak entity' mean?

A. A table that cannot exist without a parent; its PK includes the parent's PK
B. A table with no indexes
C. A view
D. A denormalized copy

Answer: A
A is correct. Example: order_items (weak) depends on orders (strong) — the order_id is part of the PK.

MCQ 17

What is the correct first step when asked to 'design the schema for X'?

A. Write CREATE TABLE immediately
B. Draw the ER diagram: list entities, relationships, and cardinalities
C. Add indexes
D. Choose the isolation level

Backfill departments from DISTINCT. Verify no orphans. Drop the redundant column last.

-- Step 1: create the new table
CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(50)
);

-- Step 2: backfill from existing rows
INSERT INTO departments (id, name)
SELECT DISTINCT dept_id, dept_name
FROM employees_bad
WHERE dept_id IS NOT NULL;

-- Step 3: rename the old table (backup) and create the new clean one
RENAME TABLE employees_bad TO employees_old;

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    emp_name VARCHAR(50),
    dept_id INT,
    FOREIGN KEY (dept_id) REFERENCES departments(id)
);

-- Step 4: copy the clean data
INSERT INTO employees (emp_id, emp_name, dept_id)
SELECT emp_id, emp_name, dept_id FROM employees_old;

-- Step 5: verify counts match
SELECT COUNT(*) FROM employees_old;
SELECT COUNT(*) FROM employees;

-- Step 6: once verified in production, drop the old table
DROP TABLE employees_old;

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to master SQL and databases with a mentor?

Explore our MySQL Masterclass