← All topics/Data persistence

Practical interview questions

Scenario-style prompts with sample answer outlines. Focus is on how you would design and reason in real codebases.

Question 6

Normalized vs denormalized

In simple terms, what does normalized mean for a database? What does denormalized mean? Give a couple of real-world examples.

Answer outline

Normalized means you store each fact in one logical place and link related rows with IDs. You avoid copying the same data onto many rows, so when something changes you usually update it once.

Denormalized means you duplicate or pre-aggregate data on purpose, often to make reads faster or queries simpler, knowing you must keep copies in sync or accept stale cached values.

Principles

  • Normalized example: User has id + displayName; Post has authorId pointing to that user. The name lives on one user row, not repeated on every post.
  • Denormalized example: a Post row stores commentCount so the feed can show counts without running COUNT(*) on comments for every row—faster, but you must increment/decrement that field when comments change (or recompute periodically).
  • Denormalized example: caching authorName on each Post for a read-heavy feed avoids a join per row, but if the user renames themselves you must update many posts or tolerate out-of-date names until refresh.
Sketch — normalized vs duplicated
// Normalized: one source for the name
User(id: "u1", name: "Alex")
Post(id: "p1", authorId: "u1", text: "Hello")

// Denormalized for read speed: name copied onto post
Post(id: "p1", authorId: "u1", authorName: "Alex", text: "Hello")

Follow-up angles

  • Teams often start normalized and denormalize only where profiling shows the need.