Handling Hierarchical Data with Closure Tables in SQL
If you’ve ever tried to store categories, comments, org charts, or folders in a database, you’ve touched tree
data. In the next few minutes, you’ll learn how Closure Tables in SQL make this easy: store unlimited-depth trees, fetch all descendants or ancestors with a single indexed query, and even move entire subtrees without breaking data.
Think of a tree as a family: each item has a parent, some have children, and you often need to ask questions like “who are all your descendants?” or “what’s your full path back to the root?” That’s where Closure Tables shine.
Why Trees Are Tricky in Relational Databases
Relational databases are fantastic at rows and joins, but trees are paths. The classic parent_id
column (Adjacency List) is simple to write, yet harder to read at depth: fetching all descendants usually needs recursive queries (repeating “and who are their children?”). Those work, but they’re harder to write, tune, and reason about, especially when trees get deep or wide and you care about database performance.
Other approaches exist. Materialized Path stores paths like /1/4/9
, which reads fast but requires rewriting paths when you move nodes. Nested Set gives fast subtree reads, but inserts and moves are expensive. Closure Tables store every ancestor–descendant pair so reads are simple and predictable.
If the questions you ask most often are “all descendants,” “all ancestors,” or “the full subtree,” Closure Tables are a friendly choice.
Closure Tables, Explained for Humans
Imagine keeping a list of every family connection, direct and indirect. For each node, you store a self row: the node is its own ancestor at depth 0, and a row for each ancestor above it: parent at depth 1, grandparent at depth 2, and so on.
That’s a Closure Table. With this in place, fetching all ancestors or all descendants is just a single join filtered by depth
. No recursion in your query, no guesswork, and it works for trees of any depth.
A Tiny Example You’ll Build
We’ll model a simple product category tree:
Root — Electronics — Cameras, and Root — Books.
We’ll use PostgreSQL in the examples, but the pattern works in any SQL database that supports standard constraints and indexes.
The Schema You Can Copy-Paste
Keep the nodes
table simple and add a closure table that records all ancestor–descendant pairs with their distance.
-- Your items (categories, comments, folders, etc.)
CREATE TABLE nodes (
id BIGSERIAL PRIMARY KEY,
parent_id BIGINT REFERENCES nodes(id),
name TEXT NOT NULL,
is_deleted BOOLEAN NOT NULL DEFAULT FALSE
);
-- Closure table: all paths through the tree
CREATE TABLE node_closure (
ancestor_id BIGINT NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,
descendant_id BIGINT NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,
depth INTEGER NOT NULL CHECK (depth >= 0),
PRIMARY KEY (ancestor_id, descendant_id)
);
-- Helpful indexes for common lookups
CREATE INDEX node_closure_desc_idx ON node_closure (descendant_id, ancestor_id, depth);
CREATE INDEX node_closure_anc_depth_idx ON node_closure (ancestor_id, depth);
A quick intuition: each node gets a self-row (id → id, depth 0
). Adding a child means linking that child to all of the parent’s ancestors (plus the parent).
Insert the Root and a Few Children
Insert the root, then add children. The pattern is: insert node, insert its self-row, then link it to all ancestors of the parent.
-- 1) Insert the root and its self-row
WITH ins AS (
INSERT INTO nodes (parent_id, name) VALUES (NULL, 'Root')
RETURNING id
)
INSERT INTO node_closure (ancestor_id, descendant_id, depth)
SELECT id, id, 0 FROM ins;
-- 2) Helper function in SQL form: add a child under a parent
-- Insert child
WITH child AS (
INSERT INTO nodes (parent_id, name) VALUES (:parent_id, :child_name)
RETURNING id
)
-- Insert self-row
INSERT INTO node_closure (ancestor_id, descendant_id, depth)
SELECT id, id, 0 FROM child;
-- Link child to all ancestors of the parent
INSERT INTO node_closure (ancestor_id, descendant_id, depth)
SELECT p.ancestor_id, c.id, p.depth + 1
FROM node_closure AS p
CROSS JOIN (SELECT id FROM nodes WHERE name = :child_name) AS c
WHERE p.descendant_id = :parent_id;
Example steps to create the sample tree: add 'Electronics'
and 'Books'
under Root, then add 'Cameras'
under 'Electronics'
. The closure table will hold self rows, direct links, and indirect links like (Root,Cameras,2)
.
Querying Without Recursion
Immediate parent or children are easy with the nodes
table, but closure-based queries are where the magic happens.
-- Immediate parent
SELECT p.*
FROM nodes AS c
JOIN nodes AS p ON p.id = c.parent_id
WHERE c.id = :id AND p.is_deleted = FALSE;
-- Immediate children (via closure, depth = 1)
SELECT n.*
FROM node_closure nc
JOIN nodes n ON n.id = nc.descendant_id
WHERE nc.ancestor_id = :id
AND nc.depth = 1
AND n.is_deleted = FALSE;
All ancestors of a node:
SELECT a.*, nc.depth
FROM node_closure nc
JOIN nodes a ON a.id = nc.ancestor_id
WHERE nc.descendant_id = :id
AND nc.depth > 0
AND a.is_deleted = FALSE
ORDER BY nc.depth ASC; -- root first
All descendants (full subtree):
SELECT d.*, nc.depth
FROM node_closure nc
JOIN nodes d ON d.id = nc.descendant_id
WHERE nc.ancestor_id = :id
AND nc.depth > 0
AND d.is_deleted = FALSE
ORDER BY nc.depth ASC; -- closest children first
Limit by depth (for example, only two levels down):
SELECT d.*
FROM node_closure nc
JOIN nodes d ON d.id = nc.descendant_id
WHERE nc.ancestor_id = :id
AND nc.depth BETWEEN 1 AND 2
AND d.is_deleted = FALSE;
Breadcrumb path from the root:
SELECT string_agg(a.name, ' > ' ORDER BY nc.depth) AS path
FROM node_closure nc
JOIN nodes a ON a.id = nc.ancestor_id
WHERE nc.descendant_id = :id;
Moving a Subtree Safely
When you move a node, you want its entire subtree to follow. Wrap it in a transaction and prevent cycles (don’t move a node under its own descendant).
-- Abort if the new parent is inside the node's subtree
SELECT 1
FROM node_closure
WHERE ancestor_id = :node_id AND descendant_id = :new_parent
LIMIT 1; -- if this returns a row, abort
BEGIN;
-- 1) Update the parent pointer (single source of truth)
UPDATE nodes SET parent_id = :new_parent WHERE id = :node_id;
-- 2) Remove old paths from former ancestors to the subtree
DELETE FROM node_closure
WHERE descendant_id IN (
SELECT descendant_id FROM node_closure WHERE ancestor_id = :node_id
)
AND ancestor_id IN (
SELECT ancestor_id FROM node_closure
WHERE descendant_id = :node_id AND ancestor_id <> descendant_id
);
-- 3) Insert new paths from the new ancestors to the subtree
INSERT INTO node_closure (ancestor_id, descendant_id, depth)
SELECT super.ancestor_id, sub.descendant_id, super.depth + sub.depth + 1
FROM node_closure AS super
CROSS JOIN node_closure AS sub
WHERE super.descendant_id = :new_parent
AND sub.ancestor_id = :node_id;
COMMIT;
This reconnects every descendant of the moved node to the new ancestry with correct depths. No path rewriting. No brittle logic.
Deleting Nodes: Soft vs Hard
Keeping history is often worth it. A simple soft delete:
UPDATE nodes SET is_deleted = TRUE WHERE id = :node_id;
If you need to remove a full subtree:
BEGIN;
DELETE FROM nodes
WHERE id IN (
SELECT descendant_id FROM node_closure WHERE ancestor_id = :node_id
);
COMMIT;
To delete a single node but keep its children, reparent them first:
BEGIN;
-- Promote children to this node’s parent
UPDATE nodes
SET parent_id = (SELECT parent_id FROM nodes WHERE id = :node_id)
WHERE parent_id = :node_id;
-- Remove closure rows involving only this node
DELETE FROM node_closure
WHERE ancestor_id = :node_id OR descendant_id = :node_id;
-- Finally remove the node
DELETE FROM nodes WHERE id = :node_id;
COMMIT;
Performance in Plain English
Reads are fast because you’ve already done the “heavy thinking” when inserting paths. The storage cost grows with the number of ancestor–descendant pairs:
In the worst case (a long chain), pairs grow roughly with n²
. In balanced trees, growth looks closer to n log n
. Subtree moves are predictable: the work is proportional to (number of new parent’s ancestors) × (size of the moved subtree). Good indexes keep queries quick: primary key on (ancestor_id, descendant_id)
and indexes supporting lookups by ancestor and by descendant, as shown in the schema.
If you’re used to recursive queries, Closure Tables feel like a shortcut. You trade a bit more write work for dramatically simpler reads.
Common Mistakes to Avoid
Don’t forget self-rows (id → id, depth 0
). Always insert them with the node. Prevent cycles: before a move, confirm the new parent isn’t inside the node’s subtree. Keep nodes.parent_id
and node_closure
in sync by doing updates in a single transaction. Add indexes for both ancestor-first and descendant-first lookups. Remember to filter out is_deleted
in reads.
When Closure Tables Are a Great Fit
Choose them when you frequently ask for entire subtrees, all ancestors, or breadcrumb paths; when depth is unknown or unbounded; and when you need reliable subtree moves. They’re beginner-friendly once you see how the path rows are stored, and they scale well for read-heavy workloads.
Hands-On: Spin Up a Sandbox and Try It
Quick playground using Docker:
docker run --name pg -e POSTGRES_PASSWORD=pg -p 5432:5432 -d postgres:16
Then:
Create the schema from this guide. Insert a small tree: Root, Electronics, Books, Cameras under Electronics. Run queries for all descendants of Root, the breadcrumb path to Cameras, move Electronics under Books, and rerun the queries. You’ll see how straightforward it is to query tree structures without writing recursive SQL.
Resources
PostgreSQL: Recursive queries (for comparison and deeper understanding)
A pragmatic overview of hierarchy modeling patterns (ltree extension)
Your Next Step
Copy the schema, create a tiny 7–10 node tree, and run the subtree and path queries. If you want help applying this to your schema, share your current tables and a sample tree and I’ll sketch the exact SQL. If you enjoyed this, subscribe to get more practical patterns for working with data, no fluff, just tools you can use today.
What is a Closure Table and what problem does it solve?
A Closure Table stores every ancestor–descendant pair (including self with depth 0). This lets you query all ancestors or all descendants of a node with simple, fast indexed joins, even for deep trees.
What are the core tables in a Closure Table design?
You need a nodes
table for the actual items (e.g., id
, parent_id
, name
) and a node_closure
table with ancestor_id
, descendant_id
, and depth
(the path length). The primary key is (ancestor_id, descendant_id)
, and foreign keys reference nodes
.
How do I add the root and then a new child in a Closure Table?
Insert the root into nodes
with parent_id = NULL
, insert the root’s self-row into node_closure
(root, root, 0
). To add a child: insert the child into nodes
, insert its self-row, then insert rows linking all of the parent’s ancestors to the child with depth = parent_depth + 1
.
How do I query all ancestors of a specific node?
Use node_closure
to find ancestors of the node, excluding depth = 0
, join to nodes
to get details, and order by depth
ascending (root first).
How do I query all descendants of a specific node?
Use node_closure
to find descendants of the node (depth > 0
), join to nodes
for details, and order by depth
ascending (closest children first).
How do I move a single node to a different parent?
In a transaction: check the new parent isn’t inside the node’s subtree (to avoid cycles); update the node’s parent_id
in nodes
; delete old closure rows that connect the node’s former ancestors to its subtree; insert new closure rows connecting the new ancestors to the subtree with correct depth values.
Can I move an entire subtree under a new parent, and how does that work?
Yes. The same move process applies to the node and all its descendants. The closure rows are rebuilt so every ancestor of the new parent now prefixes the moved subtree with correct depth values.
Should I use soft deletes or hard deletes for nodes in a hierarchy?
Soft deletes are recommended for safety and auditability: add an is_deleted
flag and filter it in queries. Hard deletes remove rows, which is simpler but riskier without backups.
What performance considerations and indexes should I use?
The closure table grows with the number of ancestor–descendant pairs (watch for very deep or chain-like trees). Important indexes: primary key on (ancestor_id, descendant_id)
; additional helpful indexes on (descendant_id, ancestor_id, depth)
and (ancestor_id, depth)
to speed common queries.
What common pitfalls should I avoid with Closure Tables?
Don’t forget the self-rows (A → A, depth 0
); avoid circular references when moving nodes; keep parent_id
and closure data in sync; remember to filter out is_deleted
in queries; and ensure you index for both ancestor-first and descendant-first lookups.