Database Design Is Hard to Undo
Most code mistakes are cheap: a bad function gets refactored, a wrong variable gets renamed, a broken component gets rewritten. Database mistakes are different. A poorly normalized schema, a missing index, a column that should have been nullable — these live in production, in backups, in millions of rows that all need to be migrated. They compound.
AI doesn't eliminate that risk. But it changes where you catch the mistakes. Design conversations that used to happen with a senior engineer over a whiteboard can now happen with AI before you write the first CREATE TABLE. The critique comes cheap, the rework comes early, and the schema you ship is better.
This guide covers the full lifecycle: initial schema design, normalization, indexing, migrations, query optimization, and evolving a database that's already in production.
Part 1: Schema Design
The best time to involve AI in database design is at the very beginning — before any tables exist. Describe your domain in plain language. Let AI draft the schema. Then critique it before you commit to anything.
Starting from a Description
You don't need to know your exact schema before talking to AI. Start with what the system needs to do:
I'm building a project management tool. Here's what it needs to support:
- Organizations have many members, each with a role (owner, admin, member)
- Organizations have projects; projects have a name, description, status, and optional due date
- Projects have tasks. Tasks have: title, description, status (todo/in-progress/done), priority (low/medium/high/urgent), optional assignee, optional due date
- Tasks can have subtasks (same structure, but nested one level deep — no deeper)
- Users can comment on tasks. Comments support markdown and can be edited.
- Every change to a task is logged in an activity feed (who changed what and when)
- Users can be members of multiple organizations
Design a normalized relational schema for PostgreSQL. For each table: column names, types, constraints, and a brief explanation of the design decision.
A solid AI response produces something like this:
-- Users (shared across organizations)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
avatar_url TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Organizations
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
slug TEXT NOT NULL UNIQUE, -- URL-safe identifier
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Membership: one user can belong to many orgs, with a role per org
CREATE TABLE org_members (
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role TEXT NOT NULL CHECK (role IN ('owner', 'admin', 'member')),
joined_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (org_id, user_id)
);
-- Projects belong to one organization
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL,
description TEXT,
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'archived', 'completed')),
due_date DATE,
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Tasks and subtasks in one table (parent_id = NULL means top-level)
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
parent_id UUID REFERENCES tasks(id) ON DELETE CASCADE, -- NULL = top-level task
title TEXT NOT NULL,
description TEXT,
status TEXT NOT NULL DEFAULT 'todo'
CHECK (status IN ('todo', 'in-progress', 'done')),
priority TEXT NOT NULL DEFAULT 'medium'
CHECK (priority IN ('low', 'medium', 'high', 'urgent')),
assignee_id UUID REFERENCES users(id) ON DELETE SET NULL,
due_date DATE,
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Enforce max one level of nesting with a trigger (PostgreSQL-safe)
CREATE OR REPLACE FUNCTION enforce_task_depth()
RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
IF NEW.parent_id IS NULL THEN
RETURN NEW;
END IF;
IF EXISTS (
SELECT 1
FROM tasks parent
WHERE parent.id = NEW.parent_id
AND parent.parent_id IS NOT NULL
) THEN
RAISE EXCEPTION 'Nested subtasks deeper than one level are not allowed';
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_enforce_task_depth
BEFORE INSERT OR UPDATE OF parent_id ON tasks
FOR EACH ROW
EXECUTE FUNCTION enforce_task_depth();
-- Comments on tasks
CREATE TABLE task_comments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES users(id),
body TEXT NOT NULL, -- stored as markdown
edited_at TIMESTAMPTZ, -- NULL means never edited
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Activity log: immutable record of changes
CREATE TABLE task_activity (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
actor_id UUID NOT NULL REFERENCES users(id),
action TEXT NOT NULL, -- e.g. 'status_changed', 'assigned', 'commented'
payload JSONB, -- old/new values or other context
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The Critique Step
Never accept the first draft of a schema. Ask AI to attack it before you build anything on top of it:
Critique this schema. Look for:
- Missing constraints that could allow bad data
- Columns that will cause problems at scale
- Relationships that are modeled incorrectly
- Missing tables for features that will obviously be needed
- Normalization issues (over- or under-normalized)
- Anything that will be painful to migrate later
Common issues a good critique surfaces:
- Nesting rules can drift if they're only enforced in app code. If max-depth constraints matter for data integrity, enforce them with a trigger (as in the example) or centralize validation in one service path.
- No soft delete. When a user is deleted, their comments and activity records lose attribution (
author_idbecomes NULL or cascades away). Considerdeleted_ator keeping a deleted-users record. - JSONB payload has no schema.
task_activity.payloadis flexible now but will be queried in specific ways later. Document the expected shape per action type, or model common fields explicitly. - No org-level project permissions. Currently, any org member can see all projects. If project-level visibility is ever needed (private projects within an org), the schema has no hook for it.
- Missing: task ordering. No
positioncolumn for drag-and-drop ordering within a project. Adding this to millions of rows later is painful.
Spend 30 minutes in this loop before writing any application code. Every issue caught here is an issue that doesn't become a migration. Once data is in production, schema changes have a cost — catching them at the whiteboard stage is free.
Asking for Design Alternatives
When a design decision is non-obvious, ask AI to present the tradeoffs:
For the activity log, I see two common approaches: (1) store everything in one task_activity table with a JSONB payload, or (2) use separate tables per event type (task_status_changes, task_assignments, etc). What are the tradeoffs? Which fits better for an activity feed that needs to show "Alice changed status from todo to done" type messages?
This kind of explicit tradeoff question gets useful answers. AI will explain that the single-table approach is simpler to query for feeds but harder to enforce payload shape; the multi-table approach is strongly typed but requires a UNION or application-level assembly to render a feed. For an activity feed that needs to display heterogeneous events in chronological order, the single table with documented payload shapes is usually the right call — and AI will say so directly.
Part 2: Normalization and Relationships
Normalization is the process of organizing a schema to reduce redundancy and ensure data integrity. Under-normalized schemas store the same data in multiple places, causing update anomalies. Over-normalized schemas split data across so many tables that simple queries require five joins.
AI is good at both diagnosing normalization issues and explaining the tradeoffs when you deliberately denormalize for performance.
Diagnosing Normalization Problems
Is there a normalization issue with this table?
CREATE TABLE orders (
id UUID PRIMARY KEY,
customer_id UUID NOT NULL,
customer_email TEXT NOT NULL, -- repeated from users table
customer_name TEXT NOT NULL, -- repeated from users table
product_id UUID NOT NULL,
product_name TEXT NOT NULL, -- repeated from products table
product_price DECIMAL(10,2) NOT NULL,
quantity INTEGER NOT NULL,
total DECIMAL(10,2) NOT NULL, -- computed: price * quantity
status TEXT NOT NULL
);
AI will identify three issues here: customer_email and customer_name should come from a join to users, not be stored redundantly; product_name is also redundant but with a nuance — in an orders context, you might want to snapshot the product name at order time in case the product is renamed later; and total is a derived column that should be computed, not stored. It will also distinguish between the cases where denormalization is a legitimate choice (historical snapshot of product price) vs. a maintenance problem (current customer name).
Modeling Many-to-Many Relationships
Many-to-many relationships are one of the most common sources of schema confusion. Ask AI to explain the pattern and when to add payload to the junction table:
Users can be assigned to tasks. A task can have multiple assignees. A user can be assigned to multiple tasks. Model this relationship. Should the junction table carry any additional data?
CREATE TABLE task_assignees (
task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
assigned_by UUID NOT NULL REFERENCES users(id), -- who made the assignment
assigned_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (task_id, user_id)
);
-- Index for the other direction: find all tasks for a user
CREATE INDEX idx_task_assignees_user ON task_assignees(user_id);
The key insight AI surfaces: junction tables often need payload. Here, knowing who assigned whom and when is valuable for the activity feed. Without it, you'd need to reconstruct that from the activity log.
Self-Referencing Relationships
Hierarchical data (categories, org charts, thread replies) is tricky to model. Ask AI to compare approaches for your specific depth requirements:
I need to model product categories that can be nested up to 3 levels deep (e.g., Electronics → Laptops → Gaming Laptops). Compare: (1) adjacency list with parent_id, (2) a closure table, (3) a materialized path. Which fits best for a read-heavy product catalog where I need to query "all products in this category and all subcategories"?
This is exactly the kind of question where AI earns its keep. It will compare write complexity, query complexity, and performance characteristics for each approach at your stated depth, and recommend the adjacency list for shallow hierarchies (3 levels is manageable with a recursive CTE) or the materialized path for deeper trees. It will write the recursive CTE for you too.
WITH RECURSIVE category_tree AS (
-- Base case: the category we're querying
SELECT id, name, parent_id
FROM categories
WHERE id = $1
UNION ALL
-- Recursive case: children of each node
SELECT c.id, c.name, c.parent_id
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT p.*
FROM products p
WHERE p.category_id IN (SELECT id FROM category_tree)
ORDER BY p.name;
Part 3: Indexes and Query Performance
A schema that's correct but slow is still a problem. Indexes are the primary tool — but the wrong indexes are nearly as bad as none, because they add write overhead and false confidence. AI is useful both for deciding what to index and for diagnosing slow queries.
Getting Indexing Recommendations
Here are the most common queries my application will run against the tasks table. Recommend indexes for each and explain the reasoning.
- Fetch all tasks in a project, ordered by priority then created_at
- Fetch all tasks assigned to a user across all projects
- Fetch all overdue tasks (due_date < today AND status != 'done') in an organization
- Full-text search on task titles within a project
- Count tasks by status for a project dashboard
-- Query 1: tasks in a project, ordered by priority and created_at
-- Composite index covering the filter (project_id) and sort columns
CREATE INDEX idx_tasks_project_priority_created
ON tasks(project_id, priority, created_at DESC);
-- Query 2: tasks assigned to a user
-- task_assignees already has idx_task_assignees_user — that's enough.
-- The join back to tasks uses the primary key.
-- Query 3: overdue tasks in an organization
-- Needs org context via project, then filter on due_date and status.
-- Partial index — only indexes rows where status != 'done', reducing index size.
CREATE INDEX idx_tasks_overdue
ON tasks(project_id, due_date)
WHERE status != 'done';
-- To scope by org, join through projects:
-- projects.org_id already benefits from idx_tasks_project_priority_created
-- Add a covering index on projects if org-level queries are very common:
CREATE INDEX idx_projects_org ON projects(org_id);
-- Query 4: full-text search on task titles
CREATE INDEX idx_tasks_title_fts
ON tasks USING gin(to_tsvector('english', title));
-- Query 5: count by status — covered by the project index.
-- PostgreSQL can use idx_tasks_project_priority_created for an index-only scan
-- on (project_id, status) if you add status to the index:
CREATE INDEX idx_tasks_project_status
ON tasks(project_id, status);
Partial Indexes Are Underused
A partial index (WHERE status != 'done') indexes only the rows matching the condition. For a tasks table where 80% of rows are done, a partial index on active tasks is 5× smaller and faster than a full index. Ask AI: "Is this a good candidate for a partial index?" when your query filters on a low-cardinality column.
Diagnosing a Slow Query
Paste the query, the EXPLAIN ANALYZE output, and ask AI to interpret it:
This query is taking 800ms on a table with 500k rows. Here's the query and the EXPLAIN ANALYZE output. What's wrong and how do I fix it?
SELECT t.*, u.name as assignee_name
FROM tasks t
LEFT JOIN task_assignees ta ON ta.task_id = t.id
LEFT JOIN users u ON u.id = ta.user_id
WHERE t.project_id = $1
AND t.status != 'done'
ORDER BY t.priority DESC, t.created_at DESC
LIMIT 50;
Limit (cost=12847.23..12847.35 rows=50) (actual time=798.432..798.445 rows=50)
-> Sort (cost=12847.23..12872.23 rows=10000) (actual time=798.429..798.435 rows=50)
Sort Key: t.priority DESC, t.created_at DESC
Sort Method: external merge Disk: 4096kB
-> Hash Left Join (cost=5234.00..12597.23 rows=10000) (actual time=234.2..791.1 rows=10000)
Hash Cond: (ta.task_id = t.id)
-> Seq Scan on tasks t (cost=0.00..7123.45 rows=10000) ...
Filter: ((status <> 'done') AND (project_id = $1))
AI reads the EXPLAIN ANALYZE output fluently and spots the problem: the Seq Scan on tasks line means no index is being used, so the planner is scanning the entire table. The Sort Method: external merge Disk means the sort spilled to disk. It will recommend the composite index on (project_id, status, priority, created_at) and explain why the order of columns matters for the sort to be satisfied by the index.
Writing Optimized Queries
Write a query for a project dashboard that returns, in one round trip:
- Total task count
- Count per status (todo, in-progress, done)
- Count of overdue tasks
- Count of unassigned tasks
- The 5 most recently updated tasks with their assignee names
Project ID is $1. Today's date is $2. PostgreSQL.
WITH task_stats AS (
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'todo') AS todo_count,
COUNT(*) FILTER (WHERE status = 'in-progress') AS in_progress_count,
COUNT(*) FILTER (WHERE status = 'done') AS done_count,
COUNT(*) FILTER (WHERE due_date < $2 AND status != 'done') AS overdue_count,
COUNT(*) FILTER (
WHERE id NOT IN (SELECT task_id FROM task_assignees WHERE task_id = tasks.id)
) AS unassigned_count
FROM tasks
WHERE project_id = $1
),
recent_tasks AS (
SELECT
t.id,
t.title,
t.status,
t.priority,
t.updated_at,
array_agg(u.name ORDER BY ta.assigned_at) FILTER (WHERE u.id IS NOT NULL) AS assignees
FROM tasks t
LEFT JOIN task_assignees ta ON ta.task_id = t.id
LEFT JOIN users u ON u.id = ta.user_id
WHERE t.project_id = $1
GROUP BY t.id, t.title, t.status, t.priority, t.updated_at
ORDER BY t.updated_at DESC
LIMIT 5
)
SELECT
(SELECT row_to_json(task_stats) FROM task_stats) AS stats,
json_agg(recent_tasks ORDER BY recent_tasks.updated_at DESC) AS recent
FROM recent_tasks;
Part 4: Migrations
A migration is a versioned, repeatable change to a database schema. Good migrations are: atomic (succeed completely or roll back), backwards compatible where possible, and tested before they run in production. AI accelerates writing them — but you still need to review them carefully.
Writing a Migration from a Description
Write a PostgreSQL migration to add a labels feature to the task management schema. Requirements:
- Labels belong to an organization (not a project — shared across all projects in the org)
- Each label has a name and a color (hex string)
- Tasks can have multiple labels; labels can be on multiple tasks
- Label names must be unique per organization
- Deleting a label removes it from all tasks, but doesn't delete the task
Include both UP and DOWN migrations.
-- UP
CREATE TABLE labels (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL,
color TEXT NOT NULL CHECK (color ~ '^#[0-9a-fA-F]{6}$'),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (org_id, name)
);
CREATE TABLE task_labels (
task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
label_id UUID NOT NULL REFERENCES labels(id) ON DELETE CASCADE,
PRIMARY KEY (task_id, label_id)
);
CREATE INDEX idx_task_labels_label ON task_labels(label_id);
CREATE INDEX idx_labels_org ON labels(org_id);
-- DOWN
DROP TABLE IF EXISTS task_labels;
DROP TABLE IF EXISTS labels;
Always Ask for the DOWN Migration
Even if you never run it, writing the DOWN migration forces you to think about reversibility. If the DOWN migration would be destructive or impossible (e.g., you can't un-split a column once rows are written with the new format), that's important information to have before you run UP in production.
Safe Migrations on Live Databases
Some migrations look harmless but lock tables in production. Ask AI to flag danger before you run anything:
I need to add a NOT NULL column position INTEGER to the tasks table for drag-and-drop ordering. The table has 2 million rows in production. Is there a safe way to do this without locking the table?
AI will correctly explain the problem: ALTER TABLE tasks ADD COLUMN position INTEGER NOT NULL on a 2M-row table takes a full table rewrite in older PostgreSQL, locking writes for the duration. It will describe the safe pattern:
-- Step 1: Add the column as nullable with a default (fast — metadata-only change in PG 11+)
ALTER TABLE tasks ADD COLUMN position INTEGER;
-- Step 2: Backfill in batches (no lock, runs in background)
-- Run this in chunks to avoid long transactions:
UPDATE tasks SET position = subquery.rn
FROM (
SELECT id, ROW_NUMBER() OVER (PARTITION BY project_id ORDER BY created_at) AS rn
FROM tasks
) AS subquery
WHERE tasks.id = subquery.id
AND tasks.position IS NULL;
-- Step 3: Once backfill is complete and verified, add the NOT NULL constraint
-- In PostgreSQL 12+, use NOT VALID + VALIDATE for large tables:
ALTER TABLE tasks ADD CONSTRAINT tasks_position_not_null
CHECK (position IS NOT NULL) NOT VALID;
-- Then validate in the background (doesn't hold a lock):
ALTER TABLE tasks VALIDATE CONSTRAINT tasks_position_not_null;
-- Finally, if needed, convert to true NOT NULL (may still require a brief lock):
ALTER TABLE tasks ALTER COLUMN position SET NOT NULL;
ALTER TABLE tasks DROP CONSTRAINT tasks_position_not_null;
Renaming Columns Safely
I need to rename tasks.description to tasks.body in a live PostgreSQL database. Application code is deployed independently of migrations. How do I do this without downtime?
AI explains the expand-contract pattern: add the new column (body), deploy code that writes to both, backfill old rows, deploy code that reads from the new column only, then drop the old column. Four steps, four deployments, zero downtime. It will generate the SQL for each step.
Part 5: Evolving a Schema That's Already in Production
The hardest database work isn't the initial design — it's changing a schema after real data is in it. AI helps most here by acting as a review partner who knows the failure modes.
Impact Analysis Before Making Changes
Before any schema change, ask AI what will break:
I want to split the users table into users (auth data only) and user_profiles (display name, avatar, preferences). Here's the current schema: [paste schema]. What will break and what's the safest migration path?
A thorough AI response covers: every foreign key that references users.id (those stay as-is); every query that selects from users (those need to be updated or use a view); API endpoints that return user data (need updating); the migration itself; and a recommended view to ease the transition:
-- Create a view that looks like the old users table
-- Lets old queries keep working while you migrate them
CREATE VIEW users_legacy AS
SELECT
u.id,
u.email,
u.created_at,
p.name,
p.avatar_url
FROM users u
LEFT JOIN user_profiles p ON p.user_id = u.id;
Modeling Schema Versions in TypeScript
When using a query builder like Kysely, ask AI to generate the TypeScript types that match your schema:
Generate Kysely database types for this PostgreSQL schema: [paste schema]. Include the Database interface and all table interfaces. Use strict types — no any.
import { ColumnType, Generated, Insertable, Selectable, Updateable } from 'kysely';
export interface Database {
users: UsersTable;
organizations: OrganizationsTable;
org_members: OrgMembersTable;
projects: ProjectsTable;
tasks: TasksTable;
task_comments: TaskCommentsTable;
task_activity: TaskActivityTable;
labels: LabelsTable;
task_labels: TaskLabelsTable;
}
interface UsersTable {
id: Generated<string>;
email: string;
name: string;
avatar_url: string | null;
created_at: ColumnType<Date, never, never>; // auto-set, never inserted or updated manually
}
interface TasksTable {
id: Generated<string>;
project_id: string;
parent_id: string | null;
title: string;
description: string | null;
status: 'todo' | 'in-progress' | 'done';
priority: 'low' | 'medium' | 'high' | 'urgent';
assignee_id: string | null;
due_date: ColumnType<Date | null, string | null, string | null>;
created_by: string;
created_at: ColumnType<Date, never, never>;
updated_at: ColumnType<Date, never, string>;
}
// Derived types for use in application code
export type User = Selectable<UsersTable>;
export type NewUser = Insertable<UsersTable>;
export type UserUpdate = Updateable<UsersTable>;
export type Task = Selectable<TasksTable>;
export type NewTask = Insertable<TasksTable>;
export type TaskUpdate = Updateable<TasksTable>;
Part 6: Prompt Patterns That Work
Database design prompts have a higher failure rate than code generation prompts because the stakes are higher and the context is more domain-specific. These patterns consistently produce better output.
Always Include Your Constraints
Generic schema prompts produce generic schemas. Specific constraints produce specific, useful designs:
// Include as many as apply:
- Database: PostgreSQL 15 / MySQL 8 / SQLite
- Scale expectation: 100 users / 1M rows / 10M rows
- Read/write ratio: mostly reads / mostly writes / balanced
- Consistency requirements: eventual OK / strong required
- Specific query patterns: [list your most common queries]
- Existing tables it must integrate with: [paste relevant schema]
- Non-negotiable constraints: multi-tenancy, soft deletes, audit log, etc.
Ask for Rationale, Not Just SQL
"Write the schema" produces SQL you have to trust. "Write the schema and explain each design decision" produces SQL you can evaluate:
Design the schema and for each table, explain: why it's structured this way, what constraint enforces the key business rule, and what the most common query against it will look like.
Use AI to Review Your Own Schema
If you've already designed a schema, AI is useful as a second opinion:
Here's a schema I designed: [paste schema]. I'm going to build on this for the next 2 years. What are the three most likely things I'll regret about this schema in 18 months?
The "what will I regret" framing reliably produces forward-looking critique rather than surface-level nitpicks. AI will think about growth, changing requirements, and schema evolution — not just whether the SQL is syntactically correct.
Iterating on a Schema in One Conversation
Keep the schema in the conversation and revise it incrementally. Paste the current state, ask for a specific change, paste the updated version back:
Here's the current schema after your suggestions: [paste revised schema]. Now add support for task templates — reusable task structures that can be instantiated into a project. Templates belong to an organization. A template can have subtasks. Show only the new tables needed.
This iterative pattern — revise, paste back, extend — keeps AI grounded in what actually exists rather than reimagining the schema from scratch on every turn.
The Schema Is a Design Document
The best database design conversations with AI feel like talking with an experienced DBA: you describe what the system needs to do, they propose a structure, you challenge it, and you land on something better than either of you would have reached alone. The difference is that this DBA is available at 11pm before a product launch and charges nothing for a second opinion.
Use the critique step. Ask for rationale. Request the migration, not just the schema. The fifteen minutes of design conversation before you write the first CREATE TABLE are the highest-leverage database work you'll do.
AI-Assisted Database Design — Summary
- Design first, code second — Describe your domain in plain language. Let AI draft the schema. Then critique before building anything on top of it.
- The critique prompt — Ask for missing constraints, normalization issues, scalability problems, and what you'll regret in 18 months. Do this before committing to any design.
- Ask for tradeoffs — When a design decision is non-obvious (adjacency list vs. closure table, single activity table vs. typed events), ask AI to compare approaches for your specific requirements.
- Indexes follow queries — List your most common queries, then ask for indexing recommendations. Don't index in the abstract.
- Partial indexes — If you filter on a low-cardinality column, ask whether a partial index fits. They're smaller, faster, and underused.
- Safe migrations — Before running any DDL on a large table, ask AI whether it will lock and how to make it safe. The expand-contract pattern avoids downtime.
- Always get the DOWN migration — Writing it forces reversibility thinking. If it's impossible, that's important to know before running UP.
- Impact analysis first — Before changing a production schema, ask what will break. AI reads foreign key relationships and application patterns and surfaces breakage before you run anything.
Related Guides
Build a REST API from Spec to Deployment
A complete project that puts database design into practice: schema, migrations, typed queries, integration tests, and Docker deployment.
Testing with AI
Covers integration testing against real databases specifically: in-memory setup, seed data patterns, and testing migrations safely.