DDL to Data — paste SQL CREATE TABLEs to instantly generate realistic test data

TL;DR

A new tool called DDL to Data converts SQL CREATE TABLE statements into populated test datasets, preserving foreign-key links and honoring types and constraints. The core engine uses deterministic pattern matching; an optional AI "Story Mode" can add narrative-consistent trends.

What happened

The author launched DDL to Data to address teams that need populated staging or test databases without pulling production copies or maintaining custom seed scripts. Users paste CREATE TABLE definitions and receive realistic-looking rows: fields that resemble emails or timestamps, uniqueness maintained, and foreign key relationships preserved. The service requires no local setup and targets PostgreSQL and MySQL outputs. The underlying generator is deterministic pattern matching and runs quickly with no token costs; an opt-in Story Mode layers AI to produce higher-level narratives such as seasonal churn. In discussion about scaling, the developer noted practical choices for large exports — streaming generation to avoid memory bloat, Parquet for compressed storage, batched SQL inserts, or direct COPY operations for speed — and described FK handling and parallelization limits for very large row counts.

Why it matters

Avoids risky production data copies and the operational overhead of masking and security reviews.
Generates complete, constraint-respecting datasets from schema alone, saving time versus hand-written seeds.
No client-side integration required, unlike libraries such as Faker where field-by-field configuration is needed.
Deterministic engine runs quickly and does not incur AI token costs unless Story Mode is enabled.

Key facts

Input: paste CREATE TABLE statements; output: populated test data.
Preserves foreign key relationships and honors uniqueness constraints.
Generates realistic values (for example, emails and reasonable timestamps) rather than purely random strings.
No setup or configuration required; works with PostgreSQL and MySQL.
Core engine uses deterministic pattern matching and executes in milliseconds, according to the author.
Optional Story Mode uses AI to produce narrative-coherent datasets (e.g., seasonal trends).
For large exports, the developer recommends streaming to avoid holding all rows in memory.
Format and write strategies discussed: Parquet for compression, batch SQL inserts (~1,000 rows/statement), and direct DB COPY for fastest ingestion.
Foreign-key handling at scale: pre-generate parent primary keys and reference them for child rows.
Parallel generation is straightforward, but serialized writes are a bottleneck; chunk-then-merge is being considered but not shipped.

What to watch next

Pricing and commercial terms — the developer said they are still working this out.
Implementation of chunk-then-merge or other approaches to reduce write-time bottlenecks (on the roadmap).
Broader database support beyond PostgreSQL and MySQL — not confirmed in the source.

Quick glossary

DDL: Data Definition Language — SQL statements (like CREATE TABLE) that define database schemas and structures.
Foreign key: A column or set of columns in one table that reference primary key values in another table to enforce referential integrity.
Parquet: A columnar storage file format that provides efficient compression and on-disk layout for large datasets.
Faker: A commonly used library for generating synthetic data programmatically; requires coding and per-field configuration.
COPY: A bulk import/export database operation (commonly used in PostgreSQL) that can load data efficiently without per-row SQL overhead.

Reader FAQ

Does DDL to Data use AI to generate the data?
The core generator is deterministic pattern matching; an optional Story Mode uses AI for narrative-coherent datasets.

Which databases does it support?
The source states it works with PostgreSQL and MySQL.

Is there local setup or configuration needed?
No setup or config is required according to the source.

Can it handle very large datasets (for example, millions of rows)?
The developer outlined scaling considerations — streaming to avoid memory pressure, using Parquet or COPY for fast writes, and special FK handling — but full production-scale behaviors depend on implementation choices.

How much does it cost?
The developer said pricing is still being figured out.

I built DDL to Data after repeatedly pushing back on "just use production data and mask it" requests. Teams needed populated databases for testing, but pulling prod meant security reviews,…

DDL to Data — paste SQL CREATE TABLEs to instantly generate realistic test data

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Embassy: A Rust-based, async-first framework for safe embedded applications

Pioneer Sphera brings Dolby Atmos audio to mainstream aftermarket drivers

Geofence-driven social app, six years in development, aiming for 2026

Leave a Reply Cancel reply

You missed

Cyera raises $400M Series F, now valued at $9B after six months

Bose open-sources SoundTouch speaker API, preserves some functions after EoL

Hackers Create Open Tools to Detect and Disrupt ICE Surveillance

Samsung Says Q4 Operating Profit Will Triple as Memory Prices Surge