TL;DR
A new tool called DDL to Data converts SQL CREATE TABLE statements into populated test datasets, preserving foreign-key links and honoring types and constraints. The core engine uses deterministic pattern matching; an optional AI "Story Mode" can add narrative-consistent trends.
What happened
The author launched DDL to Data to address teams that need populated staging or test databases without pulling production copies or maintaining custom seed scripts. Users paste CREATE TABLE definitions and receive realistic-looking rows: fields that resemble emails or timestamps, uniqueness maintained, and foreign key relationships preserved. The service requires no local setup and targets PostgreSQL and MySQL outputs. The underlying generator is deterministic pattern matching and runs quickly with no token costs; an opt-in Story Mode layers AI to produce higher-level narratives such as seasonal churn. In discussion about scaling, the developer noted practical choices for large exports — streaming generation to avoid memory bloat, Parquet for compressed storage, batched SQL inserts, or direct COPY operations for speed — and described FK handling and parallelization limits for very large row counts.
Why it matters
- Avoids risky production data copies and the operational overhead of masking and security reviews.
- Generates complete, constraint-respecting datasets from schema alone, saving time versus hand-written seeds.
- No client-side integration required, unlike libraries such as Faker where field-by-field configuration is needed.
- Deterministic engine runs quickly and does not incur AI token costs unless Story Mode is enabled.
Key facts
- Input: paste CREATE TABLE statements; output: populated test data.
- Preserves foreign key relationships and honors uniqueness constraints.
- Generates realistic values (for example, emails and reasonable timestamps) rather than purely random strings.
- No setup or configuration required; works with PostgreSQL and MySQL.
- Core engine uses deterministic pattern matching and executes in milliseconds, according to the author.
- Optional Story Mode uses AI to produce narrative-coherent datasets (e.g., seasonal trends).
- For large exports, the developer recommends streaming to avoid holding all rows in memory.
- Format and write strategies discussed: Parquet for compression, batch SQL inserts (~1,000 rows/statement), and direct DB COPY for fastest ingestion.
- Foreign-key handling at scale: pre-generate parent primary keys and reference them for child rows.
- Parallel generation is straightforward, but serialized writes are a bottleneck; chunk-then-merge is being considered but not shipped.
What to watch next
- Pricing and commercial terms — the developer said they are still working this out.
- Implementation of chunk-then-merge or other approaches to reduce write-time bottlenecks (on the roadmap).
- Broader database support beyond PostgreSQL and MySQL — not confirmed in the source.
Quick glossary
- DDL: Data Definition Language — SQL statements (like CREATE TABLE) that define database schemas and structures.
- Foreign key: A column or set of columns in one table that reference primary key values in another table to enforce referential integrity.
- Parquet: A columnar storage file format that provides efficient compression and on-disk layout for large datasets.
- Faker: A commonly used library for generating synthetic data programmatically; requires coding and per-field configuration.
- COPY: A bulk import/export database operation (commonly used in PostgreSQL) that can load data efficiently without per-row SQL overhead.
Reader FAQ
Does DDL to Data use AI to generate the data?
The core generator is deterministic pattern matching; an optional Story Mode uses AI for narrative-coherent datasets.
Which databases does it support?
The source states it works with PostgreSQL and MySQL.
Is there local setup or configuration needed?
No setup or config is required according to the source.
Can it handle very large datasets (for example, millions of rows)?
The developer outlined scaling considerations — streaming to avoid memory pressure, using Parquet or COPY for fast writes, and special FK handling — but full production-scale behaviors depend on implementation choices.
How much does it cost?
The developer said pricing is still being figured out.
I built DDL to Data after repeatedly pushing back on "just use production data and mask it" requests. Teams needed populated databases for testing, but pulling prod meant security reviews,…
Sources
- Show HN: DDL to Data – Generate realistic test data from SQL schemas
- DDL to Data – Stop Copying Production Data Into Dev
- SQL Data Generator: Generate realistic test data fast
- Generating DDL/DML scripts for all tables/columns in a …
Related posts
- Dell’s 52-inch UltraSharp packs four virtual monitors into one 6K display
- Pila launches stylish home battery preorders, deliveries begin Feb 2026
- Dell unveils 52-inch UltraSharp 6K Thunderbolt Hub display with 21:9 curved 120Hz panel