Python performance and memory numbers every programmer should know

TL;DR

A set of microbenchmarks and memory measurements for CPython 3.14.2 run on a Mac Mini M4 Pro provides concrete latencies and sizes for common Python operations and objects. The author publishes a large table (and graphs) covering memory footprints, basic ops, collections, JSON libraries, web frameworks, file I/O, databases, and async primitives; benchmark code is available on GitHub.

What happened

The author ran a battery of microbenchmarks on CPython 3.14.2 on a Mac Mini M4 Pro (macOS Tahoe) and recorded timings and memory usage for many everyday Python primitives and libraries. Results include object sizes (for example, an empty process at 15.73 MB, a small int at 28 bytes, a float at 24 bytes, and a 100-character string at 141 bytes), operation latencies (integer addition ~19 ns, list append ~28.7 ns), and collection behaviors (dict lookup ~21.9 ns, set membership ~19.0 ns, a 1,000-item list membership check ~3.85 μs). The report also compares JSON serializers (orjson, ujson, msgspec, built-in json), web frameworks (FastAPI, Starlette, Flask, Django, Litestar), file I/O, and database operations (SQLite and MongoDB). The benchmark code and data are posted to a public GitHub repository for inspection.

Why it matters

Concrete microbenchmarks help choose the right data structure when performance or memory use matters (e.g., prefer dict/set membership over list scans for large collections).
Knowing object memory footprints enables better estimates for caches and in-memory data models; __slots__ can cut memory costs dramatically for many instances.
Library-level differences (JSON encoders, web frameworks) can be measured and compared directly rather than assumed, which matters for high-throughput services.
Measured async and I/O costs show where event-loop or blocking overheads may affect throughput and where optimizations could have real impact.

Key facts

Empty Python process: 15.73 MB.
Small and large ints reported as 28 bytes; a very large int (10**100) measured at 72 bytes.
Float object size: 24 bytes.
Empty list overhead: 56 bytes; a list with 1,000 ints ≈ 35.2 KB, with 1,000 floats ≈ 32.1 KB.
List append: 28.7 ns (~34.8M ops/sec); concatenation (small) ~39.1 ns.
Dict lookup by key: 21.9 ns; set membership: 19.0 ns; list membership for 1,000 items: 3.85 μs.
List comprehension for 1,000 items: 9.45 μs versus equivalent for-loop with append: 11.9 μs.
JSON: built-in json.dumps/loads (simple) ~708 ns/714 ns; orjson.dumps (complex) ~310 ns; msgspec encode (complex) ~445 ns.
Web endpoints returning JSON: FastAPI ~8.63 μs, Starlette ~8.01 μs, Flask ~16.5 μs, Django ~18.1 μs.
File ops: open+close ~9.05 μs; read 1KB ~10.0 μs; write 1MB ~207 μs. SQLite insert (JSON blob) ~192 μs; MongoDB insert_one ~119 μs.

What to watch next

Performance and memory numbers can vary across hardware, OS, and Python versions — direct portability of these numbers is not guaranteed (not confirmed in the source).
List growth/reallocation can add latency when appending under capacity pressure; the report notes reallocation can slow append operations.
How these microbenchmarks map to real-world application-level bottlenecks (end-to-end latency, concurrency effects, network I/O) is not specified (not confirmed in the source).

Quick glossary

CPython: The standard Python interpreter implementation written in C; the benchmarks were run on CPython 3.14.2.
__slots__: A class mechanism that prevents creation of per-instance __dict__ entries and can reduce memory usage for large numbers of instances.
orjson: A fast JSON serialization library for Python often used where performance is important; included among the compared JSON encoders.
list comprehension: A concise Python construct for building lists from iterables, often faster than an explicit for-loop with append.
microbenchmark: A small, targeted performance test that measures the cost of a single operation or tight set of operations, useful for comparing primitives.

Reader FAQ

Is the benchmark code available?
Yes — the author links a GitHub repository with the benchmark source.

Were benchmarks run on multiple platforms or Python versions?
No — results shown come from CPython 3.14.2 on a Mac Mini M4 Pro; cross-platform results are not provided (not confirmed in the source).

Should I always use __slots__ to save memory?
The data show large memory savings for many instances, but whether to use __slots__ depends on your program design and compatibility needs (detailed trade-offs are not fully enumerated in the source).

Which JSON library is best?
Benchmarks show orjson and msgspec perform faster than the built-in json for complex cases, but which is best depends on your serialization requirements and constraints.

Python Numbers Every Programmer Should Know 2025-12-31 performance python 13 min read There are numbers every Python programmer should know. For example, how fast or slow is it to add…

Python performance and memory numbers every programmer should know

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Why macOS Tahoe’s menu icons undermine clarity and consistency

Magnesium Supplements Crash Course (2026): Uses, Benefits, and Risks

Shogun Creator Says Upcoming Season 2 Will ‘Defy Expectations’

Leave a Reply Cancel reply

You missed

Capita tells civil servants to wait for chatbots to fix pension portal issues

Auditing my subscriptions for the New Year revealed $100 in monthly waste

Samsung Galaxy S26 could rise in price in South Korea but stay flat in US

Galaxy S26 Edge’s Return in Doubt After Indian Certification Listing Sparks Debate