How It Works
pyfake inspects a Pydantic model's field metadata at runtime and produces realistic fake values that satisfy every declared constraint — bounds, length, format, defaults, and more.
The Pipeline
%% flowchart TD
%% A[Pyfake.from_schema(MyModel)]
%% B[Engine.generate(MyModel)]
%% C[Resolver.resolve(FieldInfo)\nannotation → schema dict]
%% D[GeneratorRegistry._generate(schema)\ndispatches by type / format]
%% E[Generator function\ngenerate_int, generate_str, generate_uuid4 …]
%% F[generated value]
%% A --> B
%% B -->|iterates model.model_fields (for each field)| C
%% C --> D
%% D --> E
%% E --> F
flowchart TD
A["Pyfake.from_schema(MyModel)"]
B["Engine.generate(MyModel)"]
C["Resolver.resolve(FieldInfo)\nannotation -> schema"]
D["GeneratorRegistry._generate(schema)\ndispatch by type"]
E["Generator function\n(int, str, uuid4, ...)"]
F["Generated value"]
A --> B --> C --> D --> E --> F
Components
Pyfake — Public API
The Pyfake class is the user-facing entry point. It holds a Context and an Engine, both bound to the model you pass in.
from pyfake import Pyfake
# Class method shortcut — no need to instantiate manually
result = Pyfake.from_schema(MyModel, num=5, seed=42)
# Or instantiate first for repeated generation
fake = Pyfake(MyModel, seed=42)
fake.generate(num=3)
The optional seed parameter is forwarded to the Context, making generation fully deterministic and reproducible.
When as_dict=True (the default), instances are returned as plain dicts via .model_dump(). Set as_dict=False to get back the actual Pydantic model instances.
Engine — Orchestration
Engine knows nothing about types or constraints. Its only job is to walk model.model_fields and ask the GeneratorRegistry for a value for each field.
# Internally, Engine does roughly this:
data = {}
for field_name, field_info in MyModel.model_fields.items():
data[field_name] = registry.generate(field_info)
return data
The resulting data dict is passed directly to the model's constructor, so Pydantic validates the output before it is returned to you.
Resolver — Type Resolution
Resolver takes a FieldInfo object and converts its annotation into a schema dict — a plain Python dict that encodes everything the generators need to know.
It resolves annotations recursively, handling:
| Annotation form | Resolved as |
|---|---|
int, str, float, bool, uuid.UUID |
primitive schema node |
Optional[T] / T | None |
union node with nullable=True |
Union[A, B] |
union node with multiple variants |
List[T], Set[T] |
container node with an items sub-schema |
Tuple[A, B] / Tuple[T, ...] |
fixed or variable-length tuple |
Dict[K, V] |
dict node with keys and values sub-schemas |
Literal["a", "b"] |
literal node with a values list |
Enum subclasses |
enum node |
Nested BaseModel |
model node, recursively resolved |
Annotated[T, Field(...)] |
delegates to the inner type, merging constraints |
Constraints extracted from Field(...) — like ge, lt, min_length, max_length, multiple_of, decimal_places, pattern, format — are collected into a GeneratorArgs instance and attached to the schema node.
Schema node shape
Each resolved node is a dict with at minimum a type key and a generator_args key. Complex types add their own keys:
# primitive
{"type": str, "generator_args": GeneratorArgs(min_length=3, max_length=20)}
# union / Optional
{"type": "union", "nullable": True, "variants": [...]}
# list
{"type": list, "items": {...}, "generator_args": GeneratorArgs()}
# nested model
{"type": "model", "model": Address, "fields": {"street": {...}, "city": {...}}}
GeneratorRegistry — Dispatch
The registry receives a schema node and routes it to the correct generator function. The dispatch order is:
- Default shortcut — if
generator_args.defaultis set, return it immediately. - Union — pick a random variant; occasionally return
Nonefor nullable unions. - Literal / Enum — pick a random value from the declared set.
- Container types (
list,set,dict,tuple) — generate the appropriate number of items by recursing into sub-schemas. - Nested model — recurse into its
fieldsdict and construct the model. - Format-based dispatch — if
generator_args.formatis set (e.g."uuid4","date-time"), look up the matching generator by that string key. - Primitive type dispatch — map the Python type to a generator via the
_type_maptable.
Below are the built-in type and format mappings:
| Key | Generator |
|---|---|
integer |
generate_int |
number |
generate_float |
string |
generate_str |
bool |
generate_bool |
uuid / uuid4 |
generate_uuid4 |
uuid1 … uuid8 |
generate_uuid1 … generate_uuid8 |
date |
generate_date |
date-time |
generate_datetime |
time |
generate_time |
If no match is found after all steps, None is returned.
Context — Shared Random State
Every generator receives a shared Context instance. It carries a single random.Random object, which all generators use instead of the global random module.
# Unseeded — different output each run
fake = Pyfake(MyModel)
# Seeded — identical output every run
fake = Pyfake(MyModel, seed=123)
Passing a seed through the Pyfake constructor guarantees that all fields, across all generated instances, use the same seeded random stream — making test fixtures fully reproducible.