Source: MarkTechPost
In this tutorial, we walk through an advanced, end-to-end exploration of Polyfactory, focusing on how we can generate rich, realistic mock data directly from Python type hints. We start by setting up the environment and progressively build factories for data classes, Pydantic models, and attrs-based classes, while demonstrating customization, overrides, calculated fields, and the generation of nested objects. As we move through each snippet, we show how we can control randomness, enforce constraints, and model real-world structures, making this tutorial directly applicable to testing, prototyping, and data-driven development workflows. Check out the FULL CODES here.
import subprocess import sys def install_package(package): subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package]) packages = [ "polyfactory", "pydantic", "email-validator", "faker", "msgspec", "attrs" ] for package in packages: try: install_package(package) print(f"✓ Installed {package}") except Exception as e: print(f"✗ Failed to install {package}: {e}") print("n") print("=" * 80) print("SECTION 2: Basic Dataclass Factories") print("=" * 80) from dataclasses import dataclass from typing import List, Optional from datetime import datetime, date from uuid import UUID from polyfactory.factories import DataclassFactory @dataclass class Address: street: str city: str country: str zip_code: str @dataclass class Person: id: UUID name: str email: str age: int birth_date: date is_active: bool address: Address phone_numbers: List[str] bio: Optional[str] = None class PersonFactory(DataclassFactory[Person]): pass person = PersonFactory.build() print(f"Generated Person:") print(f" ID: {person.id}") print(f" Name: {person.name}") print(f" Email: {person.email}") print(f" Age: {person.age}") print(f" Address: {person.address.city}, {person.address.country}") print(f" Phone Numbers: {person.phone_numbers[:2]}") print() people = PersonFactory.batch(5) print(f"Generated {len(people)} people:") for i, p in enumerate(people, 1): print(f" {i}. {p.name} - {p.email}") print("n")
We set up the environment and ensure all required dependencies are installed. We also introduce the core idea of using Polyfactory to generate mock data from type hints. By initializing the basic dataclass factories, we establish the foundation for all subsequent examples.
print("=" * 80) print("SECTION 3: Customizing Factory Behavior") print("=" * 80) from faker import Faker from polyfactory.fields import Use, Ignore @dataclass class Employee: employee_id: str full_name: str department: str salary: float hire_date: date is_manager: bool email: str internal_notes: Optional[str] = None class EmployeeFactory(DataclassFactory[Employee]): __faker__ = Faker(locale="en_US") __random_seed__ = 42 @classmethod def employee_id(cls) -> str: return f"EMP-{cls.__random__.randint(10000, 99999)}" @classmethod def full_name(cls) -> str: return cls.__faker__.name() @classmethod def department(cls) -> str: departments = ["Engineering", "Marketing", "Sales", "HR", "Finance"] return cls.__random__.choice(departments) @classmethod def salary(cls) -> float: return round(cls.__random__.uniform(50000, 150000), 2) @classmethod def email(cls) -> str: return cls.__faker__.company_email() employees = EmployeeFactory.batch(3) print("Generated Employees:") for emp in employees: print(f" {emp.employee_id}: {emp.full_name}") print(f" Department: {emp.department}") print(f" Salary: ${emp.salary:,.2f}") print(f" Email: {emp.email}") print() print() print("=" * 80) print("SECTION 4: Field Constraints and Calculated Fields") print("=" * 80) @dataclass class Product: product_id: str name: str description: str price: float discount_percentage: float stock_quantity: int final_price: Optional[float] = None sku: Optional[str] = None class ProductFactory(DataclassFactory[Product]): @classmethod def product_id(cls) -> str: return f"PROD-{cls.__random__.randint(1000, 9999)}" @classmethod def name(cls) -> str: adjectives = ["Premium", "Deluxe", "Classic", "Modern", "Eco"] nouns = ["Widget", "Gadget", "Device", "Tool", "Appliance"] return f"{cls.__random__.choice(adjectives)} {cls.__random__.choice(nouns)}" @classmethod def price(cls) -> float: return round(cls.__random__.uniform(10.0, 1000.0), 2) @classmethod def discount_percentage(cls) -> float: return round(cls.__random__.uniform(0, 30), 2) @classmethod def stock_quantity(cls) -> int: return cls.__random__.randint(0, 500) @classmethod def build(cls, **kwargs): instance = super().build(**kwargs) if instance.final_price is None: instance.final_price = round( instance.price * (1 - instance.discount_percentage / 100), 2 ) if instance.sku is None: name_part = instance.name.replace(" ", "-").upper()[:10] instance.sku = f"{instance.product_id}-{name_part}" return instance products = ProductFactory.batch(3) print("Generated Products:") for prod in products: print(f" {prod.sku}") print(f" Name: {prod.name}") print(f" Price: ${prod.price:.2f}") print(f" Discount: {prod.discount_percentage}%") print(f" Final Price: ${prod.final_price:.2f}") print(f" Stock: {prod.stock_quantity} units") print() print()
We focus on generating simple but realistic mock data using dataclasses and default Polyfactory behavior. We show how to quickly create single instances and batches without writing any custom logic. It helps us validate how Polyfactory automatically interprets type hints to populate nested structures.
print("=" * 80) print("SECTION 6: Complex Nested Structures") print("=" * 80) from enum import Enum class OrderStatus(str, Enum): PENDING = "pending" PROCESSING = "processing" SHIPPED = "shipped" DELIVERED = "delivered" CANCELLED = "cancelled" @dataclass class OrderItem: product_name: str quantity: int unit_price: float total_price: Optional[float] = None @dataclass class ShippingInfo: carrier: str tracking_number: str estimated_delivery: date @dataclass class Order: order_id: str customer_name: str customer_email: str status: OrderStatus items: List[OrderItem] order_date: datetime shipping_info: Optional[ShippingInfo] = None total_amount: Optional[float] = None notes: Optional[str] = None class OrderItemFactory(DataclassFactory[OrderItem]): @classmethod def product_name(cls) -> str: products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones", "Webcam", "USB Cable", "Phone Case", "Charger", "Tablet"] return cls.__random__.choice(products) @classmethod def quantity(cls) -> int: return cls.__random__.randint(1, 5) @classmethod def unit_price(cls) -> float: return round(cls.__random__.uniform(5.0, 500.0), 2) @classmethod def build(cls, **kwargs): instance = super().build(**kwargs) if instance.total_price is None: instance.total_price = round(instance.quantity * instance.unit_price, 2) return instance class ShippingInfoFactory(DataclassFactory[ShippingInfo]): @classmethod def carrier(cls) -> str: carriers = ["FedEx", "UPS", "DHL", "USPS"] return cls.__random__.choice(carriers) @classmethod def tracking_number(cls) -> str: return ''.join(cls.__random__.choices('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=12)) class OrderFactory(DataclassFactory[Order]): @classmethod def order_id(cls) -> str: return f"ORD-{datetime.now().year}-{cls.__random__.randint(100000, 999999)}" @classmethod def items(cls) -> List[OrderItem]: return OrderItemFactory.batch(cls.__random__.randint(1, 5)) @classmethod def build(cls, **kwargs): instance = super().build(**kwargs) if instance.total_amount is None: instance.total_amount = round(sum(item.total_price for item in instance.items), 2) if instance.shipping_info is None and instance.status in [OrderStatus.SHIPPED, OrderStatus.DELIVERED]: instance.shipping_info = ShippingInfoFactory.build() return instance orders = OrderFactory.batch(2) print("Generated Orders:") for order in orders: print(f"n Order {order.order_id}") print(f" Customer: {order.customer_name} ({order.customer_email})") print(f" Status: {order.status.value}") print(f" Items ({len(order.items)}):") for item in order.items: print(f" - {item.quantity}x {item.product_name} @ ${item.unit_price:.2f} = ${item.total_price:.2f}") print(f" Total: ${order.total_amount:.2f}") if order.shipping_info: print(f" Shipping: {order.shipping_info.carrier} - {order.shipping_info.tracking_number}") print("n")
We build more complex domain logic by introducing calculated and dependent fields within factories. We show how we can derive values such as final prices, totals, and shipping details after object creation. This allows us to model realistic business rules directly inside our test data generators.
print("=" * 80) print("SECTION 7: Attrs Integration") print("=" * 80) import attrs from polyfactory.factories.attrs_factory import AttrsFactory @attrs.define class BlogPost: title: str author: str content: str views: int = 0 likes: int = 0 published: bool = False published_at: Optional[datetime] = None tags: List[str] = attrs.field(factory=list) class BlogPostFactory(AttrsFactory[BlogPost]): @classmethod def title(cls) -> str: templates = [ "10 Tips for {}", "Understanding {}", "The Complete Guide to {}", "Why {} Matters", "Getting Started with {}" ] topics = ["Python", "Data Science", "Machine Learning", "Web Development", "DevOps"] template = cls.__random__.choice(templates) topic = cls.__random__.choice(topics) return template.format(topic) @classmethod def content(cls) -> str: return " ".join(Faker().sentences(nb=cls.__random__.randint(3, 8))) @classmethod def views(cls) -> int: return cls.__random__.randint(0, 10000) @classmethod def likes(cls) -> int: return cls.__random__.randint(0, 1000) @classmethod def tags(cls) -> List[str]: all_tags = ["python", "tutorial", "beginner", "advanced", "guide", "tips", "best-practices", "2024"] return cls.__random__.sample(all_tags, k=cls.__random__.randint(2, 5)) posts = BlogPostFactory.batch(3) print("Generated Blog Posts:") for post in posts: print(f"n '{post.title}'") print(f" Author: {post.author}") print(f" Views: {post.views:,} | Likes: {post.likes:,}") print(f" Published: {post.published}") print(f" Tags: {', '.join(post.tags)}") print(f" Preview: {post.content[:100]}...") print("n") print("=" * 80) print("SECTION 8: Building with Specific Overrides") print("=" * 80) custom_person = PersonFactory.build( name="Alice Johnson", age=30, email="[email protected]" ) print(f"Custom Person:") print(f" Name: {custom_person.name}") print(f" Age: {custom_person.age}") print(f" Email: {custom_person.email}") print(f" ID (auto-generated): {custom_person.id}") print() vip_customers = PersonFactory.batch( 3, bio="VIP Customer" ) print("VIP Customers:") for customer in vip_customers: print(f" {customer.name}: {customer.bio}") print("n")
We extend Polyfactory usage to validated Pydantic models and attrs-based classes. We demonstrate how we can respect field constraints, validators, and default behaviors while still generating valid data at scale. It ensures our mock data remains compatible with real application schemas.
print("=" * 80) print("SECTION 9: Field-Level Control with Use and Ignore") print("=" * 80) from polyfactory.fields import Use, Ignore @dataclass class Configuration: app_name: str version: str debug: bool created_at: datetime api_key: str secret_key: str class ConfigFactory(DataclassFactory[Configuration]): app_name = Use(lambda: "MyAwesomeApp") version = Use(lambda: "1.0.0") debug = Use(lambda: False) @classmethod def api_key(cls) -> str: return f"api_key_{''.join(cls.__random__.choices('0123456789abcdef', k=32))}" @classmethod def secret_key(cls) -> str: return f"secret_{''.join(cls.__random__.choices('0123456789abcdef', k=64))}" configs = ConfigFactory.batch(2) print("Generated Configurations:") for config in configs: print(f" App: {config.app_name} v{config.version}") print(f" Debug: {config.debug}") print(f" API Key: {config.api_key[:20]}...") print(f" Created: {config.created_at}") print() print() print("=" * 80) print("SECTION 10: Model Coverage Testing") print("=" * 80) from pydantic import BaseModel, ConfigDict from typing import Union class PaymentMethod(BaseModel): model_config = ConfigDict(use_enum_values=True) type: str card_number: Optional[str] = None bank_name: Optional[str] = None verified: bool = False class PaymentMethodFactory(ModelFactory[PaymentMethod]): __model__ = PaymentMethod payment_methods = [ PaymentMethodFactory.build(type="card", card_number="4111111111111111"), PaymentMethodFactory.build(type="bank", bank_name="Chase Bank"), PaymentMethodFactory.build(verified=True), ] print("Payment Method Coverage:") for i, pm in enumerate(payment_methods, 1): print(f" {i}. Type: {pm.type}") if pm.card_number: print(f" Card: {pm.card_number}") if pm.bank_name: print(f" Bank: {pm.bank_name}") print(f" Verified: {pm.verified}") print("n") print("=" * 80) print("TUTORIAL SUMMARY") print("=" * 80) print(""" This tutorial covered: 1. ✓ Basic Dataclass Factories - Simple mock data generation 2. ✓ Custom Field Generators - Controlling individual field values 3. ✓ Field Constraints - Using PostGenerated for calculated fields 4. ✓ Pydantic Integration - Working with validated models 5. ✓ Complex Nested Structures - Building related objects 6. ✓ Attrs Support - Alternative to dataclasses 7. ✓ Build Overrides - Customizing specific instances 8. ✓ Use and Ignore - Explicit field control 9. ✓ Coverage Testing - Ensuring comprehensive test data Key Takeaways: - Polyfactory automatically generates mock data from type hints - Customize generation with classmethods and decorators - Supports multiple libraries: dataclasses, Pydantic, attrs, msgspec - Use PostGenerated for calculated/dependent fields - Override specific values while keeping others random - Perfect for testing, development, and prototyping For more information: - Documentation: https://polyfactory.litestar.dev/ - GitHub: https://github.com/litestar-org/polyfactory """) print("=" * 80)
We cover advanced usage patterns such as explicit overrides, constant field values, and coverage testing scenarios. We show how we can intentionally construct edge cases and variant instances for robust testing. This final step ties everything together by demonstrating how Polyfactory supports comprehensive and production-grade test data strategies.
In conclusion, we demonstrated how Polyfactory enables us to create comprehensive, flexible test data with minimal boilerplate while still retaining fine-grained control over every field. We showed how to handle simple entities, complex nested structures, and Pydantic model validation, as well as explicit field overrides, within a single, consistent factory-based approach. Overall, we found that Polyfactory enables us to move faster and test more confidently, as it reliably generates realistic datasets that closely mirror production-like scenarios without sacrificing clarity or maintainability.
Check out the FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

