The Complete Guide to Pydantic for Python Developers

the-complete-guide-to-pydantic-for-python-developers

Source: MachineLearningMastery.com

In this article, you will learn how to use Pydantic to validate, parse, and serialize structured data in Python using type hints.

Topics we will cover include:

  • Defining core models with type coercion and clear validation errors
  • Using optional fields, defaults, and Field constraints effectively
  • Writing custom validators, handling nested structures, and exporting JSON

Let’s not waste any more time.

The Complete Guide to Pydantic for Python Developers

The Complete Guide to Pydantic for Python Developers
Image by Editor

Introduction

Python’s flexibility with data types is convenient when coding, but it can lead to runtime errors when your code receives unexpected data formats. Such errors are especially common when you’re working with APIs, processing configuration files, or handling user input. Data validation, therefore, becomes necessary for building reliable applications.

Pydantic addresses this challenge by providing automatic data validation and serialization using Python’s type hint system, allowing you to define exactly what your data should look like and automatically enforcing those rules.

This article covers the basics of using Pydantic for data validation using type hints. Here’s what you’ll learn:

  • Creating and validating data structures with type hints
  • Handling optional fields and default values
  • Building custom validation logic for specific requirements
  • Working with nested models and complex data structures

Let’s begin with the basics. Before you proceed,

and follow along with the examples.

🔗 Link to the code on GitHub.

Basic Pydantic Models

Unlike manual data validation approaches that require writing extensive if-statements and type checks, Pydantic integrates well with your existing Python code. It uses Python’s type hints (which you might already be using) and transforms them into powerful validation logic.

When data doesn’t match your specifications, you get clear, actionable error messages instead of cryptic runtime exceptions. This reduces debugging time and makes your code more maintainable and self-documenting.

Pydantic models inherit from BaseModel and use Python type hints to define the expected data structure:

from pydantic import BaseModel

class User(BaseModel):

    name: str

    age: int

    email: str

# Create a user

user = User(name=“Alice”, age=“25”, email=“alice@example.com”)

print(user.age)

print(type(user.age))

Output:

This code defines a User model with three required fields. When creating a user instance, Pydantic automatically converts the string “25” to the integer 25. If conversion isn’t possible (like passing “abc” for age), it raises a validation error with a clear message about what went wrong. This automatic type coercion is particularly useful when working with JSON data or form inputs where everything arrives as strings.

Optional Fields and Defaults

Real-world data often has missing or optional fields. Pydantic handles this with Optional types and default values:

from pydantic import BaseModel, Field

from typing import Optional

class Product(BaseModel):

    name: str

    price: float

    description: Optional[str] = None

    in_stock: bool = True

    category: str = Field(default=“general”, min_length=1)

# All these work

product1 = Product(name=“Widget”, price=9.99)

product2 = Product(name=“Gadget”, price=15.50, description=“Useful tool”)

The Optional[str] type means description can be a string or None. Fields with default values don’t need to be provided when creating instances. The Field() function adds validation constraints.

Here it ensures category has at least one character. This flexibility allows your models to handle incomplete data gracefully while still enforcing important business rules.

Custom Validators in Pydantic

Sometimes you need validation logic beyond basic type checking. Validators let you implement custom rules:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

from pydantic import BaseModel, field_validator

import re

class Account(BaseModel):

    username: str

    email: str

    password: str

    @field_validator(‘username’)

    def validate_username(cls, v):

        if len(v) < 3:

            raise ValueError(‘Username must be at least 3 characters’)

        if not v.isalnum():

            raise ValueError(‘Username must be alphanumeric’)

        return v.lower()  # Normalize to lowercase

    @field_validator(’email’)

    def validate_email(cls, v):

        pattern = r‘^[w.-]+@[w.-]+.w+$’

        if not re.match(pattern, v):

            raise ValueError(‘Invalid email format’)

        return v

    @field_validator(‘password’)

    def validate_password(cls, v):

        if len(v) < 8:

            raise ValueError(‘Password must be at least 8 characters’)

        return v

account = Account(

    username=“JohnDoe123”,

    email=“john@example.com”,

    password=“secretpass123”

)

Validators run automatically during model creation. They can transform data (like converting usernames to lowercase) or reject invalid values with descriptive error messages.

The cls parameter gives access to the class, and v is the value being validated. Validators run in the order they’re defined and can access values from previously validated fields.

Nested Models and Complex Structures

Real applications deal with hierarchical data. Pydantic makes nested validation straightforward:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

from pydantic import BaseModel, field_validator

from typing import List, Optional

from datetime import datetime

class Address(BaseModel):

    street: str

    city: str

    state: str

    zip_code: str

    @field_validator(‘zip_code’)

    def validate_zip(cls, v):

        if not v.isdigit() or len(v) != 5:

            raise ValueError(‘ZIP code must be 5 digits’)

        return v

class Contact(BaseModel):

    name: str

    phone: str

    email: Optional[str] = None

class Company(BaseModel):

    name: str

    founded: datetime

    address: Address

    contacts: List[Contact]

    employee_count: int

    is_public: bool = False

# Complex nested data gets fully validated

company_data = {

    “name”: “Tech Corp”,

    “founded”: “2020-01-15T10:00:00”,

    “address”: {

        “street”: “123 Main St”,

        “city”: “San Francisco”,

        “state”: “CA”,

        “zip_code”: “94105”

    },

    “contacts”: [

        {“name”: “John Smith”, “phone”: “555-0123”},

        {“name”: “Jane Doe”, “phone”: “555-0456”, “email”: “jane@techcorp.com”}

    ],

    “employee_count”: 150

}

company = Company(**company_data)

Pydantic validates the entire structure recursively. The address gets validated according to the Address model rules, each contact in the contacts list is validated as a Contact model, and the datetime string is automatically parsed. If any part of the nested structure is invalid, you get a detailed error showing exactly where the problem occurs.

If all goes well, the company object will look like:

Company(name=‘Tech Corp’, founded=datetime.datetime(2020, 1, 15, 10, 0), address=Address(street=‘123 Main St’, city=‘San Francisco’, state=‘CA’, zip_code=‘94105’), contacts=[Contact(name=‘John Smith’, phone=‘555-0123’, email=None), Contact(name=‘Jane Doe’, phone=‘555-0456’, email=‘jane@techcorp.com’)], employee_count=150, is_public=False)

Working with APIs and JSON

Pydantic works well in handling API responses and JSON data, which often comes in unpredictable formats.

This example shows handling typical API challenges: mixed data types (age as string), various datetime formats, and optional fields:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

from pydantic import BaseModel, Field, field_validator

from typing import Union, Optional

from datetime import datetime

import json

class APIResponse(BaseModel):

    status: str

    message: Optional[str] = None

    data: Optional[dict] = None

    timestamp: datetime = Field(default_factory=datetime.now)

class UserProfile(BaseModel):

    id: int

    username: str

    full_name: Optional[str] = None

    age: Optional[int] = Field(None, ge=0, le=150)  # Age constraints

    created_at: Union[datetime, str]  # Handle multiple formats

    is_verified: bool = False

    @field_validator(‘created_at’, mode=‘before’)

    def parse_created_at(cls, v):

        if isinstance(v, str):

            try:

                return datetime.fromisoformat(v.replace(‘Z’, ‘+00:00’))

            except ValueError:

                raise ValueError(‘Invalid datetime format’)

        return v

# Simulate API response

api_json =

{

    “status”: “success”,

    “data”: {

        “id”: 123,

        “username”: “alice_dev”,

        “full_name”: “Alice Johnson”,

        “age”: “28”,

        “created_at”: “2023-01-15T10:30:00Z”,

        “is_verified”: true

    }

}

response_data = json.loads(api_json)

api_response = APIResponse(**response_data)

if api_response.data:

    user = UserProfile(**api_response.data)

    print(f“User {user.username} created at {user.created_at}”)

When you load the JSON response and create the user object, you’ll get the following output:

User alice_dev created at 20230115 10:30:00+00:00

The mode='before' parameter on validators means they run before type conversion, allowing you to handle string inputs before they’re converted to the target type. Field constraints like ge=0, le=150 ensure age values are reasonable.

Error Handling and Validation

When validation fails, Pydantic provides structured error information:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

from pydantic import BaseModel, ValidationError, field_validator

from typing import List

class Order(BaseModel):

    order_id: int

    customer_email: str

    items: List[str]

    total: float

    @field_validator(‘total’)

    def positive_total(cls, v):

        if v <= 0:

            raise ValueError(‘Total must be positive’)

        return v

# Invalid data

bad_data = {

    “order_id”: “not_a_number”,

    “customer_email”: “invalid_email”,

    “items”: “should_be_list”,

    “total”: 10.50

}

try:

    order = Order(**bad_data)

except ValidationError as e:

    print(“Validation errors:”)

    for error in e.errors():

        field = error[‘loc’][0]

        message = error[‘msg’]

        print(f”  {field}: {message}”)

    # Get JSON representation of errors

    print(“nJSON errors:”)

    print(e.json(indent=2))

Output:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

Validation errors:

  order_id: Input should be a valid integer, unable to parse string as an integer

  items: Input should be a valid list

  total: Value error, Total must be positive

JSON errors:

[

  {

    “type”: “int_parsing”,

    “loc”: [

      “order_id”

    ],

    “msg”: “Input should be a valid integer, unable to parse string as an integer”,

    “input”: “not_a_number”,

    “url”: “https://errors.pydantic.dev/2.11/v/int_parsing”

  },

  {

    “type”: “list_type”,

    “loc”: [

      “items”

    ],

    “msg”: “Input should be a valid list”,

    “input”: “should_be_list”,

    “url”: “https://errors.pydantic.dev/2.11/v/list_type”

  },

  {

    “type”: “value_error”,

    “loc”: [

      “total”

    ],

    “msg”: “Value error, Total must be positive”,

    “input”: 10.5,

    “ctx”: {

      “error”: “Total must be positive”

    },

    “url”: “https://errors.pydantic.dev/2.11/v/value_error”

  }

]

Pydantic’s error objects contain detailed information about what went wrong and where. Each error includes the field location, error type, and a human-readable message. This makes it easy to provide meaningful feedback to users or log detailed error information for debugging.

Serialization and Export

Converting models back to dictionaries or JSON is straightforward:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

from pydantic import BaseModel

from datetime import datetime

class Event(BaseModel):

    name: str

    date: datetime

    attendees: int

    is_public: bool = True

event = Event(

    name=“Python Meetup”,

    date=datetime(2024, 3, 15, 18, 30),

    attendees=45

)

# Export to dictionary

event_dict = event.model_dump()

print(event_dict)

# Export to JSON string

event_json = event.model_dump_json()

print(event_json)

# Export with exclusions

public_data = event.model_dump(exclude={‘attendees’})

print(public_data)

# Export with custom serialization

formatted_json = event.model_dump_json(indent=2)

print(formatted_json)

Output:

{‘name’: ‘Python Meetup’, ‘date’: datetime.datetime(2024, 3, 15, 18, 30), ‘attendees’: 45, ‘is_public’: True}

{“name”:“Python Meetup”,“date”:“2024-03-15T18:30:00”,“attendees”:45,“is_public”:true}

{‘name’: ‘Python Meetup’, ‘date’: datetime.datetime(2024, 3, 15, 18, 30), ‘is_public’: True}

{

  “name”: “Python Meetup”,

  “date”: “2024-03-15T18:30:00”,

  “attendees”: 45,

  “is_public”: true

}

The model_dump() and model_dump_json() methods provide flexible export options. You can exclude sensitive fields, include only specific fields, or customize how values are serialized. This is particularly useful when creating API responses where you need different representations of the same data for different contexts.

Conclusion

Pydantic transforms data validation from a tedious, error-prone task into an automatic, declarative process. Using Python’s type system, it provides runtime guarantees about your data structure while maintaining clean, readable code. Pydantic helps you catch errors early and build more reliable applications with less boilerplate code.

This article should give you a good foundation in Pydantic, from basic models to custom validators and nested structures. We’ve covered how to define data models with type hints, handle optional fields and defaults, create custom validation logic, and work with complex nested structures.

As you apply these concepts in your projects, you’ll learn additional features like serialization options, configuration settings, and advanced validation patterns. The patterns you’ve learned here will scale from simple scripts to complex applications. Keep experimenting with Pydantic’s features, and you’ll find it becomes an essential tool in your Python development workflow.

No comments yet.