Simplifying Data Validation in Python

Simplifying Data Validation in Python

While exploring AI agents, I came across two interesting libraries – Pydantic and Logfire. In this article, you will learn about Pydantic with code examples and understand what Pydantic brings to the table in the world of Data validation for Python developers.

Pydantic is a powerful Python library that uses type annotations to validate data structures. It’s become an essential tool for many Python developers, especially those working on web applications, APIs, and data-intensive projects.

Why You Need Pydantic

  1. Robust data validation. Pydantic ensures your data matches expected types and formats.
  2. Improved code readability. Type hints make your code’s intent clearer.
  3. Automatic error handling. Get detailed error messages for invalid data.
  4. High performance. Pydantic is optimized for speed.
  5. Easy integration. Works well with popular frameworks like FastAPI and Django.

Core Features and Examples

1. Basic Model Definition

Pydantic’s core functionality is data validation. It uses Python-type hints to automatically validate the structure and types of data. When you define a Pydantic model, each field is annotated with its expected type. Pydantic then ensures that any data assigned to these fields conforms to the specified types.

Let’s start with a simple example:

from pydantic import BaseModel
from typing import List, Optional
import logfire


class User(BaseModel):
    username: str
    age: int
    email: str
    is_active: bool = True
    tags: List[str] = []
    profile_picture: Optional[str] = None


logfire.configure()
user = User(username="johndoe", age=30, email="[email protected]", tags=["python", "developer"])

logfire.info(f"{user}")

In this case, Pydantic will ensure that name and email are strings, and age is an integer. If you try to create a User with invalid data types, Pydantic will raise a ValidationError.

You can see that I used logfire, I will discuss logfire in another article. For starters, logfire is an observability platform from Pydantic.

Pydantic can coerce data into the expected types. For example, if you set age=”30″. There won’t be any ValidationError as Pydantic will coerce the age to an Integer. For strict types, refer to Pydantic documentation.

This is how the code looks in an IDE with Pydantic installed using pip install pydantic.

Pydantic on an IDE

Pydantic on an IDE

Another simple example:

from typing import List, Optional
from pydantic import BaseModel

class Book(BaseModel):
    title: str
    author: str
    publication_year: int
    isbn: str
    genres: List[str]
    description: Optional[str] = None


# Creating a valid book instance

book = Book(
    title="The Hitchhiker's Guide to the Galaxy",
    author="Douglas Adams",
    publication_year=1979,
    isbn="0-330-25864-8",
    genres=["Science Fiction", "Comedy"],
)


print(book)

This example defines a Book model with various fields. Pydantic will ensure that all required fields are provided and that they match the specified types.

2. Nested Models

Pydantic supports a wide range of data types, including nested models, lists, dictionaries, and more. You can create complex data structures that mirror your application’s needs. It allows for nested models, which is great for complex data structures:

from pydantic import BaseModel
from typing import List


class Address(BaseModel):
	street: str
	city: str
	country: str
	postal_code: str


class Author(BaseModel):
	name: str
	age: int
	address: Address


class Book(BaseModel):
	title: str
	author: Author
	genres: List[str]


# Creating a book with nested author and address

book = Book(
	title="1984",
	author=Author(
		name="George Orwell",
		age=46,
		address=Address(
			street="50 Lawford Road",
			city="London",
			country="United Kingdom",
			postal_code="N1 5BJ"
		)
	),
	genres=["Dystopian", "Political Fiction"]
)

print(book)

This example shows how you can nest models within each other, allowing for complex, hierarchical data structures.

3. Custom Validators

Pydantic allows you to define custom validation logic using decorators. This is useful for implementing business rules or complex validation scenarios:

import re
from pydantic import BaseModel, field_validator


class User(BaseModel):
	username: str
	email: str
	password: str

@field_validator('username')
def username_alphanumeric(cls, v):
	assert v.isalnum(), 'Username must be alphanumeric'
	return v

@field_validator('email')
def email_valid(cls, v):
	regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
	assert re.match(regex, v), 'Invalid email format'
	return v

@field_validator('password')
def password_strength(cls, v):
	assert len(v) >= 8, 'Password must be at least 8 characters'
	assert any(c.isupper() for c in v), 'Password must contain an uppercase letter'
	assert any(c.islower() for c in v), 'Password must contain a lowercase letter'
	assert any(c.isdigit() for c in v), 'Password must contain a digit'
	return v


# Try creating users

try:
	user1 = User(username="john_doe", email="[email protected]", password="StrongPass1")
	print("Valid user:", user1)

except ValueError as e:
	print("Validation error:", e)

try:
	user2 = User(username="alice!", email="invalid-email", password="weak")
    
except ValueError as e:
	print("Validation error:", e)

This example demonstrates custom validators for username, email, and password fields, ensuring that the data meets specific criteria.

4. Config and Field Constraints

Pydantic models can be configured with various options to control their behavior. You can also add constraints to individual fields. It offers various ways to configure models and add constraints to fields:

from typing import List
from pydantic import BaseModel, Field


class Product(BaseModel):
	id: int = Field(..., gt=0)
	name: str = Field(..., min_length=3, max_length=50)
	price: float = Field(..., ge=0)
	tags: List[str] = Field(default_factory=list, max_length=5)


class Config:
	allow_mutation = False
	extra = "forbid"


# Creating a valid product

product = Product(id=1, name="Laptop", price=999.99, tags=["electronics", "computer"])
print(product)

# Attempting to create an invalid product

try:
	invalid_product = Product(
		id=0, name="TV", price=-100, tags=["a", "b", "c", "d", "e", "f"]
	)

except ValueError as e:
	print("Validation error:", e)

# Attempting to modify the product (which is not allowed due to allow_mutation=False)

try:
	product.price = 899.99
except AttributeError as e:
	print("Modification error:", e)

This example shows how to use Field for adding constraints to individual fields and how to configure the model behavior using the Config class.

5. Working With JSON

Pydantic seamlessly integrates with JSON data. You can easily parse JSON into Pydantic models and convert models back to JSON:

from typing import List
from pydantic import BaseModel

class Comment(BaseModel):
    id: int
    text: str

class Post(BaseModel):
    id: int
    title: str
    content: str
    comments: List[Comment]


# JSON data
json_data = """
{
"id": 1,
"title": "Hello, Pydantic!",
"content": "This is a post about Pydantic.",
"comments": [
{"id": 1, "text": "Great post!"},
{"id": 2, "text": "Thanks for sharing."}
]
}
"""
# Parse JSON data into a Pydantic model
post = Post.model_validate_json(json_data)
print(post)

# Convert Pydantic model back to JSON
post_json = post.model_dump_json()
print(post_json)

This example demonstrates how Pydantic can easily parse JSON data into Python objects and vice versa.

Additional Features

Settings Management

Pydantic can be used to manage application settings, providing type-safe configuration handling:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    api_key: str
    debug_mode: bool = False

settings = Settings()
print(settings)

This allows you to load settings from environment variables or configuration files while ensuring type safety and providing default values.

Schema Generation

Pydantic can automatically generate JSON Schema from your models, which is useful for API documentation:

import json
print(json.dumps(User.model_json_schema()))

Output:

{
    "properties": {
        "username": {
            "title": "Username",
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "email": {
            "title": "Email",
            "type": "string"
        },
        "is_active": {
            "default": true,
            "title": "Is Active",
            "type": "boolean"
        },
        "tags": {
            "default": [],
            "items": {
                "type": "string"
            },
            "title": "Tags",
            "type": "array"
        },
        "profile_picture": {
            "anyOf": [
                {
                    "type": "string"
                },
                {
                    "type": "null"
                }
            ],
            "default": null,
            "title": "Profile Picture"
        }
    },
    "required": [
        "username",
        "age",
        "email"
    ],
    "title": "User",
    "type": "object"
}

This feature is particularly valuable when working with OpenAPI (Swagger) specifications.

You can find the Jupyter notebook with the Pydantic code here.

Conclusion

Pydantic is a versatile and powerful tool for data validation in Python. Its use of type annotations makes it intuitive for Python developers, while its extensive features allow for complex validation scenarios. By utilizing Pydantic in your projects, you can ensure data integrity, improve code readability, and catch errors early in the development process. In building APIs, working with configuration files, or processing data from various sources, Pydantic can significantly simplify your data-handling tasks and make your code more robust and maintainable.

About sujan

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.