Typed Django forms using Pydantic

Django Forms is a great and easy way to render, clean, and validate HTML form inputs. It is easy to create custom fields and widgets for most needs. But one of the downsides we have been struggling with since moving to static type checking (using MyPy and django-stubs) is the lack of type support for the validated data coming from cleaned_data. Knowing what types the validated data was would help to prevent unnecessary bugs and allows for faster development when you don't have to second guess the underlying data.

We have created a wrapper around Django forms that together with Pydantic gives us more type information, as well as some runtime protection if the data somehow would not match the expected data.

from pydantic import BaseModel
from django import forms

class NoData(BaseModel): ...

class TypedForm[T_Valid: BaseModel, T_Invalid: BaseModel](forms.Form]):
    Valid: type[T_Valid]
    Invalid: type[T_Invalid]

    # Alias to make it easy to specify Invalid = NoData
    NoData: type[BaseModel] = NoData

    def result(self) -> T_Valid | T_Invalid:
        if not self.is_bound:
            return self.Invalid(**{field: None for field in self.fields})

        if self.is_valid():
            return self.Valid(**self.cleaned_data)

        # The non-valid fields needs to be supplied to the invalid class.
        missing_data = {key: None for key in self.errors}
        return self.Invalid(**self.cleaned_data, **missing_data)

Basically what we do is we leverage fields and validation from Django forms to determine if the form is even valid to begin with. Then depending on if the form is valid or invalid we pass the cleaned data to one of two Pydantic models, one representing the valid data and one representing the invalid data (which can be partially validated). Pydantic is used as a sanity check of our cleaned_data to enforce that it matches the type definition. We solely rely on Django forms for our validations.

It might not be apparent exactly how to use this or why it is even helpful. Lets put this into practice using a simple example.

forms.py

from pydantic import BaseModel
from typed_form import TypedForm


class ValidRegistrationData(BaseModel):
    name: str
    email: str
    age: int


class MyRegistrationForm(TypedForm):
    Valid = ValidRegistrationData
    Invalid = TypedForm.NoData

    name = forms.CharField()
    email = forms.EmailField()
    age = forms.IntegerField()

views.py

from .forms import MyRegistrationForm

def create_user(*, name: str, email: str, age: int) -> User:
    pass

def user_registration(request: HttpRequest) -> HttpResponse:
    form = MyRegistrationForm(request.POST or None)

    if request.POST:
        result = form.result()
        if isinstance(result, MyRegistrationForm.Valid):
            # At this point we know that 'result' only contains valid data and
            # Pydantic has verified that all the data is present.
            # We can now access the validated fields and create our user.
            user = create_user(
                name=result.name,
                email=result.email,
                age=result.age
            )

            return render(
                request,
                "registration_success.html.dj",
                {"user": user}
            )

    return render(
        request,
        "registration_form.html.dj",
        {"form": form}
    )

So why do all of this? Let's say that in a few months we change the form to use DateField instead of a IntegerField for age. If we change our MyRegistrationForm form as we normally would, our tooling would not complain about the difference in data types since cleaned_data (which we would normally use) is typed as dict[str, Any]. But with the above solution MyPy will inform us, before runtime, that there is a mismatch between our Pydantic definition and the expected type of age on create_user.

This is a bit of a forced example. But hopefully you can see the benefits. Especially when you are dealing with more complex views and templates.

Why the Invalid type?

This is a small tangent away from what the blog post is really about.

With the above solution you would never have to touch the Invalid definition except for setting it to TypedForm.NoData. But one of the nice features of Django forms is that even when the form doesn't pass the validation we might still get some useful data from it. Any field that passes the validation will have its cleaned data accessible through cleaned_data even while the form is not valid. This, together with us moving to use HTMX for many of our more complex forms and views, helps us when we want to rerender our forms, based on the content of the form, even when it is still partially filled/valid.

In these cases we can use the Invalid type to still get some minor help with typing. We declare a new model/definition with the fields we are interested in in case of a failed validation. But the static typing won't know what fields or what type they are so we have to add all of these fields as the specific type and None in case the field was not valid. So when doing the following we would still get some type safety where MyPy informs us that result.birthday.year might be None and we will have to handle it.

# forms.py
class InvalidRegistrationData(BaseModel):
    birthday: datetime.date | None

class MyRegistrationForm(TypedForm):
    Valid = ValidRegistrationData
    Invalid = InvalidRegistrationData

    birthday = forms.DateField()
    # The rest of the form implementation

# views.py
def user_registration(request: HttpRequest) -> HttpResponse:
    form = MyRegistrationForm(request.POST or None)
    result = form.result()

    if isinstance(result, form.Invalid):
        print(result.birthday.year) # Mypy tells us results.birthday can be datetime.date | None.

Tangent over! But we might talk about this and how we use it with HTMX more in-depth in a later blog post.

Downsides

One of the potential downsides with this approach is the need to specify the expected/clean types as Pydantic models which results in more lines of code, and depending on the size of the form the models can grow quite large.

If you need full coverage of the fields between Valid and Invalid you will have some extra fields to declare since Invalid would have to be a copy of Valid but with | None for all fields. In TypeScript you could use Partial<T> to make all fields of Valid optional, but there doesn't exist a counterpart in Python yet, so we have to handle this extra type ourself. We have concluded that this is a downside we can live with.

Something similar to TypeScripts Partial<T> is being discussed and when/if it arrives we could easily remove the Invalid type altogether.

Improvements

This implementation requires us to keep the Valid and Invalid types in sync. There is a chance that, when adding or removing a field, the Valid and Invalid definitions can become "out of sync" where a field exists in one of the definitions and not in the other or their types don't match. To prevent this we have added validation that checks that all the fields of Invalid exists in Valid and that they are of the same types (excluding None). But I excluded that implementation from the blog post since there has been enough tangents already.

Personalkollen developer blog

Mattias Lepp

Why the Invalid type?

Downsides

Improvements