Testing against the database with SQLAlchemy, pytest, and a factory

6/30/24 · 5 minutes · python testing sqlalchemy

When I am testing my code, I want to know that it is going to work against the production database. For obvious reasons, I do not want to run this against the actual production database.

Instead my tests should run against a copy that we are safe to break without repercussion. Additionally, we should be able to seed this test database with data of our choosing. This is a fairly common pattern referred to as an integration test.

Despite it being a common pattern, I am yet to come across a solution I am 100% satisfied with.

the Problem

To identify the problem, we must set the scene and define some success criteria.

Setting the Scene

The code in question is an API (FastAPI) that acts on a database. The API uses an ORM (SQLAlchemy). Tests are managed by pytest.

Migrations on the database are handled in a separate repo using Flyway.

Whilst this is the tech stack I use, the solution I am showing can be extended to any framework.

Defining Success

To successfully test our code, I have defined some criteria:

Use the most recent migration as a base for our test database.
Data is reset between tests.
Be able to seed different tests with different seed data.
Easily see which tests are using which data.

This will mean that the tests are always running on an up-to-date database, and edge cases in our system will be easy to cover.

the Solution

Now that we have defined the problem, let’s look at the proposed solution.

Database

For the integration tests to be actually useful, they must run against the

Migration

As mentioned previously, migrations are managed for the database.

During the CICD for the database migration repo, a Docker image is created of the database each time there is a change. This image is then pushed to a repository where it can be accessed later by the tests.

Testcontainer

Testcontainers is used to run this image whenever the integration tests are run. This ensures that tests are always run against an up-to-date database.

Here is a code example:

# conftest.py
from testcontainers.postgres import PostgresContainer


@pytest.fixture(scope="session")
def db_container():
    postgres_container = PostgresContainer(
        "ghcr.io/db-migrations/db:master",
        user="postgres",
        password="password",
        dbname="postgres",
        driver="psycopg",
    )
    postgres_container.start()
    yield postgres_container
    postgres_container.stop()

Seeding

A clean database has been setup. We now need to seed it with data to test against.

The Factory

The core of this solution is a Factory class. This is inspired by the Factory Boy package, but stripped down to the necessary essentials.

# factory.py
from typing import TypeVar

from sqlalchemy.orm import Session


T = TypeVar("T")


class _Factory:
    def __init__(self) -> None:
        self.db: None | Session = None

    def add_db_connection(self, db: Session) -> None:
        self.db = db

    def seed_model(self, model: type[T], data: dict) -> T:
        if not self.db:
            raise ValueError("Session not initialised")
        record = model(**data)
        self.db.add(record)
        self.db.flush()
        return record


factory = _Factory()

The seed_model method is able to add records into the database (I will show an example later). This utilises the SQLAlchemy models we have already defined as part of the application code.

The factory will have its database connection set during test startup.

Testing

Now we have a database spun up and a way of seeding data, we can start testing.

Test Client

A FastAPI TestClient that is connected to this database container is created as a pytest fixture and used to mock the API.

The TestClient fixture will look something like this:

# conftest.py
@pytest.fixture()
def client(db_connection: Connection):
    # needed for rollback
    db_connection.begin()

    TestingSessionLocal = sessionmaker(
        autocommit=False,
        autoflush=False,
        binds={
            DbBase: db_connection,
        },
    )

    db = TestingSessionLocal()
    # This is when the factory has it's connection set
    factory.add_db_connection(db)

    def override_get_db():
        try:
            yield db
        finally:
            db.close()

    app.dependency_overrides[get_db] = override_get_db
    client = TestClient(app)
    yield client
    db_connection.rollback()

A rollback is used to make sure that data is reset between each test.

Fixtures

To make sure our tests are readable and can be seeded with different data, I decided to utilise pytest fixtures. These are built to be explicit, modular, and scalable. This means that any code defined in a fixture will only be run if that fixture is used in the test that is running.

Example - Testing Users

Here is an example implementation where we are testing if a user is present.

# test_user.py
@pytest.fixture
def user():
    user_obj = factory.seed(
        Annotation,
        {
            "name": "John Smith",
            "email": "jsmith@gmail.com",
        },
    )
    return user_obj


def test_get_user(client: TestClient, user: User):
    response = client.get(f"/user")
    assert response.status_code == 200
    assert response.json() == [{
        "id": user.id,
        "name": "John Smith",
        "email": "jsmith@gmail.com",
    }]


def test_get_user_with_no_users(client: TestClient):
    # notice that the 'user' fixture is not used in this test
    response = client.get(f"/user")
    assert response.json() == []

We can see that if the user fixture is not used, then the user is not seeded and nothing is returned.

Flow Diagram

Pardon the messy diagram, but start at Create DB Rollback and it will help illustrate the flow that is followed for each individual test.

Problems

Between rollbacks sequences do not reset. This means that id’s will continue incrementing if you run multiple tests at once.

You can exclude id checks in your tests. Or if you are lazy like me, you can define a class that asserts equal to any integer.

# mock.py
class SequenceMock:
    """Mock a database sequence.

    On the first assertion call, the value must be an integer.
    On subsequent calls, it must be the initial integer value that was used in the
    first assertion call.
    """

    def __init__(self):
        self.saved_value = None

    def __eq__(self, other) -> bool:
        if self.saved_value is None:
            if isinstance(other, int):
                self.saved_value = other
                return True
            return False
        else:
            return self.saved_value == other

Conclusion

Whilst this solution is not perfect, it does satisfy the criteria I required for a passable solution.

Use the most recent migration as a base for our test database. ✅
Data is reset between tests. ✅
Be able to seed different tests with different seed data. ✅
Easily see which tests are using which data. ✅

If I improve this solution or find a better one, I will update this post!