HackSoft logo
  • Approach
  • Case Studies
  • Team
  • Company
  • Services
      Custom Software Development ConsultingAI Integrations
  • Solutions
  • Open Source
  • Blog

Need help with your Django project?

Check our django services

Improve your Django tests with fakes and factories: Advanced usage

Dec 13, 2021
Categories:DjangoPython

In part 1 of the article, we showed the basics behind using fakes and factories, and how they can help us better tests.

In this blog post, we'd like to share with you some advanced & helpful tips about using fakes and factories, that we've learned from experience.

They might help you improve slow tests and lower the chance of errors due to the wrong setup.

Factory.build() vs Factory.create()

Factory.build() will return you a new object that's not yet saved in the database.

This might be helpful in situations, where you need the object, but don't need it to be saved in the database, thus, improving the speed of the test.

Possible use cases where you can apply this:

  • A method that receives an object and performs some validation over its fields. If this is not related to any database queries, use Factory.build() in your tests
  • A service that performs some small validation at the beginning of its definition. This service receives some model instances as arguments. When you test the validation, you can build() the passed objects if you don't need them in the database
  • A selector that is grouping some passed data. If this selector does not perform any database queries, build() the passed data instead of creating it

LazyAttribute

Now, let's take a look at some patterns that we follow when we define our factories.

We use LazyAttribute a lot. I literally can't think of a factory that we've written that doesn't use it.

Despite this fact, we've just recently found out about a really powerful feature of LazyAttribute. As written in the docs:

The LazyAttribute handles such cases: it should receive a function taking the object being built and returning the value for the field

This feature becomes very handy when you have a dependency between the fields of your models.

Let's look at the following example:

class SchoolCourse(models.Model):
    start_date = models.DateField()
    end_date = models.DateField()

    class Meta:
        constraints = [
            models.CheckConstraint(
                name="school_course_start_before_end",
                check=Q(start_date__lt=F("end_date"))
            )
        ]

As you can imagine, we'd want to make sure that the SchoolCourseFactory always generates instances with proper start and end dates by default.

This is how we used to accomplish this before we've started using the LazyAttribute properly:

class SchoolCourseFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = SchoolCourse

    start_date = factory.LazyAttribute(lambda _: faker.past_date())
    end_date = factory.LazyAttribute(lambda _: faker.future_date())

This definition looks perfectly fine at first sight. The issue here is that if you want to generate a future school course, you'll always need to handle the start_date and end_date fields manually.

In [5]: for _ in range(100000):
   ...:     SchoolCourseFactory(start_date=faker.future_date())
   ...: 
---------------------------------------------------------------------------
CheckViolation                            Traceback (most recent call last)
~/.virtualenvs/styleguide/lib/python3.9/site-packages/django/db/backends/utils.py in _execute(self, sql, params, *ignored_wrapper_args)
     83             else:
---> 84                 return self.cursor.execute(sql, params)
     85 

CheckViolation: new row for relation "test_examples_schoolcourse" violates check constraint "school_course_start_before_end"
DETAIL:  Failing row contains (21, Each catch, each-catch, 2021-12-27, 2021-12-16, 6).

We've actually had a lot of problems with tests that fail randomly on CI just because of definitions like this one.

Here is how you can define the same Factory and make sure that the end_date will be after the start_date:

class SchoolCourseFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = SchoolCourse

    start_date = factory.LazyAttribute(lambda _: faker.past_date())
    end_date = factory.LazyAttribute(lambda _self: _self.start_date + timedelta(days=365))
In [14]: for course in SchoolCourseFactory.build_batch(5):
    ...:     print(course.start_date, course.end_date)
    ...: 
2021-11-26 2022-11-26
2021-11-04 2022-11-04
2021-11-19 2022-11-19
2021-11-09 2022-11-09
2021-11-21 2022-11-21

As you can see, the _self attribute of the lamba function is key here.

NOTE: We try to limit problems with the default generation of the factory objects. You can always pass start_date and end_date to the Factory if you want to change this.

SelfAttribute

The SelfAttribute is another powerful tool that comes from factory_boy.

Let's look at the following models in order to illustrate it:

class Student(models.Model):
    email = models.EmailField(max_length=255)
    school = models.ForeignKey(School, related_name='students', on_delete=models.CASCADE)

    class Meta:
        unique_together = ('email', 'school', )

class Roster(models.Model):
    student = models.ForeignKey(Student, related_name='rosters', on_delete=models.CASCADE)
    school_course = models.ForeignKey(SchoolCourse, related_name='rosters', on_delete=models.CASCADE)

    start_date = models.DateField()
    end_date = models.DateField()
NOTE: The Roster model represents that a Student is taking part in a School Course

Here is how we used to define the RosterFactory before we've learnt about SelfAttribute:

class RosterFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Roster

    student = factory.SubFactory(StudentFactory)
    school_course = factory.SubFactory(SchoolCourseFactory)
    start_date = factory.LazyAttribute(lambda _: faker.past_date())
    end_date = factory.LazyAttribute(lambda self: self.start_date + timedelta(days=365))

Again, everything looks perfectly fine with this definition, right?

There are a couple of problems with this implementation that might produce randomly failing tests though.

Problem 1 - Non-overlapping periods

Let's generate an instance of the above factory and check its period:

In [1]: roster = RosterFactory.build()

In [2]: roster.start_date, roster.end_date
Out [2]: (datetime.date(2021, 10, 20), datetime.date(2022, 10, 20))

In [3]: roster.school_course.start_date, roster.school_course.end_date
Out [3]: (datetime.date(2021, 11, 24), datetime.date(2022, 11, 24))

As you can see, our Roster object has a period that is outside the related SchoolCourse period. This might not be an issue for some of our tests, but it might lead to unexpected behavior in others.

Here is how we can easily solve this by using SelfAttribute:

class RosterFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Roster

    student = factory.SubFactory(StudentFactory)
    school_course = factory.SubFactory(SchoolCourseFactory)
    start_date = factory.SelfAttribute('school_course.start_date')
    end_date = factory.SelfAttribute('school_course.end_date')
In [1]: roster = RosterFactory.build()

In [2]: roster.start_date, roster.end_date
Out [2]: (datetime.date(2021, 10, 31), datetime.date(2022, 10, 31))

In [3]: roster.school_course.start_date, roster.school_course.end_date
Out [3]: (datetime.date(2021, 10, 31), datetime.date(2022, 10, 31))

This implementation says: "I want my roster period to be the same as the course period" which should be a valid statement for most of the use cases.

Again, this only solves potential problems with the default generation of the factory objects.

Problem 2 - Unexpected Relations

Another problem that you might have with the above implementation comes from relations of sub-factories. This happens relatively often and can lead to unexpected behavior as well.

Let's take our Roster object and show what we mean:

In [1]: roster.student.school
Out [1]: <School: Johnson Inc School>

In [2]: roster.school_course.school
Out [2]: <School: Bridges and Sons School>

This looks strange and could be misleading for the developers.

Instead, we'd like to have something like: "I want my roster's course to be in the school of the generated student by default".

We can achieve this by using the combination of SubFactory and SelfAttribute:

class RosterFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Roster

    student = factory.SubFactory(StudentFactory)
    school_course = factory.SubFactory(
        SchoolCourseFactory,
        school=factory.SelfAttribute('..student.school')
    )
    start_date = factory.SelfAttribute('school_course.start_date')
    end_date = factory.SelfAttribute('school_course.end_date')
In [1]: roster = RosterFactory.build()

In [2]: roster.student.school
Out [2]: <School: Rodriguez-Griffith School>

In [3]: roster.school_course.school
Out [3]: <School: Rodriguez-Griffith School>

The double-dot notation

The double-dot notation refers to the parent factory (in our case RosterFactory) where current sub factory (in our case SchoolCourseFactory) is being called. This is well described in the docs here.

If the double-dot notation is not up to your taste, you can achieve the same behavior by using the LazyAttribute, making the code a bit more explicit:

class RosterFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Roster

    student = factory.SubFactory(StudentFactory)
    school_course = factory.SubFactory(
        SchoolCourseFactory,
        school=factory.LazyAttribute(lambda course: course.factory_parent.student.school)
    )
    start_date = factory.LazyAttribute(lambda _self: _self.school_course.start_date)
    end_date = factory.LazyAttribute(lambda _self: _self.school_course.end_date)
NOTE: Take a look at the factory_parent here. It's actually a reference to the RosterFactory in our case.

Introducing helper factories

We've started using two types of "helper factories" which you might find useful as well.

The main idea is to reduce the verbosity & make the test setup cleaner.

Extending your factories

For example, if we observe that a lot of tests are dealing with Rosters that need to be in some chronological order, one after the other, we might want to do something like this:

def get_future_roster_start_date(roster_obj):
    if not roster_obj.start_after:
        return faker.future_date()

    return roster_obj.start_after + timedelta(days=faker.pyint(2, 100))

class FutureRosterFactory(RosterFactory):
    class Params:
        start_after = None

    start_date = factory.LazyAttribute(get_future_roster_start_date)

And here is how you can use it:

In [1]: roster = RosterFactory.build()

In [2]: future_roster1 = FutureRosterFactory.build(start_after=roster.start_date)

In [3]: future_roster2 = FutureRosterFactory.build(start_after=future_roster1.start_date)

In [4]: roster.start_date, future_roster1.start_date, future_roster2.start_date
Out [4]: (datetime.date(2021, 11, 25),
 datetime.date(2022, 3, 1),
 datetime.date(2022, 5, 13))
NOTE: In the Params class you can list all arguments that are factory class specific. They won't be passed to the generated instance.
In [13]: future_roster1.start_after
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-9299608b2d13> in <module>
----> 1 future_roster1.start_after

AttributeError: 'Roster' object has no attribute 'start_after'

Parent with children factory

If we observe that a lot of tests always require a specific parent object, to come hand-in-hand with created children objects, we might want to make our lives a bit easier.

Let's take our SchoolCourse model.  You'd most likely have services and/or selectors that work with school courses that have rosters in them.

Here's a helper factory dealing with this:

class SchoolCourseWithRostersFactory(SchoolCourseFactory):
    @factory.post_generation
    def rosters(obj, create, extracted, **kwargs):
        if create:
            rosters = extracted or RosterFactory.create_batch(
                kwargs.pop('count', 5),
                **kwargs,
                student__school=obj.school  # NOTE!
            )

            obj.rosters.set(rosters)

            return rosters

And here is how you can use it:

In [1]: course1 = SchoolCourseWithRostersFactory()

In [2]: course1.rosters.count()
Out[2]: 5

In [3]: roster = RosterFactory()

In [4]: course2 = SchoolCourseWithRostersFactory(rosters=[roster])

In [5]: course2.rosters.all()
Out[5]: <QuerySet [<Roster: Roster object (6)>]>

In [6]: course3 = SchoolCourseWithRostersFactory(rosters__count=10)

In [7]: course3.rosters.count()
Out[7]: 10

There are several important points here:

  • @factory.post_generation is a post-generation hook from factory_boy. It's invoked after the model object is created
  • The obj argument is the model object that's just been generated
  • The create argument is a boolean which is True if the create() strategy is being used. False otherwise (.build() strategy)
  • extracted is the value of the defined attribute if one is passed when the Factory is being called. SchoolCourseWithRostersFactory(students=some_generated_students) →  extracted == some_generated_students
  • kwargs are the passed optional arguments via the double underscores of the defined attribute. SchoolCourseWithRostersFactory(students__count=10)kwargs == {'count': 10}
  • As you may have noticed, this example is going to work only with the create() strategy. This is a limitation that comes from the fact that the students set comes as a reversed relation from the ORM.

The moral of the story is - whenever you see a pattern emerging, create additional helpers, to make your tests clearer.

Conclusion

The maintainers of factory_boy have done a great job! All of the above examples along with many others can be found in their awesome documentation.

The main goal of this article is to give you some practical tips on how to improve the definition of your factories.

We've learned them the hard way. We hope that this blog post might save you from debugging tests that fail from time to time on CI.

The examples above can be found in our Django Styleguide Example.

You can find more useful Django-related tips in our blog and our Django Styleguide.

Need help with your Django project?

Check our django services
HackSoft logo
Your development partner beyond code.