Django related fields & the caveats of implicit SQL INNER JOINs

Ventsislav Tashev
Oct 8, 2021
Categories:Django

Django's ORM is one of the most well-designed features of the framework. It provides a wrapper around the database that allows developers to handle database operations with ease and predictability.

At least most of the time.

N.B. The code structure in this article follows HackSoft's Django style guide. Terminologies like "selector", which will be used throughout the text, are described in detail there. ✌️


Model structure

For the purpose of this article, we'll use the following models:

class User(Model):
    ...


class School(Model):
    name = CharField(max_length=256)
    users = ManyToManyField(
        User,
        through='SchoolMember',
        related_name='schools'
    )
    
    
class SchoolMember(Model):
    school = ForeignKey(
        School,
        on_delete=models.CASCADE,
        related_name='members'
    )
    user = ForeignKey(
        User,
        on_delete=models.CASCADE,
        related_name='school_memberships'
    )

Or simply said:

The requirement

With this model structure, we have a simple task - get all the schools for a given user.

One specific requirement is to compute the count of all memberships for each school in the result.

Imagine having the following setup:

The desired result when fetching all the schools for User 1 would be (as JSON):

[
    {
        "id": 1,
        "name": "School 1",
        "members_count": 2
    },
    {
        "id": 2,
        "name": "School 2",
        "members_count": 2
    }
]

The go-to approach

As we've been using Django for quite some time - this task looks straightforward. We will get all schools for the user and we will annotate the members_count via the Count aggregation function in Django.

We've implemented this simple selector:

def user_get_schools(*, user: User) -> QuerySet[School]:
    return user.schools.annotate(members_count=Count('members', distinct=True))

We thought this will compute the desired result described in the requirement, but surprisingly - it was wrong. When we call the selector with User 1, we'll get this result:

[
    {
        "id": 1,
        "name": "School 1",
        "members_count": 1  # <-- Incorrect ❌
    },
    {
        "id": 2,
        "name": "School 2",
        "members_count": 1  # <-- Incorrect ❌ 
    }
]

The returned schools are the correct ones, but the members_count is equal to 1. If we call the selector with another user - this value stays the same. We expect both schools to have members_count equal to 2, since we have two school members in each of them.

Inspecting the query

The easiest way to find what is yielding the wrong result is to inspect the raw SQL query. We've used django-debug-toolbar for the job.

The SQL query for the request looks like this:

SELECT 
  "schools_school"."id", 
  "schools_school"."name", 
  COUNT(
    DISTINCT "schools_schoolmember"."id"
  ) AS "members_count" 
FROM 
  "schools_school" 
  INNER JOIN "schools_schoolmember" ON (
    "schools_school"."id" = "schools_schoolmember"."school_id"
  ) 
WHERE 
  "schools_schoolmember"."user_id" = 1

As we're accessing user.schools via the related_name of the field for the many-to-many relationship, Django performs an INNER JOIN behind the scenes in order to compute the results.

Then as we try to aggregate the members_count in the SELECT - the results are stripped down only to the ones from the INNER JOIN (the blue selection):

This is wrong because the INNER JOIN selects records that have matching values in both tables. And since the SchoolMembers are filtered behind the scenes (see the WHERE clause in the SQL queryπŸ‘†), the COUNT aggregation "sees" only these results.

That's why we will constantly see the members_count value to actually represent the current user's members count in a school and not the count of all members in a school, which is the requirement.


This is how Django works by default - it performs an implicit INNER JOIN in order to return related entries. In most cases, this behavior is correct and you will get correct data.

In some happy place, if we just wanted to return the id and the name of the schools, without data from external tables - this blog post would've never existed.

But like everything in software - there are some caveats. πŸ™‚

It's not a bug, it's a feature.

The correct implementation

Once we know what is causing the problem, the solution comes easy.

We need to rework the selector a bit so that it does not perform an INNER JOIN behind the scenes:

def user_get_schools(*, user: User) -> QuerySet[School]:
    user_schools_ids = user.schools.values_list('id', flat=True)
    
    return School.objects\
        .filter(id__in=user_schools_ids)\
        .annotate(members_count=Count('members', distinct=True))

Here we use the related name again (user.schools), but this time we save the list of these IDs in a separate variable (user_schools_ids). Then when constructing the result query, we use the School.objects manager directly and filter the queryset with the id__in lookup so that we still get the correct user schools.

Now the query looks like this:

SELECT 
  "schools_school"."id", 
  "schools_school"."name", 
  COUNT(
    DISTINCT "schools_schoolmember"."id"
  ) AS "members_count" 
FROM 
  "schools_school" 
  LEFT OUTER JOIN "schools_schoolmember" ON (
    "schools_school"."id" = "schools_schoolmember"."school_id"
  ) 
WHERE 
  (
    "schools_school"."id" IN (
      SELECT 
        U0."id" 
      FROM 
        "schools_school" U0 
        INNER JOIN "schools_schoolmember" U1 ON (U0."id" = U1."school_id") 
      WHERE 
        U1."user_id" = 1
    )

And respectfully, the JSON result is now correct:

[
    {
        "id": 1,
        "name": "School 1",
        "members_count": 2  # <-- Correct βœ…
    },
    {
        "id": 2,
        "name": "School 2",
        "members_count": 2  # <-- Correct βœ…
    }
]

As you can see, Django now performs a LEFT OUTER JOIN when computing the query, instead of an INNER JOIN. If you look closely, it still performs an INNER JOIN, but this time it is for the inner SELECT query that we have in the WHERE clause.

Visualized as a Venn diagram, it looks like this:

You can clearly see that we now have the full set of SchoolMembers that we need while still filtering the Schools, thanks to the LEFT OUTER JOIN. πŸŽ‰


I hope that this article will help you learn more about Django ORM's behind-the-scenes mechanics when using related sets. For more interesting Django blog posts - check out the rest of the Django-related articles in our blog. πŸ™Œ