Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport libraries home performance fix #34950

Conversation

jesperhodge
Copy link
Member

@jesperhodge jesperhodge commented Jun 7, 2024

Description

Original PR: #34523

This is a fix for a performance issue on the libraries home page. It has some drawbacks so please read through the description.

Certain course instructor accounts that had hundreds of libraries and courses encountered very long loading times for the libraries home page, that is when you go to studio and click the libraries tab. Loading times for these users were 45+ seconds and some users could never open the page at all because timeout errors occurred. This never happens for users with few courses/libraries or for global staff users.

The reason for this was imperformant Python code. Python lookup time for course instructors for libraries home had a complexity of O(n*m), where n is the number of all libraries that exist on the openedx instance and m is the number of all assigned roles which is at least the number of assigned courses + assigned libraries. This code did not apply to global staff users.

We changed the lookup time to O(n).

The drawback here is that some other places are now less performant. Certain requests that include editing libraries blocks and loading the library edit modal, but also presumably other places where a user's roles need to be checked, initialize the RoleCache for a user once, which calculates all the users assigned roles. This one-time calculation is now less performant and has a runtime of O(m) where m is the number of assigned roles.

There are further optimization opportunities but the feedback we got and our evaluation of this is that the libraries_home performance problem was a huge problem for some users and is now completely solved. The loading time assuming good network speed appears to be around 3-5 seconds on average for libraries home for every user. The timeout errors have been entirely removed. On the other hand we got feedback that the library edit modal now takes "a little longer to load" but that the overall improvement is worth it.brary, which results in a complexity of O(l*r), l=libraries, r=roles. This PR improves the complexity to O(l).

The BulkRoleCache and RoleCache classes were using a python set to store all roles for a particular user. A user can have a large number of roles, and lookup speed of iterating through a set is slow (O(n)). Most roles don't have the same course id, however. So if you have the course id of the role you're looking for, we can use a dict of course ids that contain related roles. The number of roles per course id is negligible, so we arrive at a lookup speed of O(1) when looking up a user's roles that belong to a specific course id.

The BulkRoleCache now caches and stores user roles in a data structure like this:
{
user_id_1: {
course_id_1: {role1, role2, role3}, # Set of roles associated with course_id_1
course_id_2: {role4, role5, role6}, # Set of roles associated with course_id_2
[ROLE_CACHE_UNGROUPED_ROLES_KEY]: {role7, role8} # Set of roles not tied to any specific course or library. For example, Global Staff roles.
},
user_id_2: { ... } # Similar structure for another user
}

While this changes the data structure used to store roles under the hood and adds the new property roles_by_course_id to the RoleCache,
when initializing the RoleCache will store roles additionally in the previous data structure - as a flat set - in the _roles property accessible via all_roles_set. This establishes
backwards compatibility.

We are now storing roles twice in the RoleCache (in each of the two data structures), which means this takes twice as much memory, but only in the scope of a request.

Testing Instructions

Already tested everything very extensively before merging into master. I'll leave it up to you how testing for this should look.

You can't test this too well - performance will just not be significant locally. You would need a user who has at least several courses and libraries.
I guess if you really really wanted to you could go to the code place that calls accessible_libraries_iter and add a loop that calls that function 10k times just for testing, and then compare before and after this fix to see whether it has improved.
Also this all only applies for course instructors, so there will be no change if you test while being a global staff user.

Mainly we want to test now that nothing is broken.

  • log in as global admin (your normal account I guess)
  • go to studio home page
  • click on libraries button. If it takes more than 5-10 seconds something is very wrong.
  • libraries overview should load without issues
  • log out and log in as a course instructor that has access to a few courses and a few libraries (not global staff, the user must not have permission to view/edit all courses and libraries)
  • go to studio home page
  • again click on libraries button. Should still not take longer than 5-10 seconds. But you can't really say anything about performance here because that's only observable on a prod and maybe stage environment
  • open the library edit modal. Everything should work normally and not take very long. If you notice a slight decrease in performance that is expected, just make sure that it's not very extreme.

This is an attempt to fix a performance problem on the libraries home page. When you go to studio home and click on the libraries tab, on prod it will be quick for admins but extremely slow for course instructors (> 12 seconds) and leads to timeouts. It grows with the number of libraries that are assigned to the instructor.

The Python code for the request to load libraries for a particular user goes through all existing libraries and then checks all of the user's roles for each library, which results in a complexity of O(l*r), l=libraries, r=roles. This PR improves the complexity to O(l).

The BulkRoleCache and RoleCache classes were using a python set to store all roles for a particular user. A user can have a large number of roles, and lookup speed of iterating through a set is slow (O(n)). Most roles don't have the same course id, however. So if you have the course id of the role you're looking for, we can use a dict of course ids that contain related roles. The number of roles per course id is negligible, so we arrive at a lookup speed of O(1) when looking up a user's roles that belong to a specific course id.

The BulkRoleCache now caches and stores user roles in a data structure like this:
    {
        user_id_1: {
            course_id_1: {role1, role2, role3},  # Set of roles associated with course_id_1
            course_id_2: {role4, role5, role6},  # Set of roles associated with course_id_2
            [ROLE_CACHE_UNGROUPED_ROLES_KEY]: {role7, role8}  # Set of roles not tied to any specific course or library. For example, Global Staff roles.
        },
        user_id_2: { ... }  # Similar structure for another user
    }

While this changes the data structure used to store roles under the hood and adds the new property `roles_by_course_id` to the RoleCache,
when initializing the RoleCache will store roles additionally in the previous data structure - as a flat set - in the `_roles` property accessible via `all_roles_set`. This establishes
backwards compatibility.

We are now storing roles twice in the RoleCache (in each of the two data structures), which means this takes twice as much memory, but only in the scope of a request.

(cherry picked from commit e95d7e7)
@jesperhodge jesperhodge changed the base branch from master to open-release/redwood.master June 7, 2024 17:22
@jesperhodge
Copy link
Member Author

Need to test one more thing, please hold off on merging till I finish

@jesperhodge jesperhodge marked this pull request as draft June 7, 2024 17:31
@jesperhodge jesperhodge marked this pull request as ready for review June 7, 2024 19:39
@cmltaWt0 cmltaWt0 merged commit b3df1dd into open-release/redwood.master Jun 10, 2024
1 check passed
@cmltaWt0 cmltaWt0 deleted the 2u/backport-libraries-home-performance-fix branch June 10, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants