Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging: More powerful, flexible implementation of get_filtered_tags() [FC-0036] #92

Merged
merged 16 commits into from
Oct 26, 2023

Conversation

bradenmacdonald
Copy link
Contributor

@bradenmacdonald bradenmacdonald commented Oct 6, 2023

This proposal for the Tagging API combines the implementations of get_tags(), get_filtered_tags(), and autocomplete_tags() into a single Taxonomy.get_filtered_tags() method that can implement every type of pagination and search that I'm aware we'll need.

It:

  • Provides a single API that works with static, dynamic, and free text taxonomies (dynamic means like language or user)
  • Can be used to list all the tags in the taxonomy hierarchically, get a flat paginated list of all the tags below a certain parent, search for tags that match a keyword and return their lineage too, and more.
  • Uses a fixed number of queries, generally one or two
  • Returns a QuerySet so it supports any type of pagination or iteration over the whole set
  • Can return all the tags in a taxonomy up to the max depth in tree order, using only one query
  • Also returns how many times each tag has been used, which I think is a super important piece of information to have when searching and managing Taxonomies.
  • Make the API return TagData dictionaries instead of Tag instances. It's better not to return Django models if possible since they don't distinguish between public and private API.

The idea is to keep multiple more specific functions in api.py but have them all use this get_filtered_tags() internally.

TODO:

Private ref: MNG-3940

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Oct 6, 2023
@openedx-webhooks
Copy link

Thanks for the pull request, @bradenmacdonald! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@bradenmacdonald
Copy link
Contributor Author

bradenmacdonald commented Oct 6, 2023

@pomegranited @ChrisChV This is my last openedx-learning PR from the cleanup ticket. I wanted to get your opinion on it before I spend any more time. Can you check out the diff and tell me if you like this approach? I updated all the tests in test_models.py and they're now passing so that file is probably the best place to see what the results look like.

@pomegranited
Copy link
Contributor

@bradenmacdonald Your approach looks great! I think testing it on the LightCast taxonomy should be enough? I believe @jmakowski1123 has a copy.

@bradenmacdonald
Copy link
Contributor Author

Thanks @pomegranited. Yep, we have a copy of the lightcast taxonomy and will soon have a script to import it: open-craft/taxonomy-sample-data#1 So I can soon test this with a large taxonomy.

@bradenmacdonald
Copy link
Contributor Author

@pomegranited @ChrisChV
Well I don't know if we'll have budget to finish this optimization up, but I did spend a few minutes to test the performance today and it seems good.

Before - using our current version of get_tags and get_filtered_tags:

Number of tags: 4268
Counting all tags took 5.8 ms
Listing all tags without usage counts took 102 ms
Searching for 'Cooperative' with usage counts took 0.275 ms
Cooperative Learning
# ^ Note that this search is faster but does not return the ancestors, depth, or usage count

With this change:

Number of tags: 4268
Counting all tags took 1.53 ms
Listing all tags without usage counts took 55.8 ms
Listing all tags with usage counts took 59.6 ms
Searching for 'Cooperative' with usage counts took 4.14 ms

Education and Training (None) (used: 0, children: 13)
  Teaching Subcategory (Education and Training) (used: 0, children: 28)
    Cooperative Learning (Teaching Subcategory) (used: 0, children: 0)

I tested with the LightCast taxonomy from open-craft/taxonomy-sample-data#1

Here is the script I used:

from openedx_tagging.core.tagging import api
from openedx_tagging.core.tagging.models import LanguageTaxonomy, ObjectTag, Tag, Taxonomy
import time

def _pretty_format_result(result):
    """
    Format a result to be more human readable.
    """
    return [
        f"{t['depth'] * '  '}{t['value']} ({t['parent_value']}) " +
        f"(used: {t['usage_count']}, children: {t['child_count']})"
        for t in result
    ]

def test_perf(taxonomy):
    start_time = time.time()
    count = taxonomy.tag_set.count()
    print(f"Number of tags: {count}")
    print(f"Counting all tags took {(time.time() - start_time)*1000} ms")

    start_time = time.time()
    list(taxonomy.get_filtered_tags(include_counts=False))
    print(f"Listing all tags without usage counts took {(time.time() - start_time)*1000} ms")

    start_time = time.time()
    list(taxonomy.get_filtered_tags(include_counts=True))
    print(f"Listing all tags with usage counts took {(time.time() - start_time)*1000} ms")

    start_time = time.time()
    result = taxonomy.get_filtered_tags(search_term="Cooperative")
    print(f"Searching for 'Cooperative' with usage counts took {(time.time() - start_time)*1000} ms")
    print("\n".join(_pretty_format_result(result)))


lc = Taxonomy.objects.get(name="LightCastSkillsTaxonomy")

test_perf(lc)

and to test the "before" version:

from openedx_tagging.core.tagging import api
from openedx_tagging.core.tagging.models import LanguageTaxonomy, ObjectTag, Tag, Taxonomy
import time

def _pretty_format_result(result):
    """
    Format a result to be more human readable.
    """
    return [
        f"{t.depth * '  '}{t.value}"
        for t in result
    ]

def test_perf(taxonomy):
    start_time = time.time()
    count = taxonomy.tag_set.count()
    print(f"Number of tags: {count}")
    print(f"Counting all tags took {(time.time() - start_time)*1000} ms")

    start_time = time.time()
    list(taxonomy.get_tags())
    print(f"Listing all tags without usage counts took {(time.time() - start_time)*1000} ms")

    start_time = time.time()
    result = taxonomy.get_filtered_tags(search_term="Cooperative", search_in_all=True)
    print(f"Searching for 'Cooperative' with usage counts took {(time.time() - start_time)*1000} ms")
    print("\n".join(t.value for t in result))


lc = Taxonomy.objects.get(name="LightCastSkillsTaxonomy")

test_perf(lc)

Number of tags: 4268
Counting all tags took 5.843639373779297 ms
Listing all tags without usage counts took 102.41103172302246 ms
Searching for 'Cooperative' with usage counts took 0.27489662170410156 ms
Cooperative Learning
# ^ Note that this search is faster but does not return the ancestors, depth, or usage count

@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch from 4184c05 to 005c5f3 Compare October 18, 2023 19:47
@bradenmacdonald bradenmacdonald changed the title Proposal: implementation of get_filtered_tags Tagging: More powerful, flexible implementation of get_filtered_tags() Oct 18, 2023
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch 2 times, most recently from 2d5fb28 to 8b81b1b Compare October 19, 2023 20:49
@bradenmacdonald
Copy link
Contributor Author

@pomegranited @ChrisChV @yusuf-musleh OK, since it looks like we may need this (based on @yusuf-musleh's recent PR), I found time to finish it up. Just wanted to give you a heads up that this is almost ready. (Just need to fix some minor lint issues and make the corresponding edx-platform changes.) Does someone want to review? New ticket is MNG-3940.

@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch 2 times, most recently from ee80cf0 to 237c83d Compare October 20, 2023 01:45
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch from 237c83d to e7bb5ba Compare October 20, 2023 01:56
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch 2 times, most recently from 088cfe8 to ee6d554 Compare October 20, 2023 16:05
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch from aff40fd to 941f473 Compare October 20, 2023 16:12
@bradenmacdonald bradenmacdonald changed the title Tagging: More powerful, flexible implementation of get_filtered_tags() Tagging: More powerful, flexible implementation of get_filtered_tags() [FC-0036] Oct 20, 2023
@bradenmacdonald bradenmacdonald added the FC Relates to an Axim Funded Contribution project label Oct 20, 2023
Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is excellent, @bradenmacdonald . I've left some questions and minor comments, but I'm thrilled with the whole approach.

Could you merge latest main? There's a few conflicts from #96, and I'd like to see this whole block go away.

openedx_tagging/core/tagging/data.py Outdated Show resolved Hide resolved
openedx_tagging/core/tagging/models/base.py Outdated Show resolved Hide resolved
openedx_tagging/core/tagging/models/base.py Show resolved Hide resolved
openedx_tagging/core/tagging/models/base.py Show resolved Hide resolved
openedx_tagging/core/tagging/models/base.py Show resolved Hide resolved
tests/openedx_tagging/core/tagging/test_api.py Outdated Show resolved Hide resolved
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch from cd40639 to b567a3f Compare October 25, 2023 19:21
@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch 2 times, most recently from 7cbaaf6 to 11d8461 Compare October 25, 2023 21:05
@bradenmacdonald
Copy link
Contributor Author

@pomegranited Thanks for the review! I have addressed all your comments and updated this with main.

@bradenmacdonald bradenmacdonald force-pushed the braden/get_filtered_tags branch from 11d8461 to 9e94ee4 Compare October 25, 2023 21:14
Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Excellent improvement, both in structure and performance.

Needs a version bump -- but OK to merge when that's done.

@bradenmacdonald bradenmacdonald merged commit a8337dd into main Oct 26, 2023
6 checks passed
@bradenmacdonald bradenmacdonald deleted the braden/get_filtered_tags branch October 26, 2023 17:27
@openedx-webhooks
Copy link

@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants