-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change: (org-ql--tags-at) Support tag hierarchies #146
base: master
Are you sure you want to change the base?
Conversation
e441d88
to
4194456
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for your work on this. Please forgive my nitpicking, but I do try to be consistent. :)
Thanks! I appreciate your attention to detail and clarity. I have a few more suggestions that I'll make as review comments, and then a few more things to mention separately... |
Here are a few more thoughts:
If the impact really is that small, then it should be no problem to merge it. But I would like to know the impact for users who actually use group tags as well. Thanks very much for your work on this. It seems like you're an experienced Lisper, so you're saving me a lot of work by implementing this feature. :) |
I was also considering the idea of adding a cache here myself but I have some doubts whether it will improve things. The reasons that come to mind are the following:
What is your opinion on this? Regardless, I ran a few benchmarks using the source code block you provided (I modified it to search inside a specific file for a tag that should generally result in 3 calls deep recursion) using the benchmarking utilities from you repo (although I had to change it a bit because I think a few stuff got renamed in the meantime, e.g., second -> cl-second). Here are the results of running the benchmark inside a file with >500 entries and another with >5000 entries, respectively.
What do you think? You are welcome. I am afraid that I am below average when it comes to lisp but I am glad I can help. |
IIUC, we're already using
Yeah, it starts to feel a little complicated. Looking at your benchmark results, I think we should not add a cache, or at least not at this time. A 3-18% decrease when tag groups are searched for is not that big a penalty, so it probably wouldn't be worth the complexity to add another cache.
Thanks, that's actually a better result than I expected. It does do a lot of GC, maybe because of the string concatting, but if you ran each benchmark for Would you mind doing one more test, please? What results do you get when you use the same configuration but do a search for a tag that is not part of a tag group?
You could have fooled me! :) Thanks. |
To be honest, I was also not satisfied with the test cases I considered so I took the liberty of running more benchmarks on this. Particularly, I modified the two files I used before so that every entry has the tag leading to the deepest recursion and these are the results I got for this worst case scenario, corresponding to >500 and >5000 entries, respectively:
Also, here are the results I get when searching for a tag not belonging to a tag group.
*I guess that the total duration of the benchmarks is greater compared to the first set of benchmarks because I am working on a laptop running on battery this time, otherwise it makes no sense :) |
Thanks. Okay, I have one more benchmark request: would you run the same sets, but without having group tags configured? Specifically:
That's, perhaps, the most important set of benchmarks we should consider, because it should show how these changes will affect users who don't actually use group tags. If there's a 2-16% penalty for users who do have group tags configured, when they use a Thanks very much for your diligence and patience. I don't think that many Org users use group tags, but I expect that those who do will appreciate this feature being added. |
You are welcome. Here are the results of running the benchmarks on the same set of files after removing all group definitions without changing the value of
Disappointing as it may be, it appears that a ~3% overhead is unavoidable without a cache even when no tag groups are defined. |
Hm, ok, let's be a little more explicit about methodology:
In the first benchmark I ran, it showed a 1% decrease in performance. It may be that 1-3% is within the margin of normal variation, and maybe I'm being too picky, but I'd like to avoid a 3% performance decrease when there are no group tags being used. Maybe we should test What do you think? Thanks. |
I took your advice and added a test for
Additionally, I modified the definition of
What do you think? |
org-ql.el
Outdated
(setq inherited-tags (org-ql--expand-tag-hierarchy inherited-tags)))) | ||
(let ((tag-groups (and org-group-tags | ||
org-current-tag-alist | ||
(org-tag-alist-to-groups org-current-tag-alist)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this means calling org-tag-alist-to-groups
every time this function is called on a heading whose tags are not cached. In a file with many top-level headings (e.g. I have an inbox file with years of captured top-level headings), that would mean calling this function many times, and every time it would be with the same argument and would return the same value.
If I'm correct about that, it means that this function call probably belongs elsewhere and its result should be reused, at least per-search.
I feel like we need to reconsider the design of this part of the code altogether. It might be that this group expansion should be done at another step of the search process. But please see the later comments I'm about to make...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct about that but this is only going to happen if org-current-tag-alist
is non-nil. Nonetheless, I agree with you that normally that result should be cached (and I originally thought that this was the purpose of org-tag-groups-alist
). So maybe we can add a per-file cache for that?
org-ql.el
Outdated
(let ((groups (or groups (org-tag-alist-to-groups org-current-tag-alist))) | ||
matching-group-tags) | ||
;; Remove groups with tags already in the results. | ||
(mapc (lambda (tag) (setq groups (assoc-delete-all tag groups))) tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the values these data structures have at runtime, because I haven't stepped through the code. But it seems to me like this step is something that ought to be avoided. If it seems necessary here, it might be because the design needs to be adjusted. It seems to me like this is work that shouldn't need to be done every time this function is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the above snippet, groups
essentially holds the current set of group definitions (in the first call, it should be the result of (org-tag-alist-to-groups org-current-tag-alist)
. In order to avoid iterating over the members of groups whose group-tag is already in the tags passed to this function, I simply delete them from this variable. I think that this is an improvement over the previous implementation because if this function ends up recursing, the number of iterations of the pcase-dolist
loop will be definitely less.
Thanks, that's a significant improvement. I think we need to go further, though. It seems to me that some of the work being done in this code is being done at the wrong place in the process, which causes it to be done repeatedly, every time that Before we go any further, I need your help understanding something: How is searching for group tags supposed to work exactly? What I mean is, if I have defined a group tag hierarchy like:
And I do a search for the tag In other words, an ECM (minimal working example) would be very helpful here. I think we need a minimal example Org file, with a set of group tags defined, and then we can give example searches with the expected results, and then we can (or I can--I'm sure you've already done this) reason about the steps necessary along the way. This ECM will also be helpful for writing test cases before merging this, as none of the existing test data sets have group tags (and since their tests already work as-is, I'd prefer not to modify the test data, because that would likely require updating the test results, which is a lot of work). Apologies for dragging this process on for so long. I appreciate your patience. |
I am certainly open to suggestions.
For this hierarchy, what you should get after calling
As a matter of fact, I was also looking into how to add unit tests for this feature, so should I go ahead and write a simple org-file to use as a test case?
You are welcome. On the contrary, I am sorry for not being more thorough with this. |
Thanks. After we have an ECM with sample data and example searches, I'll be able to understand it better and think about how to integrate into the existing code.
Okay, so would it be fair to call the function something like
I may be speaking prematurely, but this hints to me that we may indeed be approaching the problem from the wrong direction. The Thinking about this example, if a user searched for However, if group tags are configured per-buffer, expanding it in the pre-processing stage wouldn't work. In that case, it would seem necessary to have an additional, per-buffer query-rewriting stage. I don't feel enthusiastic about adding another complex stage like that, but it would seem like a more "correct" solution. And such per-buffer query-rewriting could be useful in other cases as well, I think.
Yes, please. Take a look at the
You're more thorough than most any other contributors, so I have no complaints. :) Thanks. |
Actually, get-ancestor-tags seems more appropriate. Assuming that a single tag is given, it would return its parent, its parent's parent and so forth.
Well, I am not really certain about this. I agree that there are actually two different ways to approach this but expanding groups when searching does not feel more efficient in terms of computation nor code maintenance. Regarding the first point, notice that each time the user specifies a tag to match, that would have to get expanded into a set (that possibly contains regular expressions) and compare each element of that set with every tag in the headings. In that case, I feel that an extra cache is imperative whereas, with the current implementation, this thing is taken care of once, when the existing cache of tags is built. Regarding the second point, I haven't looked at how the query mechanism is implemented but, in general, you would have to take care of tag expansion at every point where you do a different kind of tag search (e.g., just local tags, just inherited tags, etc). My experience so far tells me that, unless one wishes to be able to also run queries that distinguish between tags explicitly stated in the heading and tags implicitly matched because of the hierarchy, then this approach has more merits.
Great! I'll probably take a look at it some time tomorrow then. |
On the other hand, if one decides to toggle the state of |
Yes, that name seems even better.
Hm, I don't completely follow you. I think that, once we have concrete examples to reason about, it will be easier. In the meantime, though:
Note that there are specialized tags predicates, i.e. It might be worth considering to have a special predicate to match group tags. Or even if we don't end up making one, thinking about it might help us with designing the solution we end up using.
Thanks. |
f52bef0
to
ec0f8c7
Compare
Hi. Sorry for taking so long to respond. As we discussed last time, I have created a minimal org file ( Apart from that, I have pushed on a separate branch from this one (because the changes are more extensive and I think we should discuss them first), a different implementation of this feature. Particularly, I have addressed a few concerns raised earlier, namely a) calling Specifically, in order to address issues (a) and (d), I have introduced a new cache, similar to the one of
In short, now the regression is limited to around 2% instead of ~19%. Regarding issues (b) and (c), in order to avoid adding yet another cache for local and inherited group tags keys on buffer position, this time I augmented the list of values that the per buffer hash tables of As mentioned above, I placed all of these changes on a different branch mainly because I chose to modify the structure of |
ae83de1
to
890c247
Compare
Hi again, Thanks for your work on this. It sounds like you've improved the performance a lot over the initial implementation. Would you link the other branch you're talking about? I'd prefer to avoid adding another global cache if possible, but it might be worth it. Regarding the What do you think? Thanks. |
You are welcome.
This is the branch with the discussed implementation. I just merged you master on it so it should be easy to compare the changes.
Sure, sound very reasonable to me. I will take a look at the new mechanism and define a distinct predicate then. Btw, the predicates that I modified in the discussed implementation are actually (tags) and (tags-regexp), since I feel that regular expression searches on group tags also are pretty convenient. Should I define a distinct tag for regular expression queries as well? |
Thanks.
Yes, I think we should mirror them, so: The code in the new branch looks fairly straightforward, but it is a lot of new code, so I'd like to take a little time to study it. Please let me know when you push the new predicates and I'll look at it again and test it. Thanks. |
BTW, please rebase on master again as I pushed a couple of important changes, including updates to the lint/test suite. |
Hi. I have defined distinct predicates for |
Thanks. IIUC we've basically settled on going in the direction of the branch |
Done. |
Thanks for your patience. I'll try to look at this again soon. |
Since it's been a long time since the last stable release, and some important changes on master have been waiting for one, I'm moving this to 0.7. |
No problem. Let me know if you need me to help with anything. |
4f5fbc4
to
d0acc8c
Compare
477e9b9
to
f300d04
Compare
059b10c
to
77b4c2b
Compare
I think this is a simple solution for #145. It seems to work for my use cases, I haven't noticed an overhead so far and I think it is robust to cases where two tag groups reference one another thus leading to cycles and infinite recursion.