-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file based filter #99
base: main
Are you sure you want to change the base?
file based filter #99
Conversation
58e5d2e
to
8f8976e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xinzhengzhang, thank you. This is a good start.
I added some commits with the goal of quickly making small improvements and adding docs--seemed easier to just do than to discuss. Hopefully that's okay. Explanations of why I thought they were improvements are in the commit messages. If there's anything you don't like, just say or change them. I appreciate your giving me write access.
I can see at least a few more cases I'll need to ask you to work through, all around headers. They're tricky!
- Not all (C++) headers end with .h. (Nor, indeed, any extension at all, sadly.)
Suggestion: Flip the if statement the other way, checking if the file has a source extension, otherwise treating it as a header. - Header-only targets will still present a problem, I think, since they don't have any compile actions in the targets that list them in attrs. Any ideas on how we should solve this?
- If the header implementation remains similar:
- attrs seems to operate on a stringified list of targets, rather than paths. Maybe we can use this to be more specific somehow, but if not, let's document a little better what we're doing. It's probably okay to accidentally update a few extra compile commands as as long as it remains fast enough for you, though, of course, being specific is ideal.
- I'm guessing that using a
let
variable definition for the header query is more efficient, saving a double evaluation of the target_statement, though I'm not certain.
- Update in place: Let's put the logic that updates compile_commands.json with the new entries for the --file in this tool, rather than the plugin! That way it can be eventually shared across editor plugins and you don't have to deal with overwriting.
- I think we probably want to omit the slow header search when not needed with --file?
That's what I can see on a quick read through. I'm sure additional important cases to think through remain. I'll need you to take the lead on making this robust and testing how it meets your needs.
Cheers,
Chris
P.S. I'm not sure which holidays you observe, but I'm wishing you a great set of winter celebrations!
Although they cannot be found in the target action but they should be defined in the BUILD file in
Since
Do you think we need a flag to mark override or merge? Or only --file uses merge? |
I don't really understand the meaning. If we don't specify --file=a.h, won't we go here? Or do we need an additional flag to filter out users even if they come in? Thank you for caring and I wish you Merry Christmas~ |
Hey @xinzhengzhang! Sorry for being slow--got buried in tasks over here. Resurfacing. I'm so sorry; I promise I'm usually faster, like with my initial replies. Thanks for getting back on these and for making improvements. As before, I also added some quick commits on small items that seemed better to fix than to spend time discussing. Hopefully that's okay. Explanations in the commit messages. If there's anything you don't like, just say or change them. I appreciate your giving me write access. For the things that are still open, I'll try to expand on each in more detail: The (relatively common) case of header-only targets: cc_binary(
name = "main",
srcs = ["main.cpp"],
deps = ["lib"],
)
cc_library(
name = "lib",
hdrs = ["templated.hpp"], # Or srcs
) Won't our aquery currently first filter to just lib, since it's the one with templated.hpp in hdrs...and then find no actions, since no actions are actually generated by lib, since it's header-only? If I'm right, we'll need a creative fix here. I think I suggested this one before, but the best I've thought of so far is to first try what you did here and then fall back to the inputs query type that we use for source files, running header search until we find the first one that actually uses the header. I'd love it if you'd think on it and try to come up with something even better! [There's also the case of system headers, which won't be listed in an attribute, but I think those would also be caught by the above. There's also the case of a library that may have compile actions but none that include the header being requested via Omitting the slow header search The exception is when running Attr and stringified targets Override vs merge I'm quite excited about this! But I'll need you to take the lead on making this robust. Cheers, |
Wishing you also a happy lunar new year :) |
For headers how about we divide the header extractor into these steps
Sorry, I don't quite understand what this given path is used for |
Something like that! I'd suggest a couple tweaks: For 1: I don't think we can assume that the .c will include the header. Instead, we'll have to run the header search process to check, falling back to 2 if not. (I'm not quite sure I understand the module part; as much as possible we'll want to turn those off for our analysis so that we don't depend on a build.) For 4: I think it's okay to require an example of use rather than creating a dummy target, like we currently do for header only libraries, since clangd will automatically fall back to generic flags anyway. But if you think otherwise, please do say! As above, attrs seems to operate on a stringified list of targets, rather than paths, which looked like it was getting in the way and making the query less specific. So, rather than just getting the filename, I was proposing that we might want to run another aquery, using outputs() to figure out the name of the target for a given file path. Does that make sense? |
Hey, @xinzhengzhang! I wanted to check back in on these to see how you're doing. Thanks for all your efforts; I know this is hard to get right! |
Sorry to be too busy with work recently, I will implement a basic version this weekend |
af8d4e4
to
f4441c3
Compare
Sorry again for the long time of inactivity due to work. I just rebased the main branch and squash the commits. |
Totally get the busyness. I feel it, too :) We'll definitely need to invoke query multiple times. A fixed-depth breaking loop like that works, but you might end up finding it cleaner/easier to factor out the query call-(as a nested function if you need to capture?) and then have control flow that explicitly falls through each case if empty? (Some smaller things, trying to give fast feedback: finded -> found. And you'll want to be careful about the no-action error case, since that's now tolerated for the file_flags case.) |
f4441c3
to
251bbb2
Compare
I have simplify the loop using nested function and fixed no-action case in _get_commands 251bbb2 Since I don't have much experience with Python, and it is a weak type language, the implementation of passing lambda to thread handler seems very strange. If there is a good way to modify it, please also help me to modify it |
Hi, Chris @cpsauer In addition, I want to change my vscode extension to bzlmod form, so I want to patch the dependency compile-commands. I would like to ask if you have the possibility of tagging and being referenced? Of course, it is also possible for me to use extension to use commit integration |
Promote multiple --file assert to log_error Why? It's a usage error we're communicating to the user, not a code invariant we're checking in debug mode. We could recover, but we don't, because we expect only automated usage and want to give clear feedback. More pythonic if Improve my --file flag messaging consistency Call library to escape regex Helps avoid edge cases! Also handles the most common special character, '.' Add an error to catch unsupported --file <file> form Tweak -- logic now that --file arguments could be after --file docs Make aquery spacing consistent Flip the if statement in constructor target_statment Use let to combine attr query Merge compile_commands.json if --file Small simplification: endswith already can take a tuple of possibilities (as elsewhere) Small simplification of header path for file flag. (This code section likely to be replaced later, though.) Fix my typo on --file= discoverable docs Some tweaks to merge Adds note about behavior, some edge case handling around other flags that might start with --file, permissions issues, etc.
8df2238
to
0befc09
Compare
Since the entire aquery logic is nested, there are many conflicts. I just rebase main and push -f |
Hey @xinzhengzhang. Sorry for causing--and thanks for rebasing. Apologies for being slow--got sick over here and I'm kinda dragging and backlogged. I'll try to get through this soon. |
Ran out of time--and am now traveling for the week. Will try to get to this weekend. I definitely saw some bugs on a quick scan through. Could I ask you to polish in the meantime? |
Thanks for getting back to me. I'll do my best to identify and fix the bugs. Have a great trip! |
…didates and warning
…a list of SimpleNamespace objects instead of yielding a dictionary & Removal type cast
…ter the first command has been found
Hi, @xinzhengzhang! Thanks for your quick work, but most of these are going to require some more experimenting and thought before they work robustly enough that we can get them in. I'd love to have this feature land for you and others--and happy to clarify/help--but I'll still need you to really think through the cases and how to handle them robustly. Replying to each. There are some important things for us to discuss, starting with the high level. Could I get you to first read down to the first horizontal bar (
I'm not sure quite what you're asking. Could I ask you to rephrase? To save round trips, here's what's in my head: What that discussion item is about: As we search to find a source file that includes my_header.h, we get, for free, commands that apply to other files, like the other headers included by that source file. We need to decide which of those (file, command) pairs we should add to compile_commands.json. On one hand, we could just add all the pairs we got for free. But that might kick off a lot of background indexing work, and my understanding is that you want this feature in order to minimize the indexing work required to browse files. Adding a few source files in the background is potentially quite useful for go-to-definition and the discovery of other symbols. But other headers seem fairly pointless and high cost: they don't introduce any symbols not already seen from processing a source file (which pulls in its includes) and they're likely to keep getting new (but equivalent) commands that trigger a lot of reindexing. Hence that tradeoff: Only add/update the command for my_header.h and no other headers--but still add/update commands for source files found along the way. What do you think of that tradeoff and why? Does that seem optimal or would you like something else? That was my guess, but you know your use case better than I do (like with allpaths!). You can also go with this for now, experiment around as you use it, and then adjust it if needed. Either way, we should eventually document what we decide and why, so please add to the comments until they'd have been sufficient for you to understand on a first read.
Are you doing anything special to get good reload behavior from clangd in the editor? I'm seeing that clangd needs a modify/save event to trigger it getting a new command. Is that what you're seeing?
Right. (If I'm correctly understanding what you're writing.) Do you use Windows and understand what's going on with param files for Windows? (Totally ok if no on either.) Assuming I'm right in remembering that LoSealL does some development on Windows machines--we might want to ask him if he'd look into these Windows things. Either way, would you be willing to give LoSealL commit access to this branch of your fork so he can help out as needed?
Also windows-specific stuff. No .d files involved here. Are you developing on windows at all, or should we be looking to ask LoSealL on these? [To explain more: The concern is that MSVC on windows is producing absolute paths for all headers, including in the workspace, which then won't match in statements elsewhere like
I saw your email (now edited) about timeout not working. I was similarly having trouble getting it to trip--hence the ask that you investigate. How did that resolve? Were you able to get the timeout to activate? It's not obvious to me what you're going for with the timeout code you added--nor why it's bad to be able to traverse many actions if it's done fast. Could you explain? [I had been thinking that, if we could get the timeout to work, we might want to allocate fewer seconds if there were, e.g., 3 sequential calls to aquery for different top-level targets specified in refresh_compile_commands. Perhaps 1s each instead of 3s each, to maintain a total latency number. I wasn't envisioning any fancy calculating, but you may have thought of something good I haven't!]
This is going to require thinking about what you might pass in from the outside. What about absolute paths to the canonical location of bazel-out or external (i.e. at the other end of the symlink)? Further, once we've recognized system headers (i.e. absolute paths in none of those locations?) aren't those the only ones we want to run the (expensive)
How did this resolve? We definitely want this to be fast and fast enough for you. To answer the questions you'd asked: Moving to commits More generally, could I ask you to not delete TODOs until we're confident they've been handled robustly? And put back the ones you did? Makes tracking much easier. Some significant problems remain [see below], so maybe give these a quick skim, but read the "looking back at the commits" sub-section at the bottom before editing.
Come on--we'll need to do quoting in all places where we splice a target into an aquery command, as it said! Please be proactive on these :) Wish I'd had it right from the start--but I figure you'd need to know for mutating the query and don't want to cause an unnecessary merge conflict.
Not obvious this makes things better? Appreciate the effort, but how about the aquery with inputs(), to get the targets that produce the file, suggested for exploration in the TODO? If I've misunderstood and this works well, or that doesn't work, please tell me.
Again, let's return to these at the end. We're still going to get a lot of erroneous ones of the first kind, I think, for the reasons in the (deleted) TODO. Thanks. Can you put back the documenting comments from the yield and not use SimpleNamespace, since I think we'll have to strip it for #118. I'm also realizing this part of the comment is obsolete ", with commas after each entry,"--could you delete? (more generally, if you happen upon mistakes like that, please just fix). I see you moved the loop inside--but not the target_flag_pairs template filled by bazel. Don't we want that inside, too? I think the structure here may simplify significantly if we end up with one aquery command per branch (but maybe we won't be able to get there.
If it were me, I'd be pretty tempted to reset the branch back to bd7bde1 and move forward from there. There are some good things in the commits. But since each needs work and a bunch of (TODO and other) lines need to be restored anyway, I'd think restoring to bd7bde1 and then moving forward, copying in useful parts of the previous commits, might be the way to go. Maybe I should be just doing that, but I don't want to step on your toes too hard. Moving forward, I'd like to ask that you make sure you've got higher confidence before deleting those TODOs and pushing code for review. It's totally ok to not resolve all the TODOs, especially all at once. But it's taking too much time to review and discuss rough versions of the code with issues I suspect you could easily have caught without me. Better to clarify first, discussing if needed, and tell me where you aren't sure or need me to do something. Then, focus individually on the ones where you have confidence, doing some self-review before pushing changes, making sure you've actually resolved the things in the comments and from our discussion. Since this is a big change, another tactic for changes where you're uncertain would be to file PRs against your own branch. That way, we can easily discuss each in its own context. Please don't get me wrong; I massively appreciate your energy and enthusiasm to contribute. I'm just trying to find more efficient ways for us to work together to make this happen. Thanks, |
Hi, @cpsauer! Very very thanks for your super patiently mentoring. There are some places I really didn't get what you wanted to express, such as the part in #discuss, but I fully understand after your detailed delineation
Yes, it totally align with my understanding and needs.
In my opinion, the header file should not write anything other than the frame header file, for example in the iOS should just import < Foundation/Foundation.h > and < UIKit/UIKit.h > All other nested dependencies should be in the form of previous declarations. In this view, in fact, we just need to find one source file that directive contains this header, other more indexes are just waste and interference.
That's why I lean towards this
In the vscode extension, I observed change of BUILD and WORKSPACE to refresh the compile_commands
About file_path part, I will give commit access to @LoSealL to improve the windows part together. I will reply to other comments later |
…rgets after the first command has been found" This reverts commit 0453be4.
This reverts commit 6a6777a.
… return a list of SimpleNamespace objects instead of yielding a dictionary & Removal type cast" This reverts commit 40b9131.
…ment_candidates and warning" This reverts commit 9ce1716.
…fied file label" This reverts commit f8b046e.
This reverts commit ef33509.
…be run based on timeout" This reverts commit 89f583a.
I have write minimum code replicate this bug
Now that this bug is solved, this complex estimation method is no longer needed |
Hey! Yes. Thanks for being so coachable and energetic. You're right that some of my todo notes were fairly inscrutable. Sorry about that. They were initially meant to be notes to myself and then, well, I figured I'd just clean the up a little bit, hoping they'd either make sense or you ask. Wish I'd done a better job. Other items: I'm not understanding what you're saying about the headers. Just language barrier, I think. Could I ask you to take another shot at explaining? Wish I could read Chinese as well as you can English :) Reload: Ah, sorry, I mean getting clangd to reload the new changes from an updated compile commands.json. Does that make more sense? Thanks for giving LoSealL access! (I think just ignore that stuff for now. But if you're curious but the problem will be as above. If you pass in a relative path it won't match the absolute paths returned by MSVC.) |
Re timeout: Whoa, that is very surprising to me. If you figure out why, I would be very curious. Sorry about leaving that bug for you. (I'll consult with a friend who's a python expert to try to figure out what's going on. My guess is that the threadpool tries to finish all of its actions before exiting scope, but then neglects to check for timeouts?) |
In chines we usually call file which exposed interface are "头文件". Literally translated as header file is also the header I mentioned. For some high-level languages such as swift or python or cpp module usually we do not use the word header to describe their interface. In case I'm not making my point, I'm going to go into more detail. Under the c system language, using like #include.h or some variant like #import is actually to expand its content into source.
I'm using clangd wrapped via sourcekit-lsp, and it works well when compile_commands changes, I'm not too deep into his specific implementation, you can start from https://github.com/apple/sourcekit-lsp/blob/90fd3addb28a7b3cb2ef40acca6670cd57cbdd53/Sources/SourceKitLSP/SourceKitServer.swift#L730 to refer to his line of sight if necessary
From here I saw that you will transfer absolute path extracted from process to relative path.
|
Sorry--still totally don't understand the point about the headers. Like I understand all the words but not what you think we should do here. Reload: Sounds like sourcekit has better behavior here. Heads up that clangd doesn't reload until the file is modified or saved, sadly. Not a problem for you using sourcekit, but something we'll need to keep in mind for this project. Re msvc absolute header paths: Totally agree that _file_is_in_main_workspace_and_not_external itself works fine. I was just concerned about the behavior documented there causing issues elsewhere. Is that the confusion? If so, totally my fault for writing the TODO ambiguously and poorly--feel free to move it or delete it. |
For now (in my extension) all --file are passed in as relative paths and to be honest, this is indeed the most bazelify input excepted. For absolute path contains WORKSPACE root /Users/xx/project/srcs/a/b/c I think it is easy to handle and it is current implementation.
The reason I modified the comment is because I found that the reason why I can find the header file through inputs and it is not hidden by middleman is that I use objc_library dependency swift_library, and the implementation of swift is to provide the header in the https://bazel.build/rules/lib/providers/CcInfo.html#compilation_context so that I can query, but this is not a common practice
I have tested with two header file one is upper module in srcs/app and one is underlying module in srcs/base srcs/app/x.hallpaths srcs/base/x.hallpaths
Why not just add a switcher for expensive search for allpaths and deps to user? If we can't find their dependency from attr, it is unlikely not used in current practice. When it appears, it also jumps to some useless header files that do not need to be parsed. |
|
See When I try to use aquery to get outputs which passed into next aquery process, there was hundred of actions was queried
Take SwiftCompile as an example, it dependency header to generate modulemap, but has nothing to do with this header in the compile parameter. Join all outputs from queried outputs not seems a good idea. But if I add a
|
After some digging from python source code. I found the cause of this issue Start from entry concurrent.futures.ThreadPoolExecutor.map https://github.com/python/cpython/blob/263abd333d18b8825cf6d68a5051818826dbffce/Lib/concurrent/futures/_base.py#L621
We can see that shutdown will be called when exiting the scope And in shutdown all thread has been joined That is why no timeout will be thrown and wait until all operation queues finished |
I think we'll need to pass in system headers as absolute, since there's not a relative alternative. I was thinking some editors might also try to resolve symlinks and therefore pass in absolute paths for, say, the workspace or external or bazel-out. I feel like I remember someone telling me about that...for emacs maybe? So I think we should handle it if we can do so easily... But I'd also be ok with us just issuing a fatal error if someone passes in a path that's inside
Ah, interesting. Yeah, I'd just checked cc_library. Bummer that it doesn't work in general.
Wow! Yeah, 40s is definitely too slow. What's our target--10s max overall? Could you see if adding a depth cap to rdeps (3rd argument) make things faster--or --noinclude_aspects? (Or allrdeps. See also the TODO. If that's fast enough, please also check order.) Thanks for testing on your very large codebase!
I think header-only libraries (in c++) will unfortunately be pretty common, so we can't just attr everything. Same with system libraries, but that's a different case. If the above doesn't work fast enough--and you have any clever ideas--I'm all ears.
I think it sounds like you agree, but I'm afraid I don't understand your logic about few symbols. Could you confirm? And please reexplain if you want me to understand why you agree. Sorry.
Ah sorry. the outputs function, not inputs, to figure out how a file is generated, if it's generated. But maybe it's not worth it in this first version. Could be good to make the regex a little more specific, though, by including more of the path?
Very interesting! Thank you. The key piece I hadn't realized was that they waited for completion on scope exit. I think that catches me back up! Let me know what more you need to get this done. |
You saved me! It has nothing to do with rdeps or the like, just because of aspects!
I wrote a piece of pseudocode about the handling of path is ok?
I have a question about implementation here. Because bazel aquery --output=jsonproto is output in the form of a path tree, there may be dozens of lines of code to parse the corresponding path tree. Is there any ready-made library for this piece to refer to? If not, I can do it - -
Let me state my position first, I think the content in the header file should only have the interface for a class or function which implementation in the source file.
|
:) Delighted to hear it. yay! (Nasty bug they have there--not sure if you read through the linked issue.)
Seems reasonable--and great to just auto-handle--but could you check/refine it a bit more as you code? A couple things that jump out:
Thanks for thinking about handling this case!
Might a different output format be easier to parse?
I understand now your stylistic preference for Obj-C, and I feel like that goes nicely with Include What You Use. |
I added a todolist to the description of PR to track the todo follow-up. And I opened a sub pr to solve 3 todos. For the input of file, I simplified the judgment of file conversion and manually tested several possibilities (listed in the description of pr). Can also help to see if there are other cases. |
Thanks for trying out a new review structure with me! A great experiment given the trickiness of this change. I'm starting to take a peek. (Went on a trip and scrambling to catch back up. Again, sorry for delays, but I'm doing my best.) |
#93
Usage