-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str_util: support case‐insensitive string patterns #3991
Conversation
c552f69
to
8f00914
Compare
I think it should be a pattern kind so that the same syntax can be used in
Perhaps,
Something like @arxanas how do you deal with case-insensitive matches in git-branchless? Just using
Good point about email addresses, but I would make it case-sensitive by default mainly for consistency with command arguments and file patterns.
Sounds good to me. Thanks! |
Thank you for the review!
I was wavering on whether it should be part of the pattern or not, so I can switch back. Although I’m not sure how much use there is for searching branch names case‐insensitively, especially when Git remotes don’t support it. I didn’t bother adding it to the corresponding revset functions for that reason; I didn’t want to make people expect they could treat branch names case‐insensitively in all contexts. It would be nice for it to work in contexts where it does make sense without needing an explicit extension to the interface, though, I agree.
Sure, that works. It feels kind of awkward for some types of pattern to be rejected in some contexts, but I guess it’ll be inevitable for those code paths to error out if/when regex patterns are supported anyway.
Taking inspiration from regex syntax (a dangerous phrase!), what about…
Too weird? I like that it clearly separates out the type from the common options, and I think that we probably want something shorter than
Makes sense; it makes this less of a backwards‐incompatible change and can always be adjusted later. How about keeping |
It's unclear to me where we set a boundary (e.g.
That makes sense to me. Inline comment will help understand why it's case insensitive. |
case_sensitive
option to some revsets
I’ve switched this over to being part of string patterns, and settled on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mine()
matches case‐insensitively
unconditionally, since email addresses are conventionally
case‐insensitive and it doesn’t take a pattern anyway.
Is that already the case or do you mean that we should make it work that way? (It sounds like you're saying the former, but it doesn't look like that's what the code does.)
Ah, you mean that it behaves that way after the commit. We don't really have a convention for what present tense means in commit messages. I use present tense to describe how it behaves before the patch (not saying you have to follow that, just explaining why I was confused). |
I agree that it’s confusing; I’ll reword it. I try to stick to the Git quasi‐standard of imperative present tense, but it can get so awkward when going into detail, so I frequently slip… (Thank you for the repository invite, by the way!) |
These support more types of pattern than just substring matching.
The comments say “Can find a unique match by either name or email”, but these weren’t checking for an email match.
e9ac38f
to
d8438ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
FWIW, for mailmap, maybe we could also leverage regex:
pattern. It's more expressive and supports case-insensitive matches. I'm thinking of adding it as it'll be useful in diff_contains(pattern)
revset #2933.
Partially resolve a 1.5‐year‐old TODO comment. Add opt‐in syntax for case‐insensitive matching, suffixing the pattern kind with `-i`. Not every context supports case‐insensitive patterns (e.g. Git branch fetch settings). It may make sense to make this the default in at least some contexts (e.g. the commit signature and description revsets), but it would require some thought to avoid more confusing context‐sensitivity. Make `mine()` match case‐insensitively unconditionally, since email addresses are conventionally case‐insensitive and it doesn’t take a pattern anyway. This currently only handles ASCII case folding, due to the complexities of case‐insensitive Unicode comparison and the `glob` crate’s lack of support for it. This is unlikely to matter for email addresses, which very rarely contain non‐ASCII characters, but is unfortunate for names and descriptions. However, the current matching behaviour is already seriously deficient for non‐ASCII text due to the lack of any normalization, so this hopefully shouldn’t be a blocker to adding the interface. An expository comment has been left in the code for anyone who wants to try and address this (perhaps a future version of myself).
d8438ce
to
22c29f1
Compare
Hmm, for the |
I thought we could leverage regex to combine multiple matches, but that might not be needed? |
Oh, yes, that would work. I’m not sure if matching against |
I don't have strong feeling. This PR looks good, but I'm not sure if we like Good point about database query. Regex version would be harder to optimize in principle. |
I know I'm late to the party, but as long as we've explicitly asked for bikeshedding...
If I could choose anything unilaterally, I would probably have three kinds (using glob as the example):
Many search interfaces support smart-casing: Case-insensitive if the search term is all lowercase, and case-sensitive if the search term contains any uppercase characters. I think that's a good default for unqualified Case sensitivityFor string patterns, I think it's more user friendly to do a case-insensitive search by default, and let the user opt-in to case-sensitivity if they want it. That way the user can usually be lazy and not worry about it, but has the option to be strict when necessary. For file patterns, case-insensitive file systems are the default on Windows and macOS, and I think users are accustomed to not needing to be precise about casing. HyphenEspecially on Windows, I feel like there's a lot of typing symbols when typing revsets. PowerShell treats parens without a I don't think I can do anything about the quoting, but the hyphen feels unnecessary to me. |
Making the default case-insensitive has performance implications (especially in file matchers), so let's discuss that separately. It's rather a big change. Smart-case seems useful, and can be introduced later. The underlying machinery wouldn't be different. So the remaining issue is hyphen or not? Without hyphenation, |
I agree that a glob of some kind would be a good default for string patterns, maybe even a case‐insensitive one. It needs separate bikeshedding though, since currently some patterns default to For file patterns I’m not so sure. Case‐sensitivity is common filesystem semantics (albeit mostly on Linux rather than Windows or macOS) that I believe Jujutsu mostly inherits; I think we’d need to enforce case‐folding in trees for that to be the right choice, as nothing stops you having both Two motivations for the hyphen in
It just seemed a bit cleaner to me. The ergonomics aren’t great, but I already accepted we can’t get good ergonomics here without further discussion. I’d like That said, I mostly just implemented this to have the internals there so that the |
I’m so used to backwards compatibility being critical that I assumed it was important to get the naming right before the feature merges. But we see small breaking changes in most releases so clearly they’re acceptable. I’m happy to have this merge now with any naming and discuss again later. |
That’s the privilege of working on pre‐1.0 software :) Not that we should gratuitously break things, but I think we’re all agreed that the current state of string patterns isn’t ideal and that it’s too big a job to fix in this PR. I’ll just hit the merge button now so I can finally finish off the |
See the commit message for details. This resolves an old TODO and came up during the rework of #3951.
I’m not totally sure about the defaults here, but the functionality seems good to have.Checklist
If applicable:
CHANGELOG.md