Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mailmap: add support for .mailmap files #3951

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

emilazy
Copy link
Collaborator

@emilazy emilazy commented Jun 24, 2024

Explanation, motivation, backstory, and design decisions are in the commit message, and I’ve included comprehensive tests and hopefully‐adequate documentation.

Pulling in the people who actively discussed this on Discord for review, hope that’s okay: @nasamuffin @arxanas @PhilipMetzger

Some things I’m unsure about and could use feedback on:

  • I’m happy to hear bikeshedding about the DSL interfaces here, especially the *_raw names (*_unmapped?).

  • There is a breaking change to the Rust .author() and .committer() methods on Commit; I assume this is okay due to the current level of general churn, but I thought I should point it out. I plan to look into opening a PR to make gg do the right thing here.

  • get_wc_commit_mailmap isn’t exactly my favourite function name in the world, and it feels a little gross to have commit/tree‐related plumbing in the mailmap module (e.g., jj_lib::gitignore just has a pure model). Any suggestions for better places this could go, better names it could have, types it could hang off as a method of, etc. are welcome.

  • I added pub(crate) to jj_lib::git_backend: signature_{from,to}_git, as they’re used by the mailmap module to interface with gix-mailmap. Since this isn’t a feature specific to the Git backend, it feels wrong to have that dependency, but I’m not sure where would be good to move them to. A new jj_lib::git_util module, perhaps?

  • I don’t love the jj_cmd_ok_as thing in the CLI test, but it seemed like too much work to refactor that interface right now.

  • I don’t really know how conflicts in the .mailmap file should be handled. Ideally the entries not involved in a conflict should still be respected, but I’m not even sure how materialize_tree_value behaves here.

  • Should I split this into multiple commits? A lot of the changes are fairly entangled, but I could probably tease one or two more commits out of this.

  • Should there be more permanent documentation of the .mailmap support, and if so, where?

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@emilazy

This comment was marked as resolved.

@emilazy emilazy force-pushed the push-lyokulnkyppp branch from d61a6ef to 3065d5d Compare June 24, 2024 16:02
Copy link
Collaborator

@PhilipMetzger PhilipMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round, I'm not totally onboard for the *_raw() templates but let's wait for other peoples opinion. And to fix the build failure, wrap the necessary stuff in #[cfg(git)] and make the dependency rely on it.

lib/src/mailmap.rs Outdated Show resolved Hide resolved
lib/src/mailmap.rs Outdated Show resolved Hide resolved
lib/src/revset.rs Show resolved Hide resolved
@emilazy emilazy force-pushed the push-lyokulnkyppp branch from 3065d5d to e896de2 Compare June 24, 2024 16:11
@emilazy

This comment was marked as outdated.

@emilazy
Copy link
Collaborator Author

emilazy commented Jun 24, 2024

For what it’s worth, the best alternative DSL API I came up with was commit.{author,committer}([canonical=true]) and {author,committer}{pattern, [canonical=true]), but that required adding support for keyword arguments to the template DSL and didn’t seem obviously better than what I ended up with currently.

@emilazy emilazy force-pushed the push-lyokulnkyppp branch from e896de2 to 6066538 Compare June 24, 2024 16:16
@emilazy

This comment was marked as outdated.

@emilazy emilazy force-pushed the push-lyokulnkyppp branch 2 times, most recently from f21c84b to 0044daf Compare June 24, 2024 16:33
lib/src/signature_util.rs Outdated Show resolved Hide resolved
lib/src/commit.rs Outdated Show resolved Hide resolved
@emilazy emilazy force-pushed the push-lyokulnkyppp branch from 0044daf to 04df057 Compare June 25, 2024 14:45
@emilazy
Copy link
Collaborator Author

emilazy commented Jun 25, 2024

Okay, I’ve dropped the canonicalizing functions from Commit and added corresponding convenience helpers on the Mailmap type instead, along with explanatory documentation on the non‐canonicalizing Commit methods, and pointers for obtaining a Mailmap. That seems better‐factored and should probably be sufficient to nudge tooling developers onto the right path. Thank you both for the feedback; hopefully this is better from a code organization POV.

I also separated out the signature conversion, and renamed get_wc_commit_mailmap to get_current_mailmap, which I dislike a lot less. It turns out that Git actually does support empty email addresses, and that real repositories (e.g. Nixpkgs) have such commits in them; it would be a practical use of .mailmap to correct these, e.g. <[email protected]> Author Name <>, so rather than bailing early on empty signatures I just convert them to Gitoxide’s types as‐is. It’s fewer Git‐specific details in backend‐agnostic code, less churn in the rest of the codebase, and overall better behaviour, so it turned out nicely.

I’ve currently left the {author,committer}{,_raw} revset and template functions rather than moving them to a global option. Git has e.g. %an/%aN pretty format pairs which are analogous to this; I assume they only have the option because it historically used to be opt‐in and they don’t have a real query language. It feels more flexible and declarative to encode the intent in the function names rather than having their semantics depend on more global state. But if people feel strongly about just having a global option to disable it I’m fine implementing that instead.

Copy link
Contributor

@nasamuffin nasamuffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for leaving the mechanical bits in a standalone commit.

GitHub won't let me comment on the commit message :) but in my opinion, you could say a little more - that the mailmap will introduce a concept of pre- and post-mapped identities. Right now, it's implied here.

Are committer and author the only types of identities that are here? What happens to, say, commit message footers like Signed-off-by: (which are mailmapped in Git, IIRC)? Is that handled differently?

@emilazy emilazy force-pushed the push-lyokulnkyppp branch from 04df057 to 90e7a18 Compare June 25, 2024 18:38
@emilazy
Copy link
Collaborator Author

emilazy commented Jun 25, 2024

Thank you for taking a look!

GitHub won't let me comment on the commit message :) but in my opinion, you could say a little more - that the mailmap will introduce a concept of pre- and post-mapped identities. Right now, it's implied here.

That’s not something I hear often :)

I think this is a good idea and I’ve added another few paragraphs (after my editor ate the first draft…); I’ve called them raw signatures and canonical signatures, as that’s the best terminology I can think of currently and it matches the code and comments. I’ve also tweaked a few doc comments to allude to these concept more explicitly, although there isn’t an in‐depth exposition in the code currently. Possibly this should be recorded more permanently in developer documentation somewhere but hopefully that can be left to a follow‐up PR – the design docs section, perhaps? Although I wouldn’t exactly say that this is a stellar piece of de novo design to flaunt, just a pragmatic stop‐gap for arguably‐regrettable decisions the majority of DVCSes have made up to this point…

Are committer and author the only types of identities that are here? What happens to, say, commit message footers like Signed-off-by: (which are mailmapped in Git, IIRC)? Is that handled differently?

As far as I can tell authors and committers are the only places Jujutsu gets and processes signatures from currently, and there is no special support for trailers. It seems like git shortlog --group=… canonicalizes identity information in trailers, but git log doesn’t on display; a strange state of affairs that speaks to the unfortunate whack‐a‐mole you talked about and that I’m trying to avoid repeating. I guess it makes some sense with the nature of trailers as an ad‐hoc interpretation layer on top of unstructured commit messages, which I also think is regrettable. Hopefully Jujutsu can at least do better there, even if it has to lower whatever additional metadata it ends up recording down to trailers for now for Git compatibility! I’ve discussed this in the commit message too as another example of where the raw–canonical signature distinction and guidelines for which to use would be relevant.

Copy link
Member

@martinvonz martinvonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thanks!

Please give @yuja a chance to do another round of reviews too if he wants to.

lib/Cargo.toml Outdated Show resolved Hide resolved
cli/src/commit_templater.rs Outdated Show resolved Hide resolved
cli/tests/common/mod.rs Outdated Show resolved Hide resolved
let commit = store.get_commit(&entry.commit_id()).unwrap();
let author = mailmap.author(&commit);
pattern.matches(&author.name) || pattern.matches(&author.email)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we transform the query by frontend?

author(foo)
-> author(foo)|author(exact:lookup_reverse(foo)[0])|author(exact:lookup_reverse(foo)[1])|..

It's not exactly the same (because the query also matches the original values), but I think it's practically okay. I think it's easier for revset engine backed by database to process query.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I mentioned this in the commit message under “The mapping to canonical signatures is one‐way, and queries only match on the canonical form”. It seems a little tricky to translate back because the functions accept glob patterns; at the very least we would have to traverse the Mailmap structure and syntheize author_raw(exact:…) patterns for all possible matches. But I’m not entirely sure that would be correct in all cases, even if we accept the change in semantics. I’d have to think about it.

Rather than changing the semantics/implementation here, I think in a database backend, you could have a caching table like (commit ID or hash or something of .mailmap, raw signature that appears in commits, canonical signature from corresponding .mailmap), and an index for lookup? I don’t really know how those backends work since there isn’t a public example of one. Maybe you could even use a materialized view or something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the implementation either, and I guess Google backend can just ignore mailmap thingy at all (because .mailmap shouldn't exist there.)

That said, it'll be nice if mailmap handling is isolated to the frontend layer. Suppose the size of the mailmap is small compared to the commit history, I think it's okay to scan all entries to build a reverse mapping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a think about this. If you wanted to avoid the additional table/view I mentioned, we can still obtain the same database performance benefits without changing semantics (which I’m a little reluctant to do, as it goes against the rubric of when raw signature content should affect UI I put in the commit message). author(a*b*) would optimize to author(a*b*) & (author_raw(a*b*) | author_raw(exact:…) | author_raw(exact:…) | …) based on the .mailmap. The latter can be queried on the database efficiently, and the author(a*b*) & is a quick post‐processing step on the results that ensures the semantics remain the same.

This wouldn’t be hard to implement, but I’m hesitant to add the complexity off the bat if it wouldn’t currently have any prospective users, especially as it would make performance strictly worse with the current backends. (Although considering the complaints I’ve heard from Googlers in the past about what a pain it is to get their username and name updated across all systems maybe they could do with having some kind of system here…)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think doing preprocessing is strictly worse in performance. If the query doesn't match any mailmap entry (which I think is common), there's no additional cost in later stage. OTOH, without preprocessing, we'll do binary search for each commit entry. I don't think the difference would be significant, though.

FWIW, if a database-backed revset engine transforms mailmap to table/view, they'll want a single mailmap resource than mailmap per expression node.

Anyway, I think this is a fundamental issue of mailmap design, and I just thought it would be nice if we can avoid spreading a mess to the backend layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, my point is that it's nice if mailmap handling does not cross the frontend-backend boudary.

I agree with this. At the same time, mailmap is a bit of a hack, so what Emily has currently implemented in this PR seems fine to me.

Another alternative might be to pass the Mailmap into Index::evaluate_revset(). What do you think about that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative might be to pass the Mailmap into Index::evaluate_revset().

That seems slightly better.

FWIW, I also consider mailmap as a hack, and that's the reason why I insist that the query result doesn't have to be 100% accurate. It's simpler to approximate mailmap query as union of matched mailmap entries than extending backend API.

Copy link
Collaborator

@PhilipMetzger PhilipMetzger Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strawman design:

  • Mailmap could become a trait.
  • The Backend trait could gain e.g. fn read_mailmap(&self, as_of_commit: &CommitId) -> impl Mailmap. A database backend could stub this out if it doesn’t support the feature or ignore the parameter and return an interface to a database table. Handling the queries efficiently would be the responsibility of a backend that wants to support this, and could be handled as per the desugaring I outlined in lib: add support for .mailmap files #3951 (comment).
  • The in‐tree backends would share the current implementation.

Not sure if this is better or worse than the status quo, but I thought I should throw it out there.

Yes, that'd be the approach if we pursue the identity work further, but clearly is unneeded complexity for this PR. As what the current .mailmap approach should be, there I heavily agree with Yuya and Martin.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all the feedback; I’m going to take a day to think the design space over and try and address some of the concerns here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative might be to pass the Mailmap into Index::evaluate_revset(). What do you think about that?

I tried this, and it was a mess, sadly; the state needed to be piped through the whole evaluation system and it caused a lot of churn and other problems. It basically resulted in ResolvedExpressions not being fully resolved and having to patch that up in the evaluation layer, so in terms of these layering concerns it was a lot worse.

I wasn’t sure about how to move forward here, but then I realized that it is possible to implement the front‐end desugaring of the revset queries in a way that always gives correct .mailmap results and yields simple, efficient RevsetExpressions that should work great with database indexes. The backend now no longer needs to care about .mailmap support at all. Hopefully that will make everyone happy! I’m still finishing polishing it off, so I haven’t pushed it here yet, but I’ve opened #3991 as an interim change I made along the way that resolves an old TODO and helped with the implementation.

(For what it’s worth: I do agree that .mailmap is essentially a pragmatic hack, an inherently incomplete measure, and a symptom of a deficient underlying design. Still, Git users are unlikely to ever get anything better, and for some people it’s a hack that is the difference between a marginally workable identity system and one that is completely unfit for purpose, so I do care about implementing it at least as well as Git does. I’m principled in my pragmatism; giving up on the principles of application I listed in the commit message results in concrete scenarios where it fails to meet the basic point of supporting .mailmap, in my view. Thankfully, it looks like it’s totally practical to address these layering concerns while still upholding them!)

lib/src/mailmap.rs Show resolved Hide resolved
lib/src/mailmap.rs Outdated Show resolved Hide resolved
emilazy added 4 commits July 10, 2024 06:31
The underlying predicates now match only on the name and email,
and the revset functions construct the union of the two. These are
desired for the `.mailmap` work and aren’t currently surfaced in
the user interface, but the possibility of exposing more specific
queries in the future becomes available.
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
@emilazy emilazy force-pushed the push-lyokulnkyppp branch from 5400721 to 9b265b5 Compare July 10, 2024 07:37
@emilazy emilazy changed the title lib: add support for .mailmap files mailmap: add support for .mailmap files Jul 10, 2024
@emilazy
Copy link
Collaborator Author

emilazy commented Jul 10, 2024

I’ve pushed a pretty big rework of this; it feels cleaner now. Headline changes:

  • Added a Mailmap::empty constructor to use instead of Default::default for clarity.

  • Renamed mailmap::{get → read}_current_mailmap and added error reporting.

  • Added a WorkspaceCommandHelper::current_mailmap convenience helper, although it currently only has one call site. (Maybe it should be read_current_mailmap too? Maybe it should keep the async?) Caching is a TODO for a later date, but performance impact seems negligible anyway.

  • The details of handling .mailmap mappings are now dealt with entirely in the revset frontend; backends don’t have to care about it at all. The same correctness guarantees as the previous version are maintained, and the resulting expressions should be efficient to execute against database indexes. Everything is abstracted through simple functions like RevsetExpression::author; the mapping implementation is contained entirely within the private RevsetExpression::signature_field helper, which I have commented extensively to make the logic clear.

  • Added even more paragraphs of detail to the main commit message!

I believe I’ve addressed all the previous review comments; hopefully this is acceptable to everyone now and we can get this merged! This turned into a bigger project than I expected when I picked it as my first public contribution to Jujutsu, but I’m grateful for all the feedback and I think this has ended up in a good place.

Copy link
Collaborator

@yuja yuja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction of this PR looks good to me, thanks!

/// Commits with author field matching the pattern.
Author(SignatureField, StringPattern),
/// Commits with committer field matching the pattern.
Committer(SignatureField, StringPattern),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these enum variants can be simply AuthorName(_)/AuthorEmail(_)/..?

We're going to introduce AuthorDate(_), etc., so Author<Thing> seems more consistent. And we'll probably add author_name()/_email() revset function if needed.

@@ -2053,12 +2164,14 @@ impl<'a> RevsetParseContext<'a> {
user_email: String,
extensions: &'a RevsetExtensions,
workspace: Option<RevsetWorkspaceContext<'a>>,
mailmap: Rc<Mailmap>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rc<_> isn't needed, but templater depends on it. If Mailmap is cached by WorkspaceCommandHelper, we can replace Rc with &'a. That said, this isn't important, and can be addressed later. (I just find it looked a bit odd because the revset module doesn't require Rc.)

self.clone()
} else {
RevsetExpression::minus(self, &RevsetExpression::union_all(others))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'll inline this to caller.

x & ~none() could be rewritten to x at later pass, but apparently we don't have a rule for that right now.

old_field_value.map(|field_value| pattern_case_insensitive.matches(field_value));
let new_field_matches = pattern.matches(new_field_value);

if !new_field_matches && old_field_matches.unwrap_or(true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't .unwrap_or(true) mean we're building quite large negative sets? Mailmap entry doesn't usually contain old name, I suppose. Even if it's more correct, I don't think the added cost is acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, my original idea was to build query without excludes, meaning author(x) matches x in raw entry or mapped entry. This isn't perfect, but I considered it would be good enough. Mailmap is a bit of a hack.

If we need to guarantee that all shadowed matches are excluded, it's probably better to pass Mailmap to evaluate_revset() as a resource.

// }
//
// When matching raw signatures, the `Mailmap` is empty and this reduces to
// `raw_field(pattern)`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's probably better to write unit tests for this query transformation function.

StringPattern::Exact(literal) => Cow::Owned(StringPattern::exact_i(literal)),
StringPattern::Substring(needle) => Cow::Owned(StringPattern::substring_i(needle)),
StringPattern::Glob(pattern) => Cow::Owned(StringPattern::GlobI(pattern.clone())),
_ => Cow::Borrowed(self),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'll list all variants exhaustively (to catch future bugs.)

pub fn new_name(&self) -> Option<&str> {
self.new_name.as_deref()
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be okay to simply make all fields public.

@mati865
Copy link
Collaborator

mati865 commented Sep 29, 2024

What is the status here? It'd be a shame to let it bit rot further.

@emilazy
Copy link
Collaborator Author

emilazy commented Sep 30, 2024

Sorry for the inactivity; I was expecting this to be a relatively quick PR and burnt out a little on the review cycles. I still intend to rebase this and respond to review comments at some point, though I am not sure if it will be possible to fully satisfy everyone here.

@lilyball
Copy link

I would really love to see this get merged as long as it's in a "good enough" state even if it's not perfect. There are repositories I don't want to use jujutsu with until it has mailmap support because I'd rather not see my deadname show up in the UI.

@yuja
Copy link
Collaborator

yuja commented Oct 1, 2024

I'd rather not see my deadname show up in the UI.

FWIW, the display part (= templater + Commit object methods) is easy, and can be merged separately. IIRC, most of the discussion here is about author()/committer() revset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants