Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add batch size cli arg and config option #335

Closed
wants to merge 4 commits into from

Conversation

brianmcgee
Copy link
Member

@brianmcgee brianmcgee commented Jul 3, 2024

Controls the max number of paths to batch up before applying them to a sequence of formatters.

Works through a --batch-size cli arg or via config:

[global]
batch_size = 10

Note

The cli arg takes precedence over the config option

TODO

Closes #334

Signed-off-by: Brian McGee [email protected]

@brianmcgee brianmcgee requested a review from zimbatm July 3, 2024 17:52
@brianmcgee brianmcgee marked this pull request as ready for review July 3, 2024 17:53
@brianmcgee brianmcgee changed the title feat: add --batch-size cli arg feat: add batch size cli arg and config option Jul 3, 2024
@brianmcgee
Copy link
Member Author

@michaelpj You could try the --batch-size arg by using this branch as a flake input.

Reducing the batch size to 128 or 256 should further improve your use case.

You can also set it via config using settings.global.batch_size = 128 if you are using treefmt-nix, just be sure to override the package.

Copy link
Member

@zimbatm zimbatm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, just added the missing doc

@brianmcgee
Copy link
Member Author

@zimbatm if we deploy this though, the docs site will update with a feature that is not released... I'm currently looking into versioned docs with vitepress.

@michaelpj
Copy link

Should batch size be a per-formatter option? The optimal batch size may well depend on the formatter.

@brianmcgee
Copy link
Member Author

Paths to be formatted are batched according to a batch key unique to the sequence of formatters that matched against that path.

Take the following config, for example:

[formatter.foo]
includes = ["foo/*.txt"]

[formatter.bar]
includes = ["bar/*.txt"]

[formatter.baz]
includes = ["*.txt"]
  • test.txt will have a batch key of baz since it only matches one formatter
  • foo/fizz.txt will have a batch key of baz:foo since it matches both formatters and their priority values are 0 (default).
  • bar/buzz.txt will have a batch key of bar:foo since it matches both formatters and their priority values are 0 (default).

Formatters are sorted by priority (in ascending order) and then by name (lexicographically in ascending order) to ensure a deterministic ordering. Once a batch reaches the configured global batch size, we spawn a separate routine to work through the sequence in order, applying each formatter in turn.

If we allow specifying a batch size per formatter in addition to the global batch size, it would have no effect if it was greater than the global batch size. Still, it could help optimize the application of a given batch to a given formatter.

If we remove the global batch size and instead rely solely upon a batch size per formatter, I think things start to break down.

So yeah, I can see value in an additional batch size option on a per-formatter basis. It would allow breaking batches into smaller chunks that are easier to swallow for certain formatters within the sequence, and we could parallelize the application too.

I'll find time to add it to this PR in the next few days.

@brianmcgee
Copy link
Member Author

@michaelpj I did a quick spike here so you can use this branch if you want to play around with configuring smaller batch sizes for ormolu.

[formatter.ormolu]
...
batch_size = 128

brianmcgee and others added 4 commits July 5, 2024 16:25
Controls the max number of paths
to batch up before applying them to a sequence of formatters.

Signed-off-by: Brian McGee <[email protected]>
@brianmcgee
Copy link
Member Author

This is too stale, I'll re-implement on top of latest main

@brianmcgee brianmcgee closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow providing the max batch size as a cli arg
3 participants