Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for using watchman(1) instead of walking filesystem #263

Closed
wants to merge 388 commits into from

Conversation

KAction
Copy link

@KAction KAction commented Dec 17, 2023

Anecodically, for project having 4k files and 7 different formatters, this change can reduce time it takes for treefmt(1) to figure out that it has nothing to do from ~350ms down to ~55ms, out of which getting response from watchman daemon takes ~35ms, so there is room for further improvements.

If "WATCHMAN_SOCK" environment variable is not set and "watchman(1)" command are not available, "treefmt" transparently falls back on walking the file system. Setting the environment variable is highly recommended, since is avoids subprocess invocation.

Ref: #261

Regarding further optimizations, this is what I think can be improved::

[DEBUG] load cache: 8.36ms (Δ 6.85ms)
...
[DEBUG] watchman query: 43.73ms (Δ 35.02ms)
[DEBUG] post-process watchman list: 50.55ms (Δ 6.82ms)
...

zimbatm and others added 30 commits March 2, 2021 17:36
* Build docs in nix

* Build docs in CI and update gh-pages

* Format nix
Avoid the big fat string literal
Instead of calling CLOG directly, use the "log" crate on the producer
side, and then plug CLOG as a consumer. This allows for some nice
decoupling and the log macros are also nicer on the eye.

--log-level has disappeared for now
* nix: simplify the code a bit

* docs: fix the website

Make the paths relative so they work on GitHub Pages. Fix the landing
page overriding the docs. Put the docs in the right sub-folder.
* customlog: convert emoji to unicode escape

* engine: print reformatted files on failure
* implement --stdin

A first pass at making --stdin work

* use the tempfile crate
In cases where the formatter path is relative, the "which" crate doesn't
remove the "./" in between.
* treat exit status code > 0 as an error

* adding success status check before matching status code
)

Bumps [cachix/install-nix-action](https://github.com/cachix/install-nix-action) from v12 to v13.
- [Release notes](https://github.com/cachix/install-nix-action/releases)
- [Commits](cachix/install-nix-action@v12...8d6d5e9)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [cachix/cachix-action](https://github.com/cachix/cachix-action) from v8 to v9.
- [Release notes](https://github.com/cachix/cachix-action/releases)
- [Commits](cachix/cachix-action@v8...2689c27)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Mic92 and others added 14 commits October 14, 2023 12:07
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8d0e2444ab05f79df93b70e5e497f8c708eb6b9b' (2022-12-07)
  → 'github:hercules-ci/flake-parts/c9afaba3dfa4085dbd2ccb38dfade5141e33d9d4' (2023-10-03)
• Updated input 'mkdocs-numtide':
    'github:numtide/mkdocs-numtide/af6c4a5f7c0a59da3b557795f57dcae5707523ac' (2023-04-01)
  → 'github:numtide/mkdocs-numtide/b3008171c75083f2bf2f1dc4e6781d4737dfaa49' (2023-06-26)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/4172cdda7e56a48065475fb98c57b03b83c1fde4' (2022-12-16)
  → 'github:NixOS/nixpkgs/01441e14af5e29c9d27ace398e6dd0b293e25a54' (2023-10-11)
• Updated input 'rust-overlay':
    'github:oxalica/rust-overlay/7da2f6b3a0c32f661cb2864d7fbd1d7e6f0c7543' (2022-12-16)
  → 'github:oxalica/rust-overlay/dce60ca7fca201014868c08a612edb73a998310f' (2023-10-14)
• Updated input 'rust-overlay/flake-utils':
    'github:numtide/flake-utils/c0e246b9b83f637f4681389ecabcb2681b4f3af0' (2022-08-07)
  → 'github:numtide/flake-utils/cfacdce06f30d2b68473a46042957675eebb3401' (2023-04-11)
• Added input 'rust-overlay/flake-utils/systems':
    'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Updated input 'systems':
    'github:nix-systems/default/4e9a51a15ceb27e5141819142a7d2ee827943fc8' (2023-04-08)
  → 'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Add the `hidden` and `no-hidden` flag that allows
configuring the traversal of hidden files.

Examples:

```
treefmt --hidden
```
will traverse hidden files.

```
treefmt --hidden --no-hidden
```
will not traverse hidden files, same as the default (`treefmt`).

```
treefmt --hidden --no-hidden --hidden
```
will traverse hidden files.
Flake lock file updates:

• Updated input 'rust-overlay':
    'github:oxalica/rust-overlay/dce60ca7fca201014868c08a612edb73a998310f' (2023-10-14)
  → 'github:oxalica/rust-overlay/e494404d36a41247987eeb1bfc2f1ca903e97764' (2023-10-15)

Co-authored-by: mic92-buildbot <[email protected]>
Bumps [cachix/install-nix-action](https://github.com/cachix/install-nix-action) from 23 to 24.
- [Release notes](https://github.com/cachix/install-nix-action/releases)
- [Commits](cachix/install-nix-action@v23...v24)

---
updated-dependencies:
- dependency-name: cachix/install-nix-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.23 to 0.37.27.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](bytecodealliance/rustix@v0.37.23...v0.37.27)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [cachix/cachix-action](https://github.com/cachix/cachix-action) from 12 to 13.
- [Release notes](https://github.com/cachix/cachix-action/releases)
- [Commits](cachix/cachix-action@v12...v13)

---
updated-dependencies:
- dependency-name: cachix/cachix-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
src/engine.rs Outdated
// first place to discover a changed file that matches "includes" pattern.
//
// FIXME: There should be a cleaner solution.
fn path_excluded(gs: &GlobSet, path: &PathBuf) -> bool {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is really inefficient. For nixpkgs, I get following timings:

[DEBUG] watchman query: 404.46ms (Δ 120.62ms)
[DEBUG] post-process watchman list: 601.97ms (Δ 197.52ms)

@KAction
Copy link
Author

KAction commented Dec 19, 2023

Also, for huge repositories, writing and loading cache is slow. For nixpkgs,

load cache: 290.48ms (Δ 290.28ms)
write cache: 702.86ms (Δ 29.27ms)

I think I should make separate cache for watchman-enabled scans that keeps only since parameter.

@KAction
Copy link
Author

KAction commented Dec 19, 2023

As for excludes, I think I can at least optimize happy path and change excludes: Option<GlobSet>. Or mangle excludes pattern to add trailing /*.

@KAction
Copy link
Author

KAction commented Dec 22, 2023

Updated the pull request. Previous version misused watchman, causing it to return way too much results.
With this version, whole nixpkgs is down to 12ms for no-op, my $dayjob repository is down to 8.5ms.

@KAction
Copy link
Author

KAction commented Dec 30, 2023

error (ignored): error: SQLite database '/home/garnix/.cache/nix/eval-cache-v5/6de50559647aa37e20a3fd649274afe85c9def927c4d55696d8a84953e4ccaa3.sqlite' is busy

Should I be concerned?

Copy link
Member

@zimbatm zimbatm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, I finally had the time to take a closer look and actually run it, and it's great. The only thing missing is to fix the default user experience if watchman is not installed, and a bit of documentation on how to use that feature.

Comment on lines 57 to 60
warn!(
"watchman is not available (err = {:?}), falling back on stat(2)",
e
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most users should be able to use treefmt without watchman, so having this as a warning isn't the best user experience for them.

I propose three modes: "auto" (default) tries watchman and silently falls back on "stat," "stat" is stat, and "watchman" errors if watchman is not available.

Then, introduce a CLI flag and/or project-level global config to allow switching between modes.

}
Ok(c) => {
// This is important. Subprocess wastes ~20ms.
if !std::env::var("WATCHMAN_SOCK").is_ok() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be documented, to explain how to set it

src/formatter.rs Outdated
@@ -191,7 +201,10 @@ impl Formatter {
options: cfg.options.clone(),
work_dir,
includes,
// TODO: Avoid clone().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list is generally super small, I don't think the TODO needs to stay.

Dmitry Bogatov added 2 commits March 22, 2024 15:20
Anecodically, for project having 4k files and 7 different formatters,
this change can reduce time it takes for treefmt(1) to figure out that
it has nothing to do from ~350ms down to ~8m.

If "WATCHMAN_SOCK" environment variable is not set and "watchman(1)"
command are not available, "treefmt" transparently falls back on walking
the file system. Setting the environment variable is highly recommended,
since is avoids subprocess invocation.

Ref: numtide#261
Writing list of all files takes non-trivial amount of time
@KAction
Copy link
Author

KAction commented Apr 5, 2024

I think I addressed all suggestions it latest revision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.