Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Questions #202

Open
unphased opened this issue Apr 8, 2024 · 5 comments
Open

Performance Questions #202

unphased opened this issue Apr 8, 2024 · 5 comments

Comments

@unphased
Copy link

unphased commented Apr 8, 2024

I'm testing this tool out and i saw some behavior that was not expected

  • I manually imported around 200k entries from one of my custom history files. two of these entries correspond to each actual command history item, but that's alright, it makes for a bigger test case anyway
  • performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.
  • After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.
  • After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

I dont want to give up just yet so I'm curious if anyone can contextualize the behavior i observed.

@ddworken
Copy link
Owner

Thank you for the feedback!

performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.

Thanks for raising this, this is something that I personally haven't run into since my total history size is relatively small (~30k commands). I'll spend some time working on setting up benchmarks for this and will see if I can improve performance here at all.

After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.

Yeah, it is all encrypted so it is impossible for anything else to read it. But if you'd rather not have this, see the "Offline Install Without Syncing" section in the readme. This way you can install it in a 100% offline mode without any syncing support whatsoever.

After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

Hmm, interesting. This is unexpected, so I'll also plan on taking a look at this.

ddworken added a commit that referenced this issue Apr 14, 2024
ddworken added a commit that referenced this issue Apr 15, 2024
…#202 (#204)

* Fix double-syncing error where devices receive entries from themselves

* Fix incorrect error message

* Add TODO

* Update TestESubmitThenQuery after making query more efficient

* Update TestDeletionRequests and remove unnecessary asserts

* Swap server_test.go to using require

* Fix incorrect require due to typo
@ddworken
Copy link
Owner

Looping back on this, I'm happy to say that:

  1. Searching performance should be significantly improved by ba21e1c. This will technically only improve performance in cases where there are many results (so it will still be slow if you're searching through 200k entries for only 1 matching result), but this should significantly improve the UX. I'm also planning on experimenting with sqlite's FTS/trigram support to see if we can improve this more.
  2. The issue of re-downloading entries that came from the given device is fixed by Fix double-syncing error where devices receive entries from themselves #202 #204.

@unphased
Copy link
Author

That's awesome! Thank you.

@ddworken ddworken changed the title couple questions Performance Questions Apr 16, 2024
@hongyi-zhao
Copy link

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

@ddworken
Copy link
Owner

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

Since hishtory runs entirely on the client-side and is end-to-end encrypted, postgres isn't a great fit. Postgres is generally meant to be run on a server where it contains all the data, so it isn't a good fit for the hishtory use case where the server only stores encrypted blobs and has no visibility into the data.

ddworken added a commit that referenced this issue Aug 18, 2024
ddworken added a commit that referenced this issue Aug 18, 2024
…#202 (#204)

* Fix double-syncing error where devices receive entries from themselves

* Fix incorrect error message

* Add TODO

* Update TestESubmitThenQuery after making query more efficient

* Update TestDeletionRequests and remove unnecessary asserts

* Swap server_test.go to using require

* Fix incorrect require due to typo
ddworken added a commit that referenced this issue Aug 26, 2024
* Swap to using iterators for uploading to avoid storing all chunks in memory

* Chunk uploads for reuploading

* Revert "Swap to using iterators for uploading to avoid storing all chunks in memory"

This reverts commit 632ecc5.

* Make hishtory install work even if there is zero shell history on the device

* Skip DD integration for m1 mac since it seems to fail for mysterious beta-related reasons

* Log OpenAI error to debug log for #167

* Release v0.269

* Add explicit handling for 429 error code from OpenAI

* Release v0.270

* Fix handling of new lines in commands for #163 (#170)

* Fix handling of new lines in commands for #163

* Move code for table from lib.go to query.go

* Update goldens

* Release v0.271

* Properly silence which output to fix #166

* Release v0.272

* Add || true to fully fix #166

* Release v0.273

* Improve install.py script to attempt to detect when /tmp/ is noexec (#172)

* Improve install.py script to attempt to detect when /tmp/ is noexec

* Add test to install from python script at HEAD

* Remove incorrect duplicated line

* Delete the tmp hishtory-client download since it may be dropped in CWD rather than /tmp/

* Add basic smoke test to provide test coverage for other distros (#174)

* Fix quotes on container names

* More tweaks for smoke testing

* Skip setting the hostname for smoke tests since we don't need it

* Dependencies for smoke testing

* Add cgo deps

* Install killall command

* Add two more distros for smoke testing

* Add smoke tests for arch

* Update distro-smoke-test.yml

* Remove sudo since the arch container runs as root

* Drop sudo for OpenSUSE

* Update install commands for OpenSUSE and Arch

* More tweaks to install commands

* Update arch install command

* Remove OpenSUSE since their package repos are currently returning 500 errors

* Add another dep for arch

* Move up os.remove so that the file is removed even if it fails to execute

* Move function to start of python file to make it more idiomatic

* Update go action to enable caching of dependencies

* Run integration tests in parallel to speed up testing (#175)

* Remove a few direct DB insertions to prepare for parallel tests

* Revert "Remove a few direct DB insertions to prepare for parallel tests"

This reverts commit f8a3552.

* Add rudimentary experiment of splitting tests into two chunks to make them faster

* Add missing tag

* Remove code that enforces that all goldens are used, since it is incompatible with how tests are currently split into chunks

* Lay out the framework for checking goldens being used across all test runs

* Fix missing brace

* Revert "Remove code that enforces that all goldens are used, since it is incompatible with how tests are currently split into chunks"

This reverts commit 06cc3ee.

* Add initial work towards checking that all goldens are used

* Delete incorrect and unreferenced matrix

* Upgrade actions/upload-artifact to see if that makes the download in the next job work

* Alternatively, try downloading the artifact by name

* Update golden checker to read all the golden artifacts

* Swap to using glob to enumerate all golden files, rather than hardcoding them

* Remove debugging commands

* Remove goldens that are actually used

* Remove another golden that is actually used

* Add more comprehensive support for test sharding

* Fix references to test shards and increase shard count

* Shard the fuzz test

* Add debug prints

* Mark additional tests for sharding

* Fix logic error that broke test sharding

* Remove debug print

* Fix incorrect logic with skipping the fuzz test

* Move sharding functions to testutils and add some comments

* Upgrade all setup-go actions to enable caching of deps

* Remove goldens that don't exist

* Remove new line

* Reduce delay

* Correct stage name

* Remove incorrect skip code from the first version of sharding

* Remove unused import

* Reduce number of test shards to match GitHub's limit of 5 concurrent macos jobs

* Use cask for installing homebrew to speed up github actions

* More cleanup for unused goldens

* Swap away from brew cask since it appears to be slower

* Add sync server to status -v #176 so that self-hosted users can easily confirm they're using the self-hosted server (#178)

* Release v0.274

* Make bash support lenient with empty history lines, which seems to happen for the first command or two of new installs

* Remove unnecessary sub-shell, since we just need a truthy value here

* Release v0.275

* Add web UI for querying history from the browser (#180)

As requested in #176 and #147 

* Add initail version of a web UI for querying history from the browser

* Rename webui command

* Add basic test for the web UI

* Add README for the web UI

* Add basic auth for the web server

* Add status code when panic-ing

* Release v0.276

* Add ability to disable auth and force specific creds for the web UI

* Add cleaning for integration test devices to remove DB entries

* Wire through the shell name into AI suggestions so that we can get more precise AI suggestions for the current shell

* Add support for control-A and control-E shortcuts similar to GNU readline

* Allow register new device when exceed user limit when user already exist (#181)

* Add basic readline-like support for using control-left and control-right to scroll horizontally by one word at a time

* Release v0.277

* Improve word boundary algorithm to ignore previous spaces so that control+arrow-keys will skip over repeated spaces

* Update colored golden

* Update test golden

* Update golden

* Disable colored output tests

* Add updated goldens

* Delete temporarily unused goldens

* Delete an unused file

* Bump github.com/jackc/pgx/v4 from 4.14.1 to 4.18.2 (#189)

Bumps [github.com/jackc/pgx/v4](https://github.com/jackc/pgx) from 4.14.1 to 4.18.2.
- [Changelog](https://github.com/jackc/pgx/blob/v4.18.2/CHANGELOG.md)
- [Commits](jackc/pgx@v4.14.1...v4.18.2)

---
updated-dependencies:
- dependency-name: github.com/jackc/pgx/v4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump google.golang.org/protobuf from 1.28.1 to 1.33.0 (#191)

Bumps google.golang.org/protobuf from 1.28.1 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Upgrade SLSA releaser due to github.com/slsa-framework/slsa-github-generator/issues/3350

* Release v0.278

* Update slsa-verifier to attempt to fix SLSA breakage

* Release v0.279

* Release v0.280

* Add better error message for SLSA failures

* Disable validation so we can push out a working binary even though SLSA is broken

* Release v0.281

* Fully disable validation to allow an emergency release due to SLSA breakage

* Release v0.282

* Update cosign too to fix slsa breakage from https://blog.sigstore.dev/tuf-root-update/

* Release v0.283

* Release v0.284

* Fix go.mod version after cosign upgrade

* Update go.sum after cosign update

* Release v0.285

* Re-enable SLSA verification now that we've updated the SLSA version throughout the repo

* Release v0.286

* Disable validation with local build since it seems to fail for some reason

* Add SLSA validation with current binary built by SLSA

* Set up tmate session to debug slsa releaser

* Add SLSA failure warning for versions broken by SLSA

* Remove tmate session for debugging

* Release v0.287

* Bump gopkg.in/go-jose/go-jose.v2 from 2.6.1 to 2.6.3 (#197)

Bumps gopkg.in/go-jose/go-jose.v2 from 2.6.1 to 2.6.3.

---
updated-dependencies:
- dependency-name: gopkg.in/go-jose/go-jose.v2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add support for horizontal scrolling of all columns for #188 (#195)

* Bump github.com/docker/docker (#193)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.7+incompatible to 24.0.9+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v24.0.7...v24.0.9)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/sigstore/rekor from 1.0.0 to 1.2.0 (#91)

Bumps [github.com/sigstore/rekor](https://github.com/sigstore/rekor) from 1.0.0 to 1.2.0.
- [Release notes](https://github.com/sigstore/rekor/releases)
- [Changelog](https://github.com/sigstore/rekor/blob/main/CHANGELOG.md)
- [Commits](sigstore/rekor@v1.0.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/sigstore/rekor
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add ability to configure custom OpenAI API endpoint for #186 (#194)

* Add ability to configure custom OpenAI API endpoint for #186

* Ensure the AiCompletionEndpoint field is always initialized

* Release v0.288

* Enable colored golden tests for linux (#184)

* Enable golden tests for linux and ensure all goldens get saved as outputs

* Swap in OS specific goldens

* Update colored goldens to take into account OS version, since different macos versions have different behavior here

* Update goldens

* Re-enable golden tests

* Add missing golden

* Empty commit

* Remove linux kernel version from OS name

* Remove minor version numbers from os versions for golden files for tests

* Continue-on-error for the DD setup since it will also fail if colima fails

* Add test for horizontal scrolling other columns for #188

* Add support for forcing init without prompting via --force flag for #198

* Clean up: Remove duplicated code by calling existing utility function

* Add mouse scrolling support for #200

* Revert "Add mouse scrolling support for #200" since it breaks the ability to highlight text

This reverts commit 7d9bb66.

* Release v0.289

* Add benchmarking for searching for #202

* Add index of start time so that queries with a LIMIT clause can avoid a full table scan (for #202)

* Release v0.290

* Add --port flag for the web UI for #203

* Add additional test for smoke tests to cover syncing

* Move extra delay to a separate job to avoid wasting GH action quota by sleeping in duplicated jobs

* Release v0.291

* Revert "Add additional test for smoke tests to cover syncing"

This reverts commit 514d95b.

* Fix double-syncing error where devices receive entries from themselves #202 (#204)

* Fix double-syncing error where devices receive entries from themselves

* Fix incorrect error message

* Add TODO

* Update TestESubmitThenQuery after making query more efficient

* Update TestDeletionRequests and remove unnecessary asserts

* Swap server_test.go to using require

* Fix incorrect require due to typo

* Slow down gif per feedback in #199

* Update bubbletea to include 2b46020ca0725219da1a7d7969fa85c486181258 since it seems to help fix #185

* Fix test broken by 7ae9f15 by making sure input is sent and processed as separate events

* Fix test broken by 7ae9f15 by making sure input is sent and processed as separate events

* Fix test broken by 7ae9f15 by making sure input is sent and processed as separate events and updating the golden to reflect this

* Release v0.292

* Bump golang.org/x/net from 0.22.0 to 0.23.0 (#206)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.22.0 to 0.23.0.
- [Commits](golang/net@v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove darwin-21 goldens since they're no longer used now that GH upgraded their macos image (#210)

* Update backend to avoid persisting entries to be read by devices that have been uninstalled

* Add support for custom key bindings for #190 (#209)

* Add support for custom key bindings for #190

* Add tests for configuring custom key bindings

* Simplify key bindings test

* Add docs on custom key bindings + error message for unhandled actions

* Fix condition added in d6a6021 to also apply to rows with the go 'empty' value and not just null

* Add support for enabling/disabling syncing post-install

* Release v0.293

* fix: close file (#213)

* Release v0.294

* Move docs on custom key bindings to a more logical location

* Fix duplicate pre-saving issue reported in #215

* Revert "Fix duplicate pre-saving issue reported in #215"

This reverts commit 336b331.

* Fix duplicate pre-saving issue reported in #215 (#217)

* Release v0.295

* Add full fix for #215 along with a test to reproduce the issue (#218)

* Release v0.296

* Add ability to skip config modifications for #212 (#216)

* Add ability to skip config modifications

* Update golden names to fork on OS

* Remove incorrect newline in golden

* Add README documentation for default-filter

* Update title for section

* Release v0.297

* Add basic fix for #225 by escaping tab characters before rendering

This is a tricky bug to fix because the width of a tab character varies depending on context. This means that when we're trying to build a table and calculating the width of columns for budgeting, we can't actually know the width of a tab without knowing exactly what characters come before it. This is in theory doable, but it leads to some really complex code that I'd rather not adopt.

* Release v0.298

* Bump github.com/hashicorp/go-retryablehttp from 0.7.2 to 0.7.7 (#223)

Bumps [github.com/hashicorp/go-retryablehttp](https://github.com/hashicorp/go-retryablehttp) from 0.7.2 to 0.7.7.
- [Changelog](https://github.com/hashicorp/go-retryablehttp/blob/main/CHANGELOG.md)
- [Commits](hashicorp/go-retryablehttp@v0.7.2...v0.7.7)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-retryablehttp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add additional fallback method for retrieving the CWD to further improve the situation for #226

* Explicitly install openssl to see if it fixes smoke test errors on arch

* Add integration test for #226

* Release v0.299

* Update macos version for signer since GH dropped support for macos 11

* Release v0.300

* Swap to macos-latest to see if GH has more quota for that tag

* Release v0.301

* Release v0.302

* Upgrade to setup-go@v4 for automatic caching support

* Revert "Remove OpenSUSE since their package repos are currently returning 500 errors"

This reverts commit 6270060.

* Install git and tar for opensuse smoke tests

* Link /bin/sh for opensuse smoke tests

* Remove opensuse smoke tests

* use http.DefaultClient (#232)

* Add new short name for "ExitCode" - "$?" (#228)

* Add more short column name alternatives similar to #228

* add forceComapctMode config entry (#237)

* Add docs in readme to call out shorter column names as added in #228

* Change compact-mode setting that was added in #237 to respect the convention of taking in an argument

* Add config-get compact-mode command (as needed by #237)

* Move checking of forced compact mode into helper functions to ensure it is checked everywhere (follow up to #237)

* Add test for forced compact mode (from #237)

* ai: add some new env variables to control OpenAI requests (#231)

Co-authored-by: David Dworken <[email protected]>

* Update incorrect docs on ClientConfig struct

* Add ability for the client to configure the model via an environment variable

* Bump github.com/docker/docker (#236)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.9+incompatible to 25.0.6+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v24.0.9...v25.0.6)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* run "make fmt" (#233)

* Add make fmt to pre-commit

* Fix import

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Nguyễn Hoàng Đức <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: guangwu <[email protected]>
Co-authored-by: Pavel Griaznov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants