Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History API rewrite (discussion after #680) #735

Closed
ClementNerma opened this issue Jan 30, 2024 · 17 comments
Closed

History API rewrite (discussion after #680) #735

ClementNerma opened this issue Jan 30, 2024 · 17 comments

Comments

@ClementNerma
Copy link
Contributor

This issue is aimed to track all discussions for the rewrite of the history, menu and completer (HMC) API, followed existing decisions and trials in #680 (history only).

Opening message from @stormasm : #680 (comment)

@ClementNerma
Copy link
Contributor Author

The Big Idea concept is a Complete Refactor / Rewrite of Reedline without having History / Menus / Completer (HMC)
so tightly embedded into the Engine ---- but instead API entry points into the Engine which would enable
decoupled HMC components...

@stormasm what do you mean exactly by that? Given you currently can disable API and completer by simply not providing them when Reedline is created.

@stormasm
Copy link
Contributor

stormasm commented Jan 30, 2024

@ClementNerma true, good point !

Think about it then from the point of view of having a History / Menu / Completer (HMC) crate system where developers can come along and have the ability to implement any one of the three components given our fundamental Reedline API into these systems...

In other words vastly loosening the Engine API to be able to accept those components (as Traits)...

Or something to that effect...

From the History API point of view imagine you wanted to use a different persistence database / something different than SQLite (for example)...

@ClementNerma
Copy link
Contributor Author

Sorry, I don't really get what you're suggesting.

Aren't these three components traits already? PR #680 was about making it more flexible and easier to use, but it was already a trait before.

IIRC I've already implemented a custom completer, and in the current git tree it seems you can make custom menus?

@stormasm
Copy link
Contributor

@ClementNerma going back to your previous PR --- one thing you mentioned in there was to leave the File format of the History API alone ---- that seems critical to me...

The current file format of the History API would remain "untouched".

I really like the way it works for many different reasons...

@stormasm stormasm changed the title HMC rewrite (discussion after #680) History API rewrite (discussion after #680) Jan 30, 2024
@stormasm
Copy link
Contributor

stormasm commented Jan 30, 2024

@ClementNerma lets just focus on the History API for now...
I went ahead and modified the title of the issue...
It will limit the scope of what we are talking about and probably be more productive in the short term 😄
Plus its what you are most interested in for now (I assume ?) --- and where your coding passion lies based on your
previous PR....

@ClementNerma
Copy link
Contributor Author

ClementNerma commented Jan 30, 2024

Seems fine to me. For the SQLite database I feel like it could still be useful for programs who'd like to take advantage of it, so I'd like to keep the ID inside. The migration is instant for previous database and 100% compatible with the previous one (the only change to the schema is one new column).

Plus its what you are most interested in for now (I assume ?) --- and where your coding passion lies based on your
previous PR....

Well initially I started the PR to create a path for myself while building my own shell. But now I'd like to think this could help other people as Reedline is the 'de facto' library for this kind of task in Rust. So why not improve it in multiple ways?

I now have a good understanding of how the history API works so that's where I can be the more proficient for now, but I'm absolutely not against helping on other parts if help is required :)

@stormasm
Copy link
Contributor

@ClementNerma sounds good to me ! again, thanks for showing an interest in improving the History API...

@ClementNerma
Copy link
Contributor Author

ClementNerma commented Jan 31, 2024

I updated the PR, I feel like it should mergeable in its current state. What do you think?

This could be the first brick of a larger API overhaul, but greatly improving the History API and making it entirely implementable for everyone is a major upside IMO.

I think a later step could be to introduce a third builtin history type, which would use all the features from the history. Like SQLite, but using a more performant and compact system like sled for instance.

@stormasm
Copy link
Contributor

@ClementNerma I don't see any new commits in the PR, can you please paste a link to the PR here, if we are talking #680 --- the last commit I see is from awhile ago...

@ClementNerma
Copy link
Contributor Author

Indeed it doesn't show up in the PR 🤔
Weird, I'll check that later.

@stormasm
Copy link
Contributor

@ClementNerma we are doing a nushell release on Feb 6 so it probably makes sense to hold off on submitting the PR to next week...

Prior to submitting the PR I would recommend NOT changing anything in the Schema file for the SQlite...

I don't think we are going to want to have to run a Migrate script on developers databases...

So I would recommend no changes to the text file or the database file...

#736

Also I would recommend running your ideas by @Hofer-Julian prior to doing any more work on this concept...

They are pretty familiar with the inner workings of how Reedline interacts and works with the History / Database...

And may have some insights on how you can move forward with your intentions of improving History.

Thanks !

@ClementNerma
Copy link
Contributor Author

Ok so I'll remove the new SQLite schema and wait for people to review it.

cc @Hofer-Julian

@stormasm
Copy link
Contributor

@ClementNerma since the old PR is already closed and it has lots of old stuff in there...

can you please open a new PR so we start from a clean slate...

thanks !

@saep
Copy link
Contributor

saep commented Feb 3, 2024

To remove duplicates from the command history in nushell, I have created the PR
#741 and read through some of the history related source code. Since I've spent
some time reading and thinking, I'll just write down issues and suggestions
that came to my mind.

The code snippets are from the version I've looked at f244736.

save

    /// save a history item to the database
    /// if given id is None, a new id is created and set in the return value
    /// if given id is Some, the existing entry is updated
    fn save(&mut self, h: HistoryItem) -> Result<HistoryItem>;

I don't think that a new history item must be generated if no id is given.
If I type cargo test a couple of times and I have a file based backend that doesn't
contain duplicate commands, then it seems fine to me to just update the existing entry
and return that.

count and search

    /// count the results of a query
    fn count(&self, query: SearchQuery) -> Result<i64>;
    /// return the total number of history items
    fn count_all(&self) -> Result<i64> {
        self.count(SearchQuery::everything(SearchDirection::Forward, None))
    }
    /// return the results of a query
    fn search(&self, query: SearchQuery) -> Result<Vec<HistoryItem>>;

Search should probably return an iterator and not a Vec. nushell only shows
10 results when i press <C-r>, but if the query results in thousands of
matches, all the items must be allocated in a vector before they are shown to
me. If that takes some seconds, I have to wait some seconds, even though the
first 10 matches can probably be found in significantly less time. The
implementation may still create a vector and use .into_iter() to convert to the
api, but it would make the api more flexible without really sacrificing anything.

It might even make sense to combine search and count to return a pair of the
number of results and an iterator.

File-based history

This is an append-only log with just the commands synchronized between
different processes. It's also loaded completely into memory. It has a special case
for not adding the same command if you execute it consecutively, which is a weird
hack to avoid a few duplicates. If you happen to exceed the HISTORY_SIZE, every
sync will cause somewhat expensive write operations.

It probably shouldn't contain duplicates. I would change the format to include a
time stamp and possibly a score number. Similar to zsh, we could reserve a fixed
amount of bytes to store that information. This has the advantage, that we can
keep track of the last time the command is used by just changing a few bytes in
the file. The score can possibly be a configurable algorithm. Spontaneous ideas:

  1. number of times the command has been executed
  2. score that is degraded somehow, if the command hasn't been used in a while
  3. number of times the command has been executed, but the user can decrement it in
    the menu if he thinks its scored to high

Without changing the file format, one could also simply not save entries that
already exist. This has the slight disadvantage, that you cannot show more recent
commands earlier in the completion.

@ClementNerma
Copy link
Contributor Author

I don't think that a new history item must be generated if no id is given.
If I type cargo test a couple of times and I have a file based backend that doesn't contain duplicate commands, then it seems fine to me to just update the existing entry and return that.

This is a very ambiguous scenario that's not obvious at first glance ; hence why I'm recommending different methods for creating and updating items. History backends also being responsible for ID generation also to always ensure consistency within the code and always be sure we can identify a given HistoryItem by its ID.

Search should probably return an iterator and not a Vec. nushell only shows 10 results when i press , but if the query results in thousands of matches, all the items must be allocated in a vector before they are shown to me. If that takes some seconds, I have to wait some seconds, even though the first 10 matches can probably be found in significantly less time.

Allocating a 1M-elements vector doesn't take more than milliseconds, probably even less. Also, the SearchQuery parameter to the search method already includes a limit argument which allows to limit the number of results (even if it's before additional filtering). Requiring an into_iter also adds another level of allocation if we need to collect it into a Vec later down the path.

It might even make sense to combine search and count to return a pair of the number of results and an iterator.

These are actually two very different things ; for instance in a database it would require two data fetching. In a file, even more.

It probably shouldn't contain duplicates. I would change the format to include a time stamp and possibly a score number.

Besides what I already said at the beginning of my comment, including a timestamp would break the existing text-based format. One thing I would consider (and I think it would be a very good addition IMO) would be to deprecate the old file format and create a new one with all the bells and whistles of the HistoryItem values. The major downside to this would be that it would create confusion by having two different text formats.

Similar to zsh, we could reserve a fixed amount of bytes to store that information. This has the advantage, that we can
keep track of the last time the command is used by just changing a few bytes in the file. The score can possibly be a configurable algorithm. Spontaneous ideas: [...]

I also don't think this is a good idea, requiring to update old entries would require additional writes to the file that wouldn't be needed otherwise. It also prevents the file from being append-only.

@saep
Copy link
Contributor

saep commented Feb 4, 2024

Allocating a 1M-elements vector doesn't take more than milliseconds, probably even less. Also, the SearchQuery parameter to the search method already includes a limit argument which allows to limit the number of results (even if it's before additional filtering). Requiring an into_iter also adds another level of allocation if we need to collect it into a Vec later down the path.

fn main() {
    let mut vec = Vec::new();
    for i in 0..1_000_000 {
        let str = format!("{}{}{}", &i, &i, &i);
        if str.contains("111") {
            vec.push((i, str));
        }
    }
    println!("{}", vec.len());
}

~80ms on my desktop and ~310ms on my laptop. Still a history of a million commands is unrealistic, so yeah, fast enough.

It might even make sense to combine search and count to return a pair of the number of results and an iterator.

These are actually two very different things ; for instance in a database it would require two data fetching. In a file, even more.

After further thinking, I agree with keeping them separate. You don't always need the count and the additional aggregation it will cost.

I also don't think this is a good idea, requiring to update old entries would require additional writes to the file that wouldn't be needed otherwise.

It would be the same amount of writes. Instead of writing a duplicate at the end, some bytes in the middle are altered. Since a fixed amount of bytes is reserved, the file's content mustn't be moved at all.

It also prevents the file from being append-only.

I'd love to remove typos from my history when they clutter my searches.

@ClementNerma
Copy link
Contributor Author

I was talking about an instant allocation, not 1 million of allocations, which is immensely slower :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants