-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve history API #680
Improve history API #680
Conversation
Hmmm seems like some tests don't pass because the IDs are expected to be contiguous. This wasn't mentioned in the docs, I'll see what can I do about this. |
Thanks for the PR. We're always interested in making reedline better. However, someone is going to have to convince me that |
The goal of this is still a bit unclear to me, you describe changes but not what is problematic with the current system of ids. Having the IDs private before #677 allowed us to guarantee that those possibly implicit guarantees are upheld. Maybe worth writing out what the current IDs all mean, what requirements we impose to implement session isolation/cwd aware hints etc. The constraint of the existing text file format only storing the command needs to be upheld for Nushell backwards compatibility unless we have a clear story, |
Before I go further and rework the histories' internals, @fdncred may I have your opinion on this? |
@fdncred and @sholderbach thanks for your answers!
Having the IDs private prevented anyone to make a custom For me, using random IDs is the best choice: we don't make any assumption on what an ID represents - it is only that, an ID. Not an index, or anything else. It can be generated in a variety of manners depending on the needs. For an in-memory history for instance, it could be a simple counter. In most cases, a random generator (like I used) would be the best choice to avoid ID re-use. With this PR we have a system where IDs are black boxes, and are only used to refer to a specific history item - nothing more. Navigation in the history, ordering etc. is the responsibility of the This will allow anyone to build a custom
|
By the pattern of PRs, I'm beginning to wonder if me merging those 3 other ones was the right thing to do. I'm wondering if you separated them to help us or if you're just sending PRs as you discover the need instead of a holistic approach. This PR in particular seems like quite a big departure. I'm wondering there are specific things that could be improved with our history system without breaking it this way. I hope that doesn't sound mean or rude, but we have to consider what's best for reedline but also for nushell first because reedline was created for nushell. As far as the opinion you're wanting from me, I'm not sure what you're asking about. If it's about sholderbach's comments, then, when it comes to reedline, I'm always happy to agree with him due to his vast knowledge of reedline and how reedline is used in nushell along with his general expertise with Rust. Again, we're excited to have people excited about reedline and trying to make it better. So, thank you for that. |
Mhh to me what the capabilities and requirements for your Reedline needs to improve in this area for Nushell as well, but designing against a vague outside target requires us to dedicate time to reason about much more as to not box in our Nushell-internal design space. From our perspective having a locked trait is perfectly fine. |
This was indeed the case in the beginning, up until the moment the API was entirely usable from the outside (my last PR). Now I have a pretty solid idea of what the history's API should look like (in my opinion). This PR isn't the first of many others like the previous ones, it is the final changes (more important this time) to re-design part of the history API as well as its internals.
I don't think so - I personally think using
Absolutely, and I think it would benefit nushell as well by making the history API easier to use, which would result in nushell's code related to this part easier as well.
Oh ok, I thought you were the main maintainer as you merged my last PRs.
You're welcome! I really want to make reedline the best possible both for nushell and for external projects ; I think it's an amazing tool but with great limitations (most related to terminal emulators, like the famous resizing problems). Making it as usable and complete as possible would allow it to become a standard as a line editor crate, and would greatly simplify many projects that require this kind of tools. I think there are still areas where reedline can improve: auto-inserting closing symbols like quotes and parenthesis could be interesting for instance. But overall I feel like it's mostly a finished "product" which only requires some polishing and one breaking change: the one suggested in this PR. |
Because why keep anything private when we can make it customizable from the outside? The PR I'm proposing doesn't make the history system anymore complex than it already is ; in fact I'd argue it is simpler this way. IDs are black boxes and we don't make any assumption about them - that's simpler than making assumptions everywhere about what they mean and taking the risk of introducing bugs if we handle them the wrong way somewhere.
This PR will contain the updated code for It's very hard to fix it without changing the underlying file format as well as the way HistoryItemId are handled. |
Adding to this:
You will be able to see it in ReShell (which I briefly talked about in another PR). This one will be based on I'd argue it would even be a better idea to replace both |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #680 +/- ##
==========================================
+ Coverage 49.17% 49.90% +0.72%
==========================================
Files 46 46
Lines 7924 8086 +162
==========================================
+ Hits 3897 4035 +138
- Misses 4027 4051 +24
|
CI now passes 🥳 So, there is one big problem with this PR: the underlying data (text file for This means there needs to be a migration for older files - this can be done easily with a simple function call though. But this needs a migration is required. I'm currently searching for a solution to avoid this. |
Ok so I managed to resolve the problem for the SQLite database: by default the So I perform a simple check:
This way, existing databases will be migrated seamlessly and there won't be the slightest performance hit once the migration has been done. |
The PR is now complete. Some tests still need to be added to check edge cases and how the migration behaves in different scenarios, but all the changes and features are fully implemented. To sum up a little bit: the main change is the Concretely, the new system should be invisible to users, apart from the API change (which is trivial to adapt if they want to keep the same behaviour as they currently have). Both history structs now use an internal A slight initialization cost will be induced by the entropy gathering, but it should be quick enough to be unnoticeable (less than 10 µs (microseconds) average on two of my machines in release mode). Performance will be slightly lower given the additional need to handle IDs, but it should be low enough to be entirely unnoticeable. Some benchmarks could be conducted on these changes but I'm confident it won't be perceptible to end users. Safety is increased ; some long-standing bugs have been resolved in this PR, including history truncating creating IDs collision. String escaping has also been added to the SQLite backend when searching with a pattern (using a History backends still mostly behave as they used to:
Tests are mostly unchanged, they have been slightly updated to reflect the new API but are otherwise identical - so the code coverage doesn't changes and the tests are still as reliable as previously. An interesting add in the future could be to add, either as a builtin or as an example, an @fdncred and @sholderbach may I have your opinion on this? If you agree with it, I'll write the new tests to ensure everything is working perfectly (notably the migration process), and then we'll be able to merge this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to try and improve reedline and for the summary at the bottom, it helps.
I see some good work in here that on some level makes dealing with history more intuitive. I also see some things that I don't agree with and/or have questions about.
I'm really concerned about landing this on many levels. Here's some of what I'd need to feel better about in order for me to vote yes to land this PR. I'm not quite sure how to get to a good level of comfort. Suggestions welcome.
- I need to feel good about how it's NOT going to affect nushell users.
- I need to feel certain that the changes made here are going in the right direction of where we want to take reedline in the future.
- I need to feel like this is better than what it was before.
- I need to feel like our tests cover as much as possible in order to allay my concerns.
- I think we need to have some consensus of a few nushell core-team members (up/down vote) since this is such a large PR and really changes everything related to history.
// History is "full", so we delete the oldest entry first, | ||
// before adding a new one. | ||
self.entries.pop_front(); | ||
self.len_on_disk = self.len_on_disk.saturating_sub(1); | ||
let first_id = *(self.entries.first().unwrap().0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of unwrap. I'd like to see them all removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have an idea of how to get rid of it - a previous instruction checks if we have a last element. Unfortunately, IndexMap
doesn't directly implement .pop_front()
(which is a decision on their own) so we need to get the first ID somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not objecting to getting the first ID somewhere. I'm objecting to using unwrap. I'd guess that you can match on it and handle errors and use map_err if the error is a different flavor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be misleading. Using a .map_err()
in a situation we've proven to meet the condition required for unwrapping may lead other developers to think there is a case where this could fail - while it can't, that's why I used .unwrap() (which I always use very sparingly).
Writing .first().unwrap()
is more readable, less confusing, and more correct as if we don't meet this condition something's very wrong and we cannot continue processing anything because some of the code is completely invalid.
} | ||
|
||
/// this history doesn't replace entries | ||
fn replace(&mut self, h: &HistoryItem) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of replace if neither FileBacked or SqlBacked are using it? Also, seems like the name should be update vs replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should use it, but this hasn't been implemented yet. For now I've left these parts untouched.
I feel like changing their behaviour would be a bad idea right now as it would change how Nushell behaves.
The method separation is useful for clarifying behaviour, and here we can clearly see a "feature" is missing, while it wasn't obvious at all previously.
For histories that support replacing, this will be useful. I added it because reedline requires histories to replace history items. It could have been a call to update
, but it's how reedline is designed, so I preferred to not change that.
" | ||
create table history ( | ||
idx integer primary key autoincrement, | ||
id integer unique not null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the point of having an idx
primary key with autoincrement which is guaranteed to be unique and also having an id
.
Also, if you're going to add new columns, this is why more_info
/ExtraInfo
exists. It was put there to future-proof the db schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The starting point of this PR is that IDs should be random. So we have a query that tells "get the X next entries in the history starting from item Y" and we don't have a timestamp, we need an index for that.
Random IDs don't enable new features, they simplify how we think about IDs, and also solve some longstanding issues like ID clashes in filled FileBackedHistory
.
I thought about more_info
but 1) I feel like every structured information should be strictly explicit and typed and 2) we can't create an index on this if we put an index and other things together in it. Also, I wasn't sure weither Nushell was using it or not.
Also, this makes it more composable for future updates: if Nushell wants to use this field for other things, it won't have to deal with an index already being contained inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, one nice "side-effect" of this PR is that the migration process is re-usage ; meaning that if Nushell decides to store new informations in the future, it will be trivial to add new data by simply re-using the same migration codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The starting point of this PR is that IDs should be random. So we have a query that tells "get the X next entries in the history starting from item Y" and we don't have a timestamp, we need an index for that.
Still don't get it. It seems odd to have a primary key that we essentially don't use because we have a different "primary key" not called a primary key that we're using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We absolutely use the id
column: that's the one used for inserting, updating, removing, and fetching indexes. Everything is based on that column, idx
is only used for a specific situation: when we want to query >1 items from an existing one in either direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, good. maybe it's the name that i'm hung up on then. id and idx are very similar. maybe query_idx
or something similar. it's just not intuitive having such similar names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah renaming the column makes sense, I'll do that.
This PR is designed to not affect everyone in a perceptible manner. All the changes are hunder the hood. The only difference is the API revamp which needs a little change (concretely, we're talking propably less than a minute of code update, that's really a detail).
That entirely depends on your vision of the project, but I feel like reedline would be the best at being both a reference implementation and solid ground for Nushell AND being a nice line editing library to use in custom shells.
I feel it is way better, but that's a personal take. We now have a clearer ID system, less error-prone, a cleaner API.
Absolutely, which is why I said I would write the new tests if this PR is approved (what I mean by that is if you're okay with merging the PR at the condition of it being correctly tested, I'll write the tests).
Well, yes and no actually. It mostly works like previously, it's mostly under-the-hood changes. Especially the SQLite backend, which Nushell uses, uses an almost identical system: you just have one more column for indexes, and that's all. But I understand your concern and given reedline's under the "reponsability" of Nushell it would be a good thing to get an approval from Nushell's team. |
Clearly, this is a breaking change and changes a ton of guts. The history is all dealt with differently now that you've added a new index column. You try to maintain backwards compatibility as much as possible, which is appreciated, but any reedline user will have to deal with your changes if we accept this PR. There's no getting around it. I understand your point though. |
This PR is designed to make changes seamless. Everything that worked previously still works now. If we list all the things that change for users, it is:
No one has implemented a custom But I think it is such a small number of users with such a minimal impact that this isn't a major problem ; it would be a bigger problem if Nushell used the text file format as some tools may have relied on it ; but fortunately it uses the SQLite database instead. |
@ClementNerma we are happy you are taking the time to review and understand our code better... It is going to take our team some time to digest and understand the ramifications of your changes as well as have some internal discussions about the work you have done so that we can move forward and give you some feedback. I will review your code as well and get a better understanding of it... My first question for you is this ? In our current history implementation by default we save the same entry numerous times. For example if I type version
ls
version
ls
version
ls
history
ls
history
ls We will save all of those ls entries as well as all of the history entries and version entries. I would like to have one ls and one history and one version across my entire history file... How easy would it be to do this with your current solution just out of curiosity ? Thanks for your work --- we need folks like yourself who are willing to take on the work and stay with it over time... Our concern is that you come in and make a bunch of changes which we are not totally able to understand the motivation for them and then we are responsible over the longer term maintaining that code... Hope you understand and are willing to work with us on the reedline communities behalf. |
Of course! It's not a trivial change after all.
The SQLite backend can just perform a full-string search - check if the The file backend would require to perform a full search on the history. A way to avoid that could be to construct an in-memory Side note: I looked at a good chunk of reedline's code and saw it was very intelligently optimized in many places. So it seems like performance is really an important part of its design. In which case, adding a third (optional) backend for the history could be greatly beneficial: basing a new one on If we go this way, it will be the subject of another PR, but I think it's good to keep that in mind.
It's a pleasure to work on this project! Rest assured, if this PR is merged, I will still be here to support it if new features need to be added for instance. I have three main motivations with this PR:
Also, |
@ClementNerma I reviewed your PR and now have a better understanding of the work you have done... First of all let me say that @sholderbach is the lead person on reedline and @fdncred also knows a lot about reedline but will defer to @sholderbach for final approval on anything major like this PR. I am just another member of the nushell core team but with no expertise on reedline... As I stated in my previous comments we are happy to have folks like yourself who are willing to contribute to the reedline community but you should understand that we are incredibly stretched in our ability to manage all of the work that we have to deal with both in the reedline community and nushell community as a whole. With that said the chance of this PR landing is low. I am not going to close it because that would be up to @sholderbach or @fdncred to make the final call but in order for you to be able to be a valuable member of the community we are going to have to take a different approach. We want you to help us maintain and make reedline better over the long term so don't take this the wrong way... Our team is much happier with smaller more focused PRs with an end design goal that has been well thought out and discussed prior... So I would suggest entering into more discussion in the reedline discord channel with your ideas and possibly maintaining reedline by assisting with and commenting on PRs as they come in... Overall I like your idea of having swappable History Implementations on top of a well defined History API but in order to get there its going to take a crawl, walk, run approach which is how we like to phrase it... Also I personally do not like having the HistoryItem IDs locked down and fixed --- The History Items IDs should be created dynamically each time Reedline fires up... And storing the ItemID's in the history.txt file is just not good IMHO. It bloats the size of the history.txt file dramatically and makes it almost impossible to post process the history.txt file if someone wants to use it for other purposes. Having the very clean, nice, simple history.txt file with just the command is the way to go... |
I understand that, after all its just a FOSS project, not a paid product.
Seems like a good idea!
Allow to me disagree on that. IDs are used to differentiate items, how can you re-generate them on-the-fly? Don't forget entries can be inserted by another concurrent process interacting with the same history file / database.
I totally agree with that, but it's hard to keep it simple. How do you identify uniquely a set of entries created during runtime, knowing that you may later have to synchronize the file to disk, discover there are new entries, and insert them at a previous position in the list? We need to have something that is entirely unique to each entry, which is not possible with a plain file. If for instance you have one |
I've thought about this PR a bit lately, and thought that we could do this another way: What if For instance, let's say we keep the same file format as before. We store entries one by one, and at runtime we associate each entry with an ID (could be an incremented counter). This could also be done for the SQL database (even though I'd like to keep things the way I did them in this PR). This way we wouldn't have to change the format of the text-based format: simple lines with command text, nothing more. What do you think about that? Would it make a merge considerable? EDIT: I've updated my branch accordingly, tests all pass. This PR now uses the exact same file format than before. |
@ClementNerma I am going to give you some preliminary ideas here --- this is just a starting point so please don't start coding yet --- lets just talk about a design going forward... You might want to open up an issue and we can talk more about this there... Here are my preliminary thoughts on this... History - Menus - Completer can be referred to as HMC components The Big Idea concept is a Complete Refactor / Rewrite of Reedline without having History / Menus / Completer (HMC)
Yes --- I like this idea / most importantly greatly simplify the storage... I want the whole history API to be a drop in replacement to Reedline in the future --- that is the end futuristic goal... So imagine developers coming along and having the ability to write their own History implementation and it would Now moving on my "grand vision" for Reedline is to have the ability to have a simple Reedline with
Based on an API system that allowed you to Shut OFF History / Menus / Completers This is for example purposes only --- most people would not want to shut it off but it shows And or drop in your own History / Menus / Completers based on a well thought out API that enabled gluing on to a core Engine that accepted these components... That is where I would like to end up... The History API could be step one --- but it would entail lots of design to be able to pull it out of the Engine You could create another Reedline-History crate in your own repo to do the experimentation or something to that effect... Please start an issue so we can continue the discussion there.... Not sure if you are up for this project --- but I think it could be a cool longer term effort that would take a while to implement... Open to all sorts of ideas / strategies / criticisms of this concept --- I am just throwing ideas out there to see what people think... Nu-cli should eventually be able to accept the alternative Reedline as it is being developed and then eventually Just so you can see my ideas.... |
Just writing stuff down here as an example... If you look at the basic example in Reedline... let sig = line_editor.read_line(&prompt)?; Why does the read_line method have to take a prompt ---- I never understood that design choice ? It should just work like this... let sig = line_editor.read_line()?; |
@stormasm That would require the prompt to be stored inside the Reedline object, so mutating it from outside would become a bit tougher. It would definitely be convenient if you don't want to mutate the prompt, though. Edit: Actually, never mind, you could just do |
@ysthakur thank you ! for thinking about the prompt topic --- and updating the idea 😄 |
@ysthakur GIven that Yes, I think the current prompt API is not working to make that easy (you need to use interior mutability, unfortunately), but it probably is "the right thing to do". |
This PR reworks the history API, and brings several improvements:
History
trait, adding and replacing entries are now two distinct methods, reducing confusionHistoryItem
values always have an ID, reducing confusionHistory
values are responsible for generating IDs for new entriesFileBackedHistory
now longer has duplicate IDs in the historyFileBackedHistory
as well asSqliteBackedHistory
now generate IDs randomly, usingrand
'sSmallRng
. This RNG is created from a system-provided source of entropy during the history creation, and is designed to generate numbers quickly.This PR induces two main downsides:
First, creating a
FileBackedHistory
orSqliteBackedHistory
may now cause a slight delay for gathering entropy. I tested on two machines and got less than 1s for 1 million RNG creation, but there may be edge cases where it takes longer).Also, the
History
API contains breaking changes. In the current context, I don't consider this to be a large problem as no one seems to use this API anyway - up until two days ago, it was impossible to implement your ownHistory
due toHistoryItemId
not being constructable from the outside.Finally, a new dependency is pulled:
rand
. It only uses thesmall_rng
andgetrandom
features and so it's pretty minimal, but it's still another dependency. It is used for non-clashing history item IDs generation.Overall, this PR aims to make the API easier to write, read and use, while also reducing confusion in how the history works. It also removes what could be a bug in the future in
FileBackedHistory
which previously could have duplicate IDs for entries.