Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializable VMs #3

Open
7 of 10 tasks
jamespfennell opened this issue Jun 10, 2023 · 2 comments
Open
7 of 10 tasks

Serializable VMs #3

jamespfennell opened this issue Jun 10, 2023 · 2 comments

Comments

@jamespfennell
Copy link
Owner

jamespfennell commented Jun 10, 2023

Task list for making the serializable VMs feature complete:

  • Re-implement dynamic memory allocation so that it doesn't require the VM to have custom functionality (specifically dynamic virtual index resolvers). This is more consistent with Texcraft's emerging philosophy that the complexity of individual features should be scoped to that feature and not the whole crate.
  • Support all types of commands (macros, aliases, variables defined using \countdef)
  • Make the standard library state serializable
  • Add the \dump primitive
  • Add the ability to load a "format file" in the texcraft binary
  • Support serializing when a group is active
  • Benchmarking and performance improvements
  • Error handling in serding the commands map (and maybe VM too if there are error cases)
  • Add a trait that associates to a state a commands map. VM's for this state can then implement serde::Deserialize
  • Documentation
jamespfennell added a commit that referenced this issue Jun 10, 2023
This is the first commit in the serializable VMs project, which is biggish
project to support making the Texcraft VM serializable. The main reason why
this isn't trivial in the usual Rust way is because the VM contains function
pointers (of primitives) and thus you can't trivially serde the VM.

This commit adds (or fixes) some of the initial infrastructure that was
added (e.g., the command key type) and adds support for serializing primitives
only. Support for other commands will come later. One of the main additions
here is a unit testing utility for easy unit testing serding stuff.
jamespfennell added a commit that referenced this issue Jun 12, 2023
This is to make the dynamic memory allocation system work with serializable
VMs.
jamespfennell added a commit that referenced this issue Jun 15, 2023
One bug with this (de)serializer is that the registers array overflows the
stack, I'm guessing becuase the call stack gets bigger than with serde_json.
I put the array behind a Box which is unfortunate because there's a small
runtime cost to that. Maybe in the future there will be a better solution.

I also put all VMs behind boxes. I thought this may have solved the stack
overflow but it didn't. Nonetheless it's worth doing because the VM can be
big.

After this commit all serde unit tests run both with json and message
pack.
jamespfennell added a commit that referenced this issue Jun 21, 2023
This sneaked in in the previous commit.
@jamespfennell
Copy link
Owner Author

For benchmarking, it would be awesome to time (de)serializing a VM that has loaded the Plain TeX format. It may take a lot of work though before Texlang can read the Plain TeX format.

jamespfennell added a commit that referenced this issue Jun 21, 2023
To do this we need to support (de)serializing the save stack. This is a bit
of a pain as the save stack contains function pointers that need to be
cross-referenced with what's in the command map. In the end the code touches
the commands map, the variables API, and the serde module, so there are
more pub(crate)s than I would like. I tried to refactor things to minimize
cross module deps.

But it works!
@jamespfennell
Copy link
Owner Author

Some perf improvement ideas I had:

  • In general support serializing and deserializing iterators. In a bunch of places I create a new data structure (like a new instance of a map) and then serialize that. We could probably skip this intermediate phase, making serding faster and less memory intensive. Serialization should be trivial; figuring out deserializing may be tricky. Given serde's API it will be impossible to actually deserialize to an iterator.

  • When serializing the cat code map, don't serialize values that are the same as e.g. INITEX. When deserializing, initialize to the INITEX defaults and then apply the differences on top.

  • Serialize cat codes as integers, irrespective of the format.

  • For registers, serialize continuous runs of 0s as something like 0<number of zeros>. In many serde contexts it is expected that registers mostly have their default values so this will be much faster and space efficient. I had a fancier idea of dividing a vector into blocks of the form <number of non-zero values><number of zeros><non-zero values> which I think is provably more space efficient in all cases. If a block starts with 0, it means the vector is over.

  • In the CS name interner, the ends vector is increasing. We should serialize the diffs between adjacent elements instead of the elements themselves. For formats with varint encoding, this will be more space efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant