Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an AST #17

Open
hellux opened this issue Feb 22, 2023 · 10 comments
Open

Add an AST #17

hellux opened this issue Feb 22, 2023 · 10 comments

Comments

@hellux
Copy link
Owner

hellux commented Feb 22, 2023

It is often useful to work with an AST rather than a sequence of events. We could implement an optional module that provides AST objects that correspond to the AST defined by the djot spec (https://github.com/jgm/djot.js/blob/main/src/ast.ts).

It would be useful to be able to create it from events, and create events from the AST so you can e.g. parse events -> create ast -> modify ast -> create events -> render events.

It could also be useful to read/write the AST from/to e.g. json. We may then be able to read/write ASTs identically to the reference implementation. It might also be useful in tests to match against JSON produced by the reference implementation. We should be able to automatically implement the serialization/deserialization using serde, and then the downstream client can use any serde-compatible format.

A quick sketch of what it could look like:

#[cfg(feature = "ast")]
pub mod ast {
    use super::Event;

    use std::collections::HashMap as Map;

    #[cfg(feature = "serde")]
    use serde::{Deserialize, Serialize};

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Doc {
        children: Vec<Block>,
        references: Map<String, Reference>,
        footnotes: Map<String, Reference>,
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Reference {
        // todo
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Footnote {
        // todo
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub struct Block {
        kind: BlockKind,
        children: Vec<Block>,
    }

    #[cfg_attr(feature = "serde", derive(Deserialize, Serialize))]
    pub enum BlockKind {
        Para,
        Heading { level: usize },
        // todo
    }

    pub struct Iter<'a> {
        // todo
        _s: std::marker::PhantomData<&'a ()>,
    }

    impl<'a> Iterator for Iter<'a> {
        type Item = Event<'a>;

        fn next(&mut self) -> Option<Self::Item> {
            todo!()
        }
    }

    #[derive(Debug)]
    pub enum Error {
        EventNotEnded,
        UnexpectedStart,
        BlockInsideLeaf,
    }

    impl<'s> FromIterator<Event<'s>> for Result<Doc, Error> {
        fn from_iter<I: IntoIterator<Item = Event<'s>>>(events: I) -> Self {
            todo!()
        }
    }

    impl<'a> IntoIterator for &'a Doc {
        type Item = Event<'a>;
        type IntoIter = Iter<'a>;

        fn into_iter(self) -> Self::IntoIter {
            todo!()
        }
    }
}

clientside:

let src = "# heading

para";

let events = jotdown::Parser::new(src);
let ast = events.collect::<Result<jotdown::ast::Doc, _>>().unwrap();
let json = serde_json::to_string(&ast);

assert_eq!(
    json,
    r##"
    {
      "tag": "doc",
      "references": {},
      "footnotes": {},
      "children": [
        {
          "tag": "para",
          "children": [
            {
              "tag": "str",
              "text": "para"
            }
          ]
        }
      ]
    }
    "##
);
@clbarnes
Copy link
Contributor

clbarnes commented Apr 26, 2023

I was going to suggest basing such an AST on the output of typify for the json-schema generated from typescript definitions in djot.js, but typify doesn't parse it.

Having an internal AST like this, as well as being able to consume and produce it in JSON form, would allow the use of jotdown as a library to write filters as standalone binaries:

djot -t json mydoc.dj | myrustbinary | djot -f json > index.html

@hellux
Copy link
Owner Author

hellux commented Apr 28, 2023

I was going to suggest basing such an AST on the output of typify for the json-schema generated from typescript definitions in djot.js, but typify doesn't parse it.

It would be nice if the AST types could be generated automatically. The only work needed would be to convert between AST and events.

Having an internal AST like this, as well as being able to consume and produce it in JSON form, would allow the use of jotdown as a library to write filters as standalone binaries:

djot -t json mydoc.dj | myrustbinary | djot -f json > index.html

If one wants to manipulate an AST, I guess jotdown (which is mainly a parser) is not really needed here. Just need some AST types that can be serialized and deserialized.

@bdarcus
Copy link

bdarcus commented Jul 20, 2023

I tried two additional conversion tools:

  1. quicktype, both with typescript and json schema input
  2. typester, which isn't intended to be used in production

None of them parsed (or at least completed), so am wondering if there's something funky about that ast definition?

If one wants to manipulate an AST, I guess jotdown (which is mainly a parser) is not really needed here. Just need some AST types that can be serialized and deserialized.

So in a scenario like this, jotdown would just be able to output the same AST as djot.js, and any filtering would be done independently?

I'm wanting to implement citation and bibliography processing using djot for this project I'm working on, once John adds supports for citations, so just wondering how that might work.

@hellux
Copy link
Owner Author

hellux commented Jul 20, 2023 via email

@bdarcus
Copy link

bdarcus commented Jul 20, 2023

I just posted a linked issue over there.

@adaszko
Copy link

adaszko commented Jul 30, 2024

Hi, I'm curious what's the current status of this issue. Is the path to implementation to recover AST from a stream of events or automatic generation from a schema definition still is the way forward?

@hellux
Copy link
Owner Author

hellux commented Jul 31, 2024 via email

@bdarcus
Copy link

bdarcus commented Jul 31, 2024

I concluded that the schema automatically generated from the typescript code is less than ideal. Pretty sure that's why the conversion tools don't work correctly.

FWIW, I've used https://docs.rs/schemars/latest/schemars/ in a project of mine, and the schemas it produces seem much better.

@hellux
Copy link
Owner Author

hellux commented Aug 1, 2024 via email

@clbarnes
Copy link
Contributor

I took a stab at implementing the AST: https://github.com/clbarnes/djot_ast

The code is a bit gross in order to maximise compatibility with the typescript impl. Most of the grossness is, at least, confined to serde stuff so shouldn't impact actual use of the AST. Integrating it with jotdown events is a task I haven't started yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants