Skip to content

Commit

Permalink
updated documentation and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
rrevenantt committed Oct 25, 2020
1 parent be1ccd3 commit 57775f2
Show file tree
Hide file tree
Showing 7 changed files with 77 additions and 24 deletions.
18 changes: 5 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![docs](https://docs.rs/antlr-rust/badge.svg)](https://docs.rs/antlr-rust)
[![Crate](https://img.shields.io/crates/v/antlr_rust.svg)](https://crates.io/crates/antlr_rust)

ANTLR4 runtime for Rust programming language
[ANTLR4](https://github.com/antlr/antlr4) runtime for Rust programming language.

Tool(generator) part is currently located in rust-target branch of my antlr4 fork [rrevenantt/antlr4/tree/rust-target](https://github.com/rrevenantt/antlr4/tree/rust-target)
Latest version is automatically built to [releases](https://github.com/rrevenantt/antlr4rust/releases) on this repository.
Expand All @@ -13,9 +13,6 @@ and [tests/my_tests.rs](tests/my_test.rs) for actual usage examples

### Implementation status

Everything is implemented, "business" logic is quite stable and well tested, but user facing
API is not very robust yet and very likely will have some changes.

For now development is going on in this repository
but eventually it will be merged to main ANTLR4 repo

Expand All @@ -40,7 +37,7 @@ Can be done after merge:
- run rustfmt on generated parser
###### Long term improvements
- generate enum for labeled alternatives without redundant `Error` option
- option to generate fields instead of getters by default
- option to generate fields instead of getters by default and make visiting based on fields
- make tree generic over pointer type and allow tree nodes to arena.
(requires GAT, otherwise it would be a problem for users that want ownership for parse tree)
- support stable rust
Expand Down Expand Up @@ -84,12 +81,6 @@ I.e. for `MultContext` struct will contain `a` and `b` fields containing child s
`op` field with `TerminalNode` type which corresponds to individual `Token`.
It also is possible to disable generic parse tree creation to keep only selected children via
`parser.build_parse_trees = false`.

### Key properties
- Supports full zero-copy parsing including byte parsers
(you should be able to write zero-copy serde deserializers).
- Supports downcasting in places where type is not known statically(trait objects and embedded action)
- Listener and

### Differences with Java
Although Rust runtime API has been made as close as possible to Java,
Expand All @@ -106,11 +97,12 @@ there are quite some differences because Rust is not an OOP language and is much
If you need exactly the same behavior, use `[u32]` based `InputStream`, or implement custom `CharStream`.
- In actions you have to escape `'` in rust lifetimes with `\ ` because ANTLR considers them as strings, e.g. `Struct<\'lifetime>`
- To make custom tokens you should use `@tokenfactory` custom action, instead of usual `TokenLabelType` parser option.
In Rust target TokenFactory is main customisation interface that allows to specify input type of token type.
ANTLR parser options can accept only single identifiers while Rust target needs know about lifetime as well.
Also in Rust target `TokenFactory` is the way to specify token type. As example you can see [CSV](grammars/CSV.g4) test grammar.
- All rule context variables (rule argument or rule return) should implement `Default + Clone`.

### Unsafe
Currently, unsafe is used only to cast from trait object back to original type
Currently, unsafe is used only for downcasting (through another crate)
and to update data inside Rc via `get_mut_unchecked`(returned mutable reference is used immediately and not stored anywhere)

### Versioning
Expand Down
6 changes: 4 additions & 2 deletions src/input_stream.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,13 @@ pub type CodePoint8BitCharStream<'a> = InputStream<&'a [u8]>;
pub type CodePoint16BitCharStream<'a> = InputStream<&'a [u16]>;
pub type CodePoint32BitCharStream<'a> = InputStream<&'a [u32]>;

impl<'a, T> CharStream<&'a [T]> for InputStream<&'a [T]>
impl<'a, T> CharStream<Cow<'a, [T]>> for InputStream<&'a [T]>
where
[T]: InputData,
{
fn get_text(&self, a: isize, b: isize) -> &'a [T] { self.get_text_inner(a, b).into() }
fn get_text(&self, a: isize, b: isize) -> Cow<'a, [T]> {
Cow::Borrowed(self.get_text_inner(a, b))
}
}

impl<'a, T> CharStream<String> for InputStream<&'a [T]>
Expand Down
60 changes: 53 additions & 7 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,73 @@
#![warn(trivial_numeric_casts)]
//! # Antlr4 runtime
//!
//! **This is pre-release version.**
//! **Some small breaking changes are still possible, although none is currently planned**
//!
//! This is a Rust runtime for [ANTLR4] parser generator.
//! It is required to use parsers and lexers generated by [ANTLR4] parser generator
//!
//! This documentation refers to particular api used by generated parsers,lexers and syntax trees.
//!
//! For info on how to generate parser please refer to:
//! For info on what is [ANTLR4] and how to generate parser please refer to:
//! - [ANTLR4] main repository
//! - [README](https://github.com/rrevenantt/antlr4rust/blob/master/README.md) for Rust target
//! - [README] for Rust target
//!
//! [ANTLR4]: https://github.com/antlr/antlr4
//! [README]: https://github.com/rrevenantt/antlr4rust/blob/master/README.md
//!
//! ### Customization
//!
//! All input and output can be customized and optimized for particular usecase by implementing
//! related trait. Each of them already has different implementations that should be enough for most cases.
//! For more details see docs for corresponding trait and containing module.
//!
//! Currently available are:
//! - [`CharStream`] - Lexer input, stream of char values with slicing support
//! - [`TokenFactory`] - How lexer creates tokens.
//! - [`Token`] - Element of [`TokenStream`]
//! - [`TokenStream`] - Parser input, created from lexer or other token source.
//! - [`ParserRuleContext`] - Node of created syntax tree.
//!
//! ### Zero-copy and lifetimes
//!
//! This library supports full zero-copy parsing. To allow this
//! `'input` lifetime is used everywhere inside.
//! `'input` lifetime is used everywhere inside to refer to data borrowed by parser.
//! Besides reference to input it also can be [`TokenFactory`] if it returns references to tokens.
//! See [`ArenaFactory`] as an example of such behavior. It allocates tokens in [`Arena`](typed_arena::Arena) and return references.
//!
//! Using generated parse tree you should be careful to not require longer lifetime after the parsing.
//! If that's the case you will likely get "does not live long enough" error on the input string,
//! despite actual lifetime conflict is happening much later
//!
//! Rust infers lifetimes from the end. It means that if something requires longer lifetime
//! when you are using generated tree, then you will get error
//! If you need to generate owned versions of parse tree or you want simpler usage,
//! you can opt out zero-copy by requiring `'input` to be static. In this case it is easier to also use
//! types that contains "owned" in their name or constructor function like `OwningTokenFactory`
//! or `InputStream::new_owned()`
//!
//! ### Visitors and Listeners
//!
//! Currently visitors and listeners must outlive `'input`.
//! In general this means that visitor has either `'static` or `'input` lifetime.
//! Thus you can retrieve references to parsed data from syntax tree to listener/visitor. (as example you can see visitor test)
//!
//! You can try to give visitor outside references but in this case
//! if those references do not outlive `'input` you will get very confusing error messages,
//! so this is not recommended.
//!
//! ### Downcasting
//!
//! Rule context trait object support downcasting even for zero-copy case.
//! Also generic types(currently these are `H:ErrorStrategy` and `I:`[`TokenStream`]) that you can
//! access in generated parser from embedded actions also can be downcasted to concrete types.
//! To do it `TidExt::downcast_*` extension methods should be used.
//!
//! [`CharStream`]: crate::char_stream::CharStream
//! [`TokenFactory`]: crate::token_factory::TokenFactory
//! [`ArenaFactory`]: crate::token_factory::ArenaFactory
//! [`Token`]: crate::token::Token
//! [`TokenStream`]: crate::token_stream::TokenStream
//! [`ParserRuleContext`]: crate::parser_rule_context::ParserRuleContext
#[macro_use]
extern crate lazy_static;
Expand Down Expand Up @@ -112,8 +159,7 @@ pub mod token;
pub mod trees;
mod utils;
//pub mod tokenstream_rewriter_test;
#[doc(hidden)]
pub mod atn_type;
mod atn_type;
pub mod rule_context;
pub mod vocabulary;

Expand Down
6 changes: 4 additions & 2 deletions src/parser_rule_context.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//!
//!
//!
//!
use std::any::{type_name, Any};
use std::borrow::{Borrow, BorrowMut};
use std::cell::{Ref, RefCell, RefMut};
Expand All @@ -20,8 +24,6 @@ use crate::tree::{
};
use better_any::{Tid, TidAble, TidExt};

// use crate::utils::IndexIter;

pub trait ParserRuleContext<'input>:
ParseTree<'input> + RuleContext<'input> + Debug + Tid<'input>
{
Expand Down
9 changes: 9 additions & 0 deletions src/token_factory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,15 @@ pub type ArenaCommonFactory<'a> = ArenaFactory<'a, CommonTokenFactory, CommonTok

/// This is a wrapper for Token factory that allows to allocate tokens in separate arena.
/// It can allow to significantly improve performance by passing Tokens by references everywhere.
///
/// Requires `&'a Tok: Default` bound to produce invalid tokens, which can be easily implemented
/// like this:
/// ```text
/// lazy_static!{ static ref INVALID_TOKEN:CustomToken = ... }
/// impl Default for &'_ CustomToken {
/// fn default() -> Self { &**INVALID_TOKEN }
/// }
/// ```
// Box is used here because it is almost always should be used for token factory
#[derive(Tid)]
pub struct ArenaFactory<'input, TF, T>
Expand Down
1 change: 1 addition & 0 deletions tests/gen/labelsparser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1031,6 +1031,7 @@ where
&mut recog.base,
)))?,
}

let tmp = recog.input.lt(-1).cloned();
recog.ctx.as_ref().unwrap().set_stop(tmp);
recog.base.set_state(36);
Expand Down
1 change: 1 addition & 0 deletions tests/gen/simplelrparser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,7 @@ where
recog.base.set_state(7);
recog.base.match_token(ID, &mut recog.err_handler)?;
}

let tmp = recog.input.lt(-1).cloned();
recog.ctx.as_ref().unwrap().set_stop(tmp);
recog.base.set_state(13);
Expand Down

0 comments on commit 57775f2

Please sign in to comment.