Parser cannot handle enum variants whose name contains spaces, within larger structs #9

coriolinus · 2020-12-08T14:51:03Z

#[derive(parse_display::FromStr)]
enum Instruction {
    #[display("turn off")]
    TurnOff,
    #[display("toggle")]
    Toggle,
}

//succeeds
#[test]
fn parses_toggle() {
    "toggle".parse::<Instruction>().unwrap();
}

//succeeds
#[test]
fn parses_turn_off() {
    "turn off".parse::<Instruction>().unwrap();
}

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    value: u64,
}

//succeeds
#[test]
fn parses_contains_toggle() {
    "toggle 123".parse::<ContainsInstruction>().unwrap();
}

//fails
#[test]
fn parses_contains_turn_on() {
    "turn off 123".parse::<ContainsInstruction>().unwrap();
}

This one was a bear. It wasn't bad in itself, but it turns out that `parse-display` has a bug which meant that it just cannot handle this input: <frozenlib/parse-display#9>. Working around that issue meant breaking out the big guns (LALRPOP), which while fun, does take some extra time. That said, I'm pretty happy about `trait ManipulateLight`; it feels like a _much_ more elegant solution to the problem of abstracting over the fundamental type of the map than what I'd previously had. I would not have been likely to come up with it if I hadn't known from the start that I'd need varying map tile types, but since I did, it was pretty nice to be able to just write the right thing. The trait bounds in `Command::apply` are definitely past what I could do in 2015, though! Also, this reflects better on 2019 me than current me, but I'm pretty happy with how well the Map and Point classes applied to this problem. I realized partway into implementation that overflow was a possibility given that there are 300 instructions and only 256 values in a u8, so it was nice to see that everything worked appropriately. This is the first exercise of 2015 that took substantially longer to run in debug than release mode. At a guess, this has to do with the parser, but I'm not willing to invest the effort to profile it right now.

frozenlib · 2020-12-08T16:59:31Z

This is by design.

Since the format is specified as #[display("{instruction} {value}")], ContainsInstruction::from_str() works as follows.

Separate the input string by the first space.
Pass the string before the space to Instruction::from_str().
Pass the string after the space to u64::from_str().

Because of this behavior, it will not work properly if the string of fields contains characters that separate the fields.

If there is only one field that may contain a space, it is possible to separate them by the last space by adding #[from_str(regex = ".*")] as follows.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    #[from_str(regex = ".*")]
    instruction: Instruction,
    value: u64,
}

For more complex cases, I think it is better to use a function that converts the beginning of the string to a value, such as a funtion that return IResult in nom crate, instead of a function that converts the entire string to a value, such as FromStr::from_str().

frozenlib · 2020-12-08T17:15:21Z

Instead of specifying a field that may contain spaces, you can also specify a field that is guaranteed not to contain spaces.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    #[from_str(regex = r"[^\s]+")]
    value: u64,
}

coriolinus · 2020-12-08T17:48:41Z

Thank you for the quick response. I'm not familiar with the internals of this library: would it be practical to re-implement ContainsInstruction::parse with the following algorithm?

defer to Instruction::parse
- match the literal "turn off" or the literal toggle, returning the rest of the string via an internal API
match the space literal
defer to u64::parse
err if characters remain in the input string

In other words, construct a and use internally a parser which converts the beginning of a string and returns the unused parts, but present the whole-string parsing interface to the end user.

Obviously that's a pretty big feature request, but it would solve this issue. AFAICT it should be possible for regex-based sub-parsers to work properly under this design as well.

In the meantime, would you be willing to add documentation that this is a known limitation of the library?

frozenlib · 2020-12-09T03:50:51Z

Certainly, I think the above algorithm would be able to parse the string more accurately.

If I implement it, I would like to use a standard trait rather than the internal API to be able to use this algorithm even if struct contains fields that do not use parse-display. (e.g. std::u8)

a parser which converts the beginning of a string and returns the unused parts

Do you have any idea of existing standard trait that can be used for such purposes?

frozenlib · 2020-12-09T04:26:37Z

Adding the following features is also an option.

Adding #[from_str(regex = auto)] to instruction field will have the same effect as if you wrote #[from_str(regex = "turn off|toggle")].

Once Rust's specialization feature is stabilized, it may be able to omit #[from_str(regex = auto)].

This method has the following pros and cons.

Pros

Fewer modifications are needed
This feature can decide on char range to consume considering the backward string.

Cons

Use the same regular expression twice in the outer structure and the inner structure, and thus the performance may be worse.
Difficult to work with other parser combinators(e.g. nom), unlike Parser cannot handle enum variants whose name contains spaces, within larger structs #9 (comment)

coriolinus · 2020-12-09T12:47:34Z

Do you have any idea of existing standard trait that can be used for such purposes?

I don't know of any, unfortunately. AFAIK the general form would be something like

// returns `(parse result, unused portion of input)`
fn parse<'a, T: FromStr>(s: &'a str) -> (Result<T, <T as FromStr>::Err>, &'a str);

[edit] Note that in this function, the FromStr implementation has to greedily capture as many characters as it needs from the start of the string. Very likely, a different trait would suit better there.

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage. This feature, assuming it causes the test cases above to pass, plus appropriate mentions in the documentation, would be sufficient in my mind to close out this issue.

frozenlib · 2020-12-09T14:02:01Z

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

This feature is not yet implemented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage.

I think so too, but I can't think of a way to allow omit #[from_str(regex = auto)].

This is the same for the algorithm described in #9 (comment).

Since proc-macro cannot determine whether the field type implements the trait or not, you need to annotate whether use FromStr or the different trait described in #9 (comment).

frozenlib mentioned this issue Dec 3, 2023

Derive a parser for delimited data #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

coriolinus commented Dec 8, 2020

frozenlib commented Dec 8, 2020 •

edited

Loading

frozenlib commented Dec 8, 2020

coriolinus commented Dec 8, 2020

frozenlib commented Dec 9, 2020

frozenlib commented Dec 9, 2020

coriolinus commented Dec 9, 2020 •

edited

Loading

frozenlib commented Dec 9, 2020

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

Comments

coriolinus commented Dec 8, 2020

frozenlib commented Dec 8, 2020 • edited Loading

frozenlib commented Dec 8, 2020

coriolinus commented Dec 8, 2020

frozenlib commented Dec 9, 2020

frozenlib commented Dec 9, 2020

coriolinus commented Dec 9, 2020 • edited Loading

frozenlib commented Dec 9, 2020

frozenlib commented Dec 8, 2020 •

edited

Loading

coriolinus commented Dec 9, 2020 •

edited

Loading