Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

Open
coriolinus opened this issue Dec 8, 2020 · 7 comments

Comments

@coriolinus
Copy link

#[derive(parse_display::FromStr)]
enum Instruction {
    #[display("turn off")]
    TurnOff,
    #[display("toggle")]
    Toggle,
}

//succeeds
#[test]
fn parses_toggle() {
    "toggle".parse::<Instruction>().unwrap();
}

//succeeds
#[test]
fn parses_turn_off() {
    "turn off".parse::<Instruction>().unwrap();
}

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    value: u64,
}

//succeeds
#[test]
fn parses_contains_toggle() {
    "toggle 123".parse::<ContainsInstruction>().unwrap();
}

//fails
#[test]
fn parses_contains_turn_on() {
    "turn off 123".parse::<ContainsInstruction>().unwrap();
}
coriolinus added a commit to coriolinus/adventofcode-2015 that referenced this issue Dec 8, 2020
This one was a bear. It wasn't bad in itself, but it turns out that
`parse-display` has a bug which meant that it just cannot handle this input:
<frozenlib/parse-display#9>. Working around
that issue meant breaking out the big guns (LALRPOP), which while fun,
does take some extra time.

That said, I'm pretty happy about `trait ManipulateLight`; it feels
like a _much_ more elegant solution to the problem of abstracting
over the fundamental type of the map than what I'd previously had.
I would not have been likely to come up with it if I hadn't known
from the start that I'd need varying map tile types, but since I did,
it was pretty nice to be able to just write the right thing.

The trait bounds in `Command::apply` are definitely past what I could
do in 2015, though!

Also, this reflects better on 2019 me than current me, but I'm pretty
happy with how well the Map and Point classes applied to this problem.

I realized partway into implementation that overflow was a possibility
given that there are 300 instructions and only 256 values in a u8,
so it was nice to see that everything worked appropriately.

This is the first exercise of 2015 that took substantially longer to
run in debug than release mode. At a guess, this has to do with the
parser, but I'm not willing to invest the effort to profile it right now.
@frozenlib
Copy link
Owner

frozenlib commented Dec 8, 2020

This is by design.

Since the format is specified as #[display("{instruction} {value}")], ContainsInstruction::from_str() works as follows.

  • Separate the input string by the first space.
  • Pass the string before the space to Instruction::from_str().
  • Pass the string after the space to u64::from_str().

Because of this behavior, it will not work properly if the string of fields contains characters that separate the fields.

If there is only one field that may contain a space, it is possible to separate them by the last space by adding #[from_str(regex = ".*")] as follows.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    #[from_str(regex = ".*")]
    instruction: Instruction,
    value: u64,
}

For more complex cases, I think it is better to use a function that converts the beginning of the string to a value, such as a funtion that return IResult in nom crate, instead of a function that converts the entire string to a value, such as FromStr::from_str().

@frozenlib
Copy link
Owner

Instead of specifying a field that may contain spaces, you can also specify a field that is guaranteed not to contain spaces.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    #[from_str(regex = r"[^\s]+")]
    value: u64,
}

@coriolinus
Copy link
Author

Thank you for the quick response. I'm not familiar with the internals of this library: would it be practical to re-implement ContainsInstruction::parse with the following algorithm?

  • defer to Instruction::parse
    • match the literal "turn off" or the literal toggle, returning the rest of the string via an internal API
  • match the space literal
  • defer to u64::parse
  • err if characters remain in the input string

In other words, construct a and use internally a parser which converts the beginning of a string and returns the unused parts, but present the whole-string parsing interface to the end user.

Obviously that's a pretty big feature request, but it would solve this issue. AFAICT it should be possible for regex-based sub-parsers to work properly under this design as well.

In the meantime, would you be willing to add documentation that this is a known limitation of the library?

@frozenlib
Copy link
Owner

Certainly, I think the above algorithm would be able to parse the string more accurately.

If I implement it, I would like to use a standard trait rather than the internal API to be able to use this algorithm even if struct contains fields that do not use parse-display. (e.g. std::u8)

a parser which converts the beginning of a string and returns the unused parts

Do you have any idea of existing standard trait that can be used for such purposes?

@frozenlib
Copy link
Owner

Adding the following features is also an option.

  • Adding #[from_str(regex = auto)] to instruction field will have the same effect as if you wrote #[from_str(regex = "turn off|toggle")].

Once Rust's specialization feature is stabilized, it may be able to omit #[from_str(regex = auto)].

This method has the following pros and cons.

Pros

  • Fewer modifications are needed
  • This feature can decide on char range to consume considering the backward string.

Cons

@coriolinus
Copy link
Author

coriolinus commented Dec 9, 2020

Do you have any idea of existing standard trait that can be used for such purposes?

I don't know of any, unfortunately. AFAIK the general form would be something like

// returns `(parse result, unused portion of input)`
fn parse<'a, T: FromStr>(s: &'a str) -> (Result<T, <T as FromStr>::Err>, &'a str);

[edit] Note that in this function, the FromStr implementation has to greedily capture as many characters as it needs from the start of the string. Very likely, a different trait would suit better there.

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage. This feature, assuming it causes the test cases above to pass, plus appropriate mentions in the documentation, would be sufficient in my mind to close out this issue.

@frozenlib
Copy link
Owner

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

This feature is not yet implemented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage.

I think so too, but I can't think of a way to allow omit #[from_str(regex = auto)].

This is the same for the algorithm described in #9 (comment).

Since proc-macro cannot determine whether the field type implements the trait or not, you need to annotate whether use FromStr or the different trait described in #9 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants