Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit imports: import 'Raw "sample.html" #2036

Merged
merged 9 commits into from
Sep 18, 2024
Merged

Conversation

vi
Copy link
Contributor

@vi vi commented Sep 10, 2024

Resolves #2033.
Resolves #1000.

Shall I continue this (i.e. try to make tests, error handling, docs, etc.) or it should be done some other way?

Shall it be based on master or e.g. on #2022?

Copy link
Member

@jneem jneem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks reasonable to me. I guess we might want to bikeshed the syntax a little...

@@ -207,7 +207,7 @@ pub enum Term {

/// An unresolved import.
#[serde(skip)]
Import(OsString),
Import{path: OsString, typ: Option<LocIdent>},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest having typ: InputFormat here, and handling the supported types (including the auto-detection from the filename) while constructing the Term.

Copy link
Contributor Author

@vi vi Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How would it interact with optionally present format like Nix? Ideally it should recognize it and preserve during round-tripping and only fail when actually trying to import the file. Maybe #[cfg] from InputFormat::Nix should be removed?
  2. Does it move complexity to the grammar.lalrpop? (It's the first time I edit such a file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good question. I think since InputFormat::Nix is experimental, it's ok to error if it isn't supported.
  2. I'd suggest adding the actual logic in core/src/parser/utils.rs, so that the changes to grammar.lalrpop will be very simple.

@vi
Copy link
Contributor Author

vi commented Sep 11, 2024

What do you think about fallback to InputFormat::Nickel when filename extension cannot be recognized? Maybe that #[default] is not a good idea?

Falling back from passive format to code format may affect security if there is untrusted data near filename, if .txt appender slips somehow, Nickel would suddenly start executing included (using plain syntax) file instead of just reading a config or text snippet.

Copy link
Contributor

github-actions bot commented Sep 11, 2024

🐰 Bencher Report

Branch2036/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
fibonacci 10📈 view plot
⚠️ NO THRESHOLD
507,460.00
pidigits 100📈 view plot
⚠️ NO THRESHOLD
3,294,600.00
product 30📈 view plot
⚠️ NO THRESHOLD
820,460.00
scalar 10📈 view plot
⚠️ NO THRESHOLD
1,495,700.00
sum 30📈 view plot
⚠️ NO THRESHOLD
820,550.00
🐰 View full continuous benchmarking report in Bencher

Copy link
Contributor

🐰Bencher

ReportWed, September 11, 2024 at 13:10:13 UTC
Projectnickel
Branch2036/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
fibonacci 10➖ (view plot)496,000.00
pidigits 100➖ (view plot)3,177,500.00
product 30➖ (view plot)802,100.00
scalar 10➖ (view plot)1,482,000.00
sum 30➖ (view plot)796,270.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

@vi
Copy link
Contributor Author

vi commented Sep 12, 2024

I'd suggest having typ: InputFormat here, and handling the supported types (including the auto-detection from the filename) while constructing the Term.

I'd suggest adding the actual logic in core/src/parser/utils.rs, so that the changes to grammar.lalrpop will be very simple.

Done.

Some questions:

  • Is it OK to require the tag when filename extension is unknown?
  • Shall it preserve whether the import was explicit or implicit in Term, so that it can pretty-print it back? Or just always pretty-print with a tag? Or auto-detect when filename extension is good and omit the tag?
  • What is new_from_inputs? Is it OK to hard code it to Nickel inputs or they should be auto-detected like other files?
  • Shall there be aliases, like 'Txt or 'Text as alternative for 'Raw, or 'Ncl as alternative for 'Nickel?

@jneem
Copy link
Member

jneem commented Sep 12, 2024

Is it OK to require the tag when filename extension is unknown?

I think for backwards-compatibility we should fall back to the default when the extension is unknown. (But because the tag is a new feature, it's fine to error for an unknown tag.)

Shall it preserve whether the import was explicit or implicit in Term, so that it can pretty-print it back? Or just always pretty-print with a tag? Or auto-detect when filename extension is good and omit the tag?

I think any of those options is ok; maybe @yannham has some input. The important thing for the pretty-printer is that it round-trips when doing (pretty-print -> parse). It doesn't need to round-trip when doing (parse -> pretty-print).

What is new_from_inputs? Is it OK to hard code it to Nickel inputs or they should be auto-detected like other files?

You mean Program::new_from_inputs? I think all the Program constructors can assume nickel format.

Shall there be aliases, like 'Txt or 'Text as alternative for 'Raw, or 'Ncl as alternative for 'Nickel?

My preference is to not have unnecessary aliases, but maybe other people feel differently...

@yannham
Copy link
Member

yannham commented Sep 12, 2024

I think for backwards-compatibility we should fall back to the default when the extension is unknown. (But because the tag is a new feature, it's fine to error for an unknown tag.)

I agree.

I think any of those options is ok; maybe @yannham has some input. The important thing for the pretty-printer is that it round-trips when doing (pretty-print -> parse). It doesn't need to round-trip when doing (parse -> pretty-print).

Same. We can always special-case later (like never showing when the format is Nickel, or/and when the extension agrees), but in general pretty-printing is used for error and result reporting, and isn't guaranteed to be stable (formatting doesn't use the pretty-printer).

My preference is to not have unnecessary aliases, but maybe other people feel differently...

I agree as well. We have a raw format for exporting stuff as a pure string, so we probably want a raw format as well for importing pure text without parsing it.

We discussed the syntax in today's weekly meeting. I'll make a separate issue soon, but this doesn't need to block this PR - we can change the syntax in a second PR.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! Beside the syntax bikeshedding (which can be done later), and the type -> format question, this PR looks good.

Comment on lines 586 to 587
/// Attempt to specify an import, type of which is not known at the moment of compilation.
/// `explicit` determines whether explicit import type annotation was used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I fear type is a bit ambiguous here (because imports can have a type as well). I suggest we use format consistently in the documentation and error reporting instead.

@yannham
Copy link
Member

yannham commented Sep 12, 2024

By the way, the CI is failing because some code in the lsp/nls crate needs to be updated now that Term::Import has changed.

@vi
Copy link
Contributor Author

vi commented Sep 12, 2024

By the way, the CI is failing because some code in the lsp/nls crate needs to be updated now that Term::Import has changed.

The pull request is currently marked as draft because of not everything is implemented yet and it should not be merged yet.

Other missing things (besides format field name, fallback and nls):

  • Pretty printing
  • Adding to the book (if it shares the source code with the main code)
  • More unit tests for imports

@vi
Copy link
Contributor Author

vi commented Sep 12, 2024

I think for backwards-compatibility we should fall back to the default when the extension is unknown.

Shall a warning be issued instead of a parse error in that case?

(Does/should Nickel in general have warnings?)

@yannham
Copy link
Member

yannham commented Sep 12, 2024

(Does/should Nickel in general have warnings?)

No, it doesn't. Something we've wanted to add for some time - this could be a good excuse, but it's a separate work. For now I would say let's stick to the previous behaviour (import anything unknown as Nickel, and it fails to parse, we show the parse error).

vi added 2 commits September 12, 2024 21:34
* Rename typ to format
* Bring back the fallback
* Implement pretty-printer part
* Fix compilation of NLP
@vi
Copy link
Contributor Author

vi commented Sep 12, 2024

Implemented additional changes:

  • Renamed typ to format. Moved code that deals with the format tags closer to the code that deals with filename extensions.
  • Brought back the fallback to Nickel files (the function is deliberately kept one step away from implementing the error again though).
  • Implemented the pretty-printer part (it omits the tag when needed and adds 'Nickel if previous code relied on the fallback).
  • Fixed compilation of the NLP
  • Added imports section to the book.

Not implemented:

  • Specific unit tests
  • Format tag completions for NLP.

@vi vi marked this pull request as ready for review September 12, 2024 19:59
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It's just missing some tests. You can take inspiration from nickel/core/tests/integration/inputs/imports, where the test files for imports are located, and nickel/core/tests/integration/inputs/imports/imported, which hosts the files imported by the former tests. If you put a .ncl file in imported which isn't a stand-alone test but just designed to be imported by another a test, as most of them are in imported, you'll need to add a heading annotation # test.type = 'skip'.

If you're not confident or can't devote more time to this, feel free to let us know and we can also take it from here.

core/src/cache.rs Show resolved Hide resolved
core/src/pretty.rs Outdated Show resolved Hide resolved
doc/manual/syntax.md Outdated Show resolved Hide resolved
doc/manual/syntax.md Outdated Show resolved Hide resolved
doc/manual/syntax.md Outdated Show resolved Hide resolved
@vi
Copy link
Contributor Author

vi commented Sep 13, 2024

Added some tests (without exhaustive tests for each possible input format).

Note that when the same file is imported twice with different formats, the cache remembers the first import and ignores the tag on further ones. Shall we address this potential confusion source? I see two main ways:

  • Enforce the consistency. Remember the format in cache and verify that the format is correct when attempting to retrieve it from cache. Prototype: vi@3ae48a9
  • Use input format as a part of the key for cache, allowing caching the same file multiple times (for distinct InputFormats). Prototype: vi@cfc46aa

@jneem
Copy link
Member

jneem commented Sep 16, 2024

I prefer the second option. I can imagine some use-case where you want to import something as a structured format and as raw text.

@vi
Copy link
Contributor Author

vi commented Sep 16, 2024

Shall I include the implementation (vi@cfc46aa) into this branch?

This is a more invasive change compared to just some explicit tag support.

@yannham
Copy link
Member

yannham commented Sep 16, 2024

@vi Yes, I think it's fine to join both here. Splitting is good usually but the second change is really motivated by the current PR and has a ~ +30/-30 diff, which IMHO is entirely reasonable (and the current PR isn't too big either).

@vi
Copy link
Contributor Author

vi commented Sep 17, 2024

Something's failed on Windows, it is not very clear from CI logs.

Does Windows mangle \n to \r\n during checkout or file read?

@yannham
Copy link
Member

yannham commented Sep 17, 2024

Mh, that's very likely, yeah. One possibility is to remove the newlines at the end of the files and call it a day. Another possibility is to do a regexp match, for example to just check the suffix prefix, or by matching the end of line with a pattern that will fit both Unix and Windows \\r?\\n or something like that.

We should probably have something like an OS-dependent std.string.new_line in the future, but for this PR we'll have to do without.

@vi
Copy link
Contributor Author

vi commented Sep 18, 2024

Is there anything more to be done here (e.g. a syntax change for #2039) ?

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we can handle this in a second step. Thanks a lot for carrying this through @vi !

@yannham yannham added this pull request to the merge queue Sep 18, 2024
Merged via the queue into tweag:master with commit c50647d Sep 18, 2024
6 checks passed
@vi
Copy link
Contributor Author

vi commented Oct 12, 2024

Noticed possible regression:

$ cargo run -p  nickel-lang-cli eval q.q q.q
...
error: unbound identifier `import'Nickel`
  ┌─ <generated main>:1:2
  │
1 │ (import'Nickel "q.q") & (import'Nickel "q.q")
  │  ^^^^^^^^^^^^^ this identifier is unbound

@yannham
Copy link
Member

yannham commented Oct 14, 2024

I guess the pretty printer is at fault, for not inserting a space between import and 'Nickel. However we will end up with a different syntax, so I think we shouldn't bother fixing this issue separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Where is "import" formally documented? Add a string import builtin
3 participants