forked from erlang/otp
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: String interpolation #5
Open
TD5
wants to merge
1
commit into
WhatsApp:master
Choose a base branch
from
TD5:string-interpolation
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I am also thinking of adding corresponding constant UTF-8 binary literals and patterns, e.g. |
Adds four kinds of string interpolation split over two axes (utf-8 binary or unicode codepoint list, and user-facing or developer-facing formatting). The result are four general classes of syntax with interpolated values: ``` % binary format <<"A utf-8 binary string: 4"/utf8>> = bf"A utf-8 binary string: ~2 + 2~" ``` ``` % list format "A unicode codepoint list string: 4" = lf"A unicode codepoint list string: ~2 + 2~" ``` ``` % binary debug <<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> = bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~" ``` ``` % list debug "A unicode codepoint list string: {4, foo, [x, y, z]}" = ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~" ``` Arbitrary expressions can be nested inside string interpolation substitutions, including variables, function calls, macros and even further string interpolation expressions. Design ====== Why list- and binary-strings? ----------------------------- In the `string` module from the stdlib, a string is represented by `unicode:chardata()`, that is, a list of codepoints, binaries with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two. With this in mind, the list- and binary-oriented string interpolation syntaxes accept either type of interpolated value, but the user of the interpolation determines whether they want to generate a `unicode:char_list()` or `unicode:unicode_binary()` based on which kind of interpolation they use (`bf"..."` and `bd"..."` to create binaries, or `lf"..."` and `ld"..."` to create lists). List-strings are most useful for backwards compatibility and convenience. Binary-strings are most useful for memory-compactness and IO. Why user- and developer-oriented strings? ----------------------------------------- There are two similar, but distinct cases where developers typically want to format strings: when logging/debugging, and when displaying data to users. When logging or debugging, the most important features are typically that any kind of term can be printed, and it should round-trip losslessly and be read by developers unambiguously. Examples of these properties are, for example, retaining runtime type information, e.g. keeping strings quoted when formatting them and printing floats with full range and resolution. When displaying to users, the most important features are typically that they are always going to be human-readable and cleanly formatted. Examples of these properties are, for example, formatting strings verbatim, without quotation marks, and not retaining any Erlang-isms (e.g. we don't want to be printing Erlang tuples, because they won't make much sense to the average application consumer), so we'd rather get a `badarg` error to push the developer to make an explicit formatting decision. Why no formatting options? -------------------------- Let's consider the two use-cases introduced earlier: - Logging/debugging: Typically you want to fire-and-forget, giving whatever value you care about to the formatter, and just let it print that value unambiguously, meaning there's no need to tweak formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"` - Displaying to users: Typically you want to tightly control formatting, and you probably want to do so in a modular and reusable way. In that case, factoring out your formatting decision to a function, and interpolating the result of that function is probably the best way to go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`. Notably, nothing in the design and implementation here precludes the future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do with `io_lib:format` etc. But existing stdlib functions can offer similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`, and can be factored out into their own reusable functions. Implementation ============== To parse interpolated strings, the scanner tracks some additional state regarding whether we are currently in an interpolated string, at which point it enables the recognition of `~` as the delimiter for interpolated expressions, and generates new tokens which represent the various components of an interpolated string. Early during compilation and shell evaluation, interpolated strings are desugared into calls to functions from the `io_lib` module, and therefore don't impact later stages of compilation or evalution. The new string interpolation syntax was not previously valid syntax, so should be entirely backwards compatible with existing source code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds four kinds of string interpolation split over two axes (utf-8 binary or unicode codepoint list, and user-facing or developer-facing formatting).
The result are four general classes of syntax with interpolated values:
Arbitrary expressions can be nested inside string interpolation substitutions, including variables, function calls, macros and even further string interpolation expressions.
Design
Why list- and binary-strings?
In the
string
module from the stdlib, a string is represented byunicode:chardata()
, that is, a list of codepoints, binaries with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.With this in mind, the list- and binary-oriented string interpolation syntaxes accept either type of interpolated value, but the user of the interpolation determines whether they want to generate a
unicode:char_list()
orunicode:unicode_binary()
based on which kind of interpolation they use (bf"..."
andbd"..."
to create binaries, orlf"..."
andld"..."
to create lists).List-strings are most useful for backwards compatibility and convenience. Binary-strings are most useful for memory-compactness and IO.
Why user- and developer-oriented strings?
There are two similar, but distinct cases where developers typically want to format strings: when logging/debugging, and when displaying data to users.
When logging or debugging, the most important features are typically that any kind of term can be printed, and it should round-trip losslessly and be read by developers unambiguously. Examples of these properties are, for example, retaining runtime type information, e.g. keeping strings quoted when formatting them and printing floats with full range and resolution.
When displaying to users, the most important features are typically that they are always going to be human-readable and cleanly formatted. Examples of these properties are, for example, formatting strings verbatim, without quotation marks, and not retaining any Erlang-isms (e.g. we don't want to be printing Erlang tuples, because they won't make much sense to the average application consumer), so we'd rather get a
badarg
error to push the developer to make an explicit formatting decision.Why no formatting options?
Let's consider the two use-cases introduced earlier:
bd"~Timestamp~: ~Query~ returned ~Result~"
bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"
.Notably, nothing in the design and implementation here precludes the future introduction of formatting options such as
bf"float: ~.2f(MyFloat)~"
as one might do withio_lib:format
etc. But existing stdlib functions can offer similar functionality, e.g.bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"
, and can be factored out into their own reusable functions.Implementation
To parse interpolated strings, the scanner tracks some additional state regarding whether we are currently in an interpolated string, at which point it enables the recognition of
~
as the delimiter for interpolated expressions, and generates new tokens which represent the various components of an interpolated string.Early during compilation and shell evaluation, interpolated strings are desugared into calls to functions from the
io_lib
module, and therefore don't impact later stages of compilation or evalution.The new string interpolation syntax was not previously valid syntax, so should be entirely backwards compatible with existing source code.