Skip to content

Commit

Permalink
feature: String interpolation
Browse files Browse the repository at this point in the history
Adds four kinds of string interpolation split over two axes (utf-8 binary or
unicode codepoint list, and user-facing or developer-facing formatting).

The result are four general classes of syntax with interpolated values:

```
% binary format
<<"A utf-8 binary string: 4"/utf8>> =
  bf"A utf-8 binary string: ~2 + 2~"
```

```
% list format
"A unicode codepoint list string: 4" =
  lf"A unicode codepoint list string: ~2 + 2~"
```

```
% binary debug
<<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> =
  bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~"
```

```
% list debug
"A unicode codepoint list string: {4, foo, [x, y, z]}" =
  ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~"
```

Arbitrary expressions can be nested inside string interpolation
substitutions, including variables, function calls, macros and
even further string interpolation expressions.

Design
======

Why list- and binary-strings?
-----------------------------

In the `string` module from the stdlib, a string is represented by
`unicode:chardata()`, that is, a list of codepoints, binaries with
UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two.

With this in mind, the list- and binary-oriented string interpolation
syntaxes accept either type of interpolated value, but the user
of the interpolation determines whether they want to generate a
`unicode:char_list()` or `unicode:unicode_binary()` based on which
kind of interpolation they use (`bf"..."` and `bd"..."` to create
binaries, or `lf"..."` and `ld"..."` to create lists).

List-strings are most useful for backwards compatibility and convenience.
Binary-strings are most useful for memory-compactness and IO.

Why user- and developer-oriented strings?
-----------------------------------------

There are two similar, but distinct cases where developers typically
want to format strings: when logging/debugging, and when displaying
data to users.

When logging or debugging, the most important features are typically
that any kind of term can be printed, and it should round-trip
losslessly and be read by developers unambiguously. Examples of these
properties are, for example, retaining runtime type information, e.g.
keeping strings quoted when formatting them and printing floats
with full range and resolution.

When displaying to users, the most important features are typically
that they are always going to be human-readable and cleanly formatted.
Examples of these properties are, for example, formatting strings
verbatim, without quotation marks, and not retaining any Erlang-isms
(e.g. we don't want to be printing Erlang tuples, because they won't
make much sense to the average application consumer), so we'd rather
get a `badarg` error to push the developer to make an explicit
formatting decision.

Why no formatting options?
--------------------------

Let's consider the two use-cases introduced earlier:

- Logging/debugging: Typically you want to fire-and-forget, giving
  whatever value you care about to the formatter, and just let it
  print that value unambiguously, meaning there's no need to tweak
  formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"`
- Displaying to users: Typically you want to tightly control formatting,
  and you probably want to do so in a modular and reusable way. In that
  case, factoring out your formatting decision to a function, and
  interpolating the result of that function is probably the best way to
  go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`.

Notably, nothing in the design and implementation here precludes the
future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do
with `io_lib:format` etc. But existing stdlib functions can offer
similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`,
and can be factored out into their own reusable functions.

Implementation
==============

To parse interpolated strings, the scanner tracks some additional state
regarding whether we are currently in an interpolated string, at which
point it enables the recognition of `~` as the delimiter for
interpolated expressions, and generates new tokens which represent the
various components of an interpolated string.

Early during compilation and shell evaluation, interpolated strings are
desugared into calls to functions from the `io_lib` module, and
therefore don't impact later stages of compilation or evalution.

The new string interpolation syntax was not previously valid syntax, so
should be entirely backwards compatible with existing source code.
  • Loading branch information
TD5 committed Apr 28, 2023
1 parent 517443b commit 9922f03
Show file tree
Hide file tree
Showing 14 changed files with 1,810 additions and 10 deletions.
4 changes: 4 additions & 0 deletions lib/compiler/src/compile.erl
Original file line number Diff line number Diff line change
Expand Up @@ -789,6 +789,7 @@ make_ssa_check_pass(PassFlag) ->

standard_passes() ->
[?pass(transform_module),
?pass(desugar_interpolation),

{iff,makedep_side_effect,?pass(makedep_and_output)},
{iff,makedep,[
Expand Down Expand Up @@ -1249,6 +1250,9 @@ strip_columns(Code) ->
erl_parse:map_anno(F, Form)
end || Form <- Code].

desugar_interpolation(Code, #compile{options=Opt}=St) ->
{ok, erl_desugar_interpolation:module(Code, Opt), St}.

get_core_transforms(Opts) -> [M || {core_transform,M} <- Opts].

core_transforms(Code, St) ->
Expand Down
33 changes: 33 additions & 0 deletions lib/stdlib/examples/erl_id_trans.erl
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,39 @@ expr({match,Anno,P0,E0}) ->
E1 = expr(E0),
P1 = pattern(P0),
{match,Anno,P1,E1};
expr({interpolation_no_subs,Anno,binary,_IsDebug,Str}) ->
S = {string, Anno, Str},
{bin, Anno, [{bin_element,Anno,S,default,[utf8]}]};
expr({interpolation_no_subs,Anno,list,_IsDebug,Str}) ->
{string, Anno, Str};
expr({interpolation,Anno,
{interpolation_head,AnnoHead,binary,IsDebug,HeadStr},
Elems,
{interpolation_tail,AnnoTail,TailStr}}) ->
Elems1 =
[ case Elem of
{interpolation_cont,AnnoCont,Str} -> {interpolation_cont,AnnoCont,Str};
{interpolation_subs,AnnoSubs,Expr} -> {interpolation_subs,AnnoSubs,expr(Expr)}
end
|| Elem <- Elems ],
{interpolation,Anno,
{interpolation_head,AnnoHead,binary,IsDebug,HeadStr},
Elems1,
{interpolation_tail,AnnoTail,TailStr}};
expr({interpolation,Anno,
{interpolation_head,AnnoHead,list,IsDebug,HeadStr},
Elems,
{interpolation_tail,AnnoTail,TailStr}}) ->
Elems1 =
[ case Elem of
{interpolation_cont,AnnoCont,Str} -> {interpolation_cont,AnnoCont,Str};
{interpolation_subs,AnnoSubs,Expr} -> {interpolation_subs,AnnoSubs,expr(Expr)}
end
|| Elem <- Elems ],
{interpolation,Anno,
{interpolation_head,AnnoHead,list,IsDebug,HeadStr},
Elems1,
{interpolation_tail,AnnoTail,TailStr}};
expr({bin,Anno,Fs}) ->
Fs2 = pattern_grp(Fs),
{bin,Anno,Fs2};
Expand Down
1 change: 1 addition & 0 deletions lib/stdlib/src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ MODULES= \
erl_error \
erl_eval \
erl_expand_records \
erl_desugar_interpolation \
erl_features \
erl_internal \
erl_lint \
Expand Down
4 changes: 4 additions & 0 deletions lib/stdlib/src/epp.erl
Original file line number Diff line number Diff line change
Expand Up @@ -1930,6 +1930,10 @@ token_src({char,_,C}) ->
token_src({string, _, X}) ->
io_lib:write_string(X);
token_src({_, _, X}) ->
io_lib:format("~w", [X]);
token_src({_, _, _, X}) ->
io_lib:format("~w", [X]);
token_src({_, _, _, _, X}) ->
io_lib:format("~w", [X]).

stringify1([]) ->
Expand Down
Loading

0 comments on commit 9922f03

Please sign in to comment.