Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

Open
36 of 46 tasks
m-ou-se opened this issue Jul 7, 2022 · 32 comments
Open
36 of 46 tasks

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

m-ou-se opened this issue Jul 7, 2022 · 32 comments
Assignees
Labels
A-fmt Area: `std::fmt` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Jul 7, 2022

Earlier this year in the libs team meeting, I presented several different ideas for alternative implementations of std::fmt::Arguments which could result in smaller binary size or higher performance. Now that #93740 is mostly done, I'll be shifting my focus to fmt::Arguments and exploring those ideas.

Currently, fmt::Arguments is the size of six pointers, and refers to three slices:

  • A &'static [&'static str] containing the literal parts around the formatting placeholders. E.g. for "a{}b{}c", these are ["a", "b", "c"].
  • A &[&(ptr, fn_ptr)] which is basically a &[&dyn Display] (but can point to Debug or Hex etc. too), pointing to the arguments. This one is not 'static, as it points to the actual arguments to be formatted.
  • A Option<&'static [FmtArgument]>, where FmtArgument is a struct containing all the options like precision, width, alignment, fill character, etc. This is unused (None) when all placeholders have no options, like in "{} {}", but is used and filled in for all place holders as soon as any placeholder uses any options, like in "{:.5} {}".

Here's a visualisation of that, for a "a{}b{:.5}c" format string:

Diagram

An important part of this design is that most of it can be stored in static storage, to minimize the amount of work that a function that needs to create/pass a fmt::Arguments needs to do. It can just refer to the static data, and only fill in a slice of the arguments.

Some downsides:

  • A fmt::Arguments is still relatively big (six pointers in size), and not a great type to pass by value. It could be just two pointers in size (one to static data, one to dynamic data), such that it fits in a register pair.
  • It costs quite a lot of static storage for some simple format strings. For example, "a{}b{}c" needs a &["a", "b", "c"], which is stored in memory as a (ptr, size) pair referencing three (ptr, size) pairs referencing one byte each, which is a lot of overhead. Small string literals with just a newline or a space are very common in formatting.
  • When even just a single formatting placeholder uses any non-standard options, such as "{:02x}", a relatively large array with all the (mostly default) formatting options is stored for all placeholders.
  • The non-static part that contains the pointers to the arguments contains the pointers to the relevant Display/Debug/etc. implementation as well, even though that second part is constant and could be static. (It's a bit tricky to split those, though.)
  • Even when formatting a simple &str argument with a simple "{}" placeholder, the full Display implementation for &str is pulled in, which include code for all the unused options like padding, alignment, etc.

Issues like those are often reason to avoid formatting in some situations, which is a shame.

None of these things are trivial to fix, and all involve a trade off between compile time, code size, runtime performance, and implementation complexity. It's also very tricky to make these tradeoffs for many different use cases at once, as the ways in which formatting is used in a program differs vastly per type of Rust program.

Still, there are many ideas that are worth exploring. It's hard to predict which one will end up being best, so this will involve several different implementations to test and benchmark.

I'll explain the different ideas one by one in the comments below as I explore them.


To do:

@m-ou-se m-ou-se added I-slow Issue: Problems and improvements with respect to performance of generated code. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `std::fmt` labels Jul 7, 2022
@m-ou-se m-ou-se self-assigned this Jul 7, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Jul 7, 2022

Most likely, the best alternative will look vaguely like this:

Diagram 2

That is, it tries to:

  • Make fmt::Arguments as small as possible. (Two pointers.)
  • Move as much data as possible into static storage. (Everything except for the pointers to the arguments, which must be dynamic.)

The most interesting question is what goes in the block with the question marks, the static data that encodes the format string. (Perhaps it could even include the three strings, to save the overhead of the pointers and indirection, at least in the case of tiny strings like these.)

There are many ways to represent a format string in a data structure. There's a trade off to be made between the size of that structure, runtime performance, and complexity.

One of the ideas I want to explore more is to represent it as a single array of 'commands' such as print string "a" set precision to 5 print argument 0 etc. Effectively making a (tiny, trivial) domain specific 'virtual machine' with just ~10 instructions that's easily interpreted by std::fmt::write.

This direction is something I experimented a bit with in #84823, but still requires a lot more exploration to get it to something that can be a good alternative.

Another option we discussed in the libs meeting earlier this year, is to represent it as compiled code instead of a regular data structure. format_args!() would expand to a closure, and fmt::Arguments would effectively be a &dyn Fn. This allows for inlining and optimizing the Display implementations that it uses (which might result in smaller code size if much can be optimized away), but it also means that every single format_args!() will result in its own closure, which can blow up code size and compilation time for programs that do a lot of formatting.

Unfortunately, it takes quite some time to build every single experiment, as it requires a re-implementation of not only fmt::Arguments, but als of the format_args!() macro, every time.

@m-ou-se
Copy link
Member Author

m-ou-se commented Jul 7, 2022

An interesting case is something like format_args!("{} {}", arg0, "string literal"). Ideally, that would optimize to format_args!("{} string literal", arg0), which is an idea that's tracked here: #78356

However, if that second argument is a dynamically chosen &str, we cannot optimize it away, and we end up with an unfortunate amount of indirection thanks to how format_args!() captures all arguments by reference: it captures a &&str. That looks something like this:

Diagram 3

It means that the &str needs to be temporarily placed on the stack such that a &&str can reference it. Additionally, it's unfortunate that the entire String as Display implementation is included, even though just f.write_str() would've been enough, as no special formatting options are used.

If we go for the closure idea, we could instead get something that looks like this:

Diagram 4

So,

  • A compiler-generated structure for the captures, since we now simply make use of the capturing behavior of a regular closure. It could decide to capture a copy of &str instead of requiring a &&str. (It could even just reference the outer stack frame, such that the 'captures' structure is simply the stack frame that already exists.)
  • It could inline <String as Display>::fmt and optimize away nearly all of it. (Alternatively, we could teach the builtin format_args!() macro to generate a simple write_str call for a "{}"-formatted &str, instead of a call to Display::fmt.)

@m-ou-se
Copy link
Member Author

m-ou-se commented Jul 7, 2022

A tricky problem with the closure approach, is how to implement fmt::Arguments::as_str(). It's not impossible, but it's a tricky to encode a special case for fmt::Arguments that are just a single static string literal, if fmt::Arguments just encode something like a &dyn Fn. (Without making the structure larger again. It'd be really nice if it fits in a register pair.)

@m-ou-se
Copy link
Member Author

m-ou-se commented Jul 7, 2022

If we do not go for the closure approach, a tricky problem is how to make format_args!("{}", a) expand to include the right <T as Display>::fmt function pointer in the static data, since we don't have a typeof(a) operator in Rust.

Currently we just make use of something like &dyn Display which handles that for us, but that means that the function pointer ends up in the dynamic part together with the pointer to the argument (as part of a wide pointer), instead of separately as part of the static data.

One possibility of handling this is described in #44343, but it's quite tricky to get that right. The last comment there is a comment from me from 2020 saying that I'd try to implement it. I tried it, but it turned out to be quite tricky to get right and make it compile in all cases, and it got messy quite fast. Specifically, creating just a [fn(*const Opaque); N] with that approach works out okay-ish, but creating a structure that also contains other information (flags, precision, string pieces, etc.) gets quite complicated. Perhaps there's an easier approach possible nowadays, maybe with the help of some (to be created) (unstable) const eval related feature.

@joboet
Copy link
Member

joboet commented Jul 8, 2022

In the VM approach, the "opcodes" can be encoded together with the static string: since in UTF-8, byte values >= 0xf5 are not allowed, they can be used to encode up to 10 opcodes, which could either specify formatting options or indicate that the next bytes are a pointer to a formatting function. Additionally, by using an "end" opcode, the string pointer does not need to include size, while at the same time allowing internal NUL-bytes.

@kadiwa4
Copy link
Contributor

kadiwa4 commented Jul 9, 2022

What about putting all the &'static strs into a single one and then only storing the index where each of the parts end? So for format!("a{}bc{}d", …) there would be a string literal "abcd", stored as a thin pointer to the first byte, and one end index for each slice that needs to be extracted from the string ([1, 3, 4] in this case; to extract "a", "bc" and "d").

This would almost cut the size of the current &'static [&'static str] in half.

@Amanieu
Copy link
Member

Amanieu commented Jul 13, 2022

A tricky problem with the closure approach, is how to implement fmt::Arguments::as_str(). It's not impossible, but it's a tricky to encode a special case for fmt::Arguments that are just a single static string literal, if fmt::Arguments just encode something like a &dyn Fn. (Without making the structure larger again. It'd be really nice if it fits in a register pair.)

Would using a custom trait instead of Fn help here? It would have 2 methods: one which corresponds to the original Fn and one which returns an Option<&'static str>.

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 8, 2022

I've opened a compiler MCP for changing how format_args!() expands, to make it easier to work on these improvements.

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 8, 2022

The implementation of the format_args!() builtin macro (compiler/rustc_builtin_macros/src/format.rs) currently takes the output of rustc_parse_format and processes it, resolves numeric and named arguments (e.g. matches {a} to a = 1, etc.), figures out the right display trait (e.g. UpperHex for {:X}) produces diagnostics about invalid options or missing arguments (etc.), and generates the code that format_args!() will expand to that will eventually create a fmt::Arguments object.

It's a lot of code.

Any new implementation of fmt::Arguments will need big modifications to this code.

I'm now working on splitting this code into a part into two steps:

  1. Resolving the arguments, resolving the display trait, producing diagnostics/errors, etc.
  2. Generating the code.

Between step 1 and 2, there will be a (relatively simple) intermediate representation. Then a new implementation of fmt::Arguments only needs a new implementation of step 2: converting this intermediate representation to code. Then we can leave the first step alone.

Currently, these two steps are quite interwoven, so it'll take some time and effort to separate.

Additionaly, I think it'd be nice if we could delay step 2 until later in the compilation process, making the intermediate representation part of the AST and HIR, which is what my opened an MCP for: rust-lang/compiler-team#541

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 1, 2022

I made a template PR for those who want to help out by experimenting with a new fmt::Arguments implementation: #101272

Unfortunately, it's a lot of work to try out a new idea for a fmt::Arguments implementation, since it requires not only updating the fmt::Arguments struct and its methods, but also a new core::fmt::write() implementation, and a new format_args!() builtin macro implementation.

The template PR provides a starting point that points out exactly what to implement.

I'll work on getting #100996 merged (which is for now included in the template PR), and will start on the closure approach right after. Please feel free to try out another approach (like that 'VM' idea or something else) and leave a comment here with what you're working on, and ping me when you have something ready to test. :)

There's a few benchmarks here thanks to @mojave2. It'd be nice if we could also include a representative performance benchmark for formatting in https://perf.rust-lang.org/ or at least as an optional #[bench] test in rust-lang/rust, so we have a good way to measure results.

@Kobzol
Copy link
Contributor

Kobzol commented Sep 1, 2022

We're now in the process of adding an experimental version of runtime benchmarks into rustc-perf (rust-lang/rustc-perf#1423), I'll try to add some fmt benchmarks once/if it gets merged.

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 13, 2022

Some very promising first results from the closure approach: #101568 (comment)

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 27, 2022

Update: The compiler MCP for making format_args!() its own ast node has been accepted, and #100996 has been reviewed and approved, and is about to be merged.

The closure approach produces some great results for small programs, but doesn't perform as well in all situations. It also can significantly increase compilation time.

Once #100996 is merged, I'll be much easier to make changes to fmt::Arguments. I'll start with a few small incremental changes that should be easy to review, before continuing with more experimental ideas again.

Now that the compiler MCP has been accepted, I'll also work on moving the data types from ast.rs of #100996 into the actual AST, moving the 'actual' expansion to a later point in the compilation process, to kickstart the work on #78356.

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 28, 2022
Rewrite and refactor format_args!() builtin macro.

This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`.

This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts:

1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself.
2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure.
3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to.

In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^)

This can serve as a great starting point for rust-lang#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged.

It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process.

It also fixes a few diagnostics bugs.

This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
@EFanZh
Copy link
Contributor

EFanZh commented Dec 9, 2023

Not sure if someone has already shared a similar idea, but I want to share this idea that is somewhat a hybrid of the closure method and the list of instructions method from: https://blog.m-ou.se/format-args/.

The basic idea is that:

  • Separate the runtime data and the list of instructions.
  • Represent the list of instructions with a function pointer that is derived from a zero-sized nested generic type, where each nesting level represents a specific instruction.

The Arguments could be defined as:

pub struct Arguments<'a> {
    data: NonNull<()>, // Runtime data.
    f: unsafe fn(context: &mut dyn Write, data: NonNull<()>) -> fmt::Result, // Instructions.
    _phantom: PhantomData<&'a ()>,
}

Instruction list

For the the instruction list type, each instruction could have a definition like:

struct ThisInstruction<..., RestInstructions>(..., RestInstructions);

impl<..., RestInstructions> ThisInstruction<..., RestInstructions> {
    unsafe fn call(output: &mut dyn Write, data: NonNull<()>) -> fmt::Result {
        // Execute the current instruction. If we need to access the runtime data, we can store the offset of the
        // runtime data in the generic arguments using const generics.
        ...

        // Execute the rest of the list of instructions.
        RestInstructions::call(output, data)
    }
}

In this way, we can chain the list of instructions together to form a single nested type with a call function, which we can store in the Arguments struct. For example, a nested type derived from format_args!("abc{}def{:?}ghi", 1_u32, false) could be something like:

type InstructionList = WriteStr<..., WriteDisplay<u32, ..., WriteStr<..., WriteDebug<bool, ..., WriteStr<..., End>>>>>;
//                     ^^^^^^^^      ^^^^^^^^^^^^           ^^^^^^^^      ^^^^^^^^^^            ^^^^^^^^
//                      "abc"            `{}`                "def"          `{:?}`                "ghi"

The advantage of this method is that it allows different format specifications that have the same structure to share the same instruction list, which reduces the binary size, without sacrificing much runtime performance.

Of course, the detailed design could be a little more complicated, we may need to define some traits in order to get the idea to work.

First, the Formatter object only need to be created if necessary, we can use a trait to represent the behavior:

trait Context: Write {
    // Creates the `Formatter` object if necessary.
    fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result;
}

// The `Formatter` object has not been created.
impl<'a> Context for dyn Write + 'a {
    fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result {
        f(&mut Formatter::new(self))
    }
}

// The `Formatter` object has already been created.
impl Context for Formatter<'_> {
    fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result {
        f(self)
    }
}

Then, we can have the Instruction trait:

trait Instruction {
    // `context` is either `&mut dyn Write` or `&mut Formatter`, depending on whether a `Formatter` object
    // have been created.
    unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result;

    // Not sure why this wrapper is needed, but converting `call` into a function pointer directly causes lifetime error.
    unsafe fn call_wrapper(output: &mut dyn Write, data: NonNull<()>) -> fmt::Result {
        Self::call(output, data)
    }
}

// An instruction that denotes the end of the instruction list.
struct End;

impl Instruction for End {
    unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
        Ok(())
    }
}

Then, we can implement other instructions we need:

struct WriteStr<const OFFSET: usize, R>(R);

impl<const OFFSET: usize, R> Instruction for WriteStr<OFFSET, R>
where
    R: Instruction,
{
    unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
        let s: &str = todo!("Get the string from `data` and `OFFSET`.");

        context.write_str(s)?;

        R::call(context, data)
    }
}

struct WriteDisplay<T, const OFFSET: usize, R>(PhantomData<T>, R);

impl<T, const OFFSET: usize, R> Instruction for WriteDisplay<T, OFFSET, R>
where
    T: Display,
    R: Instruction,
{
    unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
        context.with_formatter(|f| {
            let s: &T = todo!("Get the reference to `T` from `data` and `OFFSET`.");

            Display::fmt(s, f)?;

            R::call(f, data)
        })
    }
}

We can also have other instructions in the same way like WriteDebug, WriteLowerHex, SetWidth, SetPrecision, etc. If we really want to be clever, we can create some more complex instructions, maybe something like WriteDisplayAndStr, WriteDisplayWithFormat, WriteStrAndEnd, etc.

After building the type for the instruction list, we can get the function pointer using call_wrapper:

let _ = Arguments {
    data: ...,
    f: WriteStr<..., WriteDisplay<u32, ..., WriteStr<..., WriteDebug<bool, ..., WriteStr<..., End>>>>>::call_wrapper,
    _phantom: PhantomData,
};

Runtime data

For the runtime data, it can be as simple as an array of:

union Item {
    usize: usize,     // For storing static string length, placeholder width and placeholder precision.
    ptr: NonNull<()>, // For storing object reference and static string address.
}

The OFFSET generic arguments of the instruction types can be indices of the runtime data in the array.

If we arrange the runtime data carefully (maybe in the reverse order), we can even allow format specifications with the same suffix to share some instructions:

format_args!("abc{}xyz{}uvw{:?}", 1_i32,   2_f64, false);
//                 ^^^^^^^^^^^^ <- The instructions of this part can be shared with the next one.

format_args!( "{:?}foo{}bar{:?}", [4_u32], 3_f64, true);
//                 ^^^^^^^^^^^^ <- The instructions of this part can be shared with the previous one.

Arguments::as_str support

A choice would be adding a flag to indicate whether Arguments only contains static string data, the flag can exists as a field of Arguments:

pub struct Arguments<'a> {
    data: NonNull<()>,
    is_static_string: bool, // <- Indicates whether `data` only contains static string data.
    f: unsafe fn(context: &mut dyn Write, data: NonNull<()>) -> fmt::Result,
    _phantom: PhantomData<&'a ()>,
}

Or, the flag can exist in the runtime data at a predictable offset.

Another choice I am not sure if it would work, is to have a dedicated instructions function for Arguments objects that only contain a single static string.

static WRITE_STR: unsafe fn(context: &mut dyn Write, data: NonNull<()>) -> fmt::Result = |output, data| {
    output.write_str(todo!(
        "Get string from `data`, assuming it only contains a static string."
    ))
};

If the WRITE_STR static variable can have a fixed value, we can compare the instructions function in Arguments with WRITE_STR, see if they are the same function, in order to determine whether the Arguments object only contains a static string:

impl Arguments<'_> {
    pub fn as_str(&self) -> Option<&'static str> {
        if self.f == WRITE_STR {
            Some(todo!(
                "Get string from `self.data`, assuming it only contains a static string."
            ))
        } else {
            None
        }
    }
}

Arguments::new_str support

Arguments can be implemented using an enum:

enum ArgumentsInner<'a> {
    StaticString(&'static str),
    Normal {
        data: NonNull<()>,
        f: unsafe fn(context: &mut dyn Write, data: NonNull<()>) -> fmt::Result,
        _phantom: PhantomData<&'a ()>,
    }
}

pub struct Arguments<'a> {
    inner: ArgumentsInner<'a>,
}

The new_str and as_str implementations are trivial, but this implementation has cost of checking enum variant every time when fmt::write function is called.

Another way would be extending the instructions function to add an extra argument as the length of the string.

pub struct Arguments<'a> {
    data: NonNull<()>,
    maybe_string_length: usize, // We use `length != usize::MAX` to indicate that `data` points to a static string.
    f: unsafe fn(context: &mut dyn Write, data: NonNull<()>, maybe_string_length: usize) -> fmt::Result,
    _phantom: PhantomData<&'a ()>,
}

impl Arguments<'_> {
    pub fn new_str(s: &'static str) -> Self {
        Self {
            data: NonNull::from(s).cast(),
            maybe_string_length: s.len(),
            f: WRITE_STR,
            _phantom: PhantomData,
        }
    }

    pub fn as_str(&self) -> Option<&'static str> {
        if self.maybe_string_length == usize::MAX {
            None
        } else {
            Some(unsafe {
                str::from_utf8_unchecked(slice::from_raw_parts(
                    self.data.cast().as_ptr(),
                    self.maybe_string_length,
                ))
            })
        }
    }
}

The method has the cost of passing an extra argument to the instructions functions.

Drawbacks

Depending on the format specifications used, this method could introduce a lot of types (that might be deeply nested) and functions for the compiler to optimize, the compile time could be affected. Also, depending on the inlining strategy, runtime performance could also be affected because there could be a lot of function calls. Tail call optimizations could help, but is is not guaranteed to work.

@dead-claudia
Copy link

dead-claudia commented Dec 9, 2023

Edit: remove pointless quote

@EFanZh That's more or less the spirit of my suggestion here. Mine just took a different approach to representation to squeeze out much better space savings:

  • Like yours, I separated runtime and static data:
    • String segments are inline in the bytecode
    • Runtime arguments are for<T> (&T, fn(&T) -> fmt::Result) pairs.
    • I did not special-case any argument type. So the stack space isn't optimized for here, only static binary size. (You can't really optimize size much further without going really complicated and special-casing numbers and &str.)
  • I use a variable-length binary bytecode, where flags and the fill char are optional. Length information is always determined by specific flags in the first byte. This saves a lot of space for common interpolation types without special-casing too much.
  • Instead of explicit instruction prefixes, I instead leave the state implicit by alternating between pushing a text string and pushing an interpolation. I then use an interpolation count to know when to stop. This avoids having to encode "push this literal string", though it comes at the cost of not being able to convert zero-cost from &str to fmt::Arguments.
  • I further saved some space by packing extra info (precision type and width type) into the first byte.
  • I further saved some space packing flags into the fill character's spare bits.

@dead-claudia
Copy link

dead-claudia commented Dec 10, 2023

Edit: Drop trivially-desugared bool, correct some prose

The more I think about it, the more I feel something between my idea and your all's might be best: using a binary instruction format that special-cases some types while still remaining very concise.

Part of this is also to try to minimize stack setup and usage as well: arguments require about 2 stack pointers per value, and their setup requires storing (and in PIC, looking up) several pointers. This format tries to bury that overhead into a shared blob of code in the standard library.

Note: I use "M/N bytes" below as a shorthand for "M bytes on 32-bit platforms, N bytes on 64-bit platforms". Makes talking about pointer-dependent sizes much easier.

Runtime layout: 8/16 bytes in position-dependent code and 12/24 bytes in position-independent code.

struct Arguments<'a> {
    // Self-terminating
    bytecode: *const u8,
    // Used by bytecode by reference
    arguments: *const (),
    _phantom: PhantomData<&'a ()>,
    // Required for position-independent code for opcode `D1`
    // Value only read if static custom formatter used
    #[cfg(pic)]
    base_ptr: MaybeUninit<*const ()>,
}

Interpreter state + initial values:

  • Current flags: all clear/defaults
  • Alignment: left
  • Fill pointer: pointer to ASCII space
  • Fill length: 1
  • Width: unset
  • Precision: unset
  • Current argument index: 0
  • Current formatter: fmt::Display
Instructions (67 encodable, 43 used):
  • End: 00
    • Empty string literals shouldn't be emitted anyways. This also provides for a neat end without needing a length prefix, and it's maximally cheap to test on all platforms. (MIPS and RISC-V require any other immediate to be in a register first, and AArch64 has a special "branch if zero" instruction.)
  • 01-7F: Write short string literal
    • Format: 0b0NNNNNNN data...
    • Byte length is 1 + that 7-bit N, allowing up to 128 bytes
  • 80-BF: Set current argument index to small N
    • Format: 0b10NNNNNN
    • Initial argument is 0 - this doesn't need specified in most simple strings.
  • C0: Write long string literal
    • Format: 0xC0 len:usize data...
    • len is the byte length
  • C1: Set current argument index to large N
    • Format: 0xC1 index:usize
  • C2: Set fill code point to UTF-8 encoded immediate
    • Format: 0xC2 char...
    • Designed so that padding fill can just use a pointer to this as its data pointer, without needing to copy it first. The byte length is easy to calculate: (first << (first & 0x80)).count_trailing_ones() + 1.
  • C3: Set flags
    • Format: 0xC7 flags:u8
    • Bit 0: SignPlus
    • Bit 1: SignMinus
    • Bit 2: Alternate
    • Bit 3: SignAwareZeroPad
    • Bit 4: DebugLowerHex
    • Bit 5: DebugUpperHex
  • C4-C6: Set alignment
    • C4: Left
    • C5: Center
    • C6: Right
    • Designed so you can just transmute the low 2 bits.
  • C7: Reserved
  • C8-CB: Configure precision (low 2 bits determine action)
    • C8: Clear precision parameter, use default behavior
    • C9: Set precision to current usize argument
    • CA len:u8: Set precision to small immediate
    • CB len:u32: Set precision to large immediate
  • CC-CF: Configure width (low 2 bits determine action)
    • Similar to with precision, just different prefix and different field
  • D0: Format current argument as &dyn $current_formatter
  • D1: Format current argument with pointer offset to fn(&T) -> fmt::Result
    • Format: D1 ref:usize
    • Might not be reasonably possible?
  • D2: Set current formatter type to fmt::Debug
  • D3: Set current formatter type to fmt::Display
  • D4: Set current formatter type to fmt::Octal
  • D5: Set current formatter type to fmt::LowerHex
  • D6: Set current formatter type to fmt::UpperHex
  • D7: Set current formatter type to fmt::Binary
  • D8: Set current formatter type to fmt::LowerExp
  • D9: Set current formatter type to fmt::UpperExp
  • DA: Reserved
  • DB: Reserved
  • DC: Reserved
  • DD: Reserved
  • DE: Reserved
  • DF: Reserved
  • E0: Format current argument as u8 with current formatter
  • E1: Format current argument as u16 with current formatter
  • E2: Format current argument as u32 with current formatter
  • E3: Format current argument as u64 with current formatter
  • E4: Format current argument as u128 with current formatter
  • E5: Reserved
  • E6: Reserved
  • E7: Reserved
  • E8: Format current argument as i8 with current formatter
  • E9: Format current argument as i16 with current formatter
  • EA: Format current argument as iD2 with current formatter
  • EB: Format current argument as i64 with current formatter
  • EC: Format current argument as i128 with current formatter
  • ED: Reserved
  • EE: Reserved
  • EF: Reserved
  • F0: Format current argument as f32 with current formatter
  • F1: Format current argument as f64 with current formatter
  • F2: Reserved
  • F3: Reserved
  • F4: Reserved
  • F5: Reserved
  • F6: Reserved
  • F7: Reserved
  • F8: Format current argument as &str with current formatter
  • F9: Format current argument as char with current formatter
  • FA: Reserved
  • FB: Reserved
  • FC: Reserved
  • FD: Reserved
  • FE: Reserved
  • FF: Reserved

Other notes:

  • Not all static formatter types support all argument types. For instance, fmt::LowerHex isn't supported for str. This is fine and behavior can just be undefined in those cases.
  • usize and isize should delegate to their sized equivalents (u32/i32 or u64/i64 currently)
  • A trait DelegateFormat : Deref trait should be added to allow other types that wrap primitives (ex: String, NonZero*) to efficiently delegate to their pointees for formatting purposes. This would cut down on both human-created code (as blanket delegating impls could be added) and generated code.
  • There is no special fmt::Pointer for pointers. Instead, it just uses fmt::LowerHex directly with flags set as appropriate.
  • There is no special fmt::Debug for numeric types. Instead, it just delegates directly to fmt::Display/etc.
  • Some built-in types like f32 would still use impl Debug/impl Display as they do today.

For some case studies (taken from here and Mara's blog post:

I'll leave the current representations to the reader to determine.

  • "a{}b{}c", &str, u8:

    • C format string (8 bytes): "a%sb%uc\0"
    • Runtime bytecode (10 bytes): 01 61 F8 01 62 81 E2 01 63 00
    • Arguments array (8/16 bytes): &&str, &u8
    • Current size: 27/51 bytes
  • "a{}b{:.5}c", &str, f32

    • C format string (10 bytes): "a%sb%.5fc\0"
    • Runtime bytecode (12 bytes): 01 61 F8 01 62 81 CA 05 F0 01 63 00
    • Arguments array (8/16 bytes): &&str, &f32
    • Current size: 91/163 bytes
  • "> {a}{b} {c}!", &str, &str, &str

    • C format string (11 bytes): "> %s%s %s!\0"
    • Runtime bytecode (12 bytes): 02 3C 20 F8 81 F8 01 20 82 F8 01 21 00
    • Arguments array (12/24 bytes): &&str, &&str, &&str
    • Current size: 76/148 bytes
  • "{0} {0:?} {1:x}", &str, u32

    • No C format string equivalent
    • Runtime bytecode (11 bytes): F8 01 20 D2 E0 01 20 D5 81 E2 00
    • Arguments array (8/16 bytes): &&str, &u32
    • Current size: 74/146 bytes
  • "> {a}{b} {c}!", &str, impl fmt::Display, &dyn fmt::Display

    • No C format string equivalent
    • Runtime bytecode (17/21 bytes): 02 3C 20 F8 81 D1 ADDR... 01 20 82 D0 01 21 00
    • Arguments array (12/24 bytes): &&str, &impl fmt::Display, &&dyn fmt::Display
    • Current size: 76/148 bytes + &dyn fmt::Display wrapper implementation
  • "{0:?} {0:#?}", &dyn fmt::Debug

    • No C format string equivalent
    • Runtime bytecode (8 bytes): D2 D0 01 20 C7 04 D0 00
    • Arguments array (4/8 bytes): &&dyn fmt::Debug
    • Current size: 113/209 bytes + &dyn fmt::Display wrapper implementation
  • "a{1:>012}b{1:-5}c{0:#.1}", f32, impl fmt::Debug

    • No C format string equivalent
    • Runtime bytecode (35/43 bytes): 01 61 81 C7 0C C6 CE 0C D1 ADDR... 01 62 C7 02 C4 CE 05 D1 ADDR... 01 63 80 C7 04 CC CA 01 F0 00
    • Arguments array (8/16 bytes): &f32, &impl fmt::Debug
    • Current size: 144/265 bytes + &dyn fmt::Display wrapper implementation

@m-ou-se @tgross35 Thoughts? Also, what all can/should we special-case? For one, it may be worth special-casing some string flags, and deferring to the full version only when some flags exist.


As for stuff like String, I'd feel a trait-based approach makes more sense than specializing here. To give an idea (and why I feel it really should be punted to its own RFC):

  • Not only std types lile String would benefit, but even userland types like bytes::BytesBuf.
  • One could use that to choose what to delegate to, in case of multiple things.
  • It could be useful to have it return a fmt::Arguments. This could (in some cases) then be inlined somehow into the parent format_args!, making it fully zero-overhead.

@tgross35 I tried to take your feedback from last time into consideration. Tried to address all those pain points this time around. Hopefully, this spells everything out a lot better. 🙂

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 13, 2024
perf: improve write_fmt to handle simple strings

Per `@dtolnay` suggestion in serde-rs/serde#2697 (comment) - attempt to speed up performance in the cases of a simple string format without arguments:

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
+ #[inline]
  pub fn write_fmt(&mut self, f: fmt::Arguments) -> fmt::Result {
+     if let Some(s) = f.as_str() {
+         self.buf.write_str(s)
+     } else {
          write(self.buf, f)
+     }
  }
```

Hopefully it will improve the simple case for the rust-lang#99012

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 15, 2024
perf: improve write_fmt to handle simple strings

Per `@dtolnay` suggestion in serde-rs/serde#2697 (comment) - attempt to speed up performance in the cases of a simple string format without arguments:

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
+ #[inline]
  pub fn write_fmt(&mut self, f: fmt::Arguments) -> fmt::Result {
+     if let Some(s) = f.as_str() {
+         self.buf.write_str(s)
+     } else {
          write(self.buf, f)
+     }
  }
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 20, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 21, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Mar 2, 2024

I've been thinking about this issue while working to shrink a wasm module that necessarily has a lot of string formatting both for human-readable error messages and panic messages. In this context, performance is not very important, but shrinking code size and the static data structures would be very welcome (I think code size is a bit more important here than static data because the former has to be validated and often JIT-compiled to instantiate the module).

The "bytecode" ideas presented in earlier comments are very interesting from this angle, as they go to great lengths to reduce the size of the data structures while keeping set-up code as small as anyone could ask for. However, I'm worried about pushing too much complexity in the code that has to interpret fmt::Arguments. Let's say I have 100 or 200 distinct "formatting templates" in my program: shrinking the static data for most of them by a byte or two each isn't a good deal it if the "bytecode interpreter" grows by hundreds of bytes to support the smarter encoding. In fact, it might contain a bunch of code to handle "opcodes" that don't occur anywhere in my program's use of string formatting.

With respect to performance, I'm wary of the potential hidden (because it's difficult to pin down in typical benchmarks) cost of having too many distinct encodings that could plausibly be encountered somewhat regularly. This is distinct from just minimizing branches (e.g., Stream-VByte as opposed to other varint schemes), which is not always feasible. Consider modern, fast byte-aligned LZ compression like LZ4: decoding necessarily spends a lot of time on variable-size data copies, but can still run at multiple GB/s because data-dependent branches are minimized and the remaining ones carry as much weight as possible. For example, if you eat a mispredict because a literal or match length exceeded some threshold and falls off the common code path, that's usually a good thing because you get to spend more time in a single large memcpy, which is faster than having many short matches and copies.

I think there's an interesting corner of the design space that can address both of these points: minimizing the number of different opcodes and encoding formats, just leaning heavily on the fact that most integers that occur in all the static data structures are almost always very small. So, something closer to the very first design @dead-claudia described than every other variant described since then. Without wanting to commit to a very concrete proposal, here's a sketch for a variant that prioritizes the size of a canonical "interpreter" loop while still packing the entire static data into a single, rather compact blob and probably having a shot at running decently fast:

                                                                                         ┌────────────────┐    
┌───────────────────────────┐                                                            │                │    
│Runtime args array         │                                              ┌───────────► │                │    
│1 pointer per argument     │                                              │             │   (code)       │    
└───────────────────────────┘                                              │             │                │    
                 ▲                                                         │             │                │    
                 │                                                         │             └────────────────┘    
                 │  refers to                                              │                                   
                 └──────────────────────┐                                  │points to                          
                                        │    "print arg" opcode:           │                                   
                                ┌───────┴────────┬────────────────┬────────┴─────────────────────────────┐     
                                │  argument idx  │ options loc.   │    fmt fn pointer (4 or 8B), e.g.    │     
                                │    LEB128      │   LEB128       │    <i32 as LowerHex>::fmt            │     
                                └────────────────┴────────────┬───┴──────────────────────────────────────┘     
                                                              │                                                
                                                              │ byte offset from end of string data
                                                              └────────────────────────────┐                   
                                                                                           ▼                   
┌──────────────┬────────────┬─────────────────────┬──────────────┬─────────────────────────┬──────────┬───────┐
│Bytecode len  │Strings len │ Alternating opcodes:│String pieces:│ options[0]:             │options[1]│       │
│ 1-4B LEB128  │1-8B LEB128 │ (1) print str       │"foobarquux"  │ 1B align, prec/width tag│          │ ...   │
│              │            │ (2) print arg       │      ▲---    │ 1B flags                │          │       │
│              │            │ (3) print str  etc. │      │       │ other data: LEB128      │          │       │
└──────────────┴────────────┴─────────────────────┴──────┼───────┴─────────────────────────┴──────────┴───────┘
                                                      pos│ + len                                               
                              "print str" opcode:        │                                                     
                             ┌────────────────────┐      │                                                     
                             │ length (may be 0)  │      │                                                     
                             │ 1-8B LEB128        ├──────┘                                                     
                             └────────────────────┘                                                            
                              print len bytes from current pos                                                 
                              update curr. pos for next print                                                 

Some notes:

  • I'm assuming the "put function pointers into the static data" aspect will be solved somehow so so that the runtime part can be just two pointers (one to the start of the static data, the first byte of "bytecode len"). This is important for many reasons, but it's orthogonal to what the rest of the static data looks like.
  • Almost all integer quantities are LEB128-encoded, but most of them will virtually always fit into a single byte, so they can be both small and reasonably fast to decode. There's many alternatives to LEB128, some may be better, but it should be something simple and byte-aligned. Ideally, most or all values would use the same format because that's simpler to understand, implement, and optimize (for code size and/or performance).
  • I've taken a page from LZ4's book with the strict alternation of string pieces and placeholders, which mean there will sometimes be empty strings before/between placeholders, though each of them occupies only a single byte. There's also several ways to get away with just two to four distinct opcodes (without increasing the size of a short string or typical placeholder) if one would rather avoid representing empty strings entirely.
  • All string pieces are concatenated into a single buffer with their lengths baked into the bytecode. Nevertheless, fmt::Arguments::as_str can be implemented with relative ease. Basically, it needs to decode string length and bytecode length and then check that the bytecode array is just a single LEB128-encoded integer (meaning it's only a single "print str" opcode). There's probably a variation of data layout and/or varint encoding that makes this easier and cheaper than what I sketched here.
  • I went for just storing each options field completely in some reasonably compact and variable-length form, as opposed to a bunch of bytecode instructions that manipulate individual fields. The latter is not a bad approach by any means, but if every plausible set of options can be encoded in 5 bytes or so, that's also quite competitive on size. Precision and width need an enum tag to choose between three options, which can be bit-packed together with the alignment flag, and their payloads (as well as the fill character) can usually be LEB128-encoded into a single byte each. I've not formed an option about what, exactly, is the best and easiest-to-decode representation for options, but I hope this will do as a straw proposal.
  • Note that multiple placeholders can refer to the same options struct, which avoids the steep size increase from a single placeholder among many needing non-default options. As sketched here, every format string would store a copy of the default options, but there's no obstacle to repurposing e.g. an index of 0 to mean "default options" without explicitly storing it.

Edit: I no longer remember why I decided to store the string pieces out-of-line, after the "bytecode". Intermingling the two would remove the need for the separate "string len" field and simplify fmt::Arguments::as_str. The best I can come up with now is that keeping the varints entirely separate from the strings would enable varint schemes that can decode multiple integers at once more efficiently (e.g., Stream-VByte), but I find it difficult to imagine that this would make a difference in the formatting context.

@dead-claudia
Copy link

dead-claudia commented Mar 2, 2024

@hanna-kruppe To be clear, I only did a small subset that, in my experience, almost always ends up used anyways. Also, there's less cardinality than you'd think at first glance.

  • For integers, there's very little cardinality here and what little there is varies very little between types:
    • Load: load as per usual, store in union of u128/i128 or usize/isize
    • Print: do it all in a compact base 16 display loop with a few parameters:
      • Possible sign/pad
      • Divisor (either 2, 8, 10, or 16)
      • UpperHex boolean to add 32 to the ASCII character (can be done by bit merging/masking)
    • For multi-register values (say, u128), you can reuse that logic by chunking it:
      • Divide/modulo by, say, 1 billion (for 32-bit usize) for each iteration. Display the modulus and let the next iteration use the dividend.
      • After printing an iteration with a zero divisor, return. This also implicitly handles zero.
  • For floats, it's a bit more complicated. Their printing function is non-trivial, but even C's printf supports them, so you're carrying that baggage very often anyways.
  • For strings and chars, I intentionally limited it only to Display and Debug. The first is just f.pad(s) even today, and the second's also fairly simple. Characters can be turned into strings via encode_utf8 and reuse the debug path (but with single quotes).
  • For booleans, Debug and Display are equivalent and you're just doing f.write_str(if b { "true" } else { "false" }) for both. Honestly, this could (and should) just be dropped in favor of desugaring to (static) strings.

And in each of those, you're saving:

  • Vtable pointer from stack + immediate for each use
  • Generated implementation functions for built-in types

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Mar 2, 2024

I appreciate that you've put a lot of thought into all of the designs you've described, and I don't have trouble believing that they can be implemented compactly with enough heroics. However, nobody needs to step up to actually perform such heroics if the project chooses an approach that can be implemented straightforwardly without any "cliffs" w.r.t. binary size or runtime performance. This is a bit of a worse-is-better sentiment, but here's some very tangible downsides of bundling the formatting code for certain types into a bytecode interpreter:

  1. Floating point formatting (if it aims to be correct and efficient) is pretty heavyweight at the scale we're talking about. Many Rust programs sensitive to code size avoid it and also link very little (if any) C code. Plus, even if there's already one float formatting implementation linked in via C's printf or via ryu, that doesn't justify adding another one.
  2. While not as drastic as float formatting, there are many cases (possibly even my aforementioned fmt-heavy wasm module) where things like char -> UTF-8 encoding and the UTF-8-aware string escaping invoked by <str as Debug>::fmt can be avoided today. That's a good thing because those things do take a fair bit of space, at least they way they're implemented in std today.
  3. Which brings me to integer formatting: sure, you can cram it into much less space than the standard library's current implementations take. But those implementations are that way for a reason. I wouldn't want to stake the success (w.r.t. binary size) of a fmt::Arguments rewrite on also being able to replace the existing integer formatting code with something tuned for code size above everything else.

And that's not even touching on the many different "instructions" and how to decode + dispatch them. The straightforward implementations that come to mind when reading through them are either not very fast or not very compact -- even taking into account that most of the opcode space is occupied by ranges of related things that can just do something generic based on the lower N bits of the "opcode". I'm sure there's a lot of clever tricks possible beyond what I can imagine right now, but are all of them compatible with each other and how do they balance the sort-of-competing concerns of binary size and runtime performance? With so many moving pieces, it's difficult (for me, at least) to get a feeling for these things without putting in the hard work of actually writing a complete implementation or two.

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 5, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Mar 6, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang/rust#99012
* Another related (original?) issues #10761
* Previous similar attempt to fix it by by `@Kobzol` #100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang/rust#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` #100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 20, 2024
Rewrite and refactor format_args!() builtin macro.

This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`.

This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts:

1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself.
2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure.
3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to.

In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^)

This can serve as a great starting point for rust-lang/rust#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged.

It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process.

It also fixes a few diagnostics bugs.

This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 20, 2024
More core::fmt::rt cleanup.

- Removes the `V1` suffix from the `Argument` and `Flag` types.

- Moves more of the format_args lang items into the `core::fmt::rt` module. (The only remaining lang item in `core::fmt` is `Arguments` itself, which is a public type.)

Part of rust-lang/rust#99012

Follow-up to rust-lang/rust#110616
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Rewrite and refactor format_args!() builtin macro.

This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`.

This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts:

1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself.
2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure.
3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to.

In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^)

This can serve as a great starting point for rust-lang/rust#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged.

It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process.

It also fixes a few diagnostics bugs.

This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
More core::fmt::rt cleanup.

- Removes the `V1` suffix from the `Argument` and `Flag` types.

- Moves more of the format_args lang items into the `core::fmt::rt` module. (The only remaining lang item in `core::fmt` is `Arguments` itself, which is a public type.)

Part of rust-lang/rust#99012

Follow-up to rust-lang/rust#110616
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
perf: improve write_fmt to handle simple strings

In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string.

```rust
write!(f, "text")  ->  f.write_str("text")
```

```diff
 /// [`write!`]: crate::write!
+#[inline]
 #[stable(feature = "rust1", since = "1.0.0")]
 pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result {
+    if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) }
+}
+
+/// Actual implementation of the [`write`], but without the simple string optimization.
+fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result {
     let mut formatter = Formatter::new(output);
     let mut idx = 0;
```

* Hopefully it will improve the simple case for the rust-lang/rust#99012
* Another related (original?) issues rust-lang#10761
* Previous similar attempt to fix it by by `@Kobzol` #100700

CC: `@m-ou-se` as probably the biggest expert in everything `format!`
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 13, 2024
…<try>

Evaluate `std::fmt::Arguments::new_const()` during Compile Time

Fixes rust-lang#128709

This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context.

Currently,
`println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`.

With this PR,
`println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`.

This is accomplished in two steps:

1.  Const stabilize `std::fmt::Arguments::new_const()`.
2.  Wrap calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR.

This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c

This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input.

CC rust-lang#99012
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 14, 2024
…<try>

Evaluate `std::fmt::Arguments::new_const()` during Compile Time

Fixes rust-lang#128709

This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context.

Currently,
`println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`.

With this PR,
`println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`.

This is accomplished in two steps:

1.  Const stabilize `std::fmt::Arguments::new_const()`.
2.  Wrap calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR.

This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c

This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input.

CC rust-lang#99012
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 28, 2024
…<try>

Evaluate `std::fmt::Arguments::new_const()` during Compile Time

Fixes rust-lang#128709

This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context.

Currently,
`println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`.

With this PR,
`println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`.

This is accomplished by wrapping calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR.

This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c

This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input.

CC rust-lang#99012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `std::fmt` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests