-
Notifications
You must be signed in to change notification settings - Fork 344
System.Text.Formatting
System.Text.Formatting APIs are similar to the existing StringBuilder and TextWriter APIs. They are designed to format values into text streams and to build complex strings. But these APIs are optimized for creating text for the Web. They do formatting with minimum GC heap allocations (1/6 of allocations in some scenarios) and can format directly to UTF8 streams. This can result in significant performance wins for software that does a lot of text manipulation.
var formatter = new StringFormatter();
formatter.Append(100); // Int32.ToString() is not called here, or ever
string text = formatter.ToString();
Stream stream = new MemoryStream(256);
var writer = new StreamFormatter(stream, FormattingData.InvariantUtf8);
writer.Append(100); // this writes UTF8 to the stream without creating UTF16 first
int numbersToWrite = 100000;
var sb = new StringBuilder(numbersToWrite);
for (int i = 0; i < numbersToWrite; i++) {
sb.Append(i % 10);
}
var text = sb.ToString();
int numbersToWrite = 100000;
var sb = new StringFormatter(numbersToWrite);
for (int i = 0; i < numbersToWrite; i++) {
sb.Append(i % 10);
}
var text = sb.ToString();
int numbersToWrite = 100000;
Stream stream = new MemoryStream(numbersToWrite); // this does the 100KB allocation
var sb = new StreamFormatter(stream, FormattingData.InvariantUtf8);
for (int i = 0; i < numbersToWrite; i++) {
sb.Append(i % 10);
}
In current .NET formatting, StringBuilder (and TextWriter) would call value.ToString(), on the argument passed to Append, this would allocate a string. The characters of this newly allocated string would be then copied to the internal buffer of the StringBuilder. In case of StringFormatter, the value is formatted directly into StringFormatter's buffer using a method similar to the following:
public interface IBufferFormattable {
/// <summary>
/// This interface should be implemented by types that want to support allocation-free formatting.
/// </summary>
/// <param name="buffer">The buffer to format the value into</param>
/// <param name="format">This is a pre-parsed representation of the formatting string.</param>
/// <param name="formattingData">Provides bytes representing digits and symbols.</param>
/// <param name="written">Return the number of bytes that were written to the buffer</param>
/// <returns>False if the buffer was to small, otherwise true.</returns>
bool TryFormat(Span<byte> buffer, Format.Parsed format, FormattingData formattingData, out int written);
}
And then such methods (for all formattable types) would be called by StringFormatter as follows:
public void Append<TFormatter, T>(T value, Format.Parsed format) where T : IBufferFormattable
{
int bytesWritten;
while (!value.TryFormat(formatter.FreeBuffer, format, this.FormattingData, out bytesWritten)) {
this.ResizeBuffer();
bytesWritten = 0;
}
this.CommitBytes(bytesWritten);
}
Today's web traffic is largely text. Moreover, such text payloads are more often than not UTF8. Web server application parse and format a lot of such text, and this library is an experiment how we can make such text operations faster and cheaper. See https://github.com/dotnet/corefxlab/blob/master/src/System.Text.Formatting/tests/Non-AllocatingJson.cs for a glimpse of scenarios we think this library could make more efficient.
This library is a very early prototype. It's not complete, has bugs, and performance problems that need to be fixed. Please don't try to use it in real world software.