Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
If you're the 3%, this is an educational reference on .NET performance tricks I've gathered. By reading this I assume you understand that architecture, patterns, data flow, general performance decisions, hardware, etc have a far larger impact to your codebase than optimising for potentially single digit milliseconds, nanoseconds, or memory bytes with niche and potentially dangerous strategies. For instance, you'll get far more out of caching database values than you'll ever get out of skipping local variable initialization for an object - by several orders of magnitude. I also expect that you already understand best practices e.g. properly dealing with IDisposable
objects.
You'll need to consider:
- Will I be pre-maturely optimising?
- Do I understand the risks involved?
- Is the time spent doing this worth it?
I've rated each optimisation here in terms of difficulty and you may gauge the difficulties differently. Remember, we're looking at the last 3%, so while some are easy to implement, they may only be measurably effective on an extremely hot path.
Difficulty | Reasoning |
---|---|
🟢 Easy | These are either well known, or simple to drop into existing code with only some knowledge. |
🟡 Intermediate | Mostly accessible. May require a bit more coding or understanding of specific caveats of use. Own research is advised. |
🔴 Advanced | Implementation and caveat understanding is critical to prevent app crashes or accidentally reducing performance. |
.NET is always evolving. Some of these may be invalid in the future due to breaking changes or they've been baked into the framework under the hood.
The optimizations below can range from efficient object overloads to language features. They'll include relevant documentation and attribution to where I originally found them.
And some of these are "cheap tricks" and things you probably shouldn't implement in production, but are fun to include for the sake of learning.
In the shortest way possible:
- Write unit tests so you have a baseline for your functionality
- Use something like BenchmarkDotNet or the Performance Profiler in Visual Studio to get a before performance baseline
- Write your change
- Run tests
- Benchmark to see if it did improve what you expected
If you're looking for tools to help, I have: Optimization-Tools-And-Notes
Resource | Description |
---|---|
Pro .NET Memory Management | A heavy and incredibly informative read on everything memory and memory-performance in .NET. A lot of knowledge below originally was seeded by this book and should be treated as an implicit reference. |
PerformanceTricksAzureSDK | Tricks written by Daniel Marbach |
.NET Memory Performance Analysis | A thorough document on analyzing .NET performance by Maoni Stephens |
Classes vs. Structs. How not to teach about performance! | A critique of bad benchmarking by Sergiy Teplyakov |
Diagrams of .NET internals | In depth diagrams from the Dotnetos courses |
Turbocharged: Writing High-Performance C# and .NET Code | A great video describing multiple ways to improve performance by Steve Gordon |
Performance tricks I learned from contributing to open source .NET packages | A great video describing multiple ways to sqeeze out performance by Daniel Marbach |
We always want our code to "run faster". But rarely do we ask – what is it running from?
.NET Performance Tips - Boxing and Unboxing
Boxing and Unboxing (C# Programming Guide)
Boxing is converting an instance of a value type to an instance of a reference type. And unboxing the reverse. In most cases (but not all), we'll be referring to putting a value type on the heap.
This slows down performance because the runtime needs to do a two step operation:
- allocate the new boxed object on the heap which now has an object header and method table reference
- the copy to the heap
This is two fold as it also eventually increases garbage collection pressure.
We can recognise this as the emitted box
/unbox
operations in the IL code, or by using tools such as the Clr Heap Allocation Analyzer.
No example
Nullable value types are boxed meaning there is unboxing work and null checking work happening.
// slower with nullable value types
int? Slow(int? x, int? y) {
int? sum = 0;
for (int i = 0; i < 1000; i++){
sum += x * y;
}
return sum;
}
// faster with non-nullable value types
int? Fast(int x, int y) {
int? sum = 0;
for (int i = 0; i < 1000; i++){
sum += x * y;
}
return sum;
}
Twitter post via Daniel Lawson
String.Compare()
is a memory efficient way to compare strings. This is in respect to the memory inefficient method of comparing by doing stringA.ToLower() == stringB.ToLower()
- which due to strings being immutable, has to create a new string for each .ToLower()
call.
var result = String.Compare("StringA", "StringB", StringComparison.OrdinalIgnoreCase);
Note: It's recommended to use the overload of Compare(String, String, StringComparison)
rather than Compare(String, String)
as via the documentation:
When comparing strings, you should call the Compare(String, String, StringComparison) method, which requires that you explicitly specify the type of string comparison that the method uses. For more information, see Best Practices for Using Strings.
.NET Performance Tips - Strings
Strings are immutable. When you concatenate many strings together each concatenation creates a new string that takes a relatively long amount of time.
string[] words = []; // imagine a large array of strings
// inefficient
var slowWords = "";
foreach(var word in words)
{
slowWords += word;
}
Console.WriteLine(slowWords);
// more efficient
var stringBuilder = new StringBuilder();
foreach(var word in words)
{
stringBuilder.AppendLine(word);
}
Console.WriteLine(stringBuilder.ToString());
If you know the length of your final string, perhaps you're concatenating various strings to one, you can use string.Create()
to speed up the final string creation with minimal allocations.
It looks a complex, but this method allows us to manipulate the string as if it were mutable inside the lambda via a Span<T>
and guarantees us an immutable string afterwards.
Example via Stephen Toub
string s = string.Create(34, Guid.NewGuid(), (span, guid) =>
{
"ID".AsSpan().CopyTo(span);
guid.TryFormat(span.Slice(2), out _, "N");
});
Console.WriteLine(s); // ID3d1822eb1310418cbadeafbe3e7b7b9f
There is a long time open dotnet GitHub issue where larger returned query data takes a lot longer with async compared to sync.
No example
Performance Improvements in .NET 8 by Stephen Toub
HttpClient benchmarking via Niko Uusitalo
If we work with streams out of HttpClient
we can eliminate a lot of memory copies.
The following example assumes the response is a JSON string:
public async Task GetStreamAsync()
{
using var stream = await _httpClient.GetStreamAsync(_url);
var data = await JsonSerializer.DeserializeAsync<List<WeatherForecast>>(stream);
}
In fact, this is such a common operation that we have a stream convenience overload for JSON:
public async Task GetFromJsonAsync()
{
var data = await _httpClient.GetFromJsonAsync<List<WeatherForecast>>(_url);
}
These two examples above eliminate copying the response from the stream to a JSON string then the JSON string into our objects.
There is also a CompileAsyncQuery
available too.
// EF Core compiled query
private static readonly Func<MyDbContext, IEnumerable<User>> _getUsersQuery =
EF.CompileQuery((MyDbContext context) => context.Users.ToList());
public IEnumerable<User> GetUsers()
{
return _getUsersQuery(_context);
}
In distributed systems especially, networking issues become business as usual, not exceptional circumstances. As an example, instead of HttpResponseMessage.EnsureSuccessStatusCode()
you may want to consider HttpResponseMessage.IsSuccessStatusCode
then deal with it accordingly sans exception.
No example
Asynchronous programming allows us to continue with other tasks while waiting for I/O bound calls or expensive computations. In .NET we use the async
and await
keywords. There's too much to cover for this documentation, but once you get used to it, it's easy enough to use daily.
The user needs to ensure they put the await
keyword in for an async method call.
public async Task<string> GetFileContentsAsync(string path)
{
return await File.ReadAllTextAsync(path);
}
Official DevBlog by Stephen Toub
Official Reference for ManualResetValueTaskSourceCore
Task
s are very flexible to suit the varied needs asked of them. However, a lot of use cases of Task
are simple single awaits and it may also be the case that during high throughput code there could be a performance hit of the allocations from Task
. To solve these problems we have ValueTask
which allocates less than Task
. However there are caveats with the less flexibility compared to Task
such as you shouldn't await a ValueTask
more than once.
var result = await GetValue();
public async ValueTask<int> GetValue()
{
// implementation omitted
return value;
}
Class vs Struct in C#: Making Informed Choices via NDepend blog
Value vs Reference type allocation locations via Konrad Kokosa
There are times when structs may help us with performance compared to a class. Structs are lighter than classes and often end up on the stack meaning no allocations for the GC to deal with. It's better to use them in place of simple objects.
However, there are a few caveats here including the size and complexity of the struct, nullability, and no inheritance.
var coordinates = new Coords(0, 0);
public struct Coords
{
public Coords(double x, double y)
{
X = x;
Y = y;
}
public double X { get; }
public double Y { get; }
public override string ToString() => $"({X}, {Y})";
}
Parallel.ForEach()
works like executing a foreach
loop in parallel. Best used if you have lots of independent CPU bound work.
Be wary of threading and concurrency issues.
There is also Parallel.ForEachAsync()
.
List<Image> images = GetImages();
Parallel.ForEach(images, (image) => {
var resizedImage = ResizeImage(image);
SaveImage(image);
});
ReadOnlySpan<T>
Official Reference
Memory<T>
and Span<T>
usage guidelines
A Complete .NET Developer's Guide to Span with Stephen Toub
Turbocharged: Writing High-Performance C# and .NET Code - Steve Gordon
All three of these "represent an arbitrary contiguous region of memory". Another way it's commonly explained is these can be thought of as a window into the memory.
For our short purposes here, the performance gains are via low level reading and manipulation of data without copying or allocating. We use them with arrays, strings, and more niche data situations.
There are caveats such as Span
types only being put on the stack, meaning they can't be used inside async calls (which is what Memory<T>
allows for).
int[] items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
// Technically we can use the slicing overload of AsSpan() as well
var span = items.AsSpan();
var slice = span.Slice(1, 5); // 1, 2, 3, 4, 5
The weirdest way to loop in C# is also the fastest - Nick Chapsas
Improve C# code performance with Span via NDepend blog
Don't mutate the collection during looping.
For arrays and strings, we can use AsSpan()
.
int[] items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var span = items.AsSpan();
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine(span[i]);
}
However for lists we need to use CollectionsMarshal.AsSpan()
List<int> items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var span = CollectionsMarshal.AsSpan(items);
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine(span[i]);
}
If a Span<T>
isn't viable, you can create your own enumerator with ref
and readonly
keywords. Information can be found at Unusual optimizations; ref foreach and ref returns by Marc Gravell.
How to use ArrayPool and MemoryPool in C# by Joydip Kanjilal
Allows us to rent an array of type T
that has equal to or greater size than what we ask for. This helps us reduce allocations on hot paths as instead of creating and destroying array instances, we reuse them.
Note: you can also use the more flexible MemoryPool
, however there are further caveats to that as it allocates to understand the owner of the item - which ArrayPool
avoids.
var buffer = ArrayPool<int>.Shared.Rent(5);
for (var i = 0; i < 5; i++)
{
buffer[i] = i * 2;
}
for (var i = 0; i < 5; i++)
{
Console.WriteLine(buffer[i]);
}
ArrayPool<int>.Shared.Return(buffer);
This can be thought of as configurable/managed string interning with useful methods. Brought to us by the .NET Community Toolkit.
public string GetString(ReadOnlySpan<char> readOnlySpan)
{
return StringPool.Shared.GetOrAdd(readOnlySpan);
}
A general use pool. Has varying ways to setup.
var objectPool = ObjectPool.Create<HeavyObject>();
var pooledObject = objectPool.Get();
// then some work
objectPool.Return(pooledObject);
This is designed to be a near-drop in replacement for MemoryStream
objects, with configuration and metrics.
To use an example from the link above:
private static readonly RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();
static void Main(string[] args)
{
var sourceBuffer = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7 };
using (var stream = manager.GetStream())
{
stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}
}
Workstation GC Official Article
Runtime configuration options for garbage collection
.NET Memory Performance Analysis section with server GC
Twitter post with graphs from Sergiy Teplyakov
DevBlog post from Sergiy Teplyakov
Blog post about .NET 8 GC feature: DATAS by Maoni Stephens
Having concurrent threads help with garbage collection can minimise the GC pause time in an application. However there are caveats to do with the amount of logical processors and how many applications are running on the machine at once.
No example
Native AOT Deployment Official Guide
.NET AOT Resources: 2022 Edition (Self plug)
AOT allows us to compile our code natively for the target running environment - removing the JITer. This allows us to have a smaller deployment footprint and a faster startup time.
Caveats here is that reflection (used in a lot of .NET) is extremely limited, we have to compile for each target environment, no COM, and more. We can get around some of these limitations with source generators.
No example
High performance byte/char manipulation via David Fowler
Dos and Don'ts of stackalloc via Kevin Jones
stackalloc
gives us a block of memory on the stack. We can easily use it with Span<T>
and ReadOnlySpan<T>
to work as small buffers. Highly performance stack-only work, no heap, no garbage collector.
A caveat here is it must be small to prevent stack overflows. I've seen developers at Microsoft use a max size of 256 bytes.
Example from David Fowler, linked above.
// Choose a small stack threshold to avoid stack overflow
const int stackAllocThreshold = 256;
byte[]? pooled = null;
// Either use stack memory or pooled memory
Span<byte> span = bytesNeeded <= stackAllocThreshold
? stackalloc byte[stackAllocThreshold]
: (pooled = ArrayPool<byte>.Shared.Rent(bytesNeeded);
// Write to the span and process it
var written = Populate(span);
Process(span[..written]);
// Return the pooled memory if we used it
if (pooled is not null)
{
ArrayPool<byte>.Shared.Return(pooled);
}
More easily allows us to interoperate with Memory<T>
, ReadOnlyMemory<T>
, Span<T>
, and ReadOnlySpan<T>
. A lot of performant Unsafe
(class) code is abstracted away from us when using MemoryMarshal
.
There are little safeties granted to us in MemoryMarshal
when compared to raw Unsafe
calls, so the user must keep in mind that while they don't see Unsafe
they are still at risk of the same type safety risks.
Note: This advice roughly applies to all within the System.Runtime.InteropServices
namespace, including MemoryMarshal
.
The following adapted from Immo Landwerth:
ReadOnlySpan<byte> bytes = MemoryMarshal.AsBytes(input.String.AsSpan());
return HashData(bytes);
Guid HashData(ReadOnlySpan<byte> bytes)
{
var hashBytes = (Span<byte>)stackalloc byte[20];
var written = SHA1.HashData(bytes, hashBytes);
return new Guid(hashBytes[..16]);
}
CollectionsMarshal
gives us handles to the underlying data representations of collections. It makes use of MemoryMarshal
and Unsafe
under the hood and as such any lack of safety that is found there extends to this, such as type safety risks.
Assuming the developer keeps within safety boundaries, this allows us to make highly performant access to the collection.
The following is an absolutely key caveat:
Items should not be added or removed while the span/ref is in use.
While it means more than this, an easy point to remember is: don't add/remove from the collection while using the returned object.
var items = new List<int> { 1, 2, 3, 4, 5 };
var span = CollectionsMarshal.AsSpan(items);
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine(span[i]);
}
C# 9 - Improving performance using the SkipLocalsInit attribute by Gérald Barré
Feature design for SkipLocalsInit
By default the CLR forces the JIT to set all local variables to their default value meaning your variable won't be some leftover value from memory. In high performance situations, this may become noticeable and we can skip this initialization as long as we understand the risk being taken on. Also see Unsafe.SkipInit<T>
.
[SkipLocalsInit]
byte SkipInitLocals()
{
Span<byte> s = stackalloc byte[10];
return s[0];
}
Unsafe code, pointer types, and function pointers
The unsafe
keyword allows us to work with pointers in C# to write unverifyable code. We can allocate memory without safety from the garbage collector, use pointers, and call methods with function pointers.
However, we are on our own. No GC collection, no security guarantee, no type safety, and other caveats.
Example via documentation:
class UnsafeTest
{
// Unsafe method: takes pointer to int.
unsafe static void SquarePtrParam(int* p)
{
*p *= *p;
}
unsafe static void Main()
{
int i = 5;
// Unsafe method: uses address-of operator (&).
SquarePtrParam(&i);
Console.WriteLine(i);
}
}
Safer than the unsafe
keyword, the Unsafe
class allows us to do lower level manipulation for performant code by supressing type safety while still being tracked by the garbage collector. There are caveats, especially around type safety.
var items = new Span<int>([ 0, 1, 2, 3, 4 ]);
ref var spanRef = ref MemoryMarshal.GetReference(items);
var item = Unsafe.Add(ref spanRef, 2);
Console.WriteLine(item); // prints 2
Twitter post via Konrad Kokosa
Low Level Struct Improvements proposal
UnscopedRefAttribute documentation
For more complex data structures, having your deep down property as a ref could improve speed.
public struct D
{
public int Field;
[UnscopedRef]
public ref int ByRefField => ref Field;
}
Struct equality performance in .NET - Gérald Barré
The default value type implementation of GetHashCode()
trades speed for a good hash distribution across any given value type - useful for dictionary and hashset types. This happens by reflection, which is incredibly slow when looking through our micro-optimization lens.
An example from the Jon Skeet answer above of overriding:
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}
The weirdest way to loop in C# is also the fastest - Nick Chapsas
MemoryMarshal.GetReference Documentation
Using both MemoryMarshal
, CollectionsMarshal
, and Unsafe
we're able to loop directly on the underlying array inside a List<T>
and index quickly to the next element.
Do not add or remove from the collection during looping.
var items = new List<int> { 1, 2, 3, 4, 5 };
var span = CollectionsMarshal.AsSpan(items);
ref var searchSpace = ref MemoryMarshal.GetReference(span);
for (int i = 0; i < span.Length; i++)
{
var item = Unsafe.Add(ref searchSpace, i);
Console.WriteLine(item);
}
I Lied! The Fastest C# Loop Is Even Weirder - Nick Chapsas
With this we've mostly removed the safety .NET provides us in exchange for speed. It is also difficult to understand at a glance without being familiar with MemoryMarshal
methods.
Do not add or remove from the collection during looping.
int[] items = [1, 2, 3, 4, 5];
ref var start = ref MemoryMarshal.GetArrayDataReference(items);
ref var end = ref Unsafe.Add(ref start, items.Length);
while (Unsafe.IsAddressLessThan(ref start, ref end))
{
Console.WriteLine(start);
start = ref Unsafe.Add(ref start, 1);
}
How we achieved 5X faster pipeline execution by removing closure allocations by Daniel Marbach
Dissecting the local functions in C# 7 by Sergey Tepliakov
StackOverflow question on closures
In-line delegates and anonymous methods that capture parental variables give us closure allocations. A new object is allocated to hold the value from the parent for the anonymous method.
We have a few options including, but not limited to:
- If using LINQ, moving that code into a regular loop instead
- Static lambdas (lambdas are anonymous functions) that accept state
- Further helpful if the static lambda is generic to removing boxing
Example via the above link by Daniel Marbach:
static void MyFunction<T>(Action<T> action, T state) => action(state);
int myNumber = 42;
MyFunction(static number => Console.WriteLine(number), myNumber);
If you're going to use low level threads, you'll probably be using ThreadPool
as it manages the threads for us. As .NET has moved forwards, a lot of raw thread or thread pool usage has been superseded (in my experience) by PLINQ, Parallel foreach calls, Task.Factory.StartNew()
, Task.Run()
etc - I.E. further abstraction over threads.
The following is via the documentation above:
ThreadPool.QueueUserWorkItem(ThreadProc);
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
static void ThreadProc(Object stateInfo)
{
// No state object was passed to QueueUserWorkItem, so stateInfo is null.
Console.WriteLine("Hello from the thread pool.");
}
Use SIMD-accelerated numeric types documentation
SIMD-accelerated numeric types
Great self-learning post by Alexandre Mutel
Single Instruction, Multiple Data (SIMD) allows us to act on multiple values per iteration rather than just a single value via vectorization. As .NET has moved forward, it's been made easier to take advantage of this feature.
Vectorizing adds complexity to your codebase, but thankfully under the hood, common .NET methods have been written using vectorization and as such we get the benefit for free. E.g. string.IndexOf() for OrdinalIgnoreCase.
The following is a simple example:
// Initialize two arrays for the operation
int[] array1 = [1, 2, 3, 4, 5, 6, 7, 8];
int[] array2 = [8, 7, 6, 5, 4, 3, 2, 1];
int[] result = new int[array1.Length];
// Create vectors from the arrays
var vector1 = new Vector<int>(array1);
var vector2 = new Vector<int>(array2);
// Perform the addition
var resultVector = Vector.Add(vector1, vector2);
// Copy the results back into the result array
resultVector.CopyTo(result);
// Print the results
Console.WriteLine(string.Join(", ", result)); // Outputs: 9, 9, 9, 9, 9, 9, 9, 9
Inline array language proposal document
The documentation states:
You likely won't declare your own inline arrays, but you use them transparently when they're exposed as
System.Span<T>
orSystem.ReadOnlySpan<T>
objects from runtime APIs
Inline arrays are more a runtime feature for the .NET development team at Microsoft as it allows them to give us featurs such as Span<T>
and interaction with unmanaged types. They're essentially fixed-sized stack allocated buffers.
Example from documentation:
[InlineArray(10)]
public struct Buffer
{
private int _element0;
}
var buffer = new Buffer();
for (int i = 0; i < 10; i++)
{
buffer[i] = i;
}
foreach (var i in buffer)
{
Console.WriteLine(i);
}
This attribute prevents the thread transition from cooperative GC move to preemptive GC mode when applied to a DllImport
. It shaves only nanoseconds and has many caveats for usage.
Example via Kevin Gosse:
public int PInvoke_With_SuppressGCTransition()
{
[DllImport("NativeLib.dll")]
[SuppressGCTransition]
static extern int Increment(int value);
}
Preventing .NET Garbage Collections with the TryStartNoGCRegion API by Matt Warren
Used for absolutely critical hotpaths, TryStartNoGCRegion()
will attempt to disallow the garbage collection until the corresponding EndNoGCRegion()
call. There are many caveats here that may throw exceptions, and lead to accidental misuse.
if (GC.TryStartNoGCRegion(10000))
{
Console.WriteLine("No GC Region started successfully.");
int[] values = new int[10000];
// do work
GC.EndNoGCRegion();
}
else
{
Console.WriteLine("No GC Region failed to start.");
}
Alignment is around the CPU caches and we can spend extra effort in making sure our data is fitted to the CPU cache size. However, the CLR will put in a lot of effort to align this for us by adding padding if needed.
However there may be times where we may have some influence such as with setting the StructLayout
attribute on a struct
or how we control our nested loops accesses.
No example
While there are many in the MethodImplOptions
enum, I'd wager the most common is MethodImplOptions.AggressiveInlining
which takes a piece of code and inserts it where the caller wants it as opposed to making a call out to a separate location, if possible.
It's more appropriate to use this for small functions that are very hot. Caveats here is that it can increase the size of your application and could make it slower overall.
The following example is from System.Math
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Clamp(byte value, byte min, byte max)
{
if (min > max)
{
ThrowMinMaxException(min, max);
}
if (value < min)
{
return min;
}
else if (value > max)
{
return max;
}
return value;
}
Cysharp/MemoryPack benchmark test
The Slice()
operation on a ReadOnlySequence<T>
is slow and can be worked around by wrapping the ReadOnlySequence<T>
with another struct and copying it into a Span<T>
and using the Slice()
operation on span.
The following example is a simplified version from the MemoryPack benchmark tests.
ref struct SpanWriter
{
Span<byte> raw;
public SpanWriter(byte[] buffer)
{
raw = buffer;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Advance(int count)
{
raw = raw.Slice(count);
}
}
- Locks:
lock
,SemaphmoreSlim
,Interlocked
, and the newSystem.Threading.Lock
lock. - Thread locals
- More
ref
- Loop unrolling
- CLR Configs
StringValue
System.Text.Json.Utf8*
- Having a class as
sealed
- LINQ and how there may be times on very hot paths where it might be more useful to write your own for loop; with the caveat that LINQ has fantastic optimizations with SIMD under the hood which may be in play and a developer may be de-optimizing with a for loop.