Skip to content

Latest commit

 

History

History
989 lines (665 loc) · 43.8 KB

README.md

File metadata and controls

989 lines (665 loc) · 43.8 KB

Dotnet Optimization Cheatsheet

Donald Knuth:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

If you're the 3%, this is an educational reference on .NET performance tricks I've gathered. By reading this I assume you understand that architecture, patterns, data flow, general performance decisions, hardware, etc have a far larger impact to your codebase than optimising for potentially single digit milliseconds, nanoseconds, or memory bytes with niche and potentially dangerous strategies. For instance, you'll get far more out of caching database values than you'll ever get out of skipping local variable initialization for an object - by several orders of magnitude. I also expect that you already understand best practices e.g. properly dealing with IDisposable objects.

You'll need to consider:

  1. Will I be pre-maturely optimising?
  2. Do I understand the risks involved?
  3. Is the time spent doing this worth it?

I've rated each optimisation here in terms of difficulty and you may gauge the difficulties differently. Remember, we're looking at the last 3%, so while some are easy to implement, they may only be measurably effective on an extremely hot path.

Difficulty Reasoning
🟢 Easy These are either well known, or simple to drop into existing code with only some knowledge.
🟡 Intermediate Mostly accessible. May require a bit more coding or understanding of specific caveats of use. Own research is advised.
🔴 Advanced Implementation and caveat understanding is critical to prevent app crashes or accidentally reducing performance.

.NET is always evolving. Some of these may be invalid in the future due to breaking changes or they've been baked into the framework under the hood.

The optimizations below can range from efficient object overloads to language features. They'll include relevant documentation and attribution to where I originally found them.

And some of these are "cheap tricks" and things you probably shouldn't implement in production, but are fun to include for the sake of learning.

Beginning Optimisation

In the shortest way possible:

  1. Write unit tests so you have a baseline for your functionality
  2. Use something like BenchmarkDotNet or the Performance Profiler in Visual Studio to get a before performance baseline
  3. Write your change
  4. Run tests
  5. Benchmark to see if it did improve what you expected

If you're looking for tools to help, I have: Optimization-Tools-And-Notes

Further Resources

Resource Description
Pro .NET Memory Management A heavy and incredibly informative read on everything memory and memory-performance in .NET. A lot of knowledge below originally was seeded by this book and should be treated as an implicit reference.
PerformanceTricksAzureSDK Tricks written by Daniel Marbach
.NET Memory Performance Analysis A thorough document on analyzing .NET performance by Maoni Stephens
Classes vs. Structs. How not to teach about performance! A critique of bad benchmarking by Sergiy Teplyakov
Diagrams of .NET internals In depth diagrams from the Dotnetos courses
Turbocharged: Writing High-Performance C# and .NET Code A great video describing multiple ways to improve performance by Steve Gordon
Performance tricks I learned from contributing to open source .NET packages A great video describing multiple ways to sqeeze out performance by Daniel Marbach

Optimisations

We always want our code to "run faster". But rarely do we ask – what is it running from?

🟢 Remove unncessary boxing/unboxing

.NET Performance Tips - Boxing and Unboxing

Boxing and Unboxing (C# Programming Guide)

Boxing is converting an instance of a value type to an instance of a reference type. And unboxing the reverse. In most cases (but not all), we'll be referring to putting a value type on the heap.

This slows down performance because the runtime needs to do a two step operation:

  1. allocate the new boxed object on the heap which now has an object header and method table reference
  2. the copy to the heap

This is two fold as it also eventually increases garbage collection pressure.

We can recognise this as the emitted box/unbox operations in the IL code, or by using tools such as the Clr Heap Allocation Analyzer.

No example

🟢 Use fewer nullable value types

Via Bartosz Adamczewski

Nullable value types are boxed meaning there is unboxing work and null checking work happening.

// slower with nullable value types
int? Slow(int? x, int? y) {
	int? sum = 0;
	for (int i = 0; i < 1000; i++){
		sum += x * y;
	}
	return sum;
}

// faster with non-nullable value types
int? Fast(int x, int y) {
	int? sum = 0;
	for (int i = 0; i < 1000; i++){
		sum += x * y;
	}
	return sum;
}

🟢 Use String.Compare() or String.Equals()

Twitter post via Daniel Lawson

Official Reference

String.Compare() is a memory efficient way to compare strings. This is in respect to the memory inefficient method of comparing by doing stringA.ToLower() == stringB.ToLower() - which due to strings being immutable, has to create a new string for each .ToLower() call.

var result = String.Compare("StringA", "StringB", StringComparison.OrdinalIgnoreCase);

Note: It's recommended to use the overload of Compare(String, String, StringComparison) rather than Compare(String, String) as via the documentation:

When comparing strings, you should call the Compare(String, String, StringComparison) method, which requires that you explicitly specify the type of string comparison that the method uses. For more information, see Best Practices for Using Strings.

🟢 Use StringBuilder for larger strings

.NET Performance Tips - Strings

Strings are immutable. When you concatenate many strings together each concatenation creates a new string that takes a relatively long amount of time.

string[] words = []; // imagine a large array of strings

// inefficient
var slowWords = "";

foreach(var word in words)
{
	slowWords += word;
}

Console.WriteLine(slowWords);

// more efficient
var stringBuilder = new StringBuilder();

foreach(var word in words)
{
	stringBuilder.AppendLine(word);
}

Console.WriteLine(stringBuilder.ToString());

🟢 string.Create()

Official Reference

Post via Dave Callan

If you know the length of your final string, perhaps you're concatenating various strings to one, you can use string.Create() to speed up the final string creation with minimal allocations.

It looks a complex, but this method allows us to manipulate the string as if it were mutable inside the lambda via a Span<T> and guarantees us an immutable string afterwards.

Example via Stephen Toub

string s = string.Create(34, Guid.NewGuid(), (span, guid) => 
{
	"ID".AsSpan().CopyTo(span);
	guid.TryFormat(span.Slice(2), out _, "N");
});

Console.WriteLine(s); // ID3d1822eb1310418cbadeafbe3e7b7b9f

🟢 Don't use async with large SqlClient data

There is a long time open dotnet GitHub issue where larger returned query data takes a lot longer with async compared to sync.

No example

🟢 Use streams with HttpClient

Performance Improvements in .NET 8 by Stephen Toub

HttpClient benchmarking via Niko Uusitalo

If we work with streams out of HttpClient we can eliminate a lot of memory copies.

The following example assumes the response is a JSON string:

public async Task GetStreamAsync()
{
    using var stream = await _httpClient.GetStreamAsync(_url);

    var data = await JsonSerializer.DeserializeAsync<List<WeatherForecast>>(stream);
}

In fact, this is such a common operation that we have a stream convenience overload for JSON:

public async Task GetFromJsonAsync()
{
    var data = await _httpClient.GetFromJsonAsync<List<WeatherForecast>>(_url);
}

These two examples above eliminate copying the response from the stream to a JSON string then the JSON string into our objects.

🟢 EF Core compiled queries

Official Article

There is also a CompileAsyncQuery available too.

// EF Core compiled query
private static readonly Func<MyDbContext, IEnumerable<User>> _getUsersQuery =
	EF.CompileQuery((MyDbContext context) => context.Users.ToList());

public IEnumerable<User> GetUsers()
{
	return _getUsersQuery(_context);
}

🟢 Keep "exceptions, exceptional"

Via Kevlin Henney

Some numbers via Peter Morris

In distributed systems especially, networking issues become business as usual, not exceptional circumstances. As an example, instead of HttpResponseMessage.EnsureSuccessStatusCode() you may want to consider HttpResponseMessage.IsSuccessStatusCode then deal with it accordingly sans exception.

No example

🟡 Async

Official Article

Asynchronous programming allows us to continue with other tasks while waiting for I/O bound calls or expensive computations. In .NET we use the async and await keywords. There's too much to cover for this documentation, but once you get used to it, it's easy enough to use daily.

The user needs to ensure they put the await keyword in for an async method call.

public async Task<string> GetFileContentsAsync(string path)
{
	return await File.ReadAllTextAsync(path);
}

🟡 ValueTask

Official DevBlog by Stephen Toub

Official Reference for ManualResetValueTaskSourceCore

Tasks are very flexible to suit the varied needs asked of them. However, a lot of use cases of Task are simple single awaits and it may also be the case that during high throughput code there could be a performance hit of the allocations from Task. To solve these problems we have ValueTask which allocates less than Task. However there are caveats with the less flexibility compared to Task such as you shouldn't await a ValueTask more than once.

var result = await GetValue();


public async ValueTask<int> GetValue()
{
	// implementation omitted
	return value;
}

🟡 Structs

Official Article

Class vs Struct in C#: Making Informed Choices via NDepend blog

Value vs Reference type allocation locations via Konrad Kokosa

There are times when structs may help us with performance compared to a class. Structs are lighter than classes and often end up on the stack meaning no allocations for the GC to deal with. It's better to use them in place of simple objects.

However, there are a few caveats here including the size and complexity of the struct, nullability, and no inheritance.

var coordinates = new Coords(0, 0);

public struct Coords
{
    public Coords(double x, double y)
    {
        X = x;
        Y = y;
    }

    public double X { get; }
    public double Y { get; }

    public override string ToString() => $"({X}, {Y})";
}

🟡 Parallel.ForEach()

Official Reference

Official Article

Parallel.ForEach() works like executing a foreach loop in parallel. Best used if you have lots of independent CPU bound work.

Be wary of threading and concurrency issues.

There is also Parallel.ForEachAsync().

List<Image> images = GetImages();

Parallel.ForEach(images, (image) => {
	var resizedImage = ResizeImage(image);
	SaveImage(image);
});

🟡 Span<T>, ReadOnlySpan<T>, Memory<T>

Span<T> Official Reference

ReadOnlySpan<T> Official Reference

Memory<T> Official Reference

Memory<T> and Span<T> usage guidelines

A Complete .NET Developer's Guide to Span with Stephen Toub

Turbocharged: Writing High-Performance C# and .NET Code - Steve Gordon

All three of these "represent an arbitrary contiguous region of memory". Another way it's commonly explained is these can be thought of as a window into the memory.

For our short purposes here, the performance gains are via low level reading and manipulation of data without copying or allocating. We use them with arrays, strings, and more niche data situations.

There are caveats such as Span types only being put on the stack, meaning they can't be used inside async calls (which is what Memory<T> allows for).

int[] items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];

// Technically we can use the slicing overload of AsSpan() as well
var span = items.AsSpan();
var slice = span.Slice(1, 5); // 1, 2, 3, 4, 5

🟡 Faster loops with Span<T> and ReadOnlySpan<T>

The weirdest way to loop in C# is also the fastest - Nick Chapsas

Improve C# code performance with Span via NDepend blog

Don't mutate the collection during looping.

For arrays and strings, we can use AsSpan().

int[] items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var span = items.AsSpan();

for (int i = 0; i < span.Length; i++)
{
    Console.WriteLine(span[i]);
}

However for lists we need to use CollectionsMarshal.AsSpan()

List<int> items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
var span = CollectionsMarshal.AsSpan(items);

for (int i = 0; i < span.Length; i++)
{
    Console.WriteLine(span[i]);
}

If a Span<T> isn't viable, you can create your own enumerator with ref and readonly keywords. Information can be found at Unusual optimizations; ref foreach and ref returns by Marc Gravell.

🟡 ArrayPool<T>

Official Reference

How to use ArrayPool and MemoryPool in C# by Joydip Kanjilal

Allows us to rent an array of type T that has equal to or greater size than what we ask for. This helps us reduce allocations on hot paths as instead of creating and destroying array instances, we reuse them.

Note: you can also use the more flexible MemoryPool, however there are further caveats to that as it allocates to understand the owner of the item - which ArrayPool avoids.

var buffer = ArrayPool<int>.Shared.Rent(5);

for (var i = 0; i < 5; i++)
{
    buffer[i] = i * 2;
}

for (var i = 0; i < 5; i++)
{
    Console.WriteLine(buffer[i]);
}

ArrayPool<int>.Shared.Return(buffer);

🟡 StringPool

Official Reference

Official Article

This can be thought of as configurable/managed string interning with useful methods. Brought to us by the .NET Community Toolkit.

public string GetString(ReadOnlySpan<char> readOnlySpan)
{
	return StringPool.Shared.GetOrAdd(readOnlySpan);
}

🟡 ObjectPool

Official Reference

Official Article

A general use pool. Has varying ways to setup.

var objectPool = ObjectPool.Create<HeavyObject>();

var pooledObject = objectPool.Get();

// then some work

objectPool.Return(pooledObject);

🟡 RecyclableMemoryStream

Official GitHub Repo

This is designed to be a near-drop in replacement for MemoryStream objects, with configuration and metrics.

To use an example from the link above:

private static readonly RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();

static void Main(string[] args)
{
	var sourceBuffer = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7 };
	
	using (var stream = manager.GetStream())
	{
		stream.Write(sourceBuffer, 0, sourceBuffer.Length);
	}
}

🟡 Server Garbage Collection

Garbage Collection Article

Workstation GC Official Article

Runtime configuration options for garbage collection

.NET Memory Performance Analysis section with server GC

Twitter post with graphs from Sergiy Teplyakov

DevBlog post from Sergiy Teplyakov

Blog post about .NET 8 GC feature: DATAS by Maoni Stephens

Having concurrent threads help with garbage collection can minimise the GC pause time in an application. However there are caveats to do with the amount of logical processors and how many applications are running on the machine at once.

No example

🟡 Ahead of Time Compilation (AOT)

Native AOT Deployment Official Guide

.NET AOT Resources: 2022 Edition (Self plug)

AOT allows us to compile our code natively for the target running environment - removing the JITer. This allows us to have a smaller deployment footprint and a faster startup time.

Caveats here is that reflection (used in a lot of .NET) is extremely limited, we have to compile for each target environment, no COM, and more. We can get around some of these limitations with source generators.

No example

🔴 stackalloc

Official Article

High performance byte/char manipulation via David Fowler

My favourite examples

Dos and Don'ts of stackalloc via Kevin Jones

stackalloc gives us a block of memory on the stack. We can easily use it with Span<T> and ReadOnlySpan<T> to work as small buffers. Highly performance stack-only work, no heap, no garbage collector.

A caveat here is it must be small to prevent stack overflows. I've seen developers at Microsoft use a max size of 256 bytes.

Example from David Fowler, linked above.

// Choose a small stack threshold to avoid stack overflow
const int stackAllocThreshold = 256;

byte[]? pooled = null;

// Either use stack memory or pooled memory
Span<byte> span = bytesNeeded <= stackAllocThreshold
    ? stackalloc byte[stackAllocThreshold]
    : (pooled = ArrayPool<byte>.Shared.Rent(bytesNeeded);

// Write to the span and process it
var written = Populate(span);
Process(span[..written]);

// Return the pooled memory if we used it
if (pooled is not null)
{
    ArrayPool<byte>.Shared.Return(pooled);
}

🔴 MemoryMarshal

Official Reference

More easily allows us to interoperate with Memory<T>, ReadOnlyMemory<T>, Span<T>, and ReadOnlySpan<T>. A lot of performant Unsafe (class) code is abstracted away from us when using MemoryMarshal.

There are little safeties granted to us in MemoryMarshal when compared to raw Unsafe calls, so the user must keep in mind that while they don't see Unsafe they are still at risk of the same type safety risks.

Note: This advice roughly applies to all within the System.Runtime.InteropServices namespace, including MemoryMarshal.

The following adapted from Immo Landwerth:

ReadOnlySpan<byte> bytes = MemoryMarshal.AsBytes(input.String.AsSpan());
return HashData(bytes);

Guid HashData(ReadOnlySpan<byte> bytes)
{
	var hashBytes = (Span<byte>)stackalloc byte[20];
	var written = SHA1.HashData(bytes, hashBytes);

	return new Guid(hashBytes[..16]);
}

🔴 CollectionsMarshal

Official Reference

CollectionsMarshal gives us handles to the underlying data representations of collections. It makes use of MemoryMarshal and Unsafe under the hood and as such any lack of safety that is found there extends to this, such as type safety risks.

Assuming the developer keeps within safety boundaries, this allows us to make highly performant access to the collection.

The following is an absolutely key caveat:

Items should not be added or removed while the span/ref is in use.

While it means more than this, an easy point to remember is: don't add/remove from the collection while using the returned object.

var items = new List<int> { 1, 2, 3, 4, 5 };
var span = CollectionsMarshal.AsSpan(items);

for (int i = 0; i < span.Length; i++)
{
    Console.WriteLine(span[i]);
}

🔴 SkipLocalsInit

C# 9 - Improving performance using the SkipLocalsInit attribute by Gérald Barré

Feature spec

Feature design for SkipLocalsInit

By default the CLR forces the JIT to set all local variables to their default value meaning your variable won't be some leftover value from memory. In high performance situations, this may become noticeable and we can skip this initialization as long as we understand the risk being taken on. Also see Unsafe.SkipInit<T>.

[SkipLocalsInit]
byte SkipInitLocals()
{
    Span<byte> s = stackalloc byte[10];
    return s[0];
}

🔴 unsafe (keyword)

Official Article

Unsafe code, pointer types, and function pointers

The unsafe keyword allows us to work with pointers in C# to write unverifyable code. We can allocate memory without safety from the garbage collector, use pointers, and call methods with function pointers.

However, we are on our own. No GC collection, no security guarantee, no type safety, and other caveats.

Example via documentation:

class UnsafeTest
{
    // Unsafe method: takes pointer to int.
    unsafe static void SquarePtrParam(int* p)
    {
        *p *= *p;
    }

    unsafe static void Main()
    {
        int i = 5;
        // Unsafe method: uses address-of operator (&).
        SquarePtrParam(&i);
        Console.WriteLine(i);
    }
}

🔴 Unsafe (class)

Official Reference

Safer than the unsafe keyword, the Unsafe class allows us to do lower level manipulation for performant code by supressing type safety while still being tracked by the garbage collector. There are caveats, especially around type safety.

var items = new Span<int>([ 0, 1, 2, 3, 4 ]);
ref var spanRef = ref MemoryMarshal.GetReference(items);

var item = Unsafe.Add(ref spanRef, 2);

Console.WriteLine(item); // prints 2

🔴 Ref fields (with UnscopedRef)

Twitter post via Konrad Kokosa

Gist via Konrad Kokosa

Low Level Struct Improvements proposal

UnscopedRefAttribute documentation

For more complex data structures, having your deep down property as a ref could improve speed.

public struct D
{
    public int Field;

    [UnscopedRef]
    public ref int ByRefField => ref Field;
}

🔴 Overriding GetHashCode() and Equals for Structs

Official DevBlog

Jon Skeet SO answer

Struct equality performance in .NET - Gérald Barré

The default value type implementation of GetHashCode() trades speed for a good hash distribution across any given value type - useful for dictionary and hashset types. This happens by reflection, which is incredibly slow when looking through our micro-optimization lens.

An example from the Jon Skeet answer above of overriding:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

🔴 Even Faster Loops

The weirdest way to loop in C# is also the fastest - Nick Chapsas

MemoryMarshal.GetReference Documentation

Unsafe.Add Documentation

Using both MemoryMarshal, CollectionsMarshal, and Unsafe we're able to loop directly on the underlying array inside a List<T> and index quickly to the next element.

Do not add or remove from the collection during looping.

var items = new List<int> { 1, 2, 3, 4, 5 };
var span = CollectionsMarshal.AsSpan(items);
ref var searchSpace = ref MemoryMarshal.GetReference(span);

for (int i = 0; i < span.Length; i++)
{
    var item = Unsafe.Add(ref searchSpace, i);
    Console.WriteLine(item);
}

🔴 Fastest Loops

I Lied! The Fastest C# Loop Is Even Weirder - Nick Chapsas

With this we've mostly removed the safety .NET provides us in exchange for speed. It is also difficult to understand at a glance without being familiar with MemoryMarshal methods.

Do not add or remove from the collection during looping.

int[] items = [1, 2, 3, 4, 5];

ref var start = ref MemoryMarshal.GetArrayDataReference(items);
ref var end = ref Unsafe.Add(ref start, items.Length);

while (Unsafe.IsAddressLessThan(ref start, ref end))
{
    Console.WriteLine(start);
    start = ref Unsafe.Add(ref start, 1);
}

🔴 Removing Closures

How we achieved 5X faster pipeline execution by removing closure allocations by Daniel Marbach

Dissecting the local functions in C# 7 by Sergey Tepliakov

StackOverflow question on closures

In-line delegates and anonymous methods that capture parental variables give us closure allocations. A new object is allocated to hold the value from the parent for the anonymous method.

We have a few options including, but not limited to:

  1. If using LINQ, moving that code into a regular loop instead
  2. Static lambdas (lambdas are anonymous functions) that accept state
    1. Further helpful if the static lambda is generic to removing boxing

Example via the above link by Daniel Marbach:

static void MyFunction<T>(Action<T> action, T state) => action(state);

int myNumber = 42;
MyFunction(static number => Console.WriteLine(number), myNumber);

🔴 ThreadPool

Official Reference

Offical Guide

If you're going to use low level threads, you'll probably be using ThreadPool as it manages the threads for us. As .NET has moved forwards, a lot of raw thread or thread pool usage has been superseded (in my experience) by PLINQ, Parallel foreach calls, Task.Factory.StartNew(), Task.Run() etc - I.E. further abstraction over threads.

The following is via the documentation above:

ThreadPool.QueueUserWorkItem(ThreadProc);
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);

Console.WriteLine("Main thread exits.");

static void ThreadProc(Object stateInfo) 
{
    // No state object was passed to QueueUserWorkItem, so stateInfo is null.
    Console.WriteLine("Hello from the thread pool.");
}

🔴 Vectorizing

Use SIMD-accelerated numeric types documentation

SIMD-accelerated numeric types

Vectorization Guidelines

Great self-learning post by Alexandre Mutel

Single Instruction, Multiple Data (SIMD) allows us to act on multiple values per iteration rather than just a single value via vectorization. As .NET has moved forward, it's been made easier to take advantage of this feature.

Vectorizing adds complexity to your codebase, but thankfully under the hood, common .NET methods have been written using vectorization and as such we get the benefit for free. E.g. string.IndexOf() for OrdinalIgnoreCase.

The following is a simple example:

// Initialize two arrays for the operation
int[] array1 = [1, 2, 3, 4, 5, 6, 7, 8];
int[] array2 = [8, 7, 6, 5, 4, 3, 2, 1];
int[] result = new int[array1.Length];

// Create vectors from the arrays
var vector1 = new Vector<int>(array1);
var vector2 = new Vector<int>(array2);

// Perform the addition
var resultVector = Vector.Add(vector1, vector2);

// Copy the results back into the result array
resultVector.CopyTo(result);

// Print the results
Console.WriteLine(string.Join(", ", result));  // Outputs: 9, 9, 9, 9, 9, 9, 9, 9

🔴 Inline Arrays

C# 12 release notes

Inline array language proposal document

Fun example via David Fowler

The documentation states:

You likely won't declare your own inline arrays, but you use them transparently when they're exposed as System.Span<T> or System.ReadOnlySpan<T> objects from runtime APIs

Inline arrays are more a runtime feature for the .NET development team at Microsoft as it allows them to give us featurs such as Span<T> and interaction with unmanaged types. They're essentially fixed-sized stack allocated buffers.

Example from documentation:

[InlineArray(10)]
public struct Buffer
{
    private int _element0;
}

var buffer = new Buffer();
for (int i = 0; i < 10; i++)
{
    buffer[i] = i;
}

foreach (var i in buffer)
{
    Console.WriteLine(i);
}

🔴 SuppressGCTransition

Official Reference

Great writeup by Kevin Gosse

This attribute prevents the thread transition from cooperative GC move to preemptive GC mode when applied to a DllImport. It shaves only nanoseconds and has many caveats for usage.

Example via Kevin Gosse:

public int PInvoke_With_SuppressGCTransition()
{
	[DllImport("NativeLib.dll")]
	[SuppressGCTransition]
	static extern int Increment(int value);
}

🔴 GC.TryStartNoGCRegion()

Official Reference

Preventing .NET Garbage Collections with the TryStartNoGCRegion API by Matt Warren

Used for absolutely critical hotpaths, TryStartNoGCRegion() will attempt to disallow the garbage collection until the corresponding EndNoGCRegion() call. There are many caveats here that may throw exceptions, and lead to accidental misuse.

if (GC.TryStartNoGCRegion(10000))
{
	Console.WriteLine("No GC Region started successfully.");

	int[] values = new int[10000];

	// do work

	GC.EndNoGCRegion();
}
else
{
	Console.WriteLine("No GC Region failed to start.");
}

🔴 Alignment

DevBlog Post

Example via Egor Bogatov

Alignment is around the CPU caches and we can spend extra effort in making sure our data is fitted to the CPU cache size. However, the CLR will put in a lot of effort to align this for us by adding padding if needed.

However there may be times where we may have some influence such as with setting the StructLayout attribute on a struct or how we control our nested loops accesses.

No example

🔴 [MethodImpl(MethodImplOptions.AggressiveInlining)]

Official Reference

While there are many in the MethodImplOptions enum, I'd wager the most common is MethodImplOptions.AggressiveInlining which takes a piece of code and inserts it where the caller wants it as opposed to making a call out to a separate location, if possible.

It's more appropriate to use this for small functions that are very hot. Caveats here is that it can increase the size of your application and could make it slower overall.

The following example is from System.Math

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Clamp(byte value, byte min, byte max)
{
	if (min > max)
	{
		ThrowMinMaxException(min, max);
	}

	if (value < min)
	{
		return min;
	}
	else if (value > max)
	{
		return max;
	}

	return value;
}

🔴 Improving ReadOnlySequence<T> performance

Official Reference

Twitter post via @neuecc

Cysharp/MemoryPack benchmark test

The Slice() operation on a ReadOnlySequence<T> is slow and can be worked around by wrapping the ReadOnlySequence<T> with another struct and copying it into a Span<T> and using the Slice() operation on span.

The following example is a simplified version from the MemoryPack benchmark tests.

ref struct SpanWriter
{
    Span<byte> raw;

    public SpanWriter(byte[] buffer)
    {
        raw = buffer;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Advance(int count)
    {
        raw = raw.Slice(count);
    }
}

To be written