Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add low latency project setting for RenderingDevice #11200

Open
KeyboardDanni opened this issue Nov 21, 2024 · 2 comments
Open

Add low latency project setting for RenderingDevice #11200

KeyboardDanni opened this issue Nov 21, 2024 · 2 comments

Comments

@KeyboardDanni
Copy link

Describe the project you are working on

2D pixel art action games where rendering is lightweight and low latency is preferred.

Describe the problem or limitation you are having in your project

Continuation of godotengine/godot#94898 and by extension godotengine/godot#75830 .

Namely, Godot currently places a hardcoded minimum of 2 on rendering/rendering_device/vsync/frame_queue_size both in the property and the initialization code. This adds a significant amount of unwanted latency - my findings are available here. The issue can be worked around by using the OpenGL renderer, but driver issues combined with limitations in the Compatibility renderer are pushing me toward using Forward+ instead... and input latency is currently the one remaining blocker for me.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

This proposal is intended to address the objections raised in godotengine/godot#94898 by introducing a new project setting and commandline parameter to give both game developers and end users the ability to choose whether to prefer high throughput/framerate or low input latency.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

A new property would be introduced within ProjectSettings: rendering/rendering_device/low_latency_mode (or similar name). This property is an enum that takes on one of the following values:

  • PREFER_HIGH_FRAMERATE (default) - The renderer should respect the values used in frame_queue_size and swapchain_image_count and prioritize higher throughput by allowing the GPU to queue additional frames in advance. Good for benchmarks, demos, and non-action or story-based games where input latency is unimportant, or for games with complex scenes where low latency would have a significant negative impact on the framerate.
  • PREFER_LOW_LATENCY_CONSISTENT - The renderer should use a low latency extension (from AMD or nVidia where available) or use frame_queue_size = 1, to avoid allowing the CPU to run too far ahead of the currently presented frame. The renderer should not attempt to perform any latency reduction that could cause significant jitter in input timing. Intended for use with precision platformers, rhythm games, and other applications where low latency is desired but fluctuations in input timing could cause issues.
  • PREFER_LOW_LATENCY_AGGRESSIVE - The renderer should use a low latency extension or technique, similar to PREFER_LOW_LATENCY_CONSISTENT. The renderer may attempt to perform latency reduction that could cause input jitter, as long as it results in a significant latency improvement. Use of this mode may introduce microstutter based on fluctuations in frame timing across the CPU or GPU.

The purpose of this property enum is to make the developer's intent more explicit. The naming of the default value, PREFER_HIGH_FRAMERATE, is chosen deliberately to indicate that a tradeoff is being made between framerate and low latency. Any documentation should clearly illustrate, with examples, when one would want to use each of the different modes based on the target application.

There is also a deliberate separation between latency reduction modes that may or may not introduce significant input timing jitter. Some applications such as rhythm games rely on being able to "calibrate" audio and video latency as specific, static values during the setup process, and changes in frame timing could negatively impact the experience if latency reduction tries to perform late input reads based on this timing. By making this distinction in the API, the developer can set expectations for what types of latency reduction are and aren't okay.

Since everyone's machine is different and developers don't always expose settings they should, the executable would also receive a new commandline parameter as an "escape hatch" that allows end users to override the setting. When specified, the value set in the low_latency_mode project setting is ignored, changing the setting via script will have no effect, and reading the setting will not indicate the new overridden setting to discourage applications from trying to thwart this mechanism. The values are as follows:

  • --low-latency=off - equivalent to PREFER_HIGH_FRAMERATE
  • --low-latency=on - equivalent to PREFER_LOW_LATENCY_CONSISTENT
  • --low-latency=aggressive - equivalent to PREFER_LOW_LATENCY_AGGRESSIVE

There is currently no expectation that the low latency mode can be changed at runtime (since it would rely on reinitialization of the swapchain, among other things), but it should still be possible for standalone projects to change this setting and have it take effect on engine restart, so that it can be included in in-game settings dialogs at least.

There is no expectation that PREFER_LOW_LATENCY_CONSISTENT or even PREFER_LOW_LATENCY_AGGRESSIVE will have any effect if the vsync mode is "disabled" or "mailbox". However, they should still have an effect if the vsync mode is "adaptive" as traditional vsync is still in play assuming the framerate doesn't drop below the display's refresh rate.

Q: Isn't this bad practice? Why should it be an option in the engine?

A: Depends on what's meant by bad practice. If we're aiming for throughput and high framerates, then yeah, having the CPU wait on the GPU so soon is bad practice, because you're leaving potential performance gains on the table. But sometimes the scene is basic enough that almost nothing is gained from queueing this many frames in advance, and the emergence of latency reduction technologies from nVidia and AMD shows that there is a demand for low latency gaming (and it may also be a requirement for frame generation). The purpose of the proposed enum is for the developer to make an informed decision between high framerate and low latency (and to allow users to override this decision should the developer stop updating their game).

(And also because having to make a custom build of Godot means antivirus is more likely to throw a false positive)

Q: Why is the enum so high-level? Why not have specific project settings for nVidia Reflex, AMD Anti-Lag, etc.?

A: Right now, low-latency extensions are vendor-specific, and in the case of Vulkan, are not widely available according to gpuinfo.org (1) (2) even amongst the vendors they were designed for. Both low-lag extensions and waitable swapchains are highly dependent on the current platform, hardware, API, Godot renderers, and system drivers. For now, we can start with just setting frame_queue_size = 1 and in the future, renderer backends can implement support for additional extensions as they see fit to cover gaps in functionality or improve the user experience in general, while minimizing surprises on the game developer side. Basically, the game developer just needs to express the intent, and the API will try to make a guarantee that only certain forms of latency reduction will be performed. In the future, we could add an additional project setting allowing for the selection of a specific low latency backend, but for now I don't think this amount of low level control is worth it given the current state of things.

Q: Doesn't using a DXGI swapchain directly fix this issue?

A: This is one part of the puzzle. Moving to a DXGI swapchain for OpenGL and Vulkan will improve the current situation, but still not quite bring the latency down to par with OpenGL. Ideally, DXGI should be used together with the proposed project setting for best effect. Again my test results are documented over here.

Q: Why not just have people buy a gaming monitor?

A: Because not everyone can afford a 240hz display, and I want users on 60hz displays to have the best possible experience.

Q: Why not just disable vsync?

A: Because that causes tearing, which can be distracting unless you crank the refresh rate way up (see above).

If this enhancement will not be used often, can it be worked around with a few lines of script?

No.

Is there a reason why this should be core and not an add-on in the asset library?

This is core rendering functionality.

@Calinou
Copy link
Member

Calinou commented Nov 21, 2024

PS: I'm curious what happens to input lag if you call glFinish() every frame in Godot's OpenGL code. This should act in a similar fashion to allowing 0 queued frames in other rendering APIs.

@KeyboardDanni
Copy link
Author

KeyboardDanni commented Nov 21, 2024

I could be getting my wires crossed, but far as I'm aware you can't really have 0 queued frames, assuming that the frame currently being rendered is considered one of those "queued frames". You can have something that's close to effectively being 0 queued frames, but that would involve performing late input reads.

glFinish() is an interesting beast. When it comes to synchronization, the OpenGL specification gives drivers a lot of leeway, so some implementations may not see much effect from glFinish(), and I think there may have been past instances where it actually increased input lag.

But I still went ahead and tried inserting a call to glFinish() after DisplayServer::get_singleton()->swap_buffers(); in RasterizerGLES3::gl_end_frame. On nVidia + Windows 11, the results were 2 frames with layered DXGI, roughly 2 frames with Native windowed, and surprisingly, only 1 frame with Native fullscreen. This is with vsync enabled and VRR disabled at the monitor level. For comparison, without glFinish(), there are 3 frames of lag with Native fullscreen (though layered DXGI and Native windowed remain unchanged).

So it does seem like one frame of video lag is attainable after all! Keep in mind, this measurement does not account for input latency. The tester I wrote is strictly testing swapchain-induced delay, and other testing methods will be needed to measure the additional latency from input handling.

These results also explain why I was seeing improved latency with layered DXGI on OpenGL but not Vulkan. Turns out, layered DXGI is adding a frame of lag on both APIs! It's just that OpenGL was allowing more frames to queue up under Native.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants