Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default serialization settings do not handle DateTime values according to Cosmos documented recommendations #4904

Open
DaRosenberg opened this issue Nov 23, 2024 · 3 comments
Labels
customer-reported Issue created by a customer needs-investigation

Comments

@DaRosenberg
Copy link

DaRosenberg commented Nov 23, 2024

Describe the bug

Serializing a DateTime into the ISO 8601 format used by default in Newtonsoft.Json, or the extended ISO 8601-1:2019 profile used by default in System.Text.Json, omits fraction digits if the fraction is 0. When time zone information is included in the DateTime being serialized this leads to unexpected results when querying because the resulting string values cannot be correctly lexicographically compared. For example 2024-11-23T12:30:00Z will be considered greater than 2024-11-23T12:30:00.123Z in Cosmos queries, even though when compared as actual DateTime values, the opposite is in fact true.

For this reason the Cosmos documentation recommends always serializing DateTime values in full round-trip format:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/working-with-dates#storing-datetimes

Serializing into full round-trip format (.NET standard format string "O") preserves exactly 7 fraction digits even if the fraction is exactly 0. This solves the correctness problem by making all serialized DateTime values lexicographically comparable. In addition, it retains compatibility with the default ISO 8601 formats, because those allow for the fraction digits to be present even if they are all 0.

Despite this recommendation the default configuration of the Cosmos SDK for .NET (regardless of which serializer is used) makes no effort to configure for this guideline.

Expected behavior

I would expect the official Microsoft-provided .NET SDK for Cosmos DB to have default configuration that is compliant with the Cosmos recommendations.

Actual behavior

Instead, the SDK makes has no such configuration, and instead chooses to rely on the default behavior of the supported serializers, which leads to incorrect results that can be very hard to discover.

Workaround

To work around the issue, this converter can be used with the System.Text.Json serializer to ensure that serialization uses the full round-trip format. Note that conversion is only done on serialization - not during deserialization. This is because the round-trip format is compliant with the extended ISO 8601-1:2019 profile used by default. The default deserialization behavior parses up to 7 fraction digits, even if they are all zero.

public class RoundTripDateTimeConverter : JsonConverter<DateTime>
{
    public override DateTime Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        return reader.GetDateTime();
    }

    public override void Write(Utf8JsonWriter writer, DateTime value, JsonSerializerOptions options)
    {
        // The "O" standard format will always be between 27 and 33 characters depending on DateTime kind and time zone.
        Span<byte> utf8Date = new byte[33];

        if (!Utf8Formatter.TryFormat(value, utf8Date, out var numBytes, new StandardFormat('O')))
        {
            throw new FormatException("Failed to format the input DateTime value into a round-trip DateTime string.");
        }

        writer.WriteStringValue(utf8Date[..numBytes]);
    }
}

Environment summary
SDK Version: 3.46.0

@cliedeman
Copy link

Also investigating this while starting a new project using cosmos with the Ef Core Provider
The Ef core formatting only use the Z style when DateTime.Kind = Utc.

From the docs
Image

I have the following document

{
    "id": "1",
    "t1": "2021-01-01T02:00:00+02:00",
    "t2": "2021-01-01T02:00:00Z",
    "_rid": "TDIIALfNC78BAAAAAAAAAA==",
    "_self": "dbs/TDIIAA==/colls/TDIIALfNC78=/docs/TDIIALfNC78BAAAAAAAAAA==/",
    "_etag": "\"77033b18-0000-0c00-0000-674373760000\"",
    "_attachments": "attachments/",
    "_ts": 1732473718
}

but this query succeeds

SELECT * FROM c where c.t1 < '2021-01-01T02:01:00+02:00' and c.t1 > '2021-01-01T01:59:00+02:00'

which I fully expect not to work?
So its possible those docs are out of date.
Ef core will also omit the fraction digits so this issue is relevant for the Efcore Cosmos Provider too

@DaRosenberg
Copy link
Author

@cliedeman That query will succeed because the minutes are different, so even lexicographically, the comparison will never reach the point in the string where the presence or absence of fraction digits makes a difference.

@jcageman
Copy link

jcageman commented Dec 5, 2024

We typically add an additional field next to the datetime / datetimeoffsets with utcticks and compare the utcticks to prevent these issues with cosmos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issue created by a customer needs-investigation
Projects
None yet
Development

No branches or pull requests

3 participants