Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make localdate queries behave like canonical datetime range filters #968

Merged
merged 2 commits into from
May 17, 2024

Conversation

JGreenlee
Copy link
Contributor

The existing implementation that was here was addressing the localdate fields separately and treating them like a set of filters rather than one date range from start -> end. The "set of filters" was good enough for some things, like filtering to a specific month, but not for the canonical datetime range filtering where you want everything between an arbitrary start and end datetime. The complexity in the logic to form the necessary NoSQL queries is more than expected, but after the dust settles it's about the same amount of code as the old implementation was.

In short, this implementation finds what units are used by both start and end localdates, walks down them (from largest unit to smallest unit), and combines $and and $or blocks to determine the upper and lower bound

The existing implementation that was here was addressing the localdate fields separately and treating them like a set of filters rather than one date range from start -> end.
The "set of filters" was good enough for some things, like filtering to a specific month, but not for the canonical datetime range filtering where you want everything between an arbitrary start and end datetime.
The complexity in the logic to form the necessary NoSQL queries is more than expected, but after the dust settles it's about the same amount of code as the old implementation was.

In short, this implementation finds what units are used by both start and end localdates, walks down them (from largest unit to smallest unit), and combines $and and $or blocks to determine the upper and lower bound
@JGreenlee JGreenlee marked this pull request as draft May 16, 2024 07:41
@JGreenlee
Copy link
Contributor Author

The failing test:

    def testLocalRangeRolloverQuery(self):
        """
        Search for all entries between 8:18 and 9:08 local time, both inclusive
        """
        start_local_dt = ecwl.LocalDate({'year': 2015, 'month': 8, 'hour': 8, 'minute': 18})
        end_local_dt = ecwl.LocalDate({'year': 2015, 'month': 8, 'hour': 9, 'minute': 8})
        final_query = {"user_id": self.testUUID}
        final_query.update(esdl.get_range_query("data.local_dt", start_local_dt, end_local_dt))
        entries = edb.get_timeseries_db().find(final_query).sort('data.ts', pymongo.ASCENDING)
        self.assertEqual(448, edb.get_timeseries_db().count_documents(final_query))

        entries_list = list(entries)

        # Note that since this is a set of filters, as opposed to a range, this
        # returns all entries between 18 and 8 in both hours.
        # so 8:18 is valid, but so is 9:57
        self.assertEqual(ecwe.Entry(entries_list[0]).data.local_dt.hour, 8)
        self.assertEqual(ecwe.Entry(entries_list[0]).data.local_dt.minute, 18)
        self.assertEqual(ecwe.Entry(entries_list[-1]).data.local_dt.hour, 9)
        self.assertEqual(ecwe.Entry(entries_list[-1]).data.local_dt.minute, 57)

Since the query returned from get_range_query now behaves like a proper datetime range query, we should only see the entries in the time between 8:18 and 9:08. 9:57 is not valid anymore.

@JGreenlee
Copy link
Contributor Author

The most recent run of the test found 232 entries, but expected 448. That is because it was expecting entries after 9:08.
I inspected shankari_2015-aug-27 and verified that there are indeed 232 entries if 9:08 (inclusive) is the cutoff.

I am going to modify the test to expect 232 entries.

@JGreenlee JGreenlee marked this pull request as ready for review May 16, 2024 17:54
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGreenlee I am fine with merging this for now to unblock the dashboard changes, but this is fairly complex, so could you please add in some comments, and maybe examples of the comparison query outputs so we can understand the code better.

if len(gte_lte_query) > 0:
query_result.update({curr_field: gte_lte_query})
def get_range_query(field_prefix, start_ld, end_ld):
units = [u for u in ecwl.DATETIME_UNITS if u in start_ld and u in end_ld]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, what if u is in start_ld but not in end_ld. If I specify a month in start_ld but no month in end_ld, I intuitively want to get all trips after the start but be open-ended wrt the end. Alternatively, you should check that there always is a matching pair as I did before.

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, at a high level, this converts the local date queries into ranges, but drops the old filtering capability completely. For local dates, I think we actually want both - I want to be able to see data for a range based on local time, but I also want to be able to see only weekday trips versus weekend trips (for example).

Again, the current implementation was a weird mishmash of range and filter, so I am glad we fixed it. I am merging this now, but I do think we should add back filter support at some point, maybe when we actually need it.

@shankari shankari merged commit 36432a8 into e-mission:master May 17, 2024
5 checks passed
@JGreenlee
Copy link
Contributor Author

Will address these comments when I get a chance – trying to finish the metrics implementation first

JGreenlee added a commit to JGreenlee/e-mission-server that referenced this pull request Jun 7, 2024
I recently rewrote the local dt queries used by TimeComponentQuery (e-mission#968) to make them behave as datetime range queries. But this results in some pretty complex queries which could have nested $and and $or conditions. It felt overengineered but that was the logic required if we were to query date ranges by looking at local dt objects where year, month, day, etc are all recorded separately.
Unfortunately, those queries run super slowly on production. I think this is because the $and / $or conditions prevented MongoDB from optimizing efficiently via indexing

Looked for a different solution, I found something better and simpler.
At first I didn't think using fmt_time fields with $lt and $gt comparisons would work becuase they're ISO strings, not numbers.
But luckily MongoDB can compare strings this way. it performs a lexicographical comparison where 0-9 < A-Z < a-z (like ASCII).
ISO has the standard format 0000-00-00T00:00:00
The dashes and the T will always be in the same places, so effectively, only the numbers will be compared. And the timezone info is at the end, so it doesn't get considered as long as we don't include it in the start and end inputs.
The only thing that specifically needs to be handled is making the end range inclusive. If I query from "2024-06-01" to "2024-06-03", an entry like "2024-06-03T23:59:59-04:00" should match. But lexicographically, that comes after "2024-06-03".
Appending a "Z" to the end range solves this.

The result of all this is that the TimeQuery class (timequery.py) is now able to handle either timestamp or ISO dates, and this is what metrics.py will use for summarize_by_yyyy_mm_dd.
JGreenlee added a commit to JGreenlee/e-mission-server that referenced this pull request Jun 14, 2024
e-mission#970 (comment)

reverts e-mission#968 back to the original implementation since we now have a faster way to do "range" queries via FmtTimeQuery.

To make it clearer that FmtTimeQuery is for ranges and TimeComponentQuery is for filters, I renamed some functions and added more descriptive comments/docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Address code reviews with "future"
Development

Successfully merging this pull request may close these issues.

2 participants