-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calendarDuration colon operator bad results: over hi bound, wrong month-end normalization, etc #135
Comments
Interesting. This looks like it's somewhat about how Leap Day is handled when adding a calendar-year increment.
That last vec1(4) of 2026-03-01, higher than fecha2's 2025-01-1, is just wrong. I bet there's an off-by-one bug in the end condition there. As for the other differences: 2020 was a leap year, so Feb 29 2020 existed. But there's no Feb 29 in 2022. My code is normalizing it up to March 1, like what happens when you pass "denormalized" out-of-range values to a datetime() constructor call.
But it looks like when adding a calyear (or calmonth), Matlab normalizes the nonexistent Feb 29 leap day down to Feb 28 instead, on non-leap years. I wonder if there's documentation or reference for this behavior. The Matlab doco page for `calyears says "Calendar years account for leap days when used in calendar calculations", but it doesn't say how it accounts for them. (Same for the calmonths and calendarDuration pages, to my eyes.) Matlab datetimes and datenums are ISO calendar dates, I believe. I wonder if the ISO date/time standards specify how calendar interval arithmetic works, and how to handle leap years in them. The other aspect of this seems to be how calendar intervals work when used as the middle operand in a "lo:d:hi" two-colon expression. It is surprising to me that
In this case, once But instead, it looks like it's doing
That sounds more like (Octave's documentation for colon isn't as detailed and doesn't seem to provide any guidance here.) Fixing the upper bound bug is straightforward. I'll go ahead and do that. I want to do a bit more reading to better understand the definition/specification here around the leap day normalization and "stride". (If you have any relevant links or books, I'd be happy to hear them.) But it sounds like Tablicious here needs two changes for Matlab compatibility: have datetime + calyears normalize down instead of up for nonexistent Feb 29s, and change the colon operator to do |
About the differences with Matlab in the leap day topic, I think the better option is to match the behavior of Matlab. Some developers have said sometimes that Matlab-compatibility is a goal (see, for example, the comment of Philip Nienhuis (comment number 9) in https://savannah.gnu.org/bugs/index.php?60107) |
Yeah, Matlab compatibility is a main goal for Tablicious too, and that's almost certainly the right thing to do here. I just wish I knew why Matlab was doing it this way, and the issues around it in the "calendrical computations" area, so I could do a more complete fix. I like to have a decent understanding of the code I'm writing and the behavior it's trying to do. That way, I may be able to catch other related issues in that area (like, maybe leap seconds or daylight saving time have similar effects) and get them right too, maybe in an "elegant" way in terms of the classes' internal data representation, instead of fixing individual problems in a patchwork manner as they're discovered. In particular, maybe this "leap day / add 1 calyear" behavior isn't limited to how Either way, I plan to make this behavior change to try to match Matlab, even if I can't find more thorough documentation on it. |
Okay, I think this is actually a bigger issue than just leap days. I think it applies to any calyears or calmonths calendarDuration addition that would end up with a "denormalized" date where the day-of-month value is larger than the number of days in that month. I think Matlab's caldur behavior "caps" or rounds it down to the last day within that month, and then preserves the time-of-day part. For example, January has 31 days. February has either 28 or 29 days. So, take 2004-01-31 and add 1 calendar month, or 1 cal year and 1 cal month. Tablicious currently does this:
But I think, to match Matlab, it should produce Feb 29 2004 and Feb 28 2005 instead. This is different from how the datenum function and datetime constructor work, where denormalized days "carry over" or advance the value into the next month. E.g., The logicSo, I think the fix here is to change the calendarDuration addition logic to be:
And subtraction would do the same thing, except multiplying all the caldur components by -1 first. I think this agees with the Matlab calendarDuration doco page which says, in the Tips section, "When you add a calendarDuration array that contains more than one unit to a datetime, MATLAB® always adds the larger units first.". Note that this means that calendarDuration values do not have the normal commutative and associative math behavior that regular numbers do. (Even beyond floating-point rounding issues.) These following operations will not necessarily produce the same result.
Data structureWhile I'm in here, I think Tablicious's calendarDuration structure can be refactored to simplify it. The Sign property can be removed, letting the sign of the Years/Months/Days/Time numeric values handle it. Plus, Years and Months can be combined into a single Months property, since here a calyear is defined to be 12 calendar months. Splitting them in to years, months, and quarters can be done as part of the display formatting instead. I think that'll make for simpler code in the implementation, including in the display formatting. Plus, smaller data structure using less memory, though that probably doesn't matter, because who's going to be making big calendarDuration arrays? Unless we do so as a temporary value in the Maybe drop the IsNan property too, and just carry NaN values in the Months/Days/Time properties, normalizing them to all be NaN when any is set to NaN. They're private-access, so we can control that. Fractional cal year and month valuesAnd I think we need to prohibit fractional (non-integer) values for calyears and calmonths components. Because they're variable-length things, doing fractional operations on them may not be well-defined, or if they are, may not really be sensible, or at least would be pretty difficult to implement. I think Matlab prohibits them. We should too, to match that behavior, and keep our code and behavior simpler. Like, this here. That shouldn't be a thing.
|
In Matlab (R2024a) the results are
calyears() and calmonths() in Matlab works only with integer values of years and months. There ir also a caldays() function
|
…eatedly added inc for caldurs Addresses #135. The hi bound test was a dumb off-by-one-ish logic bug in the control flow logic. Changing the calendarDuration inc behavior to multiply the inc on each loop and add it to the original lo, instead of repeatedly adding the inc to a working value, handles "displacement" due to variable-length months and the like, where the working value can get jiggered around by month-end normalization (capping to last day of month) and similar stuff. This doesn't fix the actual month-end capping/normalization for calendarDuration arithmetic, but makes colon support it when it does get fixed, over in datetime.plus or calendarDuration.plus. That fix should be coming soon.
Okay, I think I've got a fix for the bogus upper-bound behavior over on branch caldur-math-fixes, in b26975f. This is a partial fix for this bug report. This fixes basically an off-by-one error in the upper-bound test and loop logic in the calendarDuration part of datetime.colon. This also changes the datetime.colon logic for calendarDuration to multiply the inc (caldur) and add it to lo for each successive element, to get rid of "jigger" from varying-length periods. This will make it respect the month-end "capping" normalization, once that gets added to datetime.plus. Also adds the missing mtimes method. Here's the new behavior. Look better to you?
Notice how the 01-Mar normalization toward the end didn't "stick" for subsequent elements. Should add some regression tests for this, but I'm short on free time. |
Thank you for the fix! The results of your examples in Matlab are:
Apparently there is still present the problem of 28-Feb or 01-March when dealing with leap years and durations of a whole year |
That's expected! I haven't gotten to that part yet. Need to make a modification to |
Okay, I think I've got a fix for the last-day-of-month capping, plus a couple other things, in 87f6afd. What do you think of this behavior?
Found a couple more problems while I was in there. Notably, the |
Looks great! Thank you very much! |
See this example:
When I create a sequential vector of dates using the colon operator all works apparently well as the vector stops in an element lower than the last date.
But when I use years the result is not correct:
in Matlab the result is
the last element is always <= the greater limit. The bug is also present using calmonths()
I don't know if this bug is related with #134
Andrew's notes
Concise repro code:
TODO
hi:inc:lo
case for negativeinc
values. Currently just returns an empty array.lo:inc:hi
produces a descending sequence of values, and vice versa forhi:inc:lo
giving ascending values.The text was updated successfully, but these errors were encountered: