-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adding support for other data types in orjson_dumps #1938
Conversation
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1938 +/- ##
==========================================
- Coverage 95.56% 90.63% -4.94%
==========================================
Files 14 237 +223
Lines 519 14841 +14322
==========================================
+ Hits 496 13451 +12955
- Misses 23 1390 +1367
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just a question below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !
The datetime formatting is like b'"2022-01-15T00:00:00"'
and numpy arrays is like b'[0,1,2,3,4]'
which is perfect
Cc @severo in case it requires changes in the front-end (might work out of the box no?) |
This should fix #86 btw :) |
By default, it will return a string. It should work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another comment: the datetime and the Timestamp are not currently aligned
- datetime:
b'"2017-01-01T12:10:20.345000"'
- Timestamp:
b'"2017-01-01 12:10:20.345000"'
If the alignment is important, this could be achieved with:
def orjson_default(obj):
...
if isinstance(obj, pd.Timestamp):
return obj.to_pydatetime()
Thanks @albertvillanova! I added your suggestion. |
Fixed:
|
you rock! |
Part of #1443
Currently, we have 105 cache records with error types like:
The problem is at https://github.com/huggingface/datasets-server/blob/main/libs/libcommon/src/libcommon/utils.py#L120 trying to serialize the values.
After this PR, I will force refresh the affected splits (most of them are for
split-first-rows-from-streaming
)Also there is an error for "split-descriptive-statistics":
"Dict key must be str"
This was due to non-string keys in ClassLabels like ints. After this PR, I will force refresh the affected splits (49 records)