-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix decode
corner case where result isn't promoted
#163
base: main
Are you sure you want to change the base?
Conversation
src/samples.jl
Outdated
@@ -371,13 +371,15 @@ If: | |||
|
|||
sample_data isa AbstractArray && | |||
sample_resolution_in_unit == 1 && | |||
sample_offset_in_unit == 0 | |||
sample_offset_in_unit == 0 && | |||
eltype(sample_data) == typeof(sample_resolution_in_unit) == typeof(sample_offset_in_unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when called with Samples
, sample_resolution_in_unit
and sample_offset_in_unit
will always be Float64, so this effectively means decode will always return Float64
values. this is bad for both "categorical" type signals (where the signal data actually is an integer which represent an index) and for Float32
signals which a lot of ML models use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this guard is here in teh first place is that it used to be possible (in Onda 0.14 I think) to use integer 0/1 for the offset/resolution in SamplesInfo
:
Lines 66 to 67 in 2e33a6e
sample_resolution_in_unit::LPCM_SAMPLE_TYPE_UNION = convert_number_to_lpcm_sample_type(sample_resolution_in_unit), | |
sample_offset_in_unit::LPCM_SAMPLE_TYPE_UNION = convert_number_to_lpcm_sample_type(sample_offset_in_unit), |
That changed with the switch to SamplesInfoV2
:
Lines 80 to 81 in 36a7be9
sample_resolution_in_unit::Float64 | |
sample_offset_in_unit::Float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are categorical signals encoded at all? It seems like they should use encoded=false
which would address that concern. I agree that Float32
shouldn't be converted to a Float64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoded
is not a property of the stored signal, it's specific to in-memory Samples
objects, so there's no way for a signal to be encoded or decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely annoying...
I've revised the change to now work with categorical signals and I'll mention that there may be a pre-existing issue with |
test/samples.jl
Outdated
decoded = decode(samples, Float32) | ||
@test eltype(decoded.data) === Int16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to have it such that decode
automatically converts the element type on data which doesn't match the specified type. I think I've done enough here so I'll leave that for a future contributor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the remaining work to do on #141
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the remaining concerns are just about this portion so I'll get this done after all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I still don't totally see what this gets us over the decode!
approach from #141. Even with this, we'll still hit the issue where if the samples are already decoded you wont' get the expected eltype...
I guess the main advantage is that decode(::Samples, ::Type{T})
is no longer (as much of) a lie, but you're still not guaranteed to get T
(since we're doing a promote).
The output of decode(::Samples, ::Type{T})
woudl still change in important ways with this PR: the return value no longer aliases the input on the 0/1 case, and the eltype may be different from teh status quo. Whether that's fixing a bug or is actual breakage is kind of a judgment call; given that the current behavior is documented, that in internal discussions at Beacon this has been generally found to be intentional, and that Onda is deep in our tech stack, I'd lean towards being conservative and still calling this breaking even if the current behavior is potentially confusing.
src/samples.jl
Outdated
If `samples.encoded` is `true`, return a `Samples` instance that wraps | ||
|
||
decode(convert(T, samples.info.sample_resolution_in_unit), | ||
convert(T, samples.info.sample_offset_in_unit), | ||
samples.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this detail is still important here, since it affects the promotion that happens...
I've updated the PR one more time and now
I did a search through the Beacon tech stack on usages of If we decide to go the breaking route there's a few more things I would change to make the extra work worth it. I'm in favor of making this non-breaking and rolling it back if need be. |
@testset "categorical samples" begin | ||
# Signal must use a resolution of one and a offset of zero | ||
info = SamplesInfoV2(; sensor_type="eeg", | ||
channels=["a", "b"], | ||
sample_unit="unit", | ||
sample_resolution_in_unit=1, | ||
sample_offset_in_unit=0, | ||
sample_type=Int16, | ||
sample_rate=200) | ||
samples = Samples(rand(-Int16(20):Int16(20), 2, 5), info, false) | ||
|
||
# Requires decoding | ||
encoded = Samples(samples.data, samples.info, true) | ||
decoded = decode(encoded) | ||
@test eltype(decoded.data) === Int16 | ||
@test decoded.data === samples.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless of whether we decide to change the behavior, I do think that having an explicit test for this behavior is important since it's something we are relying on for a number of Beacon-internal things.
@jrevels could we get your review on this change as you are the original author of this code? |
I was confused as the element type wasn't updated when I decoded these results:
Fixes: #141