Return Tables instead of NamedTuples. #59

evetion · 2023-08-19T14:36:37Z

Also fixes #50

This changes the returns of points and lines to Table or PartitionedTable, both which implement the Tables interface. Technically breaking, but the tests are not broken, so I think the impact in practice is neglible. Old things like reduce(vcat, DataFrame.(points(g)) still work, but DataFrame(points(g)) is now possible, shorter and faster.

evetion · 2023-08-19T14:38:59Z

I also refactored the canopy/ground switches for ICESat-2 and GEDI, by just calling the underlying methods multiple times (one time for ground, one time for canopy, if both are enabled). This significantly reduces boilerplate and possible bugs.

evetion · 2023-08-19T14:50:12Z

Failing tests are unrelated (one slow download, nightly fails on HDF5, for which an issue has been made).

alex-s-gardner · 2023-08-21T19:01:22Z

Working my way through breaking changes. I had an internal function:

"""
    points_plus(granule::ICESat_Granule{}; bbox = (min_x = -Inf, min_y = -Inf, max_x = Inf, max_y = Inf))

returns the ICESat granual *WITH* granual infomation for each track
"""
function points_plus(
    granule::ICESat_Granule{};
    extent::Extent = world,
    )

    ptsplus = merge(points(granule, bbox = extent), (; granule_info = granule))
    return ptsplus
end

since points now returns a table I need to append granule as a column. Looking through the Tables.jl documentation it's not readily clear how to do this.

Do you know of an easy way to append a new column to a table?

Thanks.

evetion · 2023-08-21T20:27:52Z

I'll add a merge method to Table so this can keep working. Note that it's a SpaceLiDAR specific table, it's not a Table from Tables.jl, but it does implement the Tables.jl interface.

I'm not sure if your code ever worked for ICESat-2 and GEDI, as those returned a vector of namedtuples (now it returns a partionedtable).

Also, wouldn't it make more sense to store the granule info as metadata (depending on your output format)? I wrote points with the intent to have a table of equal length columns and simple types. Storing a non vector custom type goes against that, and can cause some headaches.

alex-s-gardner · 2023-08-21T20:33:11Z

I'm not sure if your code ever worked for ICESat-2 and GEDI, as those returned a vector of namedtuples (now it returns a partionedtable).

I also had a method for ICESat2 and GEDI:

function points_plus(
    granule::Union{GEDI_Granule{}, ICESat2_Granule{}};
    extent::Extent = world,
    )
    ptsplus = merge.(points(granule, bbox = extent), Ref((; granule_info = granule)))
    return ptsplus
end

evetion · 2023-08-21T20:36:55Z

Can you comment on how you use the granule info? For example, you could further explode it, or store it as metadata in arrow. We could make it default?

I now recover similar info from the filename (my files only have the extensions renamed), which also isn't ideal.

alex-s-gardner · 2023-08-21T20:41:56Z

Also, wouldn't it make more sense to store the granule info as metadata (depending on your output format)? I wrote points with the intent to have a table of equal length columns and simple types. Storing a non vector custom type goes against that, and can cause some headaches.

The way I've setup my pipeline is to segment the data by "geotiles"... that is X by X degree geographic extents that contain all mission data within the X degree bounding box. Data is extracted from the raw files using points and inserted as rows within a DataFrame. Each row represents a single dataset (or beam in our case), a single row has many observations and one "granule_info". DataFrames are then saved as Arrow files. This gives me an easy way to find what has already been downloaded extracted and what still needs to be added by appending to an DataFrame without needing to update an external file list that can become out of sync with the data files.

This approach is working well but I will eventually overhaul the whole pipleline so that each row of a dataframe contains a single point.. when I do this I will make heavy use of FillArrays.

alex-s-gardner · 2023-08-21T20:46:12Z

The biggest bottleneck for my global processing pipeline is file I/O. This is why I've moved to implementing geotiles that segments the data by location.

evetion · 2023-08-22T08:43:45Z

I've added merge methods. Your points_plus should work again, and no need to have two separate methods for ICESat and ICESat-2 anymore. merge(points(g::ICESat2Granule), (;g)) should just work.

evetion · 2023-08-23T14:05:21Z

Did you find other instances where this PR broke your code? I will hold off on releasing a new version untill I've got this compatible with EarthData and have some version of HFD5Tables in.

alex-s-gardner · 2023-08-23T21:26:50Z

using SpaceLiDAR
g = ICESat_Granule{:GLAH06}("GLAH06_634_1102_001_0073_1_01_0001.H5", "/Users/gardnera/data/icesat/GLAH06/034/raw/GLAH06_634_1102_001_0073_1_01_0001.H5", (type=:GLAH06, phase=1, rgt=0, instance=73, cycle=1, segment=1, version=1, revision=634), Vector{Vector{Vector{Float64}}}[])
tbl = merge(points(geotile_info), (; g))
DataFrame(tbl)

results in a MethodError: no method matching iterate(::ICESat_Granule{:GLAH06})

I think in this case we want the length of :ICESat_Granule to == 1 so that the named tuple occupies it's own table cell

evetion · 2023-08-24T07:40:02Z

Yeah, that won't work. Defining granule as something to iterate on (also requires a getindex on it), will lead to ERROR: DimensionMismatch: column :longitude has length 729117 and column :g has length 1.

But the error comes because DataFrame wants matching length columns, your pointsplus function never did that right? So DataFrame(points_plus(g)) has never worked before? (I went a few commits back to test it).

alex-s-gardner · 2023-08-24T16:58:55Z

You're correct... apologies... the exact code that I'm trying to get working again is:

if mission == :ICESat 
    df = DataFrame(points_plus.(row.granules, extent =row.extent));    
elseif mission == :ICESat2 || mission == :GEDI
    df = reduce(vcat, (DataFrame.(points_plus.(row.granules, extent = row.extent))));  
end

With this PR I can create the tables:

using SpaceLiDAR
using DataFrames
g = [ICESat_Granule{:GLAH06}("GLAH06_634_1102_001_0073_1_01_0001.H5", "/Users/gardnera/data/icesat/GLAH06/034/raw/GLAH06_634_1102_001_0073_1_01_0001.H5", (type = :GLAH06, phase = 1, rgt = 0, instance = 73, cycle = 1, segment = 1, version = 1, revision = 634), Vector{Vector{Vector{Float64}}}[]),
 ICESat_Granule{:GLAH06}("GLAH06_634_1102_001_0073_2_01_0001.H5", "/Users/gardnera/data/icesat/GLAH06/034/raw/GLAH06_634_1102_001_0073_2_01_0001.H5", (type = :GLAH06, phase = 1, rgt = 0, instance = 73, cycle = 1, segment = 2, version = 1, revision = 634), Vector{Vector{Vector{Float64}}}[])]

tbl = merge.(points.(g), [(; granuel_info = f) for f in g])

but I'm unable to make a DataFrame where each row makes up a single granule (ICESat) or beam (GEDI & ICESat2).

This makes sense as points now returns a table where each observation occupies a single row. Your new approach is absolutely the way to go but I need someway to provide backwards compatibility. To make my code work I need a way to reverse the Fill arrays and table properties so that I can treat a SpaceLiDAR Table as the original NamedTuple.

alex-s-gardner · 2023-08-24T20:49:10Z

As I mentioned before, I have been meaning to refactor my code to move away from storing each granule as it's own row. Maybe this PR will force me to finally make the change. It's a fairly major change on my end as everything is built around rows = granules.

The one thing that is really lacking when moving from rows = granules to rows = single observation is the ability to trace the observation back to it's source file. We can save this in the metadata but as soon as we start concatenating tables it gets hard to maintain traceability.

What do you think is the best path forward? As of now merge will not work for this as it needs to know the number of rows returned from points to create the FillArray. Maybe merge could be modified to create a FillArray if the input is non-iterable.

evetion · 2023-08-25T09:00:38Z

As I mentioned before, I have been meaning to refactor my code to move away from storing each granule as it's own row. Maybe this PR will force me to finally make the change. It's a fairly major change on my end as everything is built around rows = granules.

Maybe something changed in DataFrames over time in terms of automatic repeating non-iterable column values? I've added Base.parent to the Tables in SpaceLiDAR, so you get the original (vector of) NamedTuples back (the Tables are a simple wrapper around them, so we can dispatch on a type we own):

julia> SL.points(g)
SpaceLiDAR Table with 6 partitions

julia> parent(SL.points(g))
6-element Vector{NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :phr, :sensitivity, :scattered, :saturated, :clouds, :track, :strong_beam, :classification, :height_reference, :detector_id, :reflectance, :nphotons), Tuple{Vector{Float32}, Vector{Float32}, Vector{Float32}, Vector{Float32}, ...

If that doensn't work, code along the following lines would do the trick:

reduce(vcat, DataFrame.(Pair.(:track, SL.points(g)), :granule=>g))
  0.019452 seconds (2.51 k allocations: 3.784 MiB)
6×2 DataFrame
 Row │ track                              granule                           
     │ NamedTup…                          ICESat2_…                         
─────┼──────────────────────────────────────────────────────────────────────
   1 │ (longitude = Float32[117.077, 11…  ICESat2_Granule{:ATL08}("ATL08_2…
   2 │ (longitude = Float32[117.078, 11…  ICESat2_Granule{:ATL08}("ATL08_2…
   3 │ (longitude = Float32[117.112, 11…  ICESat2_Granule{:ATL08}("ATL08_2…
   4 │ (longitude = Float32[117.096, 11…  ICESat2_Granule{:ATL08}("ATL08_2…
   5 │ (longitude = Float32[117.137, 11…  ICESat2_Granule{:ATL08}("ATL08_2…
   6 │ (longitude = Float32[117.134, 11…  ICESat2_Granule{:ATL08}("ATL08_2…

The one thing that is really lacking when moving from rows = granules to rows = single observation is the ability to trace the observation back to it's source file. We can save this in the metadata but as soon as we start concatenating tables it gets hard to maintain traceability.

What do you think is the best path forward? As of now merge will not work for this as it needs to know the number of rows returned from points to create the FillArray. Maybe merge could be modified to create a FillArray if the input is non-iterable.

I think metadata is something we should support, as it could be passed if you save granules individually, and it will work with most IO (Arrow, GeoDataFrames), so you would get it back after you open the file again.

But indeed if you concatenate Tables further, the metadata will be lost. I save each granule with the same filename as the HDF5 (just with a different extension), so the unique id of the granule is preserved. From the filename, all granule information can be restored (except for the polygon, which we get from search). But logic in filenames is a bit frowned upon, so the another solution is to make a String column granule=Fill(filename, length)? Yes, it's only a string, but you can get the granule back like SL.granule_from_file(filename) (note I will probably rename to just SL.granule(filename). If you don't want a granule, you can also use icesat2_info:

SL.icesat2_info(fn)
(type = :ATL08, date = Dates.DateTime("2020-08-12T23:54:29"), rgt = 742, cycle = 8, segment = 14, version = 6, revision = 1, ascending = true, descending = false)

If you want, you could even store these attributes as their own (Fill) columns. We just need a combine(type, date, rgt, cycle, segment, version, revision) function that could give you back the id (ATL08_20200812235429_07420814_006_01.h5).

alex-s-gardner · 2023-08-29T18:52:47Z

If you want, you could even store these attributes as their own (Fill) columns.

If we went down the path of storing type, date, rgt, cycle, segment, version, revision as Fill arrays is might take up less memory when converted from a Fill to a vector as would be needed when saving to any other format than JLD (file names take up considerable space) . Should we implement this as the default or is that breaking?

alex-s-gardner · 2023-08-29T20:40:50Z

Empty granules return a NamedTuple when they should return a SpaceLiDAR Table

alex-s-gardner · 2023-08-31T21:20:47Z

@evetion once the empty granules issue is fixed I should be able to use parent to make my code backwards compatible

evetion · 2023-09-03T06:54:19Z

If we went down the path of storing type, date, rgt, cycle, segment, version, revision as Fill arrays is might take up less memory when converted from a Fill to a vector as would be needed when saving to any other format than JLD (file names take up considerable space) . Should we implement this as the default or is that breaking?

Adding the metadata to the table is not breaking on top of this. Adding extra columns might be a grey area, but for a performance I rather do not have the extra columns that I don't need (yet), and we make it easy to add them?

Besides, I think doing a Fill(basename(granule.id)) probably requires less column overhead/memory than all the exploded attributes? If you also need it quick as a vector, you could use InlineStrings?

evetion · 2023-09-03T07:10:51Z

So purely the id of the granule takes less info than the exploded info namedtuple. The inlinestring takes just as much information.

julia> info(g)
(type = :ATL08, date = Dates.DateTime("2020-08-12T23:54:29"), rgt = 742, cycle = 8, segment = 14, version = 5, revision = 1, ascending = true, descending = false)

julia> sizeof(info(g))
64

julia> sizeof(g.id)
39

julia> typeof(InlineString(g.id))
String63

evetion · 2023-09-03T07:12:50Z

@evetion once the empty granules issue is fixed I should be able to use parent to make my code backwards compatible

You didn't specify which product(s) has this problem, but I think I fixed this for GLAH06. Let me know if I missed one.

evetion · 2023-09-03T15:40:23Z

Ok, last big change. I've added support for metadata, and included functions to add either id or the granule info to the tables.

DataAPI metadata support. Arrow.jl will get support for it soon: apache/arrow-julia#481, so Tables/DataFrames saved with Arrow will retain their metadata.

julia> g = SL.granule(fn)
ICESat2_Granule{:ATL08}("ATL08_20200812235429_07420814_005_01.h5", "/Users/evetion/Downloads/ATL08_20200812235429_07420814_005_01.h5", (type = :ATL08, date = Dates.DateTime("2020-08-12T23:54:29"), rgt = 742, cycle = 8, segment = 14, version = 5, revision = 1, ascending = true, descending = false), Vector{Vector{Vector{Float64}}}[])

julia> t = points(g)
SpaceLiDAR Table with 6 partitions
julia> DataAPI.metadata(t)
Dict{String, Any} with 10 entries:
  "cycle"      => 8
  "descending" => false
  "revision"   => 1
  "segment"    => 14
  "id"         => "ATL08_20200812235429_07420814_005_01.h5"
  "rgt"        => 742
  "date"       => DateTime("2020-08-12T23:54:29")
  "ascending"  => true
  "type"       => :ATL08
  "version"    => 5
julia> df = DataFrame(t)
julia> DataAPI.metadata(df) == DataAPI.metadata(t)  # metadata is propagated.

Furthermore, I've included the following functions, which should help your workflow.

julia> t = SL.add_info(t)  # adds multiple columns from `info(g)`
julia> t = SL.add_id(t)  # adds :id column
julia> t[1].id
10509-element Fill{String}, with entries equal to ATL08_20200812235429_07420814_005_01.h5
julia> t[end].revision  # info fields are added to all tracks
5180-element Fill{Int64}, with entries equal to 1

alex-s-gardner · 2023-09-04T02:17:37Z

Looks like there is an issue with using subsetting:

using Extents, DataFrames, SpaceLiDAR, Dates
g = ICESat2_Granule{:ATL06}("ATL06_20181201095523_09740106_005_01.h5", "/Users/gardnera/data/icesat2/ATL06/005/raw/ATL06_20181201095523_09740106_005_01.h5", (type=:ATL06, date=Dates.DateTime("2018-12-01T09:55:23"), rgt=974, cycle=1, segment=6, version=5, revision=1, ascending=false, descending=true), Vector{Vector{Vector{Float64}}}[]);
points(g)

works great but

ext = Extent(X = (-128.0, -126.0), Y = (50.0, 52.0))
points(g, bbox=ext)

return this error:

ERROR: MethodError: no method matching SpaceLiDAR.PartitionedTable(::Tuple{NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{Dates.DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{Dates.DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{Dates.DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{Dates.DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{Dates.DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{Dates.DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}})

Closest candidates are:
  SpaceLiDAR.PartitionedTable(::NamedTuple)
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/granule.jl:169
  SpaceLiDAR.PartitionedTable(::Tuple{Vararg{NamedTuple{K, V}, N}}) where {N, K, V}
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/granule.jl:167

Stacktrace:
  [1] points(granule::ICESat2_Granule{:ATL06}; tracks::NTuple{6, String}, step::Int64, bbox::Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}})
    @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:49
  [2] points
    @ ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:24 [inlined]
  [3] points_plus(granule::ICESat2_Granule{:ATL06}; extent::Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}})
    @ Altim ~/Documents/GitHub/Altim.jl/src/utilities.jl:103
  [4] points_plus
    @ ~/Documents/GitHub/Altim.jl/src/utilities.jl:97 [inlined]
  [5] #43
    @ ./broadcast.jl:1297 [inlined]
  [6] _broadcast_getindex_evalf
    @ ./broadcast.jl:683 [inlined]
  [7] _broadcast_getindex
    @ ./broadcast.jl:656 [inlined]
  [8] _getindex
    @ ./broadcast.jl:680 [inlined]
  [9] _broadcast_getindex
    @ ./broadcast.jl:655 [inlined]
 [10] getindex
    @ ./broadcast.jl:610 [inlined]
 [11] copyto_nonleaf!(dest::Vector{DataFrame}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{DataFrame}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#43#44"{Base.Pairs{Symbol, Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}}, Tuple{Symbol}, NamedTuple{(:extent,), Tuple{Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}}}}}, typeof(points_plus)}, Tuple{Base.Broadcast.Extruded{Vector{ICESat2_Granule{:ATL06}}, Tuple{Bool}, Tuple{Int64}}}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast ./broadcast.jl:1068
 [12] copy
    @ ./broadcast.jl:920 [inlined]
 [13] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Type{DataFrame}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#43#44"{Base.Pairs{Symbol, Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}}, Tuple{Symbol}, NamedTuple{(:extent,), Tuple{Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}}}}}, typeof(points_plus)}, Tuple{Vector{ICESat2_Granule{:ATL06}}}}}})
    @ Base.Broadcast ./broadcast.jl:873
 [14] geotile_build(geotile_granules::DataFrame, geotile_dir::String, mission::Symbol; warnings::Bool)
    @ Altim ~/Documents/GitHub/Altim.jl/src/utilities.jl:531
 [15] top-level scope
    @ ~/Documents/GitHub/Altim.jl/src/geotile_build_archive.jl:71

ERROR: MethodError: no method matching points(::String)

Closest candidates are:
  points(::ICESat2_Granule{:ATL03}; tracks, step, bbox)
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL03.jl:26
  points(::ICESat2_Granule{:ATL03}, ::HDF5.H5DataStore, ::AbstractString, ::Float64)
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL03.jl:89
  points(::ICESat2_Granule{:ATL03}, ::HDF5.H5DataStore, ::AbstractString, ::Float64, ::Any)
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL03.jl:89
  ...

Stacktrace:
 [1] top-level scope
   @ ~/Documents/GitHub/Altim.jl/src/geotile_build_archive.jl:87

ERROR: UndefVarError: `Dates` not defined
Stacktrace:
 [1] top-level scope
   @ ~/Documents/GitHub/Altim.jl/src/geotile_build_archive.jl:86



SpaceLiDAR Table with 6 partitions

ERROR: MethodError: no method matching points(::ICESat2_Granule{:ATL06}; extent::Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}})

Closest candidates are:
  points(::ICESat2_Granule{:ATL06}; tracks, step, bbox) got unsupported keyword argument "extent"
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:24
  points(::ICESat2_Granule{:ATL06}, ::HDF5.H5DataStore, ::AbstractString, ::Float64) got unsupported keyword argument "extent"
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:52
  points(::ICESat2_Granule{:ATL06}, ::HDF5.H5DataStore, ::AbstractString, ::Float64, ::Any) got unsupported keyword argument "extent"
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:52
  ...

Stacktrace:
 [1] kwerr(::NamedTuple{(:extent,), Tuple{Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}}}}, ::Function, ::ICESat2_Granule{:ATL06})
   @ Base ./error.jl:165
 [2] top-level scope
   @ ~/Documents/GitHub/Altim.jl/src/geotile_build_archive.jl:87

ERROR: MethodError: no method matching SpaceLiDAR.PartitionedTable(::Tuple{NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}})

Closest candidates are:
  SpaceLiDAR.PartitionedTable(::NamedTuple)
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/granule.jl:169
  SpaceLiDAR.PartitionedTable(::Tuple{Vararg{NamedTuple{K, V}, N}}) where {N, K, V}
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/granule.jl:167

Stacktrace:
 [1] points(granule::ICESat2_Granule{:ATL06}; tracks::NTuple{6, String}, step::Int64, bbox::Extent{(:X, :Y), Tuple{Tuple{Float64, Float64}, Tuple{Float64, Float64}}})
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/ra53x/src/ICESat-2/ATL06.jl:49
 [2] top-level scope
   @ ~/Documents/GitHub/Altim.jl/src/geotile_build_archive.jl:87

evetion · 2023-09-04T06:16:46Z

Are you sure you checked out this branch, including the latest commits? The type signature of your methods is old and I can't replicate over here.

alex-s-gardner · 2023-09-04T18:31:50Z

I rm SpaceLiDAR then add https://github.com/evetion/SpaceLiDAR.jl/tree/feat/tables then restart julia to ensure I have the latest version.

ext = Extent{(:X, :Y),Tuple{Tuple{Float64,Float64},Tuple{Float64,Float64}}}((X=(-128.0, -126.0), Y=(50.0, 52.0)))
g = ICESat2_Granule{:ATL06}("ATL06_20181201095523_09740106_005_01.h5", "/Users/gardnera/data/icesat2/ATL06/005/raw/ATL06_20181201095523_09740106_005_01.h5", (type=:ATL06, date=DateTime("2018-12-01T09:55:23"), rgt=974, cycle=1, segment=6, version=5, revision=1, ascending=false, descending=true), Vector{Vector{Vector{Float64}}}[])

points(g, bbox=ext)

results in:

ERROR: MethodError: no method matching SpaceLiDAR.PartitionedTable(::Tuple{NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}}, ::ICESat2_Granule{:ATL06})

Closest candidates are:
  SpaceLiDAR.PartitionedTable(::Tuple{Vararg{NamedTuple{K, V}, N}}, ::G) where {N, K, V, G}
   @ SpaceLiDAR ~/.julia/packages/SpaceLiDAR/tJtlT/src/granule.jl:170

evetion · 2023-09-04T20:40:53Z

Thanks, that error message does correspond with the latest changes. I think I fixed it, the problem was in the empty defaults, where we had a Float64[] instead of a Float32[] of the non-empty data.

ERROR: MethodError: no method matching  # --> scroll to the right a bit
SpaceLiDAR.PartitionedTable(::Tuple{
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, 
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, 
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, 
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float32}, Vector{DateTime}, BitVector, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, 
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}, 
NamedTuple{(:longitude, :latitude, :height, :height_error, :datetime, :quality, :track, :strong_beam, :detector_id, :height_reference), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float32}, Vector{Float64}, Vector{DateTime}, Vector{Bool}, FillArrays.Fill{String, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Bool, 1, Tuple{Base.OneTo{Int64}}}, FillArrays.Fill{Int8, 1, Tuple{Base.OneTo{Int64}}}, Vector{Float32}}}}, ::ICESat2_Granule{:ATL06})

alex-s-gardner · 2023-09-05T14:45:20Z

I think I fixed it, the problem was in the empty defaults, where we had a Float64[] instead of a Float32[] of the non-empty data.

That seems to have fixed it.

alex-s-gardner · 2023-09-05T14:48:06Z

I've added support for metadata, and included functions to add either id or the granule info to the tables.

I'll test this today

alex-s-gardner · 2023-09-05T18:31:25Z

julia> t = SL.add_info(t)  # adds multiple columns from `info(g)`
julia> t = SL.add_id(t)  # adds :id column

These are fantastic. One issue is that if t is an empty table then no file id or info is added. My original points_plus returns a row full of empties with info, e.g:

julia> DataFrame(t)
6×11 DataFrame
 Row │ longitude  latitude   height     height_error  datetime    quality    track            strong_beam     detector_id  height_reference  granule_info                      
     │ Array…     Array…     Array…     Array…        Array…      BitVector  Fill…            Fill…           Fill…        Array…            ICESat2_Gra…                      
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt1l", 0)  Fill(false, 0)  Fill(6, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…
   2 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt1r", 0)  Fill(true, 0)   Fill(5, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…
   3 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt2l", 0)  Fill(false, 0)  Fill(4, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…
   4 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt2r", 0)  Fill(true, 0)   Fill(3, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…
   5 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt3l", 0)  Fill(false, 0)  Fill(2, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…
   6 │ Float64[]  Float64[]  Float32[]  Float32[]     DateTime[]  Bool[]     Fill("gt3r", 0)  Fill(true, 0)   Fill(1, 0)   Float32[]         ICESat2_Granule{:ATL06}("ATL06_2…

This behavior made it easy to keep track of which granules had been searched.... not sure if you have any clever ideas on how to handle this ... i suspect with the updated implementation (single observation per row) this becomes difficult.

evetion · 2023-09-05T19:00:48Z

Cool, in that case I will merge this!

Regarding per row saving, I save to a single file per granule. So I can just check the granule.id from the filename (or with this PR, open it and read it from the metadata). Can't comment much more without knowing the rest of the data workflow (and my own is a bit ugly). Let's split this into a separate issue.

Note that I will probably not release this as a version immediately (unless you want me to), as I would like to refactor the search (to use EarthData.jl) and have a first version of HDF5Tables.jl in here.

alex-s-gardner · 2023-09-05T19:07:59Z

will merge this

Great work! This is shaping up nicely

evetion · 2023-09-05T19:09:18Z

See #61 for the workflow discussion.

Return Tables instead of NamedTuples.

9cd195a

evetion requested a review from alex-s-gardner August 19, 2023 14:36

Add merge methods to Tables.

c0098c9

Add parent to Tables.

75d83dd

evetion added 2 commits August 25, 2023 16:06

Some deprecations and documentation.

0e11433

Fix url clash.

bc88598

Test all points output are AbstracTables.

ffda7dd

evetion added 2 commits September 3, 2023 12:21

Fix test order.

a7a4aa3

Add granule to Table and add metadata DataAPI support.

613508a

Make sure defaults align.

d7ee24c

Update documentation and show methods.

78b840a

evetion merged commit 730f655 into master Sep 6, 2023
11 of 12 checks passed

evetion deleted the feat/tables branch September 6, 2023 05:57

evetion mentioned this pull request Sep 6, 2023

Should the name of the bbox kwarg be changed to extent #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return Tables instead of NamedTuples. #59

Return Tables instead of NamedTuples. #59

evetion commented Aug 19, 2023 •

edited

Loading

evetion commented Aug 19, 2023

evetion commented Aug 19, 2023

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 21, 2023

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 21, 2023

alex-s-gardner commented Aug 21, 2023 •

edited

Loading

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 22, 2023

evetion commented Aug 23, 2023

alex-s-gardner commented Aug 23, 2023 •

edited

Loading

evetion commented Aug 24, 2023

alex-s-gardner commented Aug 24, 2023

alex-s-gardner commented Aug 24, 2023

evetion commented Aug 25, 2023

alex-s-gardner commented Aug 29, 2023

alex-s-gardner commented Aug 29, 2023

alex-s-gardner commented Aug 31, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023 •

edited

Loading

alex-s-gardner commented Sep 4, 2023

evetion commented Sep 4, 2023

alex-s-gardner commented Sep 4, 2023 •

edited

Loading

evetion commented Sep 4, 2023 •

edited

Loading

alex-s-gardner commented Sep 5, 2023

alex-s-gardner commented Sep 5, 2023 •

edited

Loading

alex-s-gardner commented Sep 5, 2023 •

edited

Loading

evetion commented Sep 5, 2023

alex-s-gardner commented Sep 5, 2023

evetion commented Sep 5, 2023

Return Tables instead of NamedTuples. #59

Return Tables instead of NamedTuples. #59

Conversation

evetion commented Aug 19, 2023 • edited Loading

evetion commented Aug 19, 2023

evetion commented Aug 19, 2023

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 21, 2023

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 21, 2023

alex-s-gardner commented Aug 21, 2023 • edited Loading

alex-s-gardner commented Aug 21, 2023

evetion commented Aug 22, 2023

evetion commented Aug 23, 2023

alex-s-gardner commented Aug 23, 2023 • edited Loading

evetion commented Aug 24, 2023

alex-s-gardner commented Aug 24, 2023

alex-s-gardner commented Aug 24, 2023

evetion commented Aug 25, 2023

alex-s-gardner commented Aug 29, 2023

alex-s-gardner commented Aug 29, 2023

alex-s-gardner commented Aug 31, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023

evetion commented Sep 3, 2023 • edited Loading

alex-s-gardner commented Sep 4, 2023

evetion commented Sep 4, 2023

alex-s-gardner commented Sep 4, 2023 • edited Loading

evetion commented Sep 4, 2023 • edited Loading

alex-s-gardner commented Sep 5, 2023

alex-s-gardner commented Sep 5, 2023 • edited Loading

alex-s-gardner commented Sep 5, 2023 • edited Loading

evetion commented Sep 5, 2023

alex-s-gardner commented Sep 5, 2023

evetion commented Sep 5, 2023

evetion commented Aug 19, 2023 •

edited

Loading

alex-s-gardner commented Aug 21, 2023 •

edited

Loading

alex-s-gardner commented Aug 23, 2023 •

edited

Loading

evetion commented Sep 3, 2023 •

edited

Loading

alex-s-gardner commented Sep 4, 2023 •

edited

Loading

evetion commented Sep 4, 2023 •

edited

Loading

alex-s-gardner commented Sep 5, 2023 •

edited

Loading

alex-s-gardner commented Sep 5, 2023 •

edited

Loading