Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: rt_segment_speeds KeyError... gtfs_segments related? #1324

Closed
1 of 5 tasks
edasmalchi opened this issue Dec 17, 2024 · 2 comments
Closed
1 of 5 tasks

Bug: rt_segment_speeds KeyError... gtfs_segments related? #1324

edasmalchi opened this issue Dec 17, 2024 · 2 comments
Labels
admin Administrative work bug Something isn't working data Work related to the management of data gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data

Comments

@edasmalchi
Copy link
Member

edasmalchi commented Dec 17, 2024

Where did the bug occur?
Select from the below, and be sure to affix the appropriate label to this issue (e.g. dataset, jupyterhub, metabase, analysis.calitp.org)

  • Data (the warehouse)
  • JupyterHub
  • Metabase
  • analysis.calitp.org
  • Other (add detail)

Describe the bug
Seems like the script is trying to drop a column that no longer exists.

Noticed that the script calls gtfs_segments between renaming that col and dropping it. We seem to now have version 2.1.7 instead of 0.1.0. I wonder if something about that package has changed...

@tiffanychu90 any ideas? I don't have time to look into this quite yet, but might be able to circle back before going on vacation Friday.

To Reproduce
Run rt_segment_speeds pipeline

Expected behavior
rt_segment_speeds pipeline completes

Additional context

python cut_stop_segments.py
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 195, in raise_on_meta_error
    yield
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6450, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 729, in drop_by_shallow_copy
    df2.drop(columns=columns, inplace=True, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py", line 5399, in drop
    return super().drop(
  File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py", line 4505, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py", line 4546, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6934, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: "['arrival_time1'] not found in axis"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jovyan/data-analyses/rt_segment_speeds/scripts/cut_stop_segments.py", line 138, in <module>
    segments = cut_stop_segments(analysis_date)
  File "/home/jovyan/data-analyses/rt_segment_speeds/scripts/cut_stop_segments.py", line 98, in cut_stop_segments
    segments = (segments.drop(
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 5181, in drop
    return self.map_partitions(drop_by_shallow_copy, columns, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 867, in map_partitions
    return map_partitions(func, self, *args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6519, in map_partitions
    meta = _get_meta_map_partitions(args, dfs, func, kwargs, meta, parent_meta)
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6631, in _get_meta_map_partitions
    meta = _emulate(func, *args, udf=True, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6450, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/opt/conda/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 216, in raise_on_meta_error
    raise ValueError(msg) from e
ValueError: Metadata inference failed in `drop_by_shallow_copy`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
KeyError("['arrival_time1'] not found in axis")

Traceback:
---------
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 195, in raise_on_meta_error
    yield
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/core.py", line 6450, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/opt/conda/lib/python3.9/site-packages/dask/dataframe/utils.py", line 729, in drop_by_shallow_copy
    df2.drop(columns=columns, inplace=True, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py", line 5399, in drop
    return super().drop(
  File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py", line 4505, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py", line 4546, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/opt/conda/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6934, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
@edasmalchi edasmalchi added admin Administrative work bug Something isn't working data Work related to the management of data gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data labels Dec 17, 2024
@tiffanychu90
Copy link
Member

tiffanychu90 commented Dec 18, 2024

@edasmalchi: Closing because I already caught this...you can cherry pick this commit on this or rebase on main after I merge my #1325 in.
Yes, the package does change and requires an arrival_time column (because it's actually wrapped more closely with other stuff the package is able to do).

@edasmalchi
Copy link
Member Author

oh perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin Administrative work bug Something isn't working data Work related to the management of data gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data
Projects
None yet
Development

No branches or pull requests

2 participants