Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zone Aggregation #8

Closed
wants to merge 32 commits into from
Closed

Zone Aggregation #8

wants to merge 32 commits into from

Conversation

DavidOry
Copy link
Collaborator

Placeholder pull request for zone aggregation functionality

@DavidOry DavidOry added this to the Product 1A: Zone Creator MVP milestone May 26, 2022
@DavidOry
Copy link
Collaborator Author

@jpn--, @JoeJimFlood
File references have been moved to the resources directory. @JoeJimFlood: you should now be able to run the demo notebook if you're interested.

jpn-- added 2 commits June 8, 2022 10:47
these are now in `resources`
@JoeJimFlood
Copy link
Collaborator

JoeJimFlood commented Jun 9, 2022

@DavidOry @jpn-- I tried cloning the zone-agg branch of the repo and wasn't able to run the notebook. It looks like it's failing while the trip list is being loaded. Do I need to run it in Docker? Or is there something else I'm missing?

Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 trips = load_trip_list("trips_sample.pq", data_dir=data_dir)

File ~\.conda\envs\test1\lib\site-packages\sandag_rsm\data_load\triplist.py:19, in load_trip_list(trips_filename, data_dir)
     17 try:
     18     if trips_filename.endswith(".pq") or trips_filename.endswith(".parquet"):
---> 19         trips = pd.read_parquet(trips_filename)
     20     else:
     21         trips = pd.read_csv(trips_filename)

File ~\.conda\envs\test1\lib\site-packages\pandas\io\parquet.py:493, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs)
    446 """
    447 Load a parquet object from the file path, returning a DataFrame.
    448 
   (...)
    489 DataFrame
    490 """
    491 impl = get_engine(engine)
--> 493 return impl.read(
    494     path,
    495     columns=columns,
    496     storage_options=storage_options,
    497     use_nullable_dtypes=use_nullable_dtypes,
    498     **kwargs,
    499 )

File ~\.conda\envs\test1\lib\site-packages\pandas\io\parquet.py:347, in FastParquetImpl.read(self, path, columns, storage_options, **kwargs)
    343     path = handles.handle
    345 parquet_file = self.api.ParquetFile(path, **parquet_kwargs)
--> 347 result = parquet_file.to_pandas(columns=columns, **kwargs)
    349 if handles is not None:
    350     handles.close()

File ~\.conda\envs\test1\lib\site-packages\fastparquet\api.py:751, in ParquetFile.to_pandas(self, columns, categories, filters, index, row_filter)
    747         continue
    748     parts = {name: (v if name.endswith('-catdef')
    749                     else v[start:start + thislen])
    750              for (name, v) in views.items()}
--> 751     self.read_row_group_file(rg, columns, categories, index,
    752                              assign=parts, partition_meta=self.partition_meta,
    753                              row_filter=sel, infile=infile)
    754     start += thislen
    755 return df

File ~\.conda\envs\test1\lib\site-packages\fastparquet\api.py:361, in ParquetFile.read_row_group_file(self, rg, columns, categories, index, assign, partition_meta, row_filter, infile)
    358     ret = True
    359 f = infile or self.open(fn, mode='rb')
--> 361 core.read_row_group(
    362     f, rg, columns, categories, self.schema, self.cats,
    363     selfmade=self.selfmade, index=index,
    364     assign=assign, scheme=self.file_scheme, partition_meta=partition_meta,
    365     row_filter=row_filter
    366 )
    367 if ret:
    368     return df

File ~\.conda\envs\test1\lib\site-packages\fastparquet\core.py:608, in read_row_group(file, rg, columns, categories, schema_helper, cats, selfmade, index, assign, scheme, partition_meta, row_filter)
    606 if assign is None:
    607     raise RuntimeError('Going with pre-allocation!')
--> 608 read_row_group_arrays(file, rg, columns, categories, schema_helper,
    609                       cats, selfmade, assign=assign, row_filter=row_filter)
    611 for cat in cats:
    612     if cat not in assign:
    613         # do no need to have partition columns in output

File ~\.conda\envs\test1\lib\site-packages\fastparquet\core.py:580, in read_row_group_arrays(file, rg, columns, categories, schema_helper, cats, selfmade, assign, row_filter)
    577 if name not in columns:
    578     continue
--> 580 read_col(column, schema_helper, file, use_cat=name+'-catdef' in out,
    581          selfmade=selfmade, assign=out[name],
    582          catdef=out.get(name+'-catdef', None),
    583          row_filter=row_filter)
    585 if _is_map_like(schema_helper, column):
    586     # TODO: could be done in fast loop in _assemble_objects?
    587     if name not in maps:

File ~\.conda\envs\test1\lib\site-packages\fastparquet\core.py:549, in read_col(column, schema_helper, infile, use_cat, selfmade, assign, catdef, row_filter)
    547     piece[:] = i.codes
    548 elif d and not use_cat:
--> 549     piece[:] = dic[val]
    550 elif not use_cat:
    551     piece[:] = convert(val, se)

IndexError: index 132096 is out of bounds for axis 0 with size 131469

@jpn--
Copy link
Collaborator

jpn-- commented Jun 9, 2022

That's an odd error. Maybe the trips_sample.pq file somehow got corrupted in the download step? I'd try deleting it and trying again. If the same error persists, you can try docker, or we can dig into it tomorrow when we talk

@JoeJimFlood
Copy link
Collaborator

I had been using an environment that I use for testing things out, but I realized that I should create a new environment based on the yaml file in the repo. After doing that, installing pyarrow, and upgrading scipy it would appear as though the notebook ran through all of the way successfully.

Removing the existing centroid connectors and creating new ones based on aggregate zone structure
@DavidOry DavidOry linked an issue Aug 9, 2022 that may be closed by this pull request
@DavidOry DavidOry marked this pull request as ready for review October 25, 2022 16:20
@DavidOry DavidOry changed the title Zone Aggregation (WIP) Zone Aggregation Oct 25, 2022
@DavidOry
Copy link
Collaborator Author

@jpn--
Can we start over with a new PR or modify this one to back out the hundreds of files committed with the model reference?

@DavidOry DavidOry mentioned this pull request Nov 9, 2022
@DavidOry
Copy link
Collaborator Author

DavidOry commented Nov 9, 2022

Replaced by #19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PRODUCT 1A Zone Creator: First Pass at Algorithm for Transit Connectors
6 participants