You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links.
The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet
But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.
First pip install pyarrow
Then convert the parquet to pandas:
import pyarrow.parquet as pq
trips = pq.read_table('yellow_tripdata_2021-01.parquet')
df = trips.to_pandas()
Finally, run this command and wait. It will take awhile then return a number when it is finished. df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)
Alternatively, the .csv files could be added to the repo with links to those instead.
The text was updated successfully, but these errors were encountered:
NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links.
The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet
But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.
First
pip install pyarrow
Then convert the parquet to pandas:
Finally, run this command and wait. It will take awhile then return a number when it is finished.
df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)
Alternatively, the .csv files could be added to the repo with links to those instead.
The text was updated successfully, but these errors were encountered: