GTFS-RT => Dagster (runs each minute) => GeoParquet + SQLite
.
├── src/
│ └── gtfs_pipeline/ # Main pipeline package
│ ├── api_utils.py # GTFS-RT API utilities
│ ├── definitions.py # Pipeline definitions
│ └── assets/ # Dagster assets
├── data/ # Data storage directory
│ ├── db/ # SQLite database
│ └── geoparquet/ # Geoparquet files
├── read_gtfs_rt.py # Map visualization script
├── feeds_config.yaml # Feed configuration
└── dagster.yaml # Dagster configuration
-
Use
uv
and setup project,.env
etc. -
Install dependencies:
pip install -e .
- Configure your GTFS-RT feeds in
feeds_config.yaml
:
custom_feeds:
agency_name:
url: "https://agency-gtfs-rt-feed-url"
api_token: "your-api-token" # Optional
headers: {} # Optional additional headers
- Start the Dagster daemon:
dagster dev
The pipeline automatically collects data based on the configured schedule (default: every minute). You can monitor the pipeline through the Dagster UI.
To visualize vehicle positions from collected data:
python read_gtfs_rt.py path/to/parquet/folder
- Geoparquet Files: Vehicle position data is stored in Geoparquet format, organized by timestamp in the
data/geoparquet/
directory - SQLite Database: Metadata about collected data is stored in
data/db/gtfs_rt.db