Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve flights.* dataset reproducibility #645

Merged
merged 29 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ad1b862
feat(DRAFT): Improve `flights.*` dataset reproducibility
dangotbanned Dec 10, 2024
fb3ccc6
build(DRAFT): Generate ISO datetime comparison
dangotbanned Dec 11, 2024
2b1be70
refactor(ruff): Adjust config and fix warnings
dangotbanned Dec 12, 2024
aede7f6
feat(perf): Async requests, use `gzip` instead of `.zip`
dangotbanned Dec 12, 2024
402c2b0
refactor: Reorganize, add `_write_rezip_async` doc
dangotbanned Dec 15, 2024
b57c02f
docs(DRAFT): Improve docs
dangotbanned Dec 15, 2024
efa417c
Merge remote-tracking branch 'upstream/main' into flights-repro
dangotbanned Dec 15, 2024
f646155
refactor: Tidy up, improve doc for `Flights.download_sources`
dangotbanned Dec 16, 2024
bfff9c4
docs: Add/amend some simple docs
dangotbanned Dec 16, 2024
0a19bae
refactor: move `"flights-"` to `Spec._name_prefix`
dangotbanned Dec 16, 2024
4a05b51
docs: fill out more docs
dangotbanned Dec 16, 2024
49205dd
fix: replace `app` with `self`
dangotbanned Dec 16, 2024
3418c5a
refactor: Replace `DateTimeFormat`, `DTF_TO_FMT`, `_transform_temporal`
dangotbanned Dec 16, 2024
0c11ec4
refactor: move global scoped code into `main`
dangotbanned Dec 16, 2024
c668fd7
feat(perf): Store `.parquet` instead of `.csv.gz`
dangotbanned Dec 17, 2024
a002c39
refactor: rename, re-doc `_clean_source` -> `SourceMap.clean`
dangotbanned Dec 17, 2024
2d47fdd
docs: finish `DateRange` doc
dangotbanned Dec 17, 2024
99169b6
docs: add `Flights.scan_sources` doc
dangotbanned Dec 17, 2024
7c5eed4
docs: finish `Flights` doc
dangotbanned Dec 17, 2024
859975e
refactor: reorganize, finish docs for `SourceMap`
dangotbanned Dec 18, 2024
f4bbda8
docs: add module-level doc
dangotbanned Dec 18, 2024
951fe8c
refactor(typing): extend `DateTimeFormat` to include `None`
dangotbanned Dec 18, 2024
a48eb8f
refactor: remove unused `PlScanCsv`
dangotbanned Dec 18, 2024
8481618
Merge branch 'main' into flights-repro
dangotbanned Dec 18, 2024
8ec3adf
feat: improves `Rows` validation
dangotbanned Dec 19, 2024
1962fea
chore: replace `flights.py`
dangotbanned Dec 19, 2024
05707d9
chore: remove `flights.js`
dangotbanned Dec 19, 2024
7c49683
fix: regen with fixed random seed
dangotbanned Dec 19, 2024
cd68193
revert: remove `flights-1k.csv`
dangotbanned Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions _data/flights.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 2_000
suffix = ".json"
dt_format = "%Y/%m/%d %H:%M"

[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 5_000
suffix = ".json"
dt_format = "%Y/%m/%d %H:%M"

[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 10_000
suffix = ".json"
dt_format = "%Y/%m/%d %H:%M"

[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 20_000
suffix = ".json"
dt_format = "%Y/%m/%d %H:%M"

[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 200_000
suffix = ".json"
dt_format = "decimal"
columns = ["delay", "distance", "time"]

[[specs]]
start = 2001-01-01
end = 2001-03-31
n_rows = 200_000
suffix = ".arrow"
dt_format = "decimal"
columns = ["delay", "distance", "time"]

[[specs]]
start = 2001-01-01
end = 2001-06-30
n_rows = 3_000_000
suffix = ".parquet"
2 changes: 1 addition & 1 deletion data/flights-10k.json

Large diffs are not rendered by default.

Binary file modified data/flights-200k.arrow
Binary file not shown.
2 changes: 1 addition & 1 deletion data/flights-200k.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/flights-20k.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/flights-2k.json

Large diffs are not rendered by default.

Binary file modified data/flights-3m.parquet
Binary file not shown.
2 changes: 1 addition & 1 deletion data/flights-5k.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion datapackage.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
}
],
"version": "2.11.0",
"created": "2024-12-14T19:35:54.935236+00:00",
"created": "2024-12-19T18:26:48.655674+00:00",
"resources": [
{
"name": "7zip.png",
Expand Down
2 changes: 1 addition & 1 deletion datapackage.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# vega-datasets
`2.11.0` | [GitHub](http://github.com/vega/vega-datasets.git) | 2024-12-14 19:35:54 [UTC]
`2.11.0` | [GitHub](http://github.com/vega/vega-datasets.git) | 2024-12-19 18:26:48 [UTC]

Common repository for example datasets used by Vega related projects.
BSD-3-Clause license applies only to package code and infrastructure. Users should verify their use of datasets
Expand Down
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@
"scripts": {
"prebuild": "./scripts/make-url-index.sh > src/urls.ts && ./scripts/build_datapackage.py",
"build": "rollup -c",
"flights": "node scripts/flights.js",
"github": "python scripts/github.py",
"release": "release-it"
}
Expand Down
3 changes: 1 addition & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ dev = ["ipython[kernel]>=8.30.0", "ruff>=0.8.2", "taplo>=0.9.3"]
extend-exclude = [
".venv",
"scripts/budget.py",
"scripts/flights.py",
"scripts/github.py",
"scripts/make-unemployment-across-industries.py",
"scripts/update_countries_json.py",
Expand Down Expand Up @@ -111,7 +110,7 @@ select = [
[tool.pyright]
enableExperimentalFeatures = true
ignore = ["../../../**/Lib", ".venv"]
include = ["./scripts/build_datapackage.py", "./scripts/flights2.py"]
include = ["./scripts/build_datapackage.py", "./scripts/flights.py"]
pythonPlatform = "All"
pythonVersion = "3.12"
reportUnusedExpression = "none"
Expand Down
33 changes: 0 additions & 33 deletions scripts/flights.js

This file was deleted.

Loading
Loading