Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Copy Script [minor] #4

Merged
merged 30 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
164d505
refactor cp script for jobs
ric-evans Nov 21, 2024
d0f3e5a
refactor cp script for jobs - 2
ric-evans Nov 21, 2024
a49bc87
rename dir `scripts`
ric-evans Nov 21, 2024
91f7e7b
add `calc_depth_to_dataset_dirs.py`
ric-evans Nov 21, 2024
b3b5fa3
<bot> update pyproject.toml
invalid-email-address Nov 21, 2024
7a87eea
fix renaming
ric-evans Nov 21, 2024
70cf337
add `calc_depth_to_dataset_dirs.py` - 2
ric-evans Nov 21, 2024
815b9cd
mypy
ric-evans Nov 21, 2024
8d27613
add `cp-dataset-histos.sh`
ric-evans Nov 21, 2024
3552a5d
relative paths
ric-evans Nov 21, 2024
87274ca
relative paths - 2
ric-evans Nov 21, 2024
332c425
add `--force` flag
ric-evans Nov 21, 2024
e64b143
add test
ric-evans Nov 21, 2024
c6415ac
add test - 2
ric-evans Nov 21, 2024
11ea247
add test - 3
ric-evans Nov 21, 2024
c2d20f6
`set -x`
ric-evans Nov 21, 2024
9e8e985
fix test
ric-evans Nov 21, 2024
791bfad
fix test - 2
ric-evans Nov 21, 2024
a6943eb
fix test - 3 (path)
ric-evans Nov 21, 2024
dc01674
fix test - 4 (comps)
ric-evans Nov 21, 2024
236fa47
fix test - 5 (existing histos)
ric-evans Nov 21, 2024
124e20a
fix test - 6 (existing histos)
ric-evans Nov 21, 2024
b48c6c1
fix test - 7 (existing histos)
ric-evans Nov 21, 2024
36c5206
fix test - 8 (existing histos)
ric-evans Nov 21, 2024
1d74ce0
fix test - 9 (existing histos)
ric-evans Nov 21, 2024
bf634b3
fix test - 10 (too fast!)
ric-evans Nov 21, 2024
c16cb54
fix test - 11 (rise prob)
ric-evans Nov 21, 2024
8003fb1
improve test - don't rely on dice rolls
ric-evans Nov 21, 2024
bb2f572
improve test - 2
ric-evans Nov 21, 2024
8b828b2
(test syntax)
ric-evans Nov 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 148 additions & 10 deletions .github/workflows/wipac-cicd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,16 +119,16 @@ jobs:
set -euo pipefail
pytest -vvv tests/unit/

test-wrapper-script:
test-sample-each-dataset-sh:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
max_num_datasets:
- 1
- 25
- 75
- 100 # aka all of them, currently, there are 48
base_path:
src_path:
- /tmp/data/sim/Upgrade/2022/generated/neutrino-generator/88888
- /tmp/data/sim/IceCube/2023/filtered/CORSIKA
- /tmp/data/sim/Upgrade/2022/filtered
Expand All @@ -140,14 +140,14 @@ jobs:
uses: actions/checkout@v3
- name: Set up Python environment
uses: actions/setup-python@v4
- name: Create a mock dataset structure

- name: Create source dataset dirs/files
run: |
set -euo pipefail
job_range_dpaths=(
/tmp/data/sim/{IceCube,Upgrade}/{2022,2023}/{generated,filtered}/{CORSIKA,neutrino-generator}/{77777,88888,99999}/{00-11,22-33,44-55}
)

# Create directories and conditionally populate files
for dpath in "${job_range_dpaths[@]}"; do
mkdir -p "$dpath"/histos/
# create 1-5 pkl files
Expand All @@ -162,27 +162,27 @@ jobs:
set -euo pipefail
tree /tmp/data/sim/

- name: Run script with matrix parameters
- name: Run script
run: |
set -euo pipefail
set -x
./resources/sample-each-dataset.sh ${{ matrix.base_path }} 0.5 ${{ matrix.max_num_datasets }}
./scripts/sample-each-dataset.sh ${{ matrix.src_path }} 0.5 ${{ matrix.max_num_datasets }}

- name: Validate script execution
run: |
set -euo pipefail
echo "Max num of datasets: ${{ matrix.max_num_datasets }}"

# Count dataset directories containing at least one "*.histo.hdf5" file
available_datasets=$(find ${{ matrix.base_path }} -type d -regex ".*/[0-9]+-[0-9]+$" -exec dirname {} \; | sort -u | wc -l)
available_datasets=$(find ${{ matrix.src_path }} -type d -regex ".*/[0-9]+-[0-9]+$" -exec dirname {} \; | sort -u | wc -l)
echo "Available datasets: $available_datasets"

# Use the lesser of available_datasets and num_datasets for validation
expected_num_datasets=$(( available_datasets < ${{ matrix.max_num_datasets }} ? available_datasets : ${{ matrix.max_num_datasets }} ))
echo "Expected datasets: $expected_num_datasets"

# Check processed count
processed_count=$(find ${{ matrix.base_path }} -name '*.histo.hdf5' | wc -l)
processed_count=$(find ${{ matrix.src_path }} -name '*.histo.hdf5' | wc -l)
echo "Processed count: $processed_count"

if [[ $processed_count -ne $expected_num_datasets ]]; then
Expand All @@ -197,13 +197,151 @@ jobs:
set -euo pipefail
tree /tmp/data/sim/

test-cp-dataset-histos-sh:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
prev_histos_setting:
- none
- overwrite
- keep
src_path:
- /tmp/data/sim/IceCube/2023/generated/neutrino-generator
- /tmp/data/sim/Upgrade/2022/
env:
DEST_DIR: /tmp/mycopy
OLD_FILE_MODTIME: 0
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Python environment
uses: actions/setup-python@v4

- name: Create source dataset dirs/files
run: |
set -euo pipefail
precreate_ct=0
max_to_precreate=2

dataset_dpaths=(
/tmp/data/sim/{IceCube,Upgrade}/{2022,2023}/{generated,filtered}/{CORSIKA,neutrino-generator}/{77777,88888,99999}
)

for dpath in "${dataset_dpaths[@]}"; do
echo
echo "adding: $dpath"
histo="$dpath"/"$(basename $dpath).histo.hdf5"
mkdir -p $(dirname $histo)
set -x
touch $histo
set +x

# pre-create some of these files in the destination
if [[ "${{ matrix.prev_histos_setting }}" == "overwrite" || "${{ matrix.prev_histos_setting }}" == "keep" ]]; then
# check that this histo would be touched by the script (and only make some of these)
if [[ "$dpath" == "${{ matrix.src_path }}"* ]] && (( precreate_ct < max_to_precreate )); then
echo "creating 'existing' histo file"
relative_path="${dpath#*/sim/}"
dest_dataset_dir="$DEST_DIR/sim/$relative_path"
mkdir -p "$dest_dataset_dir"
set -x
touch "$dest_dataset_dir"/"$(basename "$dest_dataset_dir").histo.hdf5"
(( ++precreate_ct ))
set +x
fi
fi

done

# set the oldest file's mod time
if [[ "${{ matrix.prev_histos_setting }}" == "overwrite" || "${{ matrix.prev_histos_setting }}" == "keep" ]]; then
oldest_modtime=$(find "$DEST_DIR" -name "*.histo.hdf5" -type f -exec stat --format='%Y' {} + | sort -n | head -1)
echo "OLD_FILE_MODTIME=$oldest_modtime" >> $GITHUB_ENV
sleep 5 # wait b/c the test can take < 1 sec
fi

- name: Look at src filetree (before)
run: |
set -euo pipefail
tree /tmp/data/sim/

- name: Look at dest filetree (before)
run: |
set -euo pipefail
tree $DEST_DIR || echo "no files here"

- name: Run script
run: |
set -euo pipefail
if [[ "${{ matrix.prev_histos_setting }}" == "overwrite" ]]; then
force_flag="--force"
else
force_flag=""
fi
set -x
./scripts/cp-dataset-histos.sh ${{ matrix.src_path }} $DEST_DIR $force_flag

- name: Validate copied histograms
run: |
set -euo pipefail

src_count=$(find ${{ matrix.src_path }} -name "*.histo.hdf5" | wc -l)
dest_count=$(find $DEST_DIR -name "*.histo.hdf5" | wc -l)
echo "Source histograms: $src_count"
echo "Copied histograms: $dest_count"
if [[ $src_count -ne $dest_count ]]; then
echo "Copied histograms count ($dest_count) does not match source histograms count ($src_count)!"
exit 1
fi

# check the overwriting settings
oldest_modtime=$(find "$DEST_DIR" -name "*.histo.hdf5" -type f -exec stat --format='%Y' {} + | sort -n | head -1)
echo "Oldest histo file modtime: $oldest_modtime"
echo "Previous oldest histo file modtime: $OLD_FILE_MODTIME"
case "${{ matrix.prev_histos_setting }}" in
none)
# oldest modtime should be younger (greater) than previously-stored value
if [[ $oldest_modtime -le $OLD_FILE_MODTIME ]]; then
echo "ERROR: there is an older file in here!" >&2
exit 1
fi
;;
overwrite)
# oldest modtime should be younger (greater) than previously-stored value
if [[ $oldest_modtime -le $OLD_FILE_MODTIME ]]; then
echo "ERROR: there is an older file in here! aka script didn't overwrite" >&2
exit 1
fi
;;
keep)
# oldest modtime should be the previously-stored value
if [[ $oldest_modtime -ne $OLD_FILE_MODTIME ]]; then
echo "ERROR: there is no older file in here! aka the scrip did overwrite" >&2
exit 1
fi
;;
*)
echo "Error: Unknown value for prev_histos_setting: $prev_histos_setting" >&2
exit 1
;;
esac

echo "All tests passed for src_path=${{ matrix.src_path }} and dest_dir=$DEST_DIR."

- name: Look at dest filetree (after)
run: |
set -euo pipefail
tree $DEST_DIR


###########################################################################
# RELEASE
###########################################################################

release:
if: github.ref == 'refs/heads/main'
needs: [ py-setup, flake8, mypy, code-format, unit-tests, test-wrapper-script ]
needs: [ py-setup, flake8, mypy, code-format, unit-tests, test-sample-each-dataset-sh, test-cp-dataset-histos-sh ]
runs-on: ubuntu-latest
concurrency: release # prevent any possible race conditions
steps:
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ name = "icecube-simprod-histogram"
description = "Utilities for working with histograms created for simprod"
readme = "README.md"
keywords = ["histogram sampling", "simulation", "statistics"]
classifiers = ["Development Status :: 3 - Alpha", "Programming Language :: Python :: 3.11"]
classifiers = ["Development Status :: 4 - Beta", "Programming Language :: Python :: 3.11"]
requires-python = ">=3.11, <3.12"

[[project.authors]]
Expand Down
128 changes: 0 additions & 128 deletions resources/cp-src-histos-tree.sh

This file was deleted.

Loading
Loading