Skip to content

Commit

Permalink
Adding TargetValue and MolecularWeight Sampler (#172)
Browse files Browse the repository at this point in the history
This PR supersedes #165 and #170 and resolves #168.

#165 implemented a sampler which would divide molecules based on their
molecular weight (either ascending or descending), and #170 added
basically the same thing but for arbitrary target values. This PR just
refactors #170 as if #165 where already implemented, reducing code
duplication quite a bit.

There is also some small internal cleanup, namely
00eeca6,
98b3efd, and
3b8bca1 (as well as some long-overdue
CI updates).
  • Loading branch information
kspieks authored Feb 15, 2024
2 parents 8213a02 + 51f3278 commit daf00ac
Show file tree
Hide file tree
Showing 24 changed files with 721 additions and 346 deletions.
27 changes: 0 additions & 27 deletions .github/workflows/check_pypi_build.yml

This file was deleted.

164 changes: 164 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
name: Continuous Integration
on:
schedule:
- cron: "0 8 * * 1-5"
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:

concurrency:
group: actions-id-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
check-formatting:
name: Check Build and Formatting Errors
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Dependencies
run: |
python -m pip install pycodestyle isort
- name: Check Build
run: |
python -m pip install .
- name: Run pycodestyle
run: |
pycodestyle --statistics --count --max-line-length=150 --show-source --ignore=E203 .
- name: Check Import Ordering Errors
run: |
isort --check-only --verbose .
build-and-test:
needs: check-formatting
continue-on-error: true
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
os: [ubuntu-latest, windows-latest, macos-latest]

runs-on: ${{ matrix.os }}
defaults:
run:
shell: bash -el {0}
name: ${{ matrix.os }} Python ${{ matrix.python-version }} Subtest
steps:
- uses: actions/checkout@v3
- uses: mamba-org/setup-micromamba@main
with:
environment-name: temp
condarc: |
channels:
- defaults
- conda-forge
channel_priority: flexible
create-args: |
python=${{ matrix.python-version }}
- name: Install Dependencies
run: |
python -m pip install -e .[molecules]
python -m pip install coverage pytest
- name: Run Tests
run: |
coverage run --source=. --omit=astartes/__init__.py,setup.py,test/* -m pytest -v
- name: Show Coverage
run: |
coverage report -m
ipynb-ci:
needs: check-formatting
strategy:
fail-fast: false
matrix:
nb-file:
["barrier_prediction_with_RDB7/RDB7_barrier_prediction_example", "train_val_test_split_sklearn_example/train_val_test_split_example", "split_comparisons/split_comparisons", "mlpds_2023_astartes_demonstration/mlpds_2023_demo"]
runs-on: ubuntu-latest
defaults:
run:
shell: bash -el {0}
name: Check ${{ matrix.nb-file }} Notebook Execution
steps:
- uses: actions/checkout@v3
- uses: mamba-org/setup-micromamba@main
with:
environment-name: temp
condarc: |
channels:
- defaults
- conda-forge
channel_priority: flexible
create-args: |
python=3.11
- name: Install dependencies
run: |
python -m pip install -e .[molecules,demos]
python -m pip install notebook
- name: Test Execution
run: |
cd examples/$(dirname ${{ matrix.nb-file }})
jupyter nbconvert --to script $(basename ${{ matrix.nb-file }}).ipynb
ipython $(basename ${{ matrix.nb-file }}).py
coverage-check:
if: contains(github.event.pull_request.labels.*.name, 'PR Ready for Review')
needs: [build-and-test, ipynb-ci]
runs-on: ubuntu-latest
defaults:
run:
shell: bash -el {0}
steps:
- uses: actions/checkout@v3
- uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: "3.10"
- name: Install Dependencies
run: |
python -m pip install -e .[molecules]
python -m pip install coverage
- name: Run Tests
run: |
coverage run --source=. --omit=astartes/__init__.py,setup.py,test/*,astartes/samplers/sampler.py -m unittest discover -v
- name: Show Coverage
run: |
coverage report -m > temp.txt
cat temp.txt
python .github/workflows/coverage_helper.py
echo "COVERAGE_PERCENT=$(cat temp2.txt)" >> $GITHUB_ENV
- name: Request Changes via Review
if: ${{ env.COVERAGE_PERCENT < 90 }}
uses: andrewmusgrave/[email protected]
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
event: REQUEST_CHANGES
body: "Increase test coverage from ${{ env.COVERAGE_PERCENT }}% to at least 90% before merging."

- name: Approve PR if Coverage Sufficient
if: ${{ env.COVERAGE_PERCENT > 89 }}
uses: andrewmusgrave/[email protected]
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
event: APPROVE
body: "Test coverage meets or exceeds 90% threshold (currently ${{ env.COVERAGE_PERCENT }}%)."

ci-report-status:
name: report CI status
needs: [build-and-test, ipynb-ci]
runs-on: ubuntu-latest
steps:
- run: |
result_1="${{ needs.build-and-test.result }}"
result_2="${{ needs.ipynb-ci.result }}"
if test $result_1 == "success" && test $result_2 == "success"; then
exit 0
else
exit 1
fi
53 changes: 0 additions & 53 deletions .github/workflows/coverage_reject.yml

This file was deleted.

31 changes: 0 additions & 31 deletions .github/workflows/format_code.yml

This file was deleted.

42 changes: 0 additions & 42 deletions .github/workflows/ipynb_ci.yml

This file was deleted.

50 changes: 0 additions & 50 deletions .github/workflows/run_tests.yml

This file was deleted.

4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Follow [this link](https://JacksonBurns.github.io/astartes/) for a nicely-render
Keep reading for a installation guide and links to tutorials!

## Installing `astartes`
We recommend installing `astartes` within a virtual environment, using either `venv` or `conda` (or other tools) to simplify dependency management. Python versions 3.7, 3.8, 3.9, 3.10, 3.11, and 3.12 are supported on all platforms.
We recommend installing `astartes` within a virtual environment, using either `venv` or `conda` (or other tools) to simplify dependency management. Python versions 3.8, 3.9, 3.10, 3.11, and 3.12 are supported on all platforms.

> **Warning**
> Windows (PowerShell) and MacOS Catalina or newer (zsh) require double quotes around text using the `'[]'` characters (i.e. `pip install "astartes[molecules]"`).
Expand Down Expand Up @@ -226,8 +226,10 @@ Do not provide a `random_state` in the `hopts` dictionary - it will be overwritt
| Sample set Partitioning based on joint X-Y distances (SPXY) | 'spxy' | Interpolative | `distance_metric` | Saldhana et. al [original paper](https://www.sciencedirect.com/science/article/abs/pii/S003991400500192X) :small_blue_diamond: | Extension of Kennard Stone that also includes the response when sampling distances. |
| Mahalanobis Distance Kennard Stone (MDKS) | 'spxy' _(MDKS is derived from SPXY)_ | Interpolative | _none, see Notes_ | Saptoro et. al [original paper](https://espace.curtin.edu.au/bitstream/handle/20.500.11937/45101/217844_70585_PUB-SE-DCE-FM-71008.pdf?sequence=2&isAllowed=y) | MDKS is SPXY using Mahalanobis distance and can be called by using SPXY with `distance_metric="mahalanobis"` |
| Scaffold | 'scaffold' | Extrapolative | `include_chirality` | [Bemis-Murcko Scaffold](https://pubs.acs.org/doi/full/10.1021/jm9602928) :small_blue_diamond: as implemented in RDKit | This sampler requires SMILES strings as input (use the `molecules` subpackage) |
| Molecular Weight| 'molecular_weight' | Extrapolative | _none_ | ~ | Sorts molecules by molecular weight as calculated by RDKit |
| Sphere Exclusion | 'sphere_exclusion' | Extrapolative | `metric`, `distance_cutoff` | _custom implementation_ | Variation on Sphere Exclusion for arbitrary-valued vectors. |
| Time Based | 'time_based' | Extrapolative | _none_ | Papers using Time based splitting: [Chen et al.](https://pubs.acs.org/doi/full/10.1021/ci200615h) :small_blue_diamond:, [Sheridan, R. P](https://pubs.acs.org/doi/full/10.1021/ci400084k) :small_blue_diamond:, [Feinberg et al.](https://pubs.acs.org/doi/full/10.1021/acs.jmedchem.9b02187) :small_blue_diamond:, [Struble et al.](https://pubs.rsc.org/en/content/articlehtml/2020/re/d0re00071j) | This sampler requires `labels` to be an iterable of either date or datetime objects. |
| Target Property | 'target_property' | Extrapolative | `descending` | ~ | Sorts data by regression target y |
| Optimizable K-Dissimilarity Selection (OptiSim) | 'optisim' | Extrapolative | `n_clusters`, `max_subsample_size`, `distance_cutoff` | _custom implementation_ | Variation on [OptiSim](https://pubs.acs.org/doi/10.1021/ci025662h) for arbitrary-valued vectors. |
| K-Means | 'kmeans' | Extrapolative | `n_clusters`, `n_init` | [`sklearn KMeans`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) | Passthrough to `sklearn`'s `KMeans`. |
| Density-Based Spatial Clustering of Applications with Noise (DBSCAN) | 'dbscan' | Extrapolative | `eps`, `min_samples`, `algorithm`, `metric`, `leaf_size` | [`sklearn DBSCAN`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) Documentation| Passthrough to `sklearn`'s `DBSCAN`. |
Expand Down
Loading

0 comments on commit daf00ac

Please sign in to comment.