Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize #4

Merged
merged 36 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b87eec9
Use yaml to parse parameters
runboj Mar 19, 2024
e477009
Add pyyaml
runboj Mar 20, 2024
e9ad7e5
Add tiled
runboj Mar 28, 2024
181e7d5
Update image ending
runboj Mar 28, 2024
b907877
Remove input data path'
runboj Mar 28, 2024
d87710e
Use yaml file for UMAP_example
runboj Mar 28, 2024
c17d0e5
Read parameters from yaml file
runboj Mar 28, 2024
ee190f6
Update ignore files
runboj Mar 28, 2024
d7121f9
Add uid_retrieve
runboj Apr 3, 2024
5b0c0ed
Add flake8
runboj Apr 3, 2024
ca9e086
Add pre-commit
runboj Apr 3, 2024
0b27191
Update umap to read latent vectors from autoencoder
runboj Apr 3, 2024
189caf8
updating results directories
taxe10 Apr 5, 2024
eadf5e3
fixed bug when uid_retrieve is an empty string
taxe10 Apr 5, 2024
8450885
fixed bug, compute umap mas not calling the stacked images
taxe10 Apr 6, 2024
37d9cdf
Add pyproject.toml
runboj May 28, 2024
9ad67a3
Update python version
runboj May 28, 2024
1d10c85
Move Dockerfile outside docker folder
runboj May 28, 2024
2fef752
Add unit test for umap
runboj May 28, 2024
1aef0e0
Remove docker folder
runboj May 28, 2024
4a1492a
Update readme for pyproject.toml
runboj May 28, 2024
4c3edff
Update Dockerfile as we delete requirements.txt
runboj May 28, 2024
9487709
Remove docker folder in makefile
runboj May 28, 2024
94ac5c9
Add GitHub action to check formatting and unit testing
runboj May 28, 2024
81ae6e7
Move umap run files to src folder such that toml file can run
runboj May 29, 2024
eae2c49
Use pytest
runboj May 29, 2024
5cf443e
Update tiled version
taxe10 Jun 1, 2024
916b263
Core organization and writing results back to tiled
taxe10 Jun 1, 2024
d6616c9
Adding pytest to dev
taxe10 Jun 5, 2024
0ba6ac6
Add tiled to dependencies
taxe10 Jun 5, 2024
ecc18a2
Update tiled version
taxe10 Jun 5, 2024
2677ced
Add load and save model option
taxe10 Jun 5, 2024
0e09df2
Fix bug with when data shape has length 2
taxe10 Jun 5, 2024
f73b437
Fix merge conflicts
taxe10 Jun 5, 2024
9e12aa1
Update example to reflect schema changes
taxe10 Jun 5, 2024
9a4f474
Merge pull request #5 from taxe10/main
taxe10 Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[flake8]
# 127 is width of the Github code viewer,
# black default is 88 so this will only warn about comments >127
max-line-length = 127
# Ignore errors due to incompatibility with black
#https://black.readthedocs.io/en/stable/guides/using_black_with_other_tools.html
extend-ignore = E203,E701
33 changes: 33 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: dimension_reduction_pca

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: 3.9
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install .[dev]
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
187 changes: 184 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,186 @@
*~
data/output/
data/upload/
.file_manager_vars
data/upload/
build/


.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
data/output/
data/upload/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
EC-lab
lab book
LTspice files
TRS
proc
ESpectrum Stream 00012.bin
Frame Stream 00012.bin
Metadata 00012.json
test.zarr
misc
test_fast.zarr
zarr_utils.py


data/

.vscode/
34 changes: 34 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
default_language_version:
python: python3
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-ast
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: debug-statements
- repo: https://github.com/gitguardian/ggshield
rev: v1.25.0
hooks:
- id: ggshield
language_version: python3
stages: [commit]
# Using this mirror lets us use mypyc-compiled black, which is about 2x faster
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.2.0
hooks:
- id: black
- repo: https://github.com/pycqa/flake8
rev: 7.0.0
hooks:
- id: flake8
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
args: ["--profile", "black"]
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.9
LABEL maintainer="THE MLEXCHANGE TEAM"

RUN apt-get update
RUN pip3 install --upgrade pip &&\
pip3 install .

WORKDIR /app/work
ENV HOME /app/work
ENV PYTHONUNBUFFERED=1

COPY umap_run.py umap_run.py
COPY src src
CMD ["echo", "running umap"]
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@ test:
echo ${ID_USER}

build_docker:
docker build -t ${IMG_WEB_SVC} -f ./docker/Dockerfile .
docker build -t ${IMG_WEB_SVC} -f ./Dockerfile .

build_podman:
podman build -t ghcr.io/runboj/mlex_dimension_reduction_umap:main -f ./Dockerfile .

run_docker:
docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}/data:/app/work/data/ ${IMG_WEB_SVC} bash

UMAP_example:
docker run -u ${ID_USER $USER}:${ID_GROUP $USER} --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}:/app/work/ ${IMG_WEB_SVC} python umap_run.py data/example_shapes/Demoshapes.npz data/output '{"n_components": 2, "min_dist": 0.1, "n_neighbors": 7}'
docker run -u ${ID_USER $USER}:${ID_GROUP $USER} --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}:/app/work/ ${IMG_WEB_SVC} python src/mlex_dimension_reduction_umap/umap_run.py example_umap.yaml


push_docker:
Expand Down
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,38 @@ First, build the dimension reduction image in terminal:

Once built, you can run the following examples:
`make UMAP_example`
which is equivalend to first `make run_docker` then `python umap_run.py data/example_shapes/Demoshapes.npz data/output '{"n_components": 2, "min_dist": 0.1, "n_neighbors": 7}'`.
which is equivalend to first `make run_docker` then `python umap_run.py example_umap.yaml`.

These examples utilize the information stored in the folder /data. The computed latent vectors will be saved in data/output.

#### TODO: run the container interactively
## Developer Setup
If you are developing this library, there are a few things to note.

1. Install development dependencies:

```
pip install .
pip install ".[dev]"
```

2. Install pre-commit
This step will setup the pre-commit package. After this, commits will get run against flake8, black, isort.

```
pre-commit install
```

3. (Optional) If you want to check what pre-commit would do before commiting, you can run:

```
pre-commit run --all-files
```

4. To run test cases:

```
python -m pytest
```

## Copyright
MLExchange Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
Expand Down
18 changes: 0 additions & 18 deletions docker/Dockerfile

This file was deleted.

10 changes: 0 additions & 10 deletions docker/requirements.txt

This file was deleted.

20 changes: 20 additions & 0 deletions example_umap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Example for parameters to excecute

# I/O
io_parameters:
uid_retrieve: # uid for feature vectors from autoencoder
data_type: 'file' # either "file" or "tiled"
root_uri: https://tiled-seg.als.lbl.gov/api/v1/metadata/reconstruction/rec20190524_085542_clay_testZMQ_8bit
data_uris: ['20190524_085542_clay_testZMQ_']
data_tiled_api_key:
result_tiled_uri: http://localhost:8888
result_tiled_api_key:
uid_save:
output_dir: 'data/output'
load_model_path:
save_model_path:

model_parameters:
n_components: 2
min_dist: 0.1
n_neighbors: 5
Loading