Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent Directions Modules #329

Open
wants to merge 61 commits into
base: main
Choose a base branch
from

Conversation

john-winnicki
Copy link

@john-winnicki john-winnicki commented Jul 19, 2023

Parallelized Fast Alpha Frequent Directions with Tree Merging and Data Projection.

The implementation is separated into three separate modules which are designed to be applied serially:

  1. Frequent Directions Algorithm: Produces a matrix sketch of experiment data and saves to h5 files.
  2. Matrix Sketch Tree Merge: Merge Matrix sketches in a branching fashion and saves to h5 files.
  3. Latent Space Projection: Project experiment data to space spanned by matrix sketches and saves to h5 files.

Example Python Script for running modules:

import sys
sys.path.append("/sdf/home/w/winnicki/btx/")
from btx.interfaces.ipsana import *
from btx.processing.freqdir import *
import numpy as np
import time
from datetime import datetime

exp = 'mfxp23120' # experiment name
run = 100 #run number
det_type = 'MfxEndstation.0:Epix10ka2M.0' # detector name, e.g epix10k2M or jungfrau4M

currRun = datetime.now().strftime("%y%m%d%H%M%S")
writeToHere = "/sdf/data/lcls/ds/mfx/mfxp23120/scratch/winnicki/h5writes/"

stfull = time.process_time()
#SKETCHING STEP
##########################################################################################
freqDir = FreqDir(john_start=0, tot_imgs=20000, rankAdapt=False, exp=exp, run=run,
        det_type=det_type, ell=50, alpha=0.2, downsample=False, bin_factor=0, merger=False,
        mergerFeatures=0, writeDirec = writeToHere)
print("STARTING SKETCHING")
st = time.process_time()
freqDir.run()
localSketchFilename = freqDir.write()
et = time.process_time()
print("Estimated time for frequent directions rank {0}/{1}: {2}".format(freqDir.rank, freqDir.size, et - st))
#MERGING STEP
##########################################################################################
mergeTree = MergeTree(divBy=2, readFile = localSketchFilename, dataSetName="sketch", writeDirec = writeToHere)
st = time.process_time()
mergeTree.merge()
mergedSketchFilename = mergeTree.write()
et = time.process_time()
print("Estimated time merge tree for rank {0}/{1}: {2}".format(freqDir.rank, freqDir.size, et - st))
#PROJECTION STEP
##########################################################################################
appComp = ApplyCompression(john_start=0, tot_imgs=9000, rankAdapt=False, exp=exp, run=run, det_type=det_type,
        ell=50, alpha=0.2, downsample=False, bin_factor=16, merger=False, mergerFeatures=0,readFile = mergedSketchFilename,
        dataSetName="sketch", writeDirec = writeToHere)
st = time.process_time()
appComp.run()
appComp.write()
et = time.process_time()
print("Estimated time projection for rank {0}/{1}: {2}".format(appComp.rank, appComp.size, et - st))
##########################################################################################
etfull = time.process_time()
print("Estimated full time for rank {0}/{1}: {2}".format(appComp.rank, appComp.size, etfull - stfull))

Example Slurm File

#!/bin/bash

#SBATCH --partition=milano --account lcls
#
#SBATCH --job-name=freqDirTest
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#
#SBATCH --time=0-20:00:00
#SBATCH --mem-per-cpu=20G
#SBATCH --ntasks=64     

mpirun -np 64 python /sdf/home/w/winnicki/sandbox/fullRun.py

@fredericpoitevin
Copy link
Collaborator

fredericpoitevin commented Jul 27, 2023

Hi @john-winnicki, running your script on S3DF seemed to work:
It generated a bunch of hdf5 files

(base) [fpoitevi@sdfiana001 FD_john]$ ll /sdf/data/lcls/ds/mfx/mfxp23120/scratch/winnicki/h5writes/
total 14361668
-rw-rw----+ 1 fpoitevi ps-data 865077248 Jul 26 17:29 230726170857_merge.h5
-rw-rw----+ 1 fpoitevi ps-data 865077248 Jul 26 17:27 230726170857_sketch_0.h5
...
-rw-rw----+ 1 fpoitevi ps-data 865077248 Jul 26 17:25 230726170857_sketch_9.h5

And completed without errors.

(base) [fpoitevi@sdfiana001 FD_john]$ more output-20217035.txt 
...
Estimated full time for rank 10/16: 1386.063531425

If you are happy with this, please:

  • remove OLDfreqdir.py from this PR
  • make sure you update freqdir.py
  • document how to run it in the initial conversation box above. All important information should go there, including content of all scripts used to run.

Once this is done, please click on "Ready to review" and we'll continue discussing from there.

Thanks!

John Winnicki and others added 2 commits July 27, 2023 18:31
@john-winnicki john-winnicki marked this pull request as ready for review July 28, 2023 18:11
@john-winnicki john-winnicki changed the title Frequent Directions Rough Draft Frequent Directions Modules Jul 28, 2023
Copy link
Collaborator

@fredericpoitevin fredericpoitevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @john-winnicki for this great contribution!

Please check my comments and let me know if you want to discuss them offline. Thanks!

btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
@fredericpoitevin fredericpoitevin self-requested a review August 4, 2023 16:26
Copy link
Collaborator

@fredericpoitevin fredericpoitevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @john-winnicki - this looks great!

btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Outdated Show resolved Hide resolved
btx/processing/freqdir.py Show resolved Hide resolved
btx/processing/freqdir.py Show resolved Hide resolved
John Winnicki and others added 6 commits August 8, 2023 03:14
…wards parent class (reverting old functions to pipca versions).
…ctions are housed here for Frequent Directions and PIPCA module. ALso appropriately modified FD and PIPCA code. Fixed indexing issue, removed means, zeroed negative values and fixed overflowing issues. Other nice changes.
John Winnicki and others added 27 commits September 14, 2023 10:14
Disabling common mode correction in FredDir DataRetriever.
created FD sketch tasks and workflow.
…d thumbnail generation outside of sketching. Other small updates.
…raw_sketch task able to run on more than one core.
…duler), the pythonpath needs to be given as well, otherwise python or mpirun defaults to the original environment.
…to re-enable ROI and the things in the settings (I think no threshold, but throw away zeros and apply unit variance and possibly normalization).
…ot being scaled correctly using the new boxing mechanism.
@john-winnicki
Copy link
Author

john-winnicki commented Mar 12, 2024

Pushed the most recent changes I made which should allow for elog submission.

Note that the images will likely be mostly noise, since the centering function needs to be tuned.

The elog submission commands are:

elog_submit.sh -c ../yaml/default_config2.yaml -t draw_sketch -n 8
elog_submit.sh -c ../yaml/default_config2.yaml -t show_sketch -n 8

The YAML file is:

setup:
  queue: 'milano'
  root_dir: '/sdf/data/lcls/ds/mfx/mfxp23120/scratch/winnicki'
  exp: 'xppc00121'
  run: 511
  det_type: 'XppEndstation.0:Alvium.1'
  cell: ''

draw_sketch:
  exp: 'xppc00121'
  run: 511
  det_type: 'XppEndstation.0:Alvium.1'
  grabImgSteps: 16
  writeToHere: "/sdf/data/lcls/ds/mfx/mfxp23120/scratch/winnicki/sketch/"
  start_offset: 0
  num_imgs: 12000
  alpha: 0.2
  rankAdapt: False
  rankAdaptMinError: 100
  downsample: False
  eluThreshold: False
  eluAlpha: 0.01
  threshold: True
  normalizeIntensity: True
  noZeroIntensity: False
  bin_factor: 1
  minIntensity: 1000
  samplingFactor: 1
  divBy: 2
  thresholdQuantile: 0.9975
  usePSI: True
  num_components: 200

show_sketch:
  exp: 'xppc00121'
  run: 511
  det_type: 'XppEndstation.0:Alvium.1'
  outdir: "/sdf/data/lcls/ds/mfx/mfxp23120/scratch/winnicki/sketch/"
  num_imgs: 12000
  nprocs: 8
  skip_size: 4
  num_imgs_to_use: 12000

Launch the elog command from "launchpad" directory and this should write to a "sketch" directory at the same level. I have attached a picture of the result, where I visualized 12,000 images. I selected a some points using the lasso tool, and the percent of currently displayed points is shown on the right (note that if you move the slider, this percent changes because the slider controls the currently displayed points). My mouse is currently hovering over a few of the points, which is why there is a hover tooltip showing.
Screenshot 2024-03-12 at 12 46 48 PM

I'll also note that in the above, the clustering does not work so well, since there are no really clearly defined clusters, apart from the left and right groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants