Densification of sparse datasets #448

kif · 2021-04-28T12:50:13Z

Densification -> LimaImage
related to #443

vallsv · 2021-04-28T13:13:09Z

Just naive question, do you really need a densify entry?

Could'nt it be part of the convert tool?

You should know that the input is sparse and you can request an image format as output.

vallsv · 2021-04-28T13:15:47Z

Anyway, i think you should use fabio-densify as entry point. THis entry points are shared with the whole installed libs. densify sounds to me too generic and easy to overwrite by accident.

kif · 2021-04-28T15:32:34Z

I can check if it can be merged to the convert part ... reasonable idea.

I use densify-Bragg sinc this tool is the symmetric of sparsify-Bragg found in pyFAI

kif · 2021-04-29T08:03:24Z

I changed the name of the CLI tool to densify-Bragg to mirror the sparsify-Bragg implemented in pyFAI.

jonwright · 2021-05-07T08:49:27Z

I will jump in here with a series of potentially silly questions ... I guess this is a project for another beamline and so I missed the earlier discussions :

where is the sparse data coming from? Is it generated in the pyFAI codebase ?
does this write out files or will it run "in memory" ?
who is asking for data that comes from a random number generator ?
How are the noise characteristics matched ? Did you look at mean absolute deviation (MAD) better than std ? I was recently impressed by this explanation : https://www.youtube.com/watch?v=iKJy2YpYPe8
could we do this packing/unpacking inside a hdf5 compression filter instead ?

Perhaps I am blinded by my own code which is using the sparse pixel values to replace images. I did not think of asking for this for fabio - it is more like a panda's (or arrow) dataframe with (i,j,k,intensity) entries.

jonwright · 2021-05-07T08:50:44Z

Obviously, I have no objection to any of this. I just didn't understand what it is meant for.

kif · 2021-05-07T13:46:21Z

Hi Jon,

Here are your answers:

where is the sparse data coming from? Is it generated in the pyFAI codebase ?`

This is the densification of the sparse-format produced by pyFAI (as part of the work performed for SSX: they are paying for this development). The sparsifaction code is currently only available in OpenCL and only efficient on GPU. Also, incorrect results have been observed on AMD GPU. The densification code is part of FabIO since it aims at a wider distribution.

does this write out files or will it run "in memory" ?

The reading of the sparse format works in memory (open the file with fabio.open, access to frame of interest, retrieve your FabioImage, copy it to the format you want. Of course the densification tool (CLI tool this PR is about) saves in HDF5 with LimaImage or EigerImage format (so far) for subsequent processing.

who is asking for data that comes from a random number generator ?

Nobody asks for data coming from a PRNG, except that the metric (R-factor) obtained on protein data after reduction with XDS is much worse without noise than with noise. To get "likely" data, one needs to add noise. In the future, this feature may be removed but first, one needs to develop direct addapters to Dials,XDS or CrystFEL. This is a short-cut to have the beamline in production by this summer. Noise remains optional (and now modulated).

How are the noise characteristics matched ?

The azimuthal mean and deviation have been obtained from a sigma-clipping as described in https://www.desy.de/~barty/cheetah/Cheetah/SFX_hitfinding.html (steps 1 to 3 of peakfinder8). The sigma-clipping enforces a normal distribution which makes the usage of the standard deviation valid. Median and MAD are nice tools, but too slow I need to process a frame within one millisecond !

could we do this packing/unpacking inside a hdf5 compression filter instead ?

This is possible but this was not asked by the client: they want to process those files in the same way as other MX beamline.

kif added 20 commits April 23, 2021 14:32

factorize some CLI tools

caea41a

fix test ... import is apparently needed.

dd53905

Start densifying ...

2d07062

save data a la LIMA

369325e

Merge remote-tracking branch 'upstream/master' into 443_densify

42fcb85

re-apply some modifications

0b8c9a4

security warninigs in GH dependencies

46f38a7

Fix nexus module

35f37fb

Enable writing LIMA files

d3b197c

Remove warning message when working on debian testing

c3e8040

Start working on the parser

8b27770

expose cli program

68fdf5a

work on densification (LIMA only)

ec18627

use MT as thread-safe pseudo random number generator

52968d7

Implement densification to lima format

57743a7

clean up

d4686d3

clean-up (again)

c74d85b

fix out of bound memory access

6745a46

Fix nexus test

2695096

Increment version since I pushed to the wrong branch ...

c6f4484

vallsv force-pushed the master branch from 2695096 to 5f67106 Compare April 28, 2021 12:56

kif requested a review from vallsv April 28, 2021 12:57

Fix python 3.6 (missing time.time_ns())

969dfe9

kif added 4 commits April 28, 2021 17:33

Implement Marsaglia

0c587ac

Fix overflow error on Python 3.6

dfd1f41

minor clean-up

c5b4b6f

densify was probably too generic.

992d0e5

I use densify-Bragg sinc this tool is the symmetric of sparsify-Bragg found in pyFAI

kif and others added 5 commits May 4, 2021 13:48

Merge branch 'master' into 443_densify

f6583bc

issue in conflict resolution

557f911

Issue in conflict resolution

9989fad

Write eiger data as single-dataset

a607056

deal with dummy values

b035003

kif changed the title ~~Increment version since I pushed to the wrong branch ...~~ Densification of sparse datasets May 4, 2021

kif added 3 commits May 5, 2021 08:06

Provide default display with silx view

ed6791a

Implement the saving of the master file

8fba78d

Noise becomes scaling factor from 0 to 1

43ff123

kif added 2 commits May 7, 2021 11:13

typo

faed702

Print warning message

c2f0010

kif added 3 commits May 10, 2021 08:38

Scaleable noise factor

da78420

Implement normalization in Python

1f525a5

Implement support for normalization (polarization effect)

9b585b7

kif merged commit ad44ffb into silx-kit:master Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Densification of sparse datasets #448

Densification of sparse datasets #448

kif commented Apr 28, 2021 •

edited

Loading

vallsv commented Apr 28, 2021 •

edited

Loading

vallsv commented Apr 28, 2021 •

edited

Loading

kif commented Apr 28, 2021

kif commented Apr 29, 2021

jonwright commented May 7, 2021

jonwright commented May 7, 2021

kif commented May 7, 2021

Densification of sparse datasets #448

Densification of sparse datasets #448

Conversation

kif commented Apr 28, 2021 • edited Loading

vallsv commented Apr 28, 2021 • edited Loading

vallsv commented Apr 28, 2021 • edited Loading

kif commented Apr 28, 2021

kif commented Apr 29, 2021

jonwright commented May 7, 2021

jonwright commented May 7, 2021

kif commented May 7, 2021

kif commented Apr 28, 2021 •

edited

Loading

vallsv commented Apr 28, 2021 •

edited

Loading

vallsv commented Apr 28, 2021 •

edited

Loading