Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Densification of sparse datasets #448

Merged
merged 38 commits into from
Jul 12, 2021
Merged

Densification of sparse datasets #448

merged 38 commits into from
Jul 12, 2021

Conversation

kif
Copy link
Member

@kif kif commented Apr 28, 2021

Densification -> LimaImage
related to #443

@vallsv
Copy link
Contributor

vallsv commented Apr 28, 2021

Just naive question, do you really need a densify entry?

Could'nt it be part of the convert tool?

You should know that the input is sparse and you can request an image format as output.

@vallsv
Copy link
Contributor

vallsv commented Apr 28, 2021

Anyway, i think you should use fabio-densify as entry point. THis entry points are shared with the whole installed libs. densify sounds to me too generic and easy to overwrite by accident.

@kif
Copy link
Member Author

kif commented Apr 28, 2021

I can check if it can be merged to the convert part ... reasonable idea.

kif added 4 commits April 28, 2021 17:33
I use densify-Bragg sinc this tool is the symmetric of sparsify-Bragg
found in pyFAI
@kif
Copy link
Member Author

kif commented Apr 29, 2021

I changed the name of the CLI tool to densify-Bragg to mirror the sparsify-Bragg implemented in pyFAI.

@kif kif changed the title Increment version since I pushed to the wrong branch ... Densification of sparse datasets May 4, 2021
@jonwright
Copy link
Collaborator

I will jump in here with a series of potentially silly questions ... I guess this is a project for another beamline and so I missed the earlier discussions :

  • where is the sparse data coming from? Is it generated in the pyFAI codebase ?
  • does this write out files or will it run "in memory" ?
  • who is asking for data that comes from a random number generator ?
  • How are the noise characteristics matched ? Did you look at mean absolute deviation (MAD) better than std ? I was recently impressed by this explanation : https://www.youtube.com/watch?v=iKJy2YpYPe8
  • could we do this packing/unpacking inside a hdf5 compression filter instead ?

Perhaps I am blinded by my own code which is using the sparse pixel values to replace images. I did not think of asking for this for fabio - it is more like a panda's (or arrow) dataframe with (i,j,k,intensity) entries.

@jonwright
Copy link
Collaborator

Obviously, I have no objection to any of this. I just didn't understand what it is meant for.

@kif
Copy link
Member Author

kif commented May 7, 2021

Hi Jon,

Here are your answers:

  • where is the sparse data coming from? Is it generated in the pyFAI codebase ?`

This is the densification of the sparse-format produced by pyFAI (as part of the work performed for SSX: they are paying for this development). The sparsifaction code is currently only available in OpenCL and only efficient on GPU. Also, incorrect results have been observed on AMD GPU. The densification code is part of FabIO since it aims at a wider distribution.

  • does this write out files or will it run "in memory" ?

The reading of the sparse format works in memory (open the file with fabio.open, access to frame of interest, retrieve your FabioImage, copy it to the format you want. Of course the densification tool (CLI tool this PR is about) saves in HDF5 with LimaImage or EigerImage format (so far) for subsequent processing.

  • who is asking for data that comes from a random number generator ?

Nobody asks for data coming from a PRNG, except that the metric (R-factor) obtained on protein data after reduction with XDS is much worse without noise than with noise. To get "likely" data, one needs to add noise. In the future, this feature may be removed but first, one needs to develop direct addapters to Dials,XDS or CrystFEL. This is a short-cut to have the beamline in production by this summer. Noise remains optional (and now modulated).

  • How are the noise characteristics matched ?

The azimuthal mean and deviation have been obtained from a sigma-clipping as described in https://www.desy.de/~barty/cheetah/Cheetah/SFX_hitfinding.html (steps 1 to 3 of peakfinder8). The sigma-clipping enforces a normal distribution which makes the usage of the standard deviation valid. Median and MAD are nice tools, but too slow I need to process a frame within one millisecond !

  • could we do this packing/unpacking inside a hdf5 compression filter instead ?

This is possible but this was not asked by the client: they want to process those files in the same way as other MX beamline.

@kif kif merged commit ad44ffb into silx-kit:master Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants