Skip to content

Data reduction

Ti-Yen Lan edited this page May 18, 2018 · 22 revisions

The data reduction step can be divided into the following processes:

Here we generate the necessary files for the EMC reconstruction. For the test dataset, this step can be skipped by directly using the reduced data provided by us, which will be described in skip data reduction.

Mapping detector pixels

We start by updating the experimental parameters in the [make-detector] section of config.ini:

[make-detector]
# pixel
num_row = 2527
num_col = 2463
cx = 1285.5
cy = 1262.0
Rstop = 115.0

# meter
detd = 0.45
px = 172e-6

# angstrom
wl = 1.03324
res_max = 1.95

# beam incidence direction
sx = 0.005
sy = -0.01
sz = -1

The parameters num_row and num_col specify the detector size in pixels. The pixels are labeled by coordinates (x,y), with x = 0, 1,..., num_row−1 and y = 0, 1,..., num_col−1. With this choice of coordinates, the upper-left detector pixel has coordinates (0,0), and the X-ray beam is incident in the -z direction. We assume the X-ray polarization is in the y direction, because the main application of our program is on the analysis of SMX data taken at storage ring synchrotron sources. The parameters (cx, cy) label the beam incidence point on the detector, and Rstop is the beamstop radius in pixels. The other parameters include detd, the sample-to-detector distance, px, the squared detector pixel size, wl, the incident X-ray wavelength, and res_max, the maximum resolution of the pixels that will be considered in the reconstruction. The vector (sx,sy,sz) indicates the beam incidence direction (does not have to be normalized), and is typically set as (0, 0, −1).

After updating the parameters, we move to the directory make-detector and execute the command

python make-mask.py [path to frame] > run.log

to generate the file mask.dat in the directory aux to exclude the detector gaps and the pixels shadowed by the beamstop holder. Here [path to frame] is the path to one of the data frames in the cbf format. You should expect to see a masked data frame that looks like:



The beamstop region will be masked out in the files that record the mapping of the detector pixels to reciprocal space, which are obtained by executing the commands:

gcc make-detector.c -O3 -lm -o det
./det ../config.ini >> run.log

Background estimation and peak finding

After moving to the directory, make-background, we generate the lists of the filenames associated with each data frame using the command:

python make-filelists.py [raw-data-dir]

Here [raw-data-dir] is the path to the directory that contains the cbf files downloaded from CXIDB.

Then we update the parameters in the [make-background] section in config.ini:

[make-background]
num_raw_data = 79992
hot_pix_thres = 1e4
qlen = 500

The execution of make-filelists.py has automatically updated the value of num_raw_data, the total number of data frames. The number, hot_pix_thres, is the threshold value beyond which a pixel is identified as defective and masked out. In our analysis, we assume that the background scatter in each data frame is azimuthally symmetric about the incident X-ray beam, and qlen represents the number of bins that divide the spatial frequency magnitudes with equal spacing for the background estimation. Finally, we execute the commands:

make
mpirun -np [nproc] ./ave_bg ../config.ini > run.log &

to estimate the pixel-wise background values and identify the outlier pixels in each frame, where [nproc] is the number of processors used in the parallel processing.

Lattice parameter estimation

Next, we move to the directory, make-powder, to estimate the lattice parameters. The parameters in the [make-powder] section in config.ini are:

[make-powder]
min_patch_sz = 2
max_patch_sz = 10
min_num_peak = 3
max_num_peak = 20

A Bragg peak candidate is assumed to contain at least min_patch_sz but no more than max_patch_sz contiguous outlier pixels identified from the diffuse background scatter. Only the data frames with at least min_num_peak but no more than max_num_peak candidate peaks are kept for the later analysis. The enforcement of data sparsity can be removed by making max_num_peak a large integer.

By executing the commands

gcc make-powder.c -O3 -lm -o powder
./powder ../config.ini > run.log

we generate frame-peak-count.dat, peak-sz-count.dat, 1d-pseudo-powder.dat and 2d-pseudo-powder.dat.

skip data reduction