-
Notifications
You must be signed in to change notification settings - Fork 4
Data reduction
The data reduction step can be divided into the following processes:
- Mapping detector pixels
- Background estimation and peak finding
- Lattice parameter estimation
- Finding probable orientations
- Data conversion
- Expansion matrix calculation
Here we generate the necessary files for the EMC reconstruction. For the test dataset, this step can be skipped by directly using the reduced data provided by us, which will be described in Skipping data reduction.
We start by updating the experimental parameters in the [make-detector]
section of config.ini
:
[make-detector]
# pixel
num_row = 2527
num_col = 2463
cx = 1285.5
cy = 1262.0
Rstop = 115.0
# meter
detd = 0.45
px = 172e-6
# angstrom
wl = 1.03324
res_max = 1.95
# beam incidence direction
sx = 0.005
sy = -0.01
sz = -1
The parameters num_row
and num_col
specify the detector size in pixels.
The pixels are labeled by coordinates (x,y)
, with x = 0, 1,..., num_row−1
and y = 0, 1,..., num_col−1
.
With this choice of coordinates, the upper-left detector pixel has coordinates (0,0)
, and the X-ray beam is incident in the -z
direction.
We assume the X-ray polarization is in the y
direction, because the main application of our program is on the analysis of SMX data taken at storage ring synchrotron sources.
The parameters (cx, cy)
label the beam incidence point on the detector, and Rstop
is the beamstop radius in pixels.
The other parameters include detd
, the sample-to-detector distance, px
, the squared detector pixel size, wl
, the incident X-ray wavelength, and res_max
, the maximum resolution of the pixels that will be considered in the reconstruction.
The vector (sx,sy,sz)
indicates the beam incidence direction (does not have to be normalized), and is typically set as (0, 0, −1)
.
After updating the parameters, we move to the directory make-detector
and execute the command
python make-mask.py [path to frame] > run.log
to generate the file mask.dat
in the directory aux
to exclude the detector gaps and the pixels shadowed by the beamstop holder.
Here [path to frame]
is the path to one of the data frames in the cbf format.
You should expect to see a masked data frame that looks like:
The beamstop region will be masked out in the files that record the mapping of the detector pixels to reciprocal space, which are obtained by executing the commands:
gcc make-detector.c -O3 -lm -o det
./det ../config.ini >> run.log
After moving to the directory make-background
, we generate the lists of the filenames associated with each data frame using the command:
python make-filelists.py [raw-data-dir]
Here [raw-data-dir]
is the path to the directory that contains the cbf files downloaded from CXIDB.
Then we update the parameters in the [make-background]
section in config.ini
:
[make-background]
num_raw_data = 79992
hot_pix_thres = 1e4
qlen = 500
The execution of make-filelists.py
has automatically updated the value of num_raw_data
, the total number of data frames.
The parameter hot_pix_thres
is the threshold value beyond which a pixel is identified as defective and masked out.
In our analysis, we assume that the background scatter in each data frame is azimuthally symmetric about the incident X-ray beam, and qlen
represents the number of bins that divide the spatial frequency magnitudes with equal spacing for the background estimation.
Finally, we execute the commands:
make
mpirun -np [nproc] ./ave_bg ../config.ini > run.log &
to estimate the pixel-wise background values and identify the outlier pixels in each frame, where [nproc]
is the number of processors used in the parallel processing.
Next, we move to the directory make-powder
to estimate the lattice parameters.
The parameters in the [make-powder]
section in config.ini
are:
[make-powder]
min_patch_sz = 2
max_patch_sz = 10
min_num_peak = 3
max_num_peak = 20
A Bragg peak candidate is assumed to contain at least min_patch_sz
but no more than max_patch_sz
contiguous outlier pixels identified from the diffuse background scatter.
Only the data frames with at least min_num_peak
but no more than max_num_peak
candidate peaks are kept for the later analysis.
The enforcement of data sparsity can be removed by making max_num_peak
a large integer.
By executing the commands
gcc make-powder.c -O3 -lm -o powder
./powder ../config.ini > run.log
we generate the files frame-peak-count.dat
, patch-sz-count.dat
, 1d-pseudo-powder.dat
and 2d-pseudo-powder.dat
.
The number of candidate peaks in each data frame is recorded in frame-peak-count.dat
.
The file patch-sz-count.dat
represents the histogram of the size of contiguous outlier pixels, from which we can check if the original choice of max_patch_sz
is reasonable.
The file 1d-pseudo-powder.dat
contains three columns: the spatial frequency magnitudes, the counts of inter-peak distances in reciprocal space in each frame, and the counts of spatial frequency magnitudes of the candidate peaks.
Finally, the file 2d-pseudo-powder.dat
records the maximum photon count in each detector pixel.
In the analysis of the test dataset, we fit the lattice parameters a = 79.1 Å
and c = 38.4 Å
by assuming a primitive tetragonal lattice.
This choice can be assessed by executing the command
python plot-1d-powder.py
to plot the histograms of the inter-peak distances and the spatial frequency magnitudes of the candidate peaks. For general crystal lattices, the lattice parameters have to be estimated by fitting the histogram of the inter-peak distances. By executing the command
python plot-2d-powder.py
we plot the 2D pseudo-powder pattern to check if the original estimates of the parameters (cx, cy)
, the beam incidence point on the detector, and (sx, sy, sz)
, the beam incidence direction, are reasonable.
The whole data processing from Mapping detector pixels to here should be rerun if these parameters have to be changed.
The values of the estimated lattice parameters are stored as a 3×3 matrix
u[0] v[0] w[0]
u[1] v[1] w[1]
u[2] v[2] w[2]
in the file basis-vec.dat
in the directory aux
, where u
, v
and w
denote the basis vectors of the primitive unit cell in units of Å.
This file should be created by the user for general crystal lattices.
Here we narrow down the number of probable orientations for each frame by directly rotating the centroids of the candidate peaks over all rotation samples in reciprocal space.
An orientation is kept for a particular frame if at least min_num_peak
candidate peaks overlap with the predicted Bragg peaks.
In the [orient-peak]
section in config.ini
we have the parameters
[orient-peak]
res_cutoff = 4.0
VN = 15
gw = 2.0
The parameter res_cutoff
specifies the highest resolution of data in unit of Å that will be used in determining the probable orientations and the low-resolution EMC reconstruction.
The parameter VN
denotes the number of voxels between the closest Bragg peaks in reciprocal space, and gw
defines the radius of a Bragg peak in unit of voxel.
After updating the parameters, we move to the directory make-quaternion
to generate the rotation samples with the command
python make-rot-samples.py [num_div]
The integer [num_div]
specifies the angular resolution by δθ = 0.944/[num_div]
.
An angular resolution of at least 2gw·res_cutoff·min_rcell/VN
is required in order to not to miss any Bragg peaks.
Here min_rcell
denotes the minimum peak distance in reciprocal space, with unit of 1/Å.
For the test dataset, we have min_rcell = 1/a
, and the resulting angular resolution corresponds to [num_div] = 70
.
This command creates the file c-quaternion[num_div].bin
in the directory aux
.
Finally, we move to the directory orient-peak
, and execute the commands
mpicc mpi-sync-orient-peak.c -O3 -lm -o orient
mpirun -np [nproc] ./orient ../config.ini > run.log &
to find the probable orientations for each frame, where [nproc]
is the number of processors to be used.
The output file num_prob_orien.dat
records the number of probable orientations for each data frame.
We then move to the directory reduce-data
to convert data to the format that will be used by the EMC reconstruction.
In the [reduce-data]
section of config.ini
, we have the parameters:
[orient-peak]
nproc = 20
mpi_bgfile = [reduced-data-dir]/Data/mpi-bg_model.bin
mpi_datafile = [reduced-data-dir]/Data/mpi-datafile.bin
Here nproc
is the number of processors that will be used for the EMC reconstruction, and [reduced-data-dir]
is the directory we used in Initialization.
The files mpi_bgfile
and mpi_datafile
store the background estimates and photon counts of the data frames that will be input to the EMC algorithm, respectively.
In order to reduce the time spent on reading data, the photon counts are stored as short integers.
The frames with more than max_num_peak
identified peaks or no probable orientations found in Finding probable orientations will be excluded.
We generate mpi_bgfile
and other auxiliary files by executing the commands
gcc reduce-data.c -O3 -lm -o reduce-data
./reduce-data ../config.ini > run.log
The order of the frames is rearranged to balance the work loads between the nproc
processors based on the number of probable orientations per frame, and this information is stored in the file reduced-data_id.dat
.
The file mpi_datafile
is generated using the commands
make
./wr-data ../config.ini >> run.log &
Since the crystal diffraction signals are concentrated in the Bragg spots, we can speed up the expand (E) step of the EMC reconstruction by precalculating a look-up table that records the mapping between the Bragg peaks and the detector pixels over all rotation samples.
This can be done by moving to the directory make-Ematrix
and executing the commands
mpicc mpi-make-Emat.c -O3 -lm -o emat
mpirun -np [nproc] ./emat -low ../config.ini > run.log &
Here [nproc]
is the number of processors to be used.
The mapping is stored in the files r2peak_file
and peak2r_file
, whose locations are specified in the [make-Ematrix]
section of config.ini
.
For those who would like to try an EMC reconstruction immediately, we have provided the reduced data generated from the test dataset following the data processing procedures described above.
After creating the directory skip-data-reduction
, the reduced data can be downloaded from the website
http://cxidb.org/data/82/reduced-data
The downloaded files are moved to their appropriate locations by executing the command
python distribute-files.py [work-dir] > dist.log &
to get ready for the EMC reconstruction.
Here [work-dir]
denotes the path to the working directory, EMC-for-SMX
.