diff --git a/theme4/XS100/data-collection.md b/theme4/XS100/data-collection.md index 8b11a29..a18826c 100644 --- a/theme4/XS100/data-collection.md +++ b/theme4/XS100/data-collection.md @@ -10,18 +10,42 @@ The Cornell High Energy Synchrotron Source (CHESS) is currently home to 7 experi ## Anatomy of an experiment -*insert experiment timeline* ADD: emphasize some experiments start months before the beamtime, some start at the beamtime, etc. ADD: Section on expectations for planning is beamline/technique specfic. Includes everything from software trainings, input data points, scan strategies, to bringing your device/equipment to the beamtime - record keeping and provenance of data typically starts before the beamtime. +exptimeline -What is the users responsibility to record and what is CHESS's responsibility. +Most experiments start well in advance of the awarded beamtime - and so does the computing and cyber-infrastructure needs. -stationcomputer +While CHESS provides state-of-the-art hardware, software, computing resources and trainings, **users are responsible for the integrity of their experiment** through thoughtful planning, experimental execution, and **data handling and analysis**. This includes maintaining best practices in experimental logs, metadata tracking, and recording of researcher decisions. Due to the nature of synchrotron experiments, the data integrity and intepretabilty - even within one research group - will be dependent on the practices adpoted by the research group. + +### Beamtime Notes and Experimental Logbook: + +It is always the responsibility of the experimenter to take **detailed** beamtime notes and a log of the data. Although work is ongoing to integrate metadata and capture requisite information in the data itself, automate workflows, visualizations, etc., it is imperative that experimenter notes are taken and ideally a copy is kept with the data on the CHESS system. + +**Joint Experimenter Notes** + +Often you will have a team of researchers taking data together - it is typically best practice to keep a *collaborative* log. In addition to your team - you should share these notes with your beamline scientist - they can often provide useful details you may miss if they observe some important or irregular behavior about the instrument itself that may or may not be obvious from the metadata streams. + +Remember - your beamline scientist is not responsible for memorizing the history of your data collected. They see so many experiments, if you ask 2 years later if they rememeber what *you* did - good luck. + +For recording notes during the beamtime, we recommend using plain text or Markdown language (formatted text file) because it is easy to read in many systems, rather than a Microsoft Word Document which has a proprietary format. Images can be rendered in markdown formats. + +**Co-Locate Your Beamtime Notes with Data** + +Save your beamtime notes or a copy of your beamtime notes or (link to your google doc / equivalent) on the CHESS filesystem with your data (your beamline scientist will tell you where the best place is). + +**Leverage MetaData Services** -*insert image of "station computer" connected to "controls racks" connected to "experimental hutch" connected to DAQ and File system and Compute Farm* +Whenever possible, we encourage users to leverage the metadata services, beamline-specific software strategies (e.g. adding metadata to image frame headers), piping unique signals through software or hardware signals - saving these metadata *with* the data at the time of data collection. For unique aspects of your experiment, it is important to identify useful metadata to have associated with the x-ray data *ahead* of the beamtime. If signals are required to be monitored, this may need to be arranged for two weeks in advanced (see Bring Your Own Device ([BYOD](#bring-your-own-device-byod)). -#### Station Computer : Beamline Control Central +**Commenting Software** + +Comment any code produced at the beamtime. *If this code was used to make decisions about the experiment, it should be saved and referred to in your experimental log.* + +### Station Computer : Beamline Control Central Every experimental station has a **station computer** that acts as controls central. The **station computer** typically runs a number of processes and is responsible for orchestrating data collection, motor motions, synchronized triggers, metadata logging, and more. +stationcomputer + **Station Computer** ground rules: - Most users directly interact with the station computer. Users associated with the beamtime will have permission to log directly into the station computer remotely through No Machine (LINK to CLASSE). Your beamline scientist will be training you in how to run *your* experiment - pay close attention and take ownership of your role during your beamtime. Your beamline scientist may only be training your group at a specific time(s) during the beamtime - make sure all users can be present during this training and/or take notes and be prepared to train your fellow users on the basic operations. When in doubt, always communicate with your staff scientist. Some processes staff scientists will insist that the user be trained by the scientist and not a fellow user. @@ -30,9 +54,9 @@ Every experimental station has a **station computer** that acts as controls cent - The station computer has many special permissions, for instance it is able to *write* to the the CHESS DAQ (**raw** directory). When saving files such as beamtime notes, it is important to save these in the directories prescribed by your beamline scientist. (link to later section on CHESS file system and directories) -*insert image of annotated "station computer" screen that has many many windows opened with difference processes for data collection* +annotatedstationcomputer -This is an example of a station computer screen shot with many processes. There are 4 desktops on the station computer, each with windows spanning 4 screens. This image is of the first desktop and shows a main controls terminal (SPEC), controls screens (MEDM screens), a data reduction GUI (HEXRD), realtime DIC (running from Python shell on external computer), a LabVIEW log of metadata signals from the instrument in the hutch, and many more. +This is an example of a station computer screen shot with many processes. There are 4 desktops on the station computer, each with windows spanning 4 screens. This image is of the first desktop and shows a main controls terminal (SPEC), controls screens (MEDM screens), a data reduction GUI (HEXRD). There will typically be even more processes running than this. Every beamline will have a unique version of this computer - some techniques even may be executed exclusively through a GUI. @@ -42,39 +66,36 @@ This section will discuss the hardware connections, motor configurations, and ov controls - -*insert image of "station computer" connected to "controls racks" connected to "experimental hutch" connected to DAQ and File system and Compute Farm - this time the image is highlighting the connections to the hardware pieces* - #### Controls Software -ADD: Preamble here about controls software- Include the difference between SPEC and EPICS. You can control Epics Devices from SPEC. +There are many controls languages and strategies across the lab. The two most common cases are **SPEC** and **EPICS** which will briefly be introduced here. Python-based controls are also very common. **SPEC** -Introduce SPEC as the main controls for many beamlines. -Give examples of standard SPEC commands that will be used. Importantly, only ever use SPEC or edit Macros with the explicit permission of your staff scientist. This may vary from beamline to beamline +SPEC is a language, loosely based on C, used for instrument control and data acquisition at many synchrotrons. + +When using spec, you will interact with a SPEC terminal and run a combination of “standard” SPEC commands and/or a series of compiled programs for your technique + +Importantly, only ever use SPEC or edit Macros with the explicit permission of your staff scientist. This may vary from beamline to beamline SPEC commands continued. -*mention spec built in functionality & spec.log files* +Below is a video of a SPEC command window with built-in and custom macros. -*insert gif/video screen grab of operating SPEC* + **EPICS** -ADD: Epics will cover drivers and PVs. +EPICS is a set of software tools and applications which provide a software infrastructure for use in building distributed control systems to operate devices -Introduce EPICS as the drivers for many of the detectors and instrumentation in the lab -Show example of EPICS drivers. You may or may not be expected to interact with these drivers at your beamline +Many of the devices at CHESS leverage EPICS drivers for operation EPICS PVs (process variables) are commonly used for signal monitoring and metadata/data logging. -Many important metadata signals can also be tracked using "EPICS PVs." While many of these PVs are used throughout data collection, they can also be an important part of data monitoring. Your beamline scientist may have you observe the monitoring page depending on your experiments sensitivity to certain signals to monitor specific station signals (link to signals.chess.cornell.edu). +Many important metadata signals can also be tracked using "EPICS PVs." While many of these PVs (process variables) are used throughout data collection, they can also be an important part of data monitoring. Your beamline scientist may have you observe the monitoring page depending on your experiments sensitivity to certain signals to monitor specific station signals (link to signals.chess.cornell.edu). -*insert image of an epics MEDM for detector example* +An Epics MEDM (Motif Editor and Display Manager) Screen for a Detector is shown in the annotated station view. **PYMCA** -*insert image of pymca* A common GUI used at the beamline is the PyMCA GUI. In addition to it's original use for XRF, this GUI can be used to load in spec.log data and plot the counters at your beamline. - -*gif/video of opening a spec.log, plotting counters, and fitting a function to the plot* +A common GUI used at the beamline is the PyMCA GUI. In addition to it's original use for XRF, this GUI can be used to load in spec.log data and plot the counters at your beamline. **Python, MATLAB, etc.** @@ -82,15 +103,7 @@ You may have specific other software: Python scripts (link to python tutorial), ## Networks and Filesystems -network - -*insert image of "station computer" connected to "controls racks" connected to "experimental hutch" connected to DAQ and File system and Compute Farm - this time the image is highlighting the connection to the DAQ* - -The primary networks at the experimental stations are: -- CHESS-DAQ Network -- Isolated Station Network -- CHESS Public -- CLASSE Public +network During data collection, raw data is written directly to the **CHESS-DAQ** filesystems. The CHESS-DAQ consists of approximately 2 petabytes of dedicated online storage arrays connected to the CHESS experimental stations through a high-speed 10Gb data collection network. @@ -98,30 +111,67 @@ To protect the communication signals between the station computer, experimental Detectors often have direct fiber optic / high speed data lines to inline computing resources and/or the CHESS-DAQ. -CHESS filesystem overview and where data is saved. +filesystem -#### Detectors and Data Handling -expstation +The CHESS filesystem has different locations for storing raw data, reduced data, etc. These different locations have different backup schedules and total storage amounts. Typically best practice is as follows: +1. Raw Data that cannot be reproduced is located in RAW/DAQ +2. Reduced Data that *can* be reproduced from Raw Data/Other Protected Data is in REDUCED DATA +3. Metadata that is small and not reproducible should be saved in METADATA (backed up nightly) +4. Data that is being produced and does not need to be backed up and could be processed again should be done in SCRATCH. This is a good location for testing code before performing Data Reduction. +5. For Data NOT associated with a particular beamtime, USER is an appropriate place for these projects. + +### Protected Data: Intellectual Property (IP) and Export Control + +Some data needs to be protected, e.g. data covered under Intellectual Property or Export Control agreements +All such data must be declared and all agreements signed before ANY data is on CHESS/Cornell computing systems (including preparatory material that falls under IP or Export Control categories). -If you wish to move any data from the CHESS filesystem to another location, the preferred way of doing so is through Globus. Please see here (LINK CHESS computing) for directions on ways to transfer data from the CHESS filesystem. +Data Collection, Storage, and Analysis can be customized to comply with data agreements, including: +- Modifying isolated networks +- Mounting encrypted drives +- Configuring encrypted computers +- Modifying permissions on filesystem locations +- Securing the experimental station with an entry password +- Disconnecting streaming video to the experimental station + +### Detectors and Data Handling +expstation + +If you wish to move any data from the CHESS filesystem to another location, the preferred way of doing so is through Globus. Please see here (https://wiki.classe.cornell.edu/Computing/GlobusDataTransfer) for directions on ways to transfer data from the CHESS filesystem. Your beamline may be producing very large quantities of data. Due to it's size, you may not be able to take your data home or transfer it home via globus. your data in raw may only stay in hot storage for a short amount of time (6 months). Your experimental station will have best practices for how to compress or reduce this data so that it is small enough to take home or live in a different part of our filesystem. +datastorage -All data is currently saved at CHESS. The data that is living in cold storage can be restored to hot storage if needed - the process for this is located here (LINK CHESS Computing). #### Bring Your Own Device (BYOD) +Users may need to bring their own devices to be beamline - either physically in the lab or remotely connected to the CHESS networks + +Examples include: +- Controls computer for equipment they have integrated for an experiment +- Analysis computer for on-the-fly analysis + +All devices must be approved at least two weeks in advance. It may not be possible to consider integration on a shorter time period. + Because the CHESS-DAQ filesystems are a critical resource for data collection, *write access* is only granted to registered devices on the CHESS-DAQ network. If you wish to bring your own device to write data to the CHESS-DAQ, please discuss your needs with your staff scientist at least one month before your beamtime. Before your device can be registered on the CHESS-DAQ, it must undergo a cybersecurity evaluation by CLASSE-IT. *Read access* to the CHESS-DAQ filesystem may be obtained by registering your device for the LNS Protected network using [this request form](https://wiki.classe.cornell.edu/Computing/LaptopRegistration). #### MetaData Handling -Metadata Considerations: -There will be many parallel datastreams being collected - critical to interpreting your data. These may be located in many different locations. Introduce EPICS IOCs, spec.logs, other files. These will be critical to your data reduction. +Ideally, all the data necessary to **fully reproduce your results** are recorded and disseminated in a manner that others can interpret after your experiment. Ideally the provenance remains unbroken from experiment planning. + +Metadata and parallel data streams are generated at every stage of your experiment. CHESS is continuing to develop and implement services to help with this creation. From programs like Galaxy, to our Metadata service - + +The **metadata service** (https://wiki.classe.cornell.edu/bin/viewauth/CHESS/Private/CHESSMetadataService) provides tools to record and automatically ingest machine-readable metadata in a systematic way. It includes variables that historically were not recorded via a second data stream (e.g. the material processing parameters). + + #### On-the-fly Data Processing & Visualization @@ -157,11 +207,4 @@ CHAP pipelines can be executed from a Linux command line or from the Galaxy scie **Technique/Beamline Specific Software** -#### Beamtime Notes: - -Last and potentially the most important part: It is always the responsibility of the experimenter to take DETAILED beamtime notes and a log of the data. Although work is ongoing to integrate metadata and capture requisite information in the data itself, automate workflows, visualizations, etc., it is imperative that experimenter notes are taken and ideally a copy is kept with the data on the CHESS system. - -Your beamline scientist is not responsible for memorizing the history of your data collected. They see so many experiments, if you ask 2 years later if they rememeber what *you* did - good luck. -Example Problem: -- End Walk through a version of taking and collecting a dataset that will be done in the hands-on portion in the afternoon. diff --git a/theme4/XS100/xs100-figures/AnnotatedStationComputer.png b/theme4/XS100/xs100-figures/AnnotatedStationComputer.png new file mode 100644 index 0000000..bc3accf Binary files /dev/null and b/theme4/XS100/xs100-figures/AnnotatedStationComputer.png differ diff --git a/theme4/XS100/xs100-figures/CHESSFileSystem.png b/theme4/XS100/xs100-figures/CHESSFileSystem.png new file mode 100644 index 0000000..fd1196a Binary files /dev/null and b/theme4/XS100/xs100-figures/CHESSFileSystem.png differ diff --git a/theme4/XS100/xs100-figures/ExperimentTimeline.png b/theme4/XS100/xs100-figures/ExperimentTimeline.png new file mode 100644 index 0000000..abc5720 Binary files /dev/null and b/theme4/XS100/xs100-figures/ExperimentTimeline.png differ diff --git a/theme4/XS100/xs100-figures/HotWarmColdStorage.png b/theme4/XS100/xs100-figures/HotWarmColdStorage.png new file mode 100644 index 0000000..c873d60 Binary files /dev/null and b/theme4/XS100/xs100-figures/HotWarmColdStorage.png differ diff --git a/theme4/XS100/xs100-figures/MetadataService.png b/theme4/XS100/xs100-figures/MetadataService.png new file mode 100644 index 0000000..d0abcad Binary files /dev/null and b/theme4/XS100/xs100-figures/MetadataService.png differ diff --git a/theme4/XS100/xs100-figures/XCITE_SPEC.mp4 b/theme4/XS100/xs100-figures/XCITE_SPEC.mp4 new file mode 100644 index 0000000..8712d03 Binary files /dev/null and b/theme4/XS100/xs100-figures/XCITE_SPEC.mp4 differ diff --git a/theme4/XS100/xs100-figures/chessNetworks.png b/theme4/XS100/xs100-figures/chessNetworks.png new file mode 100644 index 0000000..ccda6bd Binary files /dev/null and b/theme4/XS100/xs100-figures/chessNetworks.png differ