Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Data handler to save data #188

Merged
merged 28 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9093c5d
add functionality to select new data folder based on idx
nulinspiratie Feb 20, 2024
cdbd19e
started adding save_data
nulinspiratie Feb 21, 2024
5e5957e
basic data handler
nulinspiratie Feb 21, 2024
36d40d4
add numpy array processor
nulinspiratie Feb 22, 2024
9b90b0e
add xarray data handler
nulinspiratie Feb 22, 2024
2c61e3a
working tests, added init
nulinspiratie Feb 22, 2024
d4e7be9
add DataHandler.path
nulinspiratie Feb 22, 2024
695ca21
docs + small changes
nulinspiratie Feb 22, 2024
6e7556a
Added initialization name
nulinspiratie Feb 23, 2024
46f3681
Proper sorting of data folders
nulinspiratie Feb 23, 2024
6cbc66e
add documentation
nulinspiratie Feb 25, 2024
4cad629
add optional xarray
nulinspiratie Feb 25, 2024
c452b7a
lower min xarray version
nulinspiratie Feb 25, 2024
087388f
add xarray as to poetry extras
nulinspiratie Feb 25, 2024
798ce63
modify workflow to allow xarray
nulinspiratie Feb 25, 2024
5af5273
remove underscore for workflow
nulinspiratie Feb 25, 2024
8d52ad9
update lock file
nulinspiratie Feb 25, 2024
84a6b4e
remove min_size numpy array
nulinspiratie Feb 25, 2024
c8ee97a
add test xarray skip if not installed
nulinspiratie Feb 25, 2024
7d620b4
Reduce performance test duration
nulinspiratie Feb 25, 2024
bc07ff1
added `additional_files`
nulinspiratie Feb 25, 2024
2bbef32
black formatting
nulinspiratie Feb 25, 2024
99c1853
added info on auto using filename as name
nulinspiratie Feb 25, 2024
c5ed718
Update changelog and readme
nulinspiratie Feb 26, 2024
9bd4419
Fix attempt: windows \ to /
nulinspiratie Feb 26, 2024
47e308a
fix: create create_data without creating
nulinspiratie Feb 26, 2024
3f22f64
fix import pathlib
nulinspiratie Feb 26, 2024
a9e14ea
Allow multiple saves
nulinspiratie Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions qualang_tools/results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,56 @@ for i in range(len(freqs_external)): # Loop over the LO frequencies
# Process and plot the results
...
```


## Data handler
The `DataHandler` is used to easily save data once a measurement has been performed.
It saves data into an automatically generated folder with folder structure:
`{root_data_folder}/%Y-%m-%d/#{idx}_{name}_%H%M%S`.
- `root_data_folder` is the root folder for all data, defined once at the start
- `%Y-%m-%d`: All datasets are first ordered by date
- `{idx}`: Datasets are identified by an incrementer (starting at `#1`).
nulinspiratie marked this conversation as resolved.
Show resolved Hide resolved
Whenever a save is performed, the index of the last saved dataset is determined and
increased by 1.
- `name`: Each data folder has a name
nulinspiratie marked this conversation as resolved.
Show resolved Hide resolved
- `%H%M%S`: The time is also specified.
This structure can be changed in `DataHandler.folder_structure`.

Data is generally saved using the command `data_handler.save_data("msmt_name", data)`,
where `data` is a dictionary.
The data is saved to the json file `data.json` in the data folder, but nonserialisable
types are saved into separate files. The following nonserialisable types are currently
supported:
- Matplotlib figures
- Numpy arrays
- Xarrays

### Usage example
```python
# Assume a measurement has been performed, and all results are collected here
data = {
"T1": 5e-6,
"T1_figure": plt.figure(),
"IQ_array": np.array([[1, 2, 3], [4, 5, 6]])
}

# Initialize the DataHandler
data_handler = DataHandler(root_data_folder="C:/data")

# Save results
data_folder = data_handler.save_data("T1_measurement", data=data)
nulinspiratie marked this conversation as resolved.
Show resolved Hide resolved
print(data_folder)
# C:/data/2024-02-24/#152_T1_measurement_095214
# This assumes the save was performed at 2024-02-24 at 09:52:14
```
After calling `data_handler.save_data()`, three files are created in `data_folder`:
- `T1_figure.png`
- `arrays.npz` containing all the numpy arrays
- `data.json` which contains:
```
{
"T1": 5e-06,
"T1_figure": "./T1_figure.png",
"IQ_array": "./arrays.npz#IQ_array"
}
```
nulinspiratie marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 3 additions & 1 deletion qualang_tools/results/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@
from qualang_tools.results.results import progress_counter
from qualang_tools.results.results import wait_until_job_is_paused

__all__ = ["fetching_tool", "progress_counter", "wait_until_job_is_paused"]
from qualang_tools.results.data_handler import DataHandler, data_processors

__all__ = ["fetching_tool", "progress_counter", "wait_until_job_is_paused", "DataHandler", "data_processors"]
6 changes: 6 additions & 0 deletions qualang_tools/results/data_handler/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .data_folder_tools import *
from . import data_processors
from .data_processors import DEFAULT_DATA_PROCESSORS
from .data_handler import *

__all__ = [*data_folder_tools.__all__, data_processors, DEFAULT_DATA_PROCESSORS, *data_handler.__all__]
197 changes: 197 additions & 0 deletions qualang_tools/results/data_handler/data_folder_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
from pathlib import Path
from typing import Dict, Union, Optional
import re
from datetime import datetime


__all__ = ["DEFAULT_FOLDER_PATTERN", "extract_data_folder_properties", "get_latest_data_folder", "create_data_folder"]


DEFAULT_FOLDER_PATTERN = "%Y-%m-%d/#{idx}_{name}_%H%M%S"


def _validate_datetime(datetime_str: str, datetime_format: str) -> bool:
"""Validate a datetime string with a given format.

:param datetime_str: The datetime string to validate.
:param datetime_format: The format of the datetime string.
:return: True if the datetime string is valid, False otherwise.
"""
try:
datetime.strptime(datetime_str, datetime_format)
except ValueError:
return False
return True


def extract_data_folder_properties(
data_folder: Path, pattern: str = DEFAULT_FOLDER_PATTERN, root_data_folder: Path = None
) -> Optional[Dict[str, Union[str, int]]]:
"""Extract properties from a data folder.

:param data_folder: The data folder to extract properties from.
:param pattern: The pattern to extract the properties from, e.g. "#{idx}_{name}_%H%M%S".
:param root_data_folder: The root data folder to extract the relative path from.
If not provided, "relative_path" is not included in the properties.

:return: A dictionary with the extracted properties.
Dictionary keys:
- idx: The index of the data folder.
- name: The name of the data folder.
- datetime attributes "year", "month", "day", "hour", "minute", "second".
- path: The absolute path of the data folder.
- relative_path: The relative path of the data folder w.r.t the root_data_folder.
"""
pattern = pattern.replace("{idx}", r"(?P<idx>\d+)")
pattern = pattern.replace("{name}", r"(?P<name>\w+)")
pattern = pattern.replace("%Y", r"(?P<year>\d{4})")
pattern = pattern.replace("%m", r"(?P<month>\d{2})")
pattern = pattern.replace("%d", r"(?P<day>\d{2})")
pattern = pattern.replace("%H", r"(?P<hour>\d{2})")
pattern = pattern.replace("%M", r"(?P<minute>\d{2})")
pattern = pattern.replace("%S", r"(?P<second>\d{2})")

if root_data_folder is not None:
folder_path_str = str(data_folder.relative_to(root_data_folder))
else:
folder_path_str = data_folder.name

regex_match = re.match(pattern, folder_path_str)
if regex_match is None:
return None
properties = regex_match.groupdict()
properties = {key: int(value) if value.isdigit() else value for key, value in properties.items()}
properties["path"] = str(data_folder)
if root_data_folder is not None:
properties["relative_path"] = str(data_folder.relative_to(root_data_folder))
return properties


def get_latest_data_folder(
root_data_folder: Path,
folder_pattern: str = DEFAULT_FOLDER_PATTERN,
relative_path: Path = Path("."),
current_folder_pattern: str = None,
) -> Optional[Dict[str, Union[str, int]]]:
"""Get the latest data folder in a given root data folder.

Typically this is the folder within a date folder with the highest index.

:param root_data_folder: The root data folder to search for the latest data folder.
:param folder_pattern: The pattern of the data folder, e.g. "%Y-%m-%d/#{idx}_{name}_%H%M%S".
:param relative_path: The relative path to the data folder. Used for recursive calls.
:param current_folder_pattern: The current folder pattern. Used for recursive calls.
:return: A dictionary with the properties of the latest data folder.
Dictionary keys:
- idx: The index of the data folder.
- name: The name of the data folder.
- datetime attributes "year", "month", "day", "hour", "minute", "second".
- path: The absolute path of the data folder.
- relative_path: The relative path of the data folder w.r.t the root_data_folder.
"""
if isinstance(root_data_folder, str):
root_data_folder = Path(root_data_folder)

if not root_data_folder.exists():
raise NotADirectoryError(f"Root data folder {root_data_folder} does not exist.")

if current_folder_pattern is None:
current_folder_pattern = folder_pattern

current_folder_pattern, *remaining_folder_pattern = current_folder_pattern.split("/", maxsplit=1)

folder_path = root_data_folder / relative_path

if not remaining_folder_pattern:
if "{idx}" not in current_folder_pattern:
raise ValueError("The folder pattern must contain '{idx}' at the end.")
# Get the latest idx
folders = [f for f in folder_path.iterdir() if f.is_dir()]
folders = [
f for f in folders if extract_data_folder_properties(f, folder_pattern, root_data_folder=root_data_folder)
]

if not folders:
return None

latest_folder = max(folders, key=lambda f: f.name)
return extract_data_folder_properties(
data_folder=latest_folder, pattern=folder_pattern, root_data_folder=root_data_folder
)
elif "{idx}" in current_folder_pattern:
raise ValueError("The folder pattern must only contain '{idx}' in the last part.")
else:
# Filter out elements that aren't folders
folders = filter(lambda f: f.is_dir(), folder_path.iterdir())
# Filter folders that match the datetime of the current folder pattern
folders = filter(lambda f: _validate_datetime(f.name, current_folder_pattern), folders)

if not folders:
return None

# Sort folders by name (either datetime or index)
sorted_folders = sorted(folders, key=lambda f: f.name, reverse=True)

# Iterate over the folders, recursively call determine_latest_data_folder_idx
for folder in sorted_folders:
sub_folder_idx = get_latest_data_folder(
root_data_folder,
folder_pattern=folder_pattern,
current_folder_pattern=remaining_folder_pattern[0],
relative_path=relative_path / folder.name,
)
if sub_folder_idx is not None:
return sub_folder_idx
return None


def create_data_folder(
root_data_folder: Path,
name: str,
idx: Optional[int] = None,
folder_pattern: str = DEFAULT_FOLDER_PATTERN,
use_datetime: Optional[datetime] = None,
create: bool = True,
) -> Dict[str, Union[str, int]]:
"""Create a new data folder in a given root data folder.

First checks the index of the latest data folder and increments by one.

:param root_data_folder: The root data folder to create the new data folder in.
:param name: The name of the new data folder.
:param idx: The index of the new data folder. If not provided, the index is determined automatically.
:param folder_pattern: The pattern of the data folder, e.g. "%Y-%m-%d/#{idx}_{name}_%H%M%S".
:param use_datetime: The datetime to use for the folder name.
:param create: Whether to create the folder or not.
"""
if isinstance(root_data_folder, str):
root_data_folder = Path(root_data_folder)

if not root_data_folder.exists():
raise NotADirectoryError(f"Root data folder {root_data_folder} does not exist.")

if use_datetime is None:
use_datetime = datetime.now()

if idx is None:
# Determine the latest folder index and increment by one
latest_folder_properties = get_latest_data_folder(root_data_folder, folder_pattern=folder_pattern)

if latest_folder_properties is None:
# Create new folder with index 1
idx = 1
else:
idx = latest_folder_properties["idx"] + 1

relative_folder_name = folder_pattern.format(idx=idx, name=name)
relative_folder_name = use_datetime.strftime(relative_folder_name)

data_folder = root_data_folder / relative_folder_name

if data_folder.exists():
raise FileExistsError(f"Data folder {data_folder} already exists.")

if create:
data_folder.mkdir(parents=True)

return extract_data_folder_properties(data_folder, folder_pattern, root_data_folder)
112 changes: 112 additions & 0 deletions qualang_tools/results/data_handler/data_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from datetime import datetime
from pathlib import Path
import json
from typing import Any, Dict, Optional, Sequence, Union

from .data_processors import DEFAULT_DATA_PROCESSORS, DataProcessor
from .data_folder_tools import DEFAULT_FOLDER_PATTERN, create_data_folder


__all__ = ["save_data", "DataHandler"]


def save_data(
data_folder: Path,
data: Dict[str, Any],
metadata: Optional[Dict[str, Any]] = None,
data_filename: str = "data.json",
metadata_filename: str = "metadata.json",
data_processors: Sequence[DataProcessor] = (),
) -> Path:
"""Save data to a folder

:param data_folder: The folder where the data will be saved
:param data: The data to be saved
:param metadata: Metadata to be saved
:param data_filename: The filename of the data
:param metadata_filename: The filename of the metadata
:param data_processors: A list of data processors to be applied to the data
"""
if isinstance(data_folder, str):
data_folder = Path(data_folder)

if not data_folder.exists():
raise NotADirectoryError(f"Save_data: data_folder {data_folder} does not exist")

if not isinstance(data, dict):
raise TypeError("save_data: 'data' must be a dictionary")

processed_data = data.copy()
for data_processor in data_processors:
processed_data = data_processor.process(processed_data)

json_data = json.dumps(processed_data, indent=4)
(data_folder / data_filename).write_text(json_data)

if metadata is not None:
if not isinstance(metadata, dict):
raise TypeError("save_data: 'metadata' must be a dictionary")

with (data_folder / metadata_filename).open("w") as f:
json.dump(metadata, f)

for data_processor in data_processors:
data_processor.post_process(data_folder=data_folder)

return data_folder


class DataHandler:
default_data_processors = DEFAULT_DATA_PROCESSORS
root_data_folder: Path = None
folder_pattern: str = DEFAULT_FOLDER_PATTERN
data_filename: str = "data.json"
metadata_filename: str = "metadata.json"

def __init__(
self,
data_processors: Optional[Sequence[DataProcessor]] = None,
root_data_folder: Optional[Union[str, Path]] = None,
folder_pattern: Optional[str] = None,
path: Optional[Path] = None,
):
if data_processors is not None:
self.data_processors = data_processors
else:
self.data_processors = [processor() for processor in self.default_data_processors]

if root_data_folder is not None:
self.root_data_folder = root_data_folder
if folder_pattern is not None:
self.folder_pattern = folder_pattern

self.path = path
self.path_properties = None

def create_data_folder(
self, name: str, idx: Optional[int] = None, use_datetime: Optional[datetime] = None, create: bool = True
) -> Dict[str, Union[str, int]]:
"""Create a new data folder in the root data folder"""
self.path_properties = create_data_folder(
root_data_folder=self.root_data_folder,
folder_pattern=self.folder_pattern,
use_datetime=use_datetime,
name=name,
idx=idx,
create=create,
)
self.path = self.path_properties["path"]
return self.path_properties

def save_data(self, name, data, metadata=None, idx=None, use_datetime: Optional[datetime] = None):
if self.path is None:
self.create_data_folder(name, idx=idx, use_datetime=use_datetime)

return save_data(
data_folder=self.path,
data=data,
metadata=metadata,
data_filename=self.data_filename,
metadata_filename=self.metadata_filename,
data_processors=self.data_processors,
)
nulinspiratie marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading