Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43722 EFD Transform Implementation #18

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
57e1b52
Implement first version of EFD transformations
glaubervila May 27, 2024
83eadd3
Update misssing values in the config file
rcboufleur May 27, 2024
07ea890
Fix exception message for unimplemented dialect
rcboufleur May 27, 2024
b7c30cc
Refactor Aggregate class to handle column-wise operations
rcboufleur May 27, 2024
5e587ab
Refactor Aggregate class to Summary for better clarity and consistency
rcboufleur May 27, 2024
7134953
Refactor ATAOS_correctionOffsets_w function to use mean
rcboufleur May 27, 2024
5cc2231
the ExposureEfd table schema was changed to be compatible with config…
glaubervila May 27, 2024
8732968
Added VisitEFD table and fixed empty values
glaubervila May 27, 2024
ead6f10
Upsert transactions now commit every 100 rows. Docstrings were added …
glaubervila Jun 3, 2024
3816d47
Refactor configuration file loading and validation
glaubervila Jun 3, 2024
e1a94ee
Changed lsst_efd_client -> InfluxDB API for topic queries
glaubervila Jun 24, 2024
d65a16f
fixed lint
glaubervila Jun 24, 2024
f512a97
Lint fixes
rcboufleur Jun 24, 2024
7225480
Additional parameters inserted in config yml
rcboufleur Jun 24, 2024
11eba2d
Lint fixes
rcboufleur Jun 24, 2024
6fd11b4
Added Dockerfile for Efd Transform
glaubervila Jun 27, 2024
70be00e
The main command was added to the dockerfile
glaubervila Jun 27, 2024
f663262
Implement retrieval of packed time series from InfluxDB using API que…
rcboufleur Aug 12, 2024
4366644
Lint fixes
rcboufleur Aug 12, 2024
92462c8
Fixed lint
glaubervila Aug 20, 2024
45559a5
Fixed lint
glaubervila Aug 20, 2024
21dc88e
Implements new config file formats and validation
rcboufleur Aug 21, 2024
fb66a51
Implements unique topic querying to avoid duplicate queries
rcboufleur Aug 21, 2024
7c73a43
Changed base image to w_2024_33
glaubervila Aug 21, 2024
880e782
Connection to usdf_efd api now uses environment variables
glaubervila Aug 21, 2024
4c7c5ad
Added copy of sqlite test database
glaubervila Aug 21, 2024
0643536
Config changes refactored into transformations
rcboufleur Aug 26, 2024
613cdce
Config changes refactored into transformations
rcboufleur Aug 26, 2024
9d65166
Removed template envs
glaubervila Aug 26, 2024
c437b71
minor changes in dockerfile
glaubervila Aug 26, 2024
1ce8920
All files related to trasnform_efd have been moved to the same folder…
glaubervila Aug 26, 2024
95e9d04
The Dockerfile has been changed to the new file structure.
glaubervila Aug 26, 2024
bb93200
Processing warnings due to computation errors were accounted for
rcboufleur Aug 26, 2024
4abc0ba
Lint error fix
rcboufleur Aug 26, 2024
ef65871
ESS accelerometer fields were added
rcboufleur Aug 26, 2024
058855c
Segmentation of queries with large number of fields is implemented
rcboufleur Aug 26, 2024
e86bc75
Typo fix in the configuration file
rcboufleur Aug 26, 2024
4db0f53
Start date and end date parameters are now optional with default valu…
glaubervila Sep 30, 2024
7ef80f5
Updated summary functions
rcboufleur Oct 14, 2024
4aead0b
Fix in time format definitions
rcboufleur Oct 14, 2024
7f43ea4
Changed config file path
glaubervila Oct 14, 2024
635a8b5
Test postgresql consdb connection
glaubervila Oct 15, 2024
e30acba
Fixed missing schema name in transform efd insert queries
glaubervila Oct 16, 2024
294d816
Added more logs and test access to consdb tables
glaubervila Oct 16, 2024
f1049fc
Fix sqlalchemy get table from metadata
glaubervila Oct 16, 2024
90005f5
Fix comparisons of time aware indexes
rcboufleur Oct 16, 2024
0bd52d1
Fixed sqlalchemy table from metadata with sqlite
glaubervila Oct 16, 2024
284b0df
Column and topic mapping refactored
rcboufleur Oct 17, 2024
6745196
Fixed lint with pre-commit
glaubervila Oct 18, 2024
33fc5b7
Fixed lint
glaubervila Oct 18, 2024
6c2b725
Conficts resolved and merged.
rcboufleur Oct 24, 2024
2615fdb
Schema generator based on config files created
rcboufleur Oct 24, 2024
8ea745d
Temporary files update
rcboufleur Oct 24, 2024
c05ff0f
Column added to generate_schema.py
rcboufleur Oct 24, 2024
f46ce25
Column added to generate_schema
rcboufleur Oct 24, 2024
fd30ae8
A queue manager has been added to control execution by periods.
glaubervila Oct 24, 2024
9acbdab
minor fixes
glaubervila Oct 24, 2024
a4846da
Last minute value transformation included
rcboufleur Oct 24, 2024
7f80094
New config files generated
rcboufleur Oct 24, 2024
e4ce479
Updated instruments in testing files
rcboufleur Oct 25, 2024
8928f5c
Updated schema structures
rcboufleur Oct 25, 2024
c445d6c
Create new task after run all tasks
glaubervila Oct 25, 2024
ea09f56
Minor fix in create new tasks
glaubervila Oct 25, 2024
b66af91
Minor changes in run sh
glaubervila Nov 6, 2024
7a7f99d
Added permision
glaubervila Nov 6, 2024
df31088
fixed dockerfile
glaubervila Nov 6, 2024
9c38464
diferent schemas by instruments
glaubervila Nov 7, 2024
e114cce
Update to properly handle schemas by instrument
rcboufleur Nov 7, 2024
f4b416d
yaml extensions fixed
rcboufleur Nov 7, 2024
626f148
Fixed hardcoded schema for Queue manager
glaubervila Nov 7, 2024
927b852
Added timewindow to queue manager table
glaubervila Nov 8, 2024
1d7556a
Timewindow column added to the yaml files
rcboufleur Nov 8, 2024
160d265
Schema generator yaml files updated
rcboufleur Nov 8, 2024
38a8589
Test files updated
rcboufleur Nov 8, 2024
0e53e5c
Update Dockerfile.efdtransform
rcboufleur Nov 22, 2024
4fd17e3
Update Dockerfile.efdtransform
rcboufleur Nov 22, 2024
b43bc79
Merge remote-tracking branch 'origin/main' into tickets/DM-43722
rcboufleur Dec 6, 2024
92a2a4b
Updates configuration files
rcboufleur Dec 8, 2024
0cd3d38
Updates schemas yaml files
rcboufleur Dec 8, 2024
8ff1413
Implements unpivoted array support for array like results
rcboufleur Dec 8, 2024
6bc51b8
Updates test files
rcboufleur Dec 8, 2024
8ecd421
Implements new unpivoted tables
rcboufleur Dec 8, 2024
7cc9885
Updates column name in unpivoted tables
rcboufleur Dec 8, 2024
c889ec8
Rerun pre-commit
rcboufleur Dec 8, 2024
9de2cc6
Fixes error on github actions
rcboufleur Dec 8, 2024
28a900c
Update summary.py
rcboufleur Dec 8, 2024
59d2e2a
Fix column name bug unpivoted data
rcboufleur Dec 8, 2024
36fae9c
Realocates testing files
rcboufleur Dec 8, 2024
38c12fd
Fixes pytest for config_model
rcboufleur Dec 8, 2024
8acfdd0
Fixes bugs in processing unpivoted results
rcboufleur Dec 8, 2024
a5c1e66
Updates temporary files
rcboufleur Dec 9, 2024
84f9295
Fix trailing spaces
rcboufleur Dec 9, 2024
cbf7755
Fixes configuration files and schemas
rcboufleur Dec 9, 2024
c15a890
Update Dockerfile.efdtransform
rcboufleur Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,10 @@ jobs:
image: ${{ github.repository }}-pq
github_token: ${{ secrets.GITHUB_TOKEN }}
dockerfile: Dockerfile.pqserver

- name: Build efdtransform
uses: lsst-sqre/build-and-push-to-ghcr@v1
with:
image: ${{ github.repository }}-efdtransform
github_token: ${{ secrets.GITHUB_TOKEN }}
dockerfile: Dockerfile.efdtransform
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ repos:
# supported by your project here, or alternatively use
# pre-commit's default_language_version, see
# https://pre-commit.com/#top_level-default_language_version
language_version: python3.12
language_version: python3.11
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
Expand Down
15 changes: 15 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"editor.formatOnSave": true,
"[python]": {
"editor.tabSize": 4,
"editor.rulers": [
79,
110
],
},
"python.analysis.extraPaths": [
"./python",
"./python",
"./python/lsst"
]
}
51 changes: 51 additions & 0 deletions Dockerfile.efdtransform
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
ARG OBS_LSST_VERSION=w_2024_47
FROM lsstsqre/centos:7-stack-lsst_distrib-${OBS_LSST_VERSION}
USER lsst
RUN source loadLSST.bash && mamba install aiokafka httpx
RUN source loadLSST.bash && pip install \
kafkit==0.2.1 \
lsst_efd_client==0.12.0

# Python code
COPY --chown=lsst:lsst python/lsst/consdb/efd_transform ./consdb/efd_transform

RUN mkdir data

# TODO: SQLITE TEST DATABASE
COPY --chown=lsst:lsst tmp/efd_transform/LATISS.db ./data/
COPY --chown=lsst:lsst tmp/efd_transform/LSSTComCam.db ./data/
COPY --chown=lsst:lsst tmp/efd_transform/LSSTComCamSim.db ./data/

# Environment variables that must be set:
# ------------------------------
ENV CONFIG_FILE="/opt/lsst/software/stack/consdb/efd_transform/config_LATISS.yml"
ENV INSTRUMENT="LATISS"

# Buttler Access Variables
ENV BUTLER_REPO="s3://rubin-summit-users/butler.yaml"
ENV S3_ENDPOINT_URL="https://s3dfrgw.slac.stanford.edu/"
ENV LSST_RESOURCES_S3_PROFILE_embargo="https://sdfembs3.sdf.slac.stanford.edu"

# ENV AWS_ACCESS_KEY_ID="placeholder"
# ENV AWS_SECRET_ACCESS_KEY="placeholder"
ENV PGUSER="rubin"
# ENV PGPASSWORD="placeholder"

# USDF EFD API access Variables
ENV EFD="usdf_efd"
ENV EFD_USERNAME="efdreader"
# ENV EFD_PASSWORD="placeholder"

# Consdb Transform DATABASE Variables
ENV CONSDB_URL="sqlite:////opt/lsst/software/stack/data/test.db"

# Processing time interval in minutes
ENV TIMEDELTA="5"

ENV LOG_FILE="/opt/lsst/software/stack/data/transform.log"

CMD ["bash", "-c", "source loadLSST.bash; setup lsst_distrib; python ./consdb/efd_transform/transform_efd.py -c \"$CONFIG_FILE\" -i \"$INSTRUMENT\" -r \"$BUTLER_REPO\" -d \"$CONSDB_URL\" -E \"$EFD\" -t \"$TIMEDELTA\" -l \"$LOG_FILE\""]


# Exemple of command used to execute transform_efd with docker
# docker run --rm -it --volume $PWD/data:/opt/lsst/software/stack/data -e CONFIG_FILE=config_LATISS.yaml -e ... consdb/efd_transform:latest
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ asyncio_mode = "auto"
test = ["pytest"]
dev = [
"documenteer[guide] < 2",
]
]
Empty file.
233 changes: 233 additions & 0 deletions python/lsst/consdb/efd_transform/cdb_transformed_efd_LATISS.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
name: cdb_latiss
"@id": "#cdb_latiss"
description: Transformed EFD Consolidated Database for LATISS
tables:
- name: exposure_efd
"@id": "#exposure_efd"
description: Transformed EFD topics by exposure.
primaryKey:
- "#exposure_efd.exposure_id"
- "#exposure_efd.instrument"
constraints:
- name: un_exposure_id_instrument
"@id": "#exposure_efd.un_exposure_id_instrument"
"@type": Unique
description: Ensure exposure_id is unique.
columns:
- "#exposure_efd.exposure_id"
- "#exposure_efd.instrument"
columns:
- name: exposure_id
"@id": "#exposure_efd.exposure_id"
datatype: long
description: Exposure unique ID.
- name: created_at
"@id": "#exposure_efd.created_at"
datatype: timestamp
value: 'CURRENT_TIMESTAMP'
description: Timestamp when the record was created, default is the current timestamp
- name: instrument
"@id": "#exposure_efd.instrument"
datatype: char
length: 20
description: Instrument name.

- name: exposure_efd_unpivoted
"@id": "#exposure_efd_unpivoted"
description: Unpivoted EFD exposure data.
primaryKey:
- "#exposure_efd_unpivoted.exposure_id"
- "#exposure_efd_unpivoted.property"
- "#exposure_efd_unpivoted.field"
constraints:
- name: un_exposure_property_field
"@id": "#exposure_efd_unpivoted.un_exposure_property_field"
"@type": Unique
description: Ensure the combination of exposure_id, property, and field is unique.
columns:
- "#exposure_efd_unpivoted.exposure_id"
- "#exposure_efd_unpivoted.property"
- "#exposure_efd_unpivoted.field"
columns:
- name: exposure_id
"@id": "#exposure_efd_unpivoted.exposure_id"
datatype: long
nullable: False
description: Unique identifier for the exposure
- name: property
"@id": "#exposure_efd_unpivoted.property"
datatype: string
length: 64
nullable: False
value: default_property
description: Property name for the unpivoted data
- name: field
"@id": "#exposure_efd_unpivoted.field"
datatype: string
length: 32
nullable: False
value: default_field
description: Field name for the unpivoted data
- name: value
"@id": "#exposure_efd_unpivoted.value"
datatype: float
nullable: True
description: Value corresponding to the parameter
- name: created_at
"@id": "#exposure_efd_unpivoted.created_at"
datatype: timestamp
value: 'CURRENT_TIMESTAMP'
description: Timestamp when the record was created, default is the current timestamp

- name: visit1_efd
"@id": "#visit1_efd"
description: Transformed EFD topics by visit.
primaryKey:
- "#visit1_efd.visit_id"
- "#visit1_efd.instrument"
constraints:
- name: un_visit_id_instrument
"@id": "#visit1_efd.un_visit_id_instrument"
"@type": Unique
description: Ensure visit_id is unique.
columns:
- "#visit1_efd.visit_id"
- "#visit1_efd.instrument"
columns:
- name: visit_id
"@id": "#visit1_efd.visit_id"
datatype: long
description: Visit unique ID.
- name: created_at
"@id": "#visit1_efd.created_at"
datatype: timestamp
value: 'CURRENT_TIMESTAMP'
description: Timestamp when the record was created, default is the current timestamp
- name: instrument
"@id": "#visit1_efd.instrument"
datatype: char
length: 20
description: Instrument name.

- name: visit1_efd_unpivoted
"@id": "#visit1_efd_unpivoted"
description: Unpivoted EFD visit data.
primaryKey:
- "#visit1_efd_unpivoted.visit_id"
- "#visit1_efd_unpivoted.property"
- "#visit1_efd_unpivoted.field"
constraints:
- name: un_visit_property_field
"@id": "#visit1_efd_unpivoted.un_visit_property_field"
"@type": Unique
description: Ensure the combination of visit_id, property, and field is unique.
columns:
- "#visit1_efd_unpivoted.visit_id"
- "#visit1_efd_unpivoted.property"
- "#visit1_efd_unpivoted.field"
columns:
- name: visit_id
"@id": "#visit1_efd_unpivoted.visit_id"
datatype: long
nullable: False
description: Unique identifier for the visit
- name: property
"@id": "#visit1_efd_unpivoted.property"
datatype: string
length: 64
nullable: False
value: default_property
description: Property name for the unpivoted data
- name: field
"@id": "#visit1_efd_unpivoted.field"
datatype: string
length: 32
nullable: False
value: default_field
description: Field name for the unpivoted data
- name: value
"@id": "#visit1_efd_unpivoted.value"
datatype: float
nullable: True
description: Value corresponding to the parameter
- name: created_at
"@id": "#visit1_efd_unpivoted.created_at"
datatype: timestamp
value: 'CURRENT_TIMESTAMP'
description: Timestamp when the record was created, default is the current timestamp

- name: transformed_efd_scheduler
"@id": "#transformed_efd_scheduler"
description: Transformed EFD scheduler.
primaryKey:
- "#transformed_efd_scheduler.id"
constraints:
- name: un_id
"@id": "#transformed_efd_scheduler.un_id"
"@type": Unique
description: Ensure id is unique.
columns:
- "#transformed_efd_scheduler.id"
columns:
- name: id
"@id": "#transformed_efd_scheduler.id"
datatype: int
nullable: False
autoincrement: True
description: Unique ID, auto-incremented
- name: start_time
"@id": "#transformed_efd_scheduler.start_time"
datatype: timestamp
description: Start time of the transformation interval, must be provided
- name: end_time
"@id": "#transformed_efd_scheduler.end_time"
datatype: timestamp
description: End time of the transformation interval, must be provided
- name: timewindow
"@id": "#transformed_efd_scheduler.timewindow"
datatype: int
description: Time window used to expand start and end times by, in minutes
- name: status
"@id": "#transformed_efd_scheduler.status"
datatype: char
length: 20
value: "pending"
description: Status of the process, default is 'pending'
- name: process_start_time
"@id": "#transformed_efd_scheduler.process_start_time"
datatype: timestamp
description: Timestamp when the process started
- name: process_end_time
"@id": "#transformed_efd_scheduler.process_end_time"
datatype: timestamp
description: Timestamp when the process ended
- name: process_exec_time
"@id": "#transformed_efd_scheduler.process_exec_time"
datatype: int
value: 0
description: Execution time of the process in seconds, default is 0
- name: exposures
"@id": "#transformed_efd_scheduler.exposures"
datatype: int
value: 0
description: Number of exposures processed, default is 0
- name: visits1
"@id": "#transformed_efd_scheduler.visits1"
datatype: int
value: 0
description: Number of visits recorded, default is 0
- name: retries
"@id": "#transformed_efd_scheduler.retries"
datatype: int
value: 0
description: Number of retries attempted, default is 0
- name: error
"@id": "#transformed_efd_scheduler.error"
datatype: text
description: "Error message, if any"
- name: created_at
"@id": "#transformed_efd_scheduler.created_at"
datatype: timestamp
value: 'CURRENT_TIMESTAMP'
description: Timestamp when the record was created, default is the current timestamp
Loading
Loading