Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features to HBase tables using full schema (master, anomaly) #751

Merged
merged 10 commits into from
Nov 6, 2023

Conversation

JulienPeloton
Copy link
Member

@JulienPeloton JulienPeloton commented Nov 6, 2023

IMPORTANT: Please create an issue first before opening a Pull Request.
Linked to issue(s): Closes #750

What changes were proposed in this pull request?

This PR expands the schema of objects pushed in HBase tables. Only tables receiving full schema objects are affected: ztf, ztf.anomaly. Note that as the features are struct, but HBase only accepts string or binary, we cast the features in string. In principle, this means:

>>> df = cast_features(df)
>>> df.printSchema()
DataFrame[..., lc_features_g: string, lc_features_r: string, ...]

You would then decode them using for example the json package:

>>> import json
>>> pdf = df.select('lc_features_g').toPandas()
>>> features = pdf['lc_features_g'].apply(lambda x: json.loads(x))

How was this patch tested?

CI

@JulienPeloton JulienPeloton added this to the 3.2 milestone Nov 6, 2023
@JulienPeloton
Copy link
Member Author

The Sentinel job keeps failing because raw2science reads no data (despite the fact 2 data files are seen...). Intriguing:

...
23/11/06 20:47:56 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
23/11/06 20:47:56 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
...

I've launched the test suite from within the docker container (julienpeloton/fink-ci:dev), and I got no errors...

Copy link

sonarcloud bot commented Nov 6, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@JulienPeloton
Copy link
Member Author

Merging since these errors are unlikely related to the change here.

  • Sentinel is failing while tests in the Docker container works perfectly
  • e2e workflow has timeout errors in the checking results section, likely related to helm.

@JulienPeloton JulienPeloton merged commit 0720ac4 into master Nov 6, 2023
14 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add LC features in the anomaly HBase table
1 participant