Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to GSON #12

Merged
merged 4 commits into from
Dec 6, 2024
Merged

Switch to GSON #12

merged 4 commits into from
Dec 6, 2024

Conversation

JulienPeloton
Copy link
Member

This PR introduces some performance improvements:

  • gson.Gson is more stable than json.JSONObject -- especially for handling special characters

Medium object (~10 alerts) remain dominated by data manipulation

0.00 seconds - /home/almalinux/fink-object-api/apps/utils/utils.py:52 - download_cutout
  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:69 - create_or_update_hbase_table
  0.01 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:149 - hbase_to_dict
  0.02 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:177 - extract_rate_and_color
  0.18 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:42 - format_hbase_output
  0.18 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:28 - connect_to_hbase_table
  0.48 seconds - /home/almalinux/fink-object-api/apps/routes/objects/utils.py:24 - extract_object_data

What could be optimised: with_constellation (60ms), extract_fink_classification_ (17ms), convert_datatype (40ms) -- but I note that gateway.jvm.com.Lomikel.HBaser.HBaseClient peaks at 130ms (difficult to beat that). Large objects (>1000 alerts) starts to feel HBase data transfer:

  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/utils.py:52 - download_cutout
  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:69 - create_or_update_hbase_table
  0.03 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:179 - extract_rate_and_color
  0.16 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:28 - connect_to_hbase_table
  0.38 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:149 - hbase_to_dict
  0.52 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:42 - format_hbase_output
  1.71 seconds - /home/almalinux/fink-object-api/apps/routes/objects/utils.py:24 - extract_object_data

Main route performance

The main route performance for a medium size object (14 alerts, about 130 columns):

request time (second)
Lightcurve data (3 cols) 0.1
Lightcurve data (130 cols) 0.3
Lightcurve & 1 cutout data 3.4
Lightcurve & 3 cutout data 5.4

Requesting cutouts is costly! We have 14 alerts, which is about 0.25 second per cutout. Note that requesting 3 cutouts is faster then 3 times 1 cutout, as what drives the cost is to load the full block in HDFS in memory (see this discussion about the strategy behind).

Note that for lightcurve data, the time is fortunately not linear with the number of alerts per object:

request time (second)
Lightcurve data (33 alerts, 130 cols) 0.3
Lightcurve data (1575 alerts, 130 cols) 1.8

@JulienPeloton JulienPeloton merged commit 9a20e21 into main Dec 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant