Switch to GSON #12

JulienPeloton · 2024-12-06T23:31:14Z

This PR introduces some performance improvements:

gson.Gson is more stable than json.JSONObject -- especially for handling special characters

Medium object (~10 alerts) remain dominated by data manipulation

0.00 seconds - /home/almalinux/fink-object-api/apps/utils/utils.py:52 - download_cutout
  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:69 - create_or_update_hbase_table
  0.01 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:149 - hbase_to_dict
  0.02 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:177 - extract_rate_and_color
  0.18 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:42 - format_hbase_output
  0.18 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:28 - connect_to_hbase_table
  0.48 seconds - /home/almalinux/fink-object-api/apps/routes/objects/utils.py:24 - extract_object_data

What could be optimised: with_constellation (60ms), extract_fink_classification_ (17ms), convert_datatype (40ms) -- but I note that gateway.jvm.com.Lomikel.HBaser.HBaseClient peaks at 130ms (difficult to beat that). Large objects (>1000 alerts) starts to feel HBase data transfer:

  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/utils.py:52 - download_cutout
  0.00 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:69 - create_or_update_hbase_table
  0.03 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:179 - extract_rate_and_color
  0.16 seconds - /home/almalinux/fink-object-api/apps/utils/client.py:28 - connect_to_hbase_table
  0.38 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:149 - hbase_to_dict
  0.52 seconds - /home/almalinux/fink-object-api/apps/utils/decoding.py:42 - format_hbase_output
  1.71 seconds - /home/almalinux/fink-object-api/apps/routes/objects/utils.py:24 - extract_object_data

Main route performance

The main route performance for a medium size object (14 alerts, about 130 columns):

request	time (second)
Lightcurve data (3 cols)	0.1
Lightcurve data (130 cols)	0.3
Lightcurve & 1 cutout data	3.4
Lightcurve & 3 cutout data	5.4

Requesting cutouts is costly! We have 14 alerts, which is about 0.25 second per cutout. Note that requesting 3 cutouts is faster then 3 times 1 cutout, as what drives the cost is to load the full block in HDFS in memory (see this discussion about the strategy behind).

Note that for lightcurve data, the time is fortunately not linear with the number of alerts per object:

request	time (second)
Lightcurve data (33 alerts, 130 cols)	0.3
Lightcurve data (1575 alerts, 130 cols)	1.8

JulienPeloton added 4 commits December 6, 2024 21:17

Do not escape by default. Closes #11.

efc39bb

Update README

19ad359

Add gson JAR

bbdb132

Swith to GSON to decode TreeMap

25a3da1

JulienPeloton added the performance label Dec 6, 2024

JulienPeloton mentioned this pull request Dec 6, 2024

bug: default JSON decoding should not use string #11

Closed

JulienPeloton merged commit 9a20e21 into main Dec 6, 2024
2 checks passed

JulienPeloton mentioned this pull request Dec 6, 2024

Profiling: /objects with long variables #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to GSON #12

Switch to GSON #12

JulienPeloton commented Dec 6, 2024

Switch to GSON #12

Switch to GSON #12

Conversation

JulienPeloton commented Dec 6, 2024

Main route performance