Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Add ability for indexer cache to write new schema #58414

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

john-z-yang
Copy link
Member

@john-z-yang john-z-yang commented Oct 18, 2023

This PR

  • Read the sentry option sentry-metrics.indexer.write-new-cache-namespace (set to False default), and if it's True, write to the new cache schema along with the old cache schema.

Previous change in this stack PR
#58178

Motivation

Click to expand

The indexer is making the following assumptions with Memcached:

  1. We are only adding key/value pairs to Memcached during bulk_record (the indexer’s “write path”)
  2. Key/value pairs are gone from Memcached after 2 hours

It’s very easy to break 1 or 2 inadvertently. For example, we are inserting key/value pairs back into the cache during a resolve cache miss. Doing so is justified as some strings could be more read heavy than write heavy, such as users with automated scripts that are scanning for a specific key to show up.

Background

Click to expand

How accessing the cache works today in the Indexer

Method Key exists? Behaviour
bulk_record - Use the value from cache
- Emit CACHE_READ in header
bulk_record - Fetch value from Postgres
- Emit DB_READ in header
- Write {string} → {id} to cache
resolve - Use value from cache
resolve - Fetch value from Postgres
- Write {string} → {id} to cache

where bulk_record = the code path for writing indexed strings

and resolve = the code path for querying strings

Goal

Click to expand We store a timestamp of when the cache is stored in the value, and also partition the key between the write and read path.

(writer + string) → (integer + timestamp)

How this works

  • Creates a distinction between, and handles appropriately, both the read and write path of replenishing the cache
  • Log a metric in this scenario
  • Simultaneously makes us aware of, and mitigates, unexpected cache behaviors
  • Actually fixes the issue where we were replenishing the cache on the read path
Method Key exists? Timestamp valid? Behaviour
bulk_record - Use value from cache
- Emit CACHE_READ in header
bulk_record - Fetch value from Postgres
- Emit DB_READ in header
- Write br:{string} → {id}:{timestamp} to cache
- Increment metric
bulk_record - - Fetch value from Postgres
- Emit DB_READ in header
- Write br:{string} → {id}:{timestamp} to cache
resolve - Use value from cache
resolve - Use value from cache
- Write res:{string} → {id}:{timestamp} to cache
- Increment metric
resolve - - Fetch value from Postgres
- Write res:{string} → {id}:{timestamp} to cache

Deployment Plan

Click to expand
  1. Deploy read paths that can toggle between reading from (string) → (integer) and (writer + string) → (integer + timestamp) via a sentry option
  2. [THIS PR] Deploy bulk_record changes to write to both types of key/value pairs ((string) → (integer) and (writer + string) → (integer + timestamp)) on each bulk_record lookup
  3. Toggle the option deployed in step 1 to read from (writer + string) → (integer + timestamp)
  4. (observability) Deploy logic to emit metric when timestamp that is greater than 2-3 hours
  5. (mitigation) Deploy logic to reject key/value pairs when timestamp is greater than 2-3 hours
  6. Add back the resolve path cache replenishment

Local Testing

Click to expand

Started sentry with memcache and kcat on snuba-generic-metrics

kcat -b localhost:9092 -t snuba-generic-metrics -C \
  -f '\nKey (%K bytes): %k
  Value (%S bytes): %s
  Timestamp: %T
  Partition: %p
  Offset: %o
  Headers: %h\n'

Send first batch of 3 messages with the send_metrics script

python bin/send_metrics.py --use-cases custom

Confirm that all 3 values have fetch type of FIRST_SEEN ("f")

Key (-1 bytes):
  Value (495 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"init","67563":"metric_e2e_custom_counter_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"f":{"67563":"metric_e2e_custom_counter_k_AIXH44IB"},"c":{"65555":"c:custom/custom@none"}},"use_case_id":"custom","metric_id":65555,"org_id":1,"timestamp":1697664813,"project_id":3,"type":"c","value":1,"sentry_received_timestamp":1697664813.218}
  Timestamp: 1697664834437
  Partition: 0
  Offset: 408
  Headers: mapping_sources=cfh,metric_type=c

Key (-1 bytes):
  Value (491 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"errored","67562":"metric_e2e_custom_set_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"f":{"67562":"metric_e2e_custom_set_k_AIXH44IB"},"c":{"65556":"s:custom/error@none"}},"use_case_id":"custom","metric_id":65556,"org_id":1,"timestamp":1697664813,"project_id":3,"type":"s","value":[3],"sentry_received_timestamp":1697664813.218}
  Timestamp: 1697664834437
  Partition: 0
  Offset: 409
  Headers: mapping_sources=cfh,metric_type=s

Key (-1 bytes):
  Value (502 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"healthy","67561":"metric_e2e_custom_dist_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"c":{"65558":"d:custom/duration@second"},"f":{"67561":"metric_e2e_custom_dist_k_AIXH44IB"}},"use_case_id":"custom","metric_id":65558,"org_id":1,"timestamp":1697664813,"project_id":3,"type":"d","value":[4,5,6],"sentry_received_timestamp":1697664813.218}
  Timestamp: 1697664834437
  Partition: 0
  Offset: 410
  Headers: mapping_sources=cfh,metric_type=d
% Reached end of topic snuba-generic-metrics [0] at offset 411

Send the same batch of 3 messages with the send_metrics script

python bin/send_metrics.py --use-cases custom --rand-str AIXH44IB

Confirm that the new messages all have fetch type of CACHE_HIT ("c")

Key (-1 bytes):
  Value (485 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"errored","67562":"metric_e2e_custom_set_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"c":{"67562":"metric_e2e_custom_set_k_AIXH44IB","65556":"s:custom/error@none"}},"use_case_id":"custom","metric_id":65556,"org_id":1,"timestamp":1697664926,"project_id":3,"type":"s","value":[3],"sentry_received_timestamp":1697664926.943}
  Timestamp: 1697664948036
  Partition: 0
  Offset: 411
  Headers: mapping_sources=ch,metric_type=s

Key (-1 bytes):
  Value (496 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"healthy","67561":"metric_e2e_custom_dist_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"c":{"65558":"d:custom/duration@second","67561":"metric_e2e_custom_dist_k_AIXH44IB"}},"use_case_id":"custom","metric_id":65558,"org_id":1,"timestamp":1697664926,"project_id":3,"type":"d","value":[4,5,6],"sentry_received_timestamp":1697664926.943}
  Timestamp: 1697664948036
  Partition: 0
  Offset: 412
  Headers: mapping_sources=ch,metric_type=d

Key (-1 bytes):
  Value (489 bytes): {"tags":{"9223372036854776010":"production","9223372036854776017":"init","67563":"metric_e2e_custom_counter_v_AIXH44IB"},"version":2,"retention_days":90,"mapping_meta":{"h":{"9223372036854776010":"environment","9223372036854776017":"session.status"},"c":{"67563":"metric_e2e_custom_counter_k_AIXH44IB","65555":"c:custom/custom@none"}},"use_case_id":"custom","metric_id":65555,"org_id":1,"timestamp":1697664926,"project_id":3,"type":"c","value":1,"sentry_received_timestamp":1697664926.943}
  Timestamp: 1697664948036
  Partition: 0
  Offset: 413
  Headers: mapping_sources=ch,metric_type=c
% Reached end of topic snuba-generic-metrics [0] at offset 414

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 18, 2023
@john-z-yang john-z-yang changed the base branch from master to john/update-caching-idxer-schema-read October 20, 2023 16:51
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from 5839070 to 6ac3b93 Compare October 20, 2023 16:53
@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Oct 20, 2023
@github-actions
Copy link
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@github-actions
Copy link
Contributor

🏷 The following changes will be made to the repository labels

Validating provided labels
Fetching labels from GitHub
Labels are already up to date

@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-read branch from d73c1b9 to 5e76277 Compare October 20, 2023 16:58
@john-z-yang john-z-yang removed the Scope: Frontend Automatically applied to PRs that change frontend components label Oct 20, 2023
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from 6ac3b93 to 79b4f7b Compare October 20, 2023 17:07
@john-z-yang john-z-yang changed the title John/update caching idxer schema write feat(metrics): Add ability for indexer cache to write new schema Oct 20, 2023
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from 79b4f7b to fb9d162 Compare October 20, 2023 17:32
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-read branch from 5e76277 to 57e4ecf Compare October 20, 2023 19:06
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from fb9d162 to 05d76b1 Compare October 20, 2023 19:07
@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #58414 (e34b2da) into master (ec6f8fa) will decrease coverage by 0.01%.
Report is 90 commits behind head on master.
The diff coverage is 85.52%.

❗ Current head e34b2da differs from pull request most recent head 2733f96. Consider uploading reports for the commit 2733f96 to get more accurate results

@@            Coverage Diff             @@
##           master   #58414      +/-   ##
==========================================
- Coverage   80.60%   80.59%   -0.01%     
==========================================
  Files        5151     5151              
  Lines      225476   225419      -57     
  Branches    37986    37987       +1     
==========================================
- Hits       181736   181670      -66     
- Misses      38187    38189       +2     
- Partials     5553     5560       +7     
Files Coverage Δ
src/sentry/api/base.py 93.45% <100.00%> (+0.25%) ⬆️
src/sentry/api/endpoints/auth_index.py 84.80% <100.00%> (ø)
...dpoints/integrations/sentry_apps/authorizations.py 100.00% <100.00%> (ø)
src/sentry/hybridcloud/models/apikeyreplica.py 100.00% <100.00%> (ø)
...c/sentry/hybridcloud/models/orgauthtokenreplica.py 92.10% <100.00%> (+10.52%) ⬆️
src/sentry/hybridcloud/options.py 100.00% <100.00%> (ø)
src/sentry/incidents/logic.py 91.46% <100.00%> (+0.02%) ⬆️
src/sentry/middleware/access_log.py 94.02% <100.00%> (+0.27%) ⬆️
src/sentry/middleware/auth.py 92.98% <100.00%> (+2.62%) ⬆️
src/sentry/models/apikey.py 98.24% <100.00%> (+0.03%) ⬆️
... and 18 more

... and 13 files with indirect coverage changes

@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-read branch from 57e4ecf to 90c62ae Compare October 20, 2023 20:11
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from 05d76b1 to 987ef58 Compare October 20, 2023 20:27
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from 987ef58 to 9d0dc10 Compare October 20, 2023 20:59
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-read branch from 90c62ae to 3caa6a1 Compare October 24, 2023 21:53
Base automatically changed from john/update-caching-idxer-schema-read to master October 24, 2023 22:31
john-z-yang added a commit that referenced this pull request Oct 24, 2023
# This PR

- Add a `namespace` parameters to the `make_cache_key` function, and
depending on the feature flag (set to false right now) will create cache
keys from the new namespaces that are separate for the read path and
write path.
- Functionally does not change anything unless the flag is set to True

**Next change in this stack PR**
#58414
@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch 2 times, most recently from 22c9eac to e34b2da Compare October 24, 2023 23:06
@john-z-yang john-z-yang marked this pull request as ready for review October 25, 2023 04:11
@john-z-yang john-z-yang requested a review from a team as a code owner October 25, 2023 04:11
@john-z-yang john-z-yang requested a review from ayirr7 October 25, 2023 04:11
@@ -150,14 +160,28 @@ def get_many(self, namespace: str, keys: Iterable[str]) -> MutableMapping[str, O
def set_many(self, namespace: str, key_values: Mapping[str, int]) -> None:
cache_key_values = {self._make_cache_key(k): v for k, v in key_values.items()}
self.cache.set_many(cache_key_values, timeout=self.randomized_ttl, version=self.version)
if options.get("sentry-metrics.indexer.write-new-cache-namespace"):
timestamp = int(datetime.utcnow().timestamp())
name_sapced_cache_key_values = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name_sapced_cache_key_values = {
namespaced_cache_key_values = {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

cache_keys = [self._make_cache_key(key) for key in keys]
self.cache.delete_many(cache_keys, version=self.version)
self.cache.delete_many([self._make_cache_key(key) for key in keys], version=self.version)
if options.get("sentry-metrics.indexer.write-new-cache-namespace"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string option name is being used at a lot of places in the code. Same for read new cache namespace. Lets define string constants for these to avoid any possibility of copy/paste mishaps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed, thank you!

Copy link
Member

@nikhars nikhars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are not blockers. Feel free to merge after making the change

@john-z-yang john-z-yang force-pushed the john/update-caching-idxer-schema-write branch from e34b2da to 2733f96 Compare October 26, 2023 15:07
@john-z-yang john-z-yang enabled auto-merge (squash) October 26, 2023 15:44
@john-z-yang john-z-yang merged commit 15af691 into master Oct 26, 2023
@john-z-yang john-z-yang deleted the john/update-caching-idxer-schema-write branch October 26, 2023 15:54
@github-actions github-actions bot locked and limited conversation to collaborators Nov 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants