Return structured data for JS source file and source map Scraping attempts #1300

lforst · 2023-09-15T11:37:39Z

For our Source Map Debugger (Blue Thunder Edition™) initiative we require information from Symbolicator about the JS source scraping attempts that have been conducted.

Solution brainstorm

Returned payload (could be nested somewhere):

interface ReturnedPayload {
  symbolication_scraping_attempts: {
    source_file_url: string;
    source_file_scraping:
      | { result: 'not-attempted'; reason_code: number }
      | { result: 'error'; error_code: number; scraping_http_status: number }
      | { result: 'success'};
    source_map_url: string;
    source_map_scraping:
      | { result: 'not-attempted'; reason_code: number}
      | { result: 'error'; error_code: number; scraping_http_status: number }
      | { result: 'success' };
  }[];
}

// Possible error_code Mappings:
// 1: "Unknown"
// 2: "Not found"
// 3: "File too big"
// ... TBD

// Possible reason_code Mappings:
// 1: "Unknown"
// 2: "Already have source through other process"
// ... TBD

(TypeScript notation because it's the only language I know 😶‍🌫️)

loewenheim · 2023-09-26T12:27:44Z

I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.

ETA: Actually I believe there are really two reasons for "not attempted": either we already have it or scraping is disabled. Consequently we should split up "not attempted" into two results and remove the reason.

loewenheim · 2023-09-26T13:03:35Z

Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?

lforst · 2023-09-26T13:31:13Z

I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.

I think "forbidden" can fall under error and "disabled" under not-attempted. (With corresponding codes of course.)

Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?

Ah, thanks. Good catch! Adjusted.

loewenheim · 2023-09-26T13:36:41Z

I think "forbidden" can fall under error and "disabled" under not-attempted. (With corresponding codes of course.)

Sorry, I intended "forbidden" and "disabled" to be the same thing, I just hadn't decided what the better name was yet (it's probably "disabled").

loewenheim · 2023-09-28T12:43:48Z

I opened #1311 as a strawman implementation so that we have something to discuss.

The CompletedJsSymbolicationResponse now contains a list of "scraping attempts". The definition of these scraping attempts is

pub struct JsScrapingAttempt {
    pub url: String,
    pub result: JsScrapingResult,
}

pub enum JsScrapingResult {
    NotAttempted,
    Success,
    Failure {
        reason: JsScrapingFailureReason,
        details: String,
    },
}

pub enum JsScrapingFailureReason {
    NotFound,
    Disabled,
    InvalidHost,
    PermissionDenied,
    Timeout,
    DownloadError,
    Other,
}

So each scraping attempt has a url and a result. The result can be NotAttempted, Success, or Failure. Success and NotAttempted contain no further information (NotAttempted just means that we got the file another way), but Failure has an additional Reason and possibly details.

Here is an example from out test fixtures (written in YAML, but you'd be getting it as JSON):

scraping_attempts:
  - url: "http://example.com/index.html"
    result:
      Failure:
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result: NotAttempted

This differs from your design above in a few respects.

We don't bundle the scraping attempts for a source file and its sourcemap together. This could be done, but it would complicate the code a fair bit. You can still associate them because every frame in the response contains the sourcemap URL.
The details of the failure cases are different. We return a simple "reason" and possibly "details" instead of an error code and the http status. The error codes could certainly still be done, but getting the http status would need significant changes to our downloader.

Are these differences acceptable to you? Is there anything you would like to see changed?

lforst · 2023-09-29T09:16:36Z

@loewenheim Thank you! This is great <3

I have a few thoughts.

You can still associate them because every frame in the response contains the sourcemap URL.

Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?

We return a simple "reason" and possibly "details" instead of an error code and the http status.

HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the result field. To illustrate, here is what I suggest we change the example you gave to:

scraping_attempts:
  - url: "http://example.com/index.html"
    result:
        status: Failure
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result:
        status: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result:
        status: NotAttempted

That way we don't have to check for the type of result and can basically assume result.status is a string and result.reason is an optional string. This is obviously up for debate.

Additionally, I would probably return the strings in a bit more pythonian manner, e.g. not_attempted instead of NotAttempted and so on. This is a bit whatever.

In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.

loewenheim · 2023-09-29T09:59:57Z

@loewenheim Thank you! This is great <3

I have a few thoughts.

You can still associate them because every frame in the response contains the sourcemap URL.

Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?

The information already exists in the event JSON you can view on Sentry, if that's what you mean.

We return a simple "reason" and possibly "details" instead of an error code and the http status.

HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the result field. To illustrate, here is what I suggest we change the example you gave to:
scraping_attempts:
  - url: "http://example.com/index.html"
    result:
        status: Failure
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result:
        status: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result:
        status: NotAttempted
That way we don't have to check for the type of result and can basically assume result.status is a string and result.reason is an optional string. This is obviously up for debate.

That sounds like a good idea and would be easy to change.

Additionally, I would probably return the strings in a bit more pythonian manner, e.g. not_attempted instead of NotAttempted and so on. This is a bit whatever.

You're right, I'll change it to snake_case.

In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.

I'll send you an example event that I believe demonstrates that we can.

loewenheim · 2023-09-29T10:14:56Z

What do you think of this?

scraping_attempts:
  - url: "http://example.com/index.html"
    status: failure
    reason: disabled
  - url: "http://example.com/embedded.js"
    status: not_attempted
  - url: "http://example.com/embedded.js.map"
    status: not_attempted

status is always present, reason and details may be present for failures. We can also nest it inside a result the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.

We could also do the scraping_attempts as a map from URLs to results instead of a list. What do you think?

lforst · 2023-09-29T10:20:38Z

I'll send you an example event that I believe demonstrates that we can.

Based on our discussion that data.sourcemap is available inside stackframe objects, having source files and sourcemaps separate sounds good to me!

status is always present, reason and details may be present for failures. We can also nest it inside a result the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.

No this seems great! Let's do it exactly like that :)

We could also do the scraping_attempts as a map from URLs to results instead of a list. What do you think?

I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (debug_meta).

loewenheim · 2023-09-29T10:22:48Z

I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (debug_meta).

Makes sense, I'll leave it as a list then.

lforst mentioned this issue Sep 15, 2023

In-Product Source Maps Debugger getsentry/sentry#55662

Closed

loewenheim linked a pull request Sep 29, 2023 that will close this issue

feat: Return "scraping attempts" from JS symbolication #1311

Merged

loewenheim closed this as completed in #1311 Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return structured data for JS source file and source map Scraping attempts #1300

Return structured data for JS source file and source map Scraping attempts #1300

lforst commented Sep 15, 2023 •

edited

Loading

loewenheim commented Sep 26, 2023 •

edited

Loading

loewenheim commented Sep 26, 2023

lforst commented Sep 26, 2023

loewenheim commented Sep 26, 2023

loewenheim commented Sep 28, 2023

lforst commented Sep 29, 2023

loewenheim commented Sep 29, 2023

loewenheim commented Sep 29, 2023

lforst commented Sep 29, 2023

loewenheim commented Sep 29, 2023

Return structured data for JS source file and source map Scraping attempts #1300

Return structured data for JS source file and source map Scraping attempts #1300

Comments

lforst commented Sep 15, 2023 • edited Loading

Solution brainstorm

loewenheim commented Sep 26, 2023 • edited Loading

loewenheim commented Sep 26, 2023

lforst commented Sep 26, 2023

loewenheim commented Sep 26, 2023

loewenheim commented Sep 28, 2023

lforst commented Sep 29, 2023

loewenheim commented Sep 29, 2023

loewenheim commented Sep 29, 2023

lforst commented Sep 29, 2023

loewenheim commented Sep 29, 2023

lforst commented Sep 15, 2023 •

edited

Loading

loewenheim commented Sep 26, 2023 •

edited

Loading