Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return structured data for JS source file and source map Scraping attempts #1300

Closed
Tracked by #55662
lforst opened this issue Sep 15, 2023 · 10 comments · Fixed by #1311
Closed
Tracked by #55662

Return structured data for JS source file and source map Scraping attempts #1300

lforst opened this issue Sep 15, 2023 · 10 comments · Fixed by #1311

Comments

@lforst
Copy link
Member

lforst commented Sep 15, 2023

For our Source Map Debugger (Blue Thunder Edition™) initiative we require information from Symbolicator about the JS source scraping attempts that have been conducted.

Solution brainstorm

Returned payload (could be nested somewhere):

interface ReturnedPayload {
  symbolication_scraping_attempts: {
    source_file_url: string;
    source_file_scraping:
      | { result: 'not-attempted'; reason_code: number }
      | { result: 'error'; error_code: number; scraping_http_status: number }
      | { result: 'success'};
    source_map_url: string;
    source_map_scraping:
      | { result: 'not-attempted'; reason_code: number}
      | { result: 'error'; error_code: number; scraping_http_status: number }
      | { result: 'success' };
  }[];
}

// Possible error_code Mappings:
// 1: "Unknown"
// 2: "Not found"
// 3: "File too big"
// ... TBD

// Possible reason_code Mappings:
// 1: "Unknown"
// 2: "Already have source through other process"
// ... TBD

(TypeScript notation because it's the only language I know 😶‍🌫️)

@loewenheim
Copy link
Contributor

loewenheim commented Sep 26, 2023

I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.

ETA: Actually I believe there are really two reasons for "not attempted": either we already have it or scraping is disabled. Consequently we should split up "not attempted" into two results and remove the reason.

@loewenheim
Copy link
Contributor

Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?

@lforst
Copy link
Member Author

lforst commented Sep 26, 2023

I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.

I think "forbidden" can fall under error and "disabled" under not-attempted. (With corresponding codes of course.)

Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?

Ah, thanks. Good catch! Adjusted.

@loewenheim
Copy link
Contributor

I think "forbidden" can fall under error and "disabled" under not-attempted. (With corresponding codes of course.)

Sorry, I intended "forbidden" and "disabled" to be the same thing, I just hadn't decided what the better name was yet (it's probably "disabled").

@loewenheim
Copy link
Contributor

I opened #1311 as a strawman implementation so that we have something to discuss.

The CompletedJsSymbolicationResponse now contains a list of "scraping attempts". The definition of these scraping attempts is

pub struct JsScrapingAttempt {
    pub url: String,
    pub result: JsScrapingResult,
}

pub enum JsScrapingResult {
    NotAttempted,
    Success,
    Failure {
        reason: JsScrapingFailureReason,
        details: String,
    },
}

pub enum JsScrapingFailureReason {
    NotFound,
    Disabled,
    InvalidHost,
    PermissionDenied,
    Timeout,
    DownloadError,
    Other,
}

So each scraping attempt has a url and a result. The result can be NotAttempted, Success, or Failure. Success and NotAttempted contain no further information (NotAttempted just means that we got the file another way), but Failure has an additional Reason and possibly details.

Here is an example from out test fixtures (written in YAML, but you'd be getting it as JSON):

scraping_attempts:
  - url: "http://example.com/index.html"
    result:
      Failure:
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result: NotAttempted

This differs from your design above in a few respects.

  1. We don't bundle the scraping attempts for a source file and its sourcemap together. This could be done, but it would complicate the code a fair bit. You can still associate them because every frame in the response contains the sourcemap URL.
  2. The details of the failure cases are different. We return a simple "reason" and possibly "details" instead of an error code and the http status. The error codes could certainly still be done, but getting the http status would need significant changes to our downloader.

Are these differences acceptable to you? Is there anything you would like to see changed?

@lforst
Copy link
Member Author

lforst commented Sep 29, 2023

@loewenheim Thank you! This is great <3

I have a few thoughts.

You can still associate them because every frame in the response contains the sourcemap URL.

Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?

We return a simple "reason" and possibly "details" instead of an error code and the http status.

HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the result field. To illustrate, here is what I suggest we change the example you gave to:

scraping_attempts:
  - url: "http://example.com/index.html"
    result:
        status: Failure
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result:
        status: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result:
        status: NotAttempted

That way we don't have to check for the type of result and can basically assume result.status is a string and result.reason is an optional string. This is obviously up for debate.

Additionally, I would probably return the strings in a bit more pythonian manner, e.g. not_attempted instead of NotAttempted and so on. This is a bit whatever.

In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.

@loewenheim
Copy link
Contributor

@loewenheim Thank you! This is great <3

I have a few thoughts.

You can still associate them because every frame in the response contains the sourcemap URL.

Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?

The information already exists in the event JSON you can view on Sentry, if that's what you mean.

We return a simple "reason" and possibly "details" instead of an error code and the http status.

HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the result field. To illustrate, here is what I suggest we change the example you gave to:

scraping_attempts:
  - url: "http://example.com/index.html"
    result:
        status: Failure
        reason: Disabled
  - url: "http://example.com/test.min.js"
    result:
        status: NotAttempted
  - url: "http://example.com/test.min.js.map"
    result:
        status: NotAttempted

That way we don't have to check for the type of result and can basically assume result.status is a string and result.reason is an optional string. This is obviously up for debate.

That sounds like a good idea and would be easy to change.

Additionally, I would probably return the strings in a bit more pythonian manner, e.g. not_attempted instead of NotAttempted and so on. This is a bit whatever.

You're right, I'll change it to snake_case.

In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.

I'll send you an example event that I believe demonstrates that we can.

@loewenheim
Copy link
Contributor

What do you think of this?

scraping_attempts:
  - url: "http://example.com/index.html"
    status: failure
    reason: disabled
  - url: "http://example.com/embedded.js"
    status: not_attempted
  - url: "http://example.com/embedded.js.map"
    status: not_attempted

status is always present, reason and details may be present for failures. We can also nest it inside a result the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.

We could also do the scraping_attempts as a map from URLs to results instead of a list. What do you think?

@lforst
Copy link
Member Author

lforst commented Sep 29, 2023

I'll send you an example event that I believe demonstrates that we can.

Based on our discussion that data.sourcemap is available inside stackframe objects, having source files and sourcemaps separate sounds good to me!

status is always present, reason and details may be present for failures. We can also nest it inside a result the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.

No this seems great! Let's do it exactly like that :)

We could also do the scraping_attempts as a map from URLs to results instead of a list. What do you think?

I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (debug_meta).

@loewenheim
Copy link
Contributor

I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (debug_meta).

Makes sense, I'll leave it as a list then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants