Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix freeform guardrails #918

Merged
merged 6 commits into from
Dec 31, 2023
Merged

Fix freeform guardrails #918

merged 6 commits into from
Dec 31, 2023

Conversation

ansoncfit
Copy link
Member

@ansoncfit ansoncfit commented Dec 19, 2023

We currently limit the number of destinations and OD pairs in regional analyses with time/path results. This PR fixes a couple issues with the related guardrails:

  • Fixes a conditional so that when oneToOne = true, the broker checks the number of OD pairs (which equals the number of origins) against the correct limit, rather than just allowing the analysis to proceed.
  • Checks limit on number of destinations for path analyses in broker. Workers were applying a stricter limit (5000 destinations for path analyses) than the broker, so the broker could potentially enqueue millions of tasks that would trigger an immediate error once they reached the workers.

Small refactor for brevity/legibility: task = job.templateTask

Correct logic for checking oneToOne analyses
Check limit on number of destinations for path analyses in broker
@abyrd
Copy link
Member

abyrd commented Dec 20, 2023

The underlying problem where tasks continue to be distributed to workers after an error was reported should be solved by #921. With that fixed, mismatches in validation rules between different components (UI, backend, workers) should be much less problematic and just result in a few error messages.

That said, it's also a good idea to make these checks more specific as in this PR.

int nODPairs = task.oneToOne ? nOriginsTotal : nOriginsTotal * nDestinations;
if (task.recordTimes &&
(nDestinations > MAX_FREEFORM_DESTINATIONS || nODPairs > MAX_FREEFORM_OD_PAIRS)) {
throw new AnalysisServerException(String.format(
Copy link
Member Author

@ansoncfit ansoncfit Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is throw AnalysisServerException.badRequest(... preferable here?

Is it straightforward to apply filterStackTrace (moved to ExceptionUtils in #921) to the message eventually passed along to the UI?

Copy link
Member

@abyrd abyrd Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your second question: yes, I think so. It looks like we could apply filterStackTrace to the message parameter in each call to HttpApi#respondToException (there are five of them just above the method definition).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your first question: it looks like calling badRequest would yield an identical AnalysisServerException instance but with type BAD_REQUEST instead of UNKNOWN. It looks like the only effect would be the string BAD_REQUEST instead of UNKNOWN in a field in the JSON returned to the UI. Assuming this does not change UI behavior in any problematic way, yes badRequest does seem to better represent the source of the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out that method. I think we want to apply it to the stackTrace, not the message. I committed that below for testing on staging, but feel free to revert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. It was late when I was reading that and I mistakenly thought the callers were passing e.getStackTrace() as the message. This was probably driven by some kind of wishful thinking because I realized filterStackTrace is so narrowly suited to output from one method (throwable.printStackTrace via ExceptionUtils.stackTraceString) that it probably shouldn't be public and applicable to the more general String type. This is a minor detail but it's located in error handling code that should be foolproof (things could rapidly get annoying to debug if the signal path for the debugging messages themselves is failing). It looks like it is failsafe and would at worst return "Unknown stack frame" but there could be some edge case we're not thinking of.
Of course in current usage, both places the method is used are filterStackTrace(errorEvent.stackTrace) so we know it's always being applied to a final field that's the output of ExceptionUtils.filterStackTrace(). This should work fine but I'm still thinking about ways to clean up the method signature.
Option A: have the public method be public static String filterStackTrace (Throwable throwable) then call that in the ErrorEvent constructor. This would cause all console log traces to be filtered. All others (Slack) are already filtered.
Option B: move filterStackTrace back to a private method on ErrorEvent and always use it via a public wrapper method that only reads from ErrorEvent.stackTrace. (Won't work because we need it in RegionalWorkResult)
Option C: Add a field to ErrorEvent so it has both the full and filtered stack trace
Option D: keep the entire Throwable in a private field on the ErrorEvent and generate the stack traces on demand

Copy link
Member

@abyrd abyrd Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did option C as it ensures the trace string always came from the right method but still allows for verbose stack traces in console logs. Seems to work in local testing on some missing regional results.

D looks cleaner, but it defers processing of the stack trace to the moment the log or slack message is written, thus deferring some potential failures to those background event handling threads. I'd rather front-load the stack trace processing to the point where the ErrorEvent is created, containing any delays or problems closer to the source of the original error.

abyrd
abyrd previously approved these changes Dec 28, 2023
Copy link
Member

@abyrd abyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this as-is in case you want to merge it now. Feel free to add the exception changes discussed in the PR comments if you like in which case I can merge immediately after re-review.

BAD_REQUEST instead of UNKNOWN
@ansoncfit ansoncfit merged commit 14cbb20 into dev Dec 31, 2023
3 checks passed
@ansoncfit ansoncfit deleted the freeform-guardrails branch December 31, 2023 00:15
christophfink added a commit to DigitalGeographyLab/r5 that referenced this pull request Jan 16, 2024
* fix comment on NETWORK_FORMAT_VERSION

* Do not record opportunities 120 minutes away

* Fix freeform guardrails

Correct logic for checking oneToOne analyses
Check limit on number of destinations for path analyses in broker

* only record job when assembler created, fixes conveyal#887

* record errors (stopping job) before other checks

addresses conveyal#887

* report worker errors as much shorter stack trace

move filterStackTrace method into shared utility class

* record only one error per job in broker

addresses conveyal#919 and conveyal#887

* Add final modifier to PathResult constants

* Use specific AnalysisServerException

BAD_REQUEST instead of UNKNOWN

* Filter stack traces sent to UI

* build filtered stack trace directly from throwable

conveyal#918 (comment)

* Update to 2020 Census geometries

* Update seamless-census test fixtures

with 2020 Census geometries

* do not include stacktrace in message

rethrow AnalysisServerException since they already have clear messages

* Implement Timo Jaakkonen’s crossing delays (#4), upload package to DGL GitHub organisation

* Jaakkonen (2013)’s values

* float -> int delays

* let’s try this

* enough for today

* too many changes for such a small patch 😬

* fixed test

* shield against undefined date

* 😬

* am I allowed to push to DGL’s packages?

* linted, moved package destination to my own fork

* moved repo to fork

* build package on pull request, publish to dgl’s repo

* build package on pull request, publish to dgl’s repo

* build package on pull request, publish to dgl’s repo

* build package on pull request, publish to dgl’s repo

* build package on pull request, publish to dgl’s repo

* Run on older image, as ubuntu-latest has gradle-8, which fails (#5)

* Freeze gradle version (#6)

* Run on older image, as ubuntu-latest has gradle-8, which fails

* freeze gradle version

* fix build workflow  (#7)

* Run on older image, as ubuntu-latest has gradle-8, which fails

* freeze gradle version

* refactor workflow, correct repo package address

* fix build workflow (#8)

* Run on older image, as ubuntu-latest has gradle-8, which fails

* freeze gradle version

* refactor workflow, correct repo package address

* clean-up

* clean-up

* token

* print debug info

* configurable (per-repo) package registry (#9)

* Run on older image, as ubuntu-latest has gradle-8, which fails

* freeze gradle version

* refactor workflow, correct repo package address

* clean-up

* clean-up

* token

* print debug info

* flexible package repo address

* remove debug statement

* configurable package registry

* debugging GH actions

* debugging ii

* fix build system (#10)

* Run on older image, as ubuntu-latest has gradle-8, which fails

* freeze gradle version

* refactor workflow, correct repo package address

* clean-up

* clean-up

* token

* print debug info

* flexible package repo address

* remove debug statement

* configurable package registry

* debugging GH actions

* debugging ii

* asfd

* remove debug statements

* read custom OSM tags, use them for bike routing (#11)

* read custom OSM tags, use them for bike routing

* allocate memory

* write on correct edge

* Latest 7.x.x gradle

* Fix bicycle speed tag reading

* Allow publish step to fail (in pull requests)

* Add default speeds that represent the speed limits on Finnish roads (#13)

* revert Jaakkola implementation (#16)

* disable turning penalties for bicycle and car, do not apply perceived length multiplier for bicycle routing (#17)

* Use per-edge speeds for cycling (#19)

* Use per-edge speeds for cycling

* Timo Jakkola’s times (#21)

* save shapes of GTFS patterns (#22)

* 23 roll back jaakkola congestion timings (#24)

* save shapes of GTFS patterns

* Roll back Jaakkola congestion penalties

* make MAX_PATH_DESTINATIONS *not* final

---------

Co-authored-by: abyrd <[email protected]>
Co-authored-by: ansons <[email protected]>
Co-authored-by: Anson Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants