Unable to get failure reason for jobs that have failed #39

jewer · 2014-12-23T05:36:42Z

Happy to write this myself, but just wanted to validate direction:

Use Case: I've submitted a query asynchronously to BQ. The query fails for any number of reasons. The job will be marked as "done" but I do not have the ability to get the reason for the failure and the current check_job implementation will just say 0 rows.

Solutions, in no particular order:

Leave it alone, it's good enough
Hijack check_job and have it return richer job status information.
Add a new get_job_status method that's responsible for returning the status, including failure reasons if any.

I'm guessing number 2 is probably the best option, but wanted to pass it by before making a pull request.

Whatdya think?

tylertreat · 2014-12-23T05:47:52Z

I'm in favor of 2 as well. It might make sense to have check_job return a "job status" object which encapsulates the various job properties: id, complete, total rows, failure reason, etc. It's a breaking API change obviously, but I think it's probably justified here.

jewer · 2014-12-23T20:53:40Z

One super annoying thing about the BQ API is that the status code returned for getQueryResults on failed jobs represents the reason why the job failed, not for the getQueryResults request.

For instance, if I try to submit a query against a table that doesn't exist, getQueryResults returns a 404. If I submit syntactically incorrect SQL to a batch job, then getQueryResults return a 400. Personally, I think that's an odd API design as I expect the status code to be in context of the current request, not the job I submitted and am now checking on the status. Anyways, not a big deal; it just means that the logic in check_job will have to catch HttpError and just involve a lot more logic than I think it should have to.

I'm guessing we probably want our own shape for that job status object, just to reduce the number of breaking changes? Probably model off the current structure? If it changes in the future, we can just continue to use the same shape or decide to make another breaking API change. Something like this?

{
  "job_id": "ABCD123",
  "state": "DONE", //done, pending, running
  "total_rows": 0, //valid values 0-n, will be missing if in error state, mutually exclusive with 'error'
  "error": {  //if errors, otherwise missing
    "message": "Encountered bad SQL at line 1, column 53. Was expecting: <EOF>", //top-level message as reported by BQ
    "errors": [
       {
         "reason": "invalidQuery",
         "message": "Encountered bad SQL at line 1, column 53. Was expecting: <EOF>"
       },
       {
         "reason": "invalidQuery",
         "message": "Encountered bad SQL at line 4, column 22. Was expecting: SELECT"
       }
    ]
  }
}

tylertreat · 2014-12-24T04:13:44Z

Yeah, that seems reasonable. I think I would be in favor of making that an actual class instead of using a dict. Otherwise the user has to know what keys are set as opposed to operating on a rigid object. This also makes it a little easier to change things without breaking the API down the road.

jewer changed the title ~~Unable to check status of jobs that have failed~~ Unable to get failure reason for jobs that have failed Dec 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get failure reason for jobs that have failed #39

Unable to get failure reason for jobs that have failed #39

jewer commented Dec 23, 2014

tylertreat commented Dec 23, 2014

jewer commented Dec 23, 2014

tylertreat commented Dec 24, 2014

Unable to get failure reason for jobs that have failed #39

Unable to get failure reason for jobs that have failed #39

Comments

jewer commented Dec 23, 2014

tylertreat commented Dec 23, 2014

jewer commented Dec 23, 2014

tylertreat commented Dec 24, 2014