Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValidationError if column value is 1 or 2 #178

Open
davicorreiajr opened this issue Mar 20, 2020 · 6 comments
Open

ValidationError if column value is 1 or 2 #178

davicorreiajr opened this issue Mar 20, 2020 · 6 comments

Comments

@davicorreiajr
Copy link
Contributor

davicorreiajr commented Mar 20, 2020

I'm having a weird issue: whenever the column value is 1 or 2, I get the following error:

target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: '2.0 is not valid under any of the given schemas'

It's weird that I get the error for 1 or 2, but from 3 on it seems to work.

If it helps, the generated schema for this column is:

{
   "anyOf":[
      {
         "type":"null"
      },
      {
         "type":"number",
         "multipleOf":1e-15
      },
      {
         "type":"string"
      }
   ]
}

I'm using target-postgres along with tap-google-sheets.

@davicorreiajr davicorreiajr changed the title Column fails if its value is 1 or 2 ValidationError if column value is 1 or 2 Mar 20, 2020
@AlexanderMann
Copy link
Collaborator

AlexanderMann commented Mar 24, 2020

@davicorreiajr that is bizarre, but I imagine that has something to do with floating point error/binary number gaps? Our code doesn't actually do those checks, that's Python's implementation of JSONSchema (we try to do as little data munging when validating the schema against the records we ingest).

This looks very similar: python-jsonschema/jsonschema#247

@djdevin
Copy link

djdevin commented Mar 27, 2020

Another postgres target with the same issue, possible solution:

https://github.com/statsbotco/target-postgres/pull/11/files

@AlexanderMann
Copy link
Collaborator

@djdevin ah, interesting. I think the best way to solve this, with the smallest footprint is prolly just to use the suggestion straight from the Python JSON docs:

>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

With that sort of change, handling this issue might be trivial.

@AlexanderMann
Copy link
Collaborator

If someone wants to put up a PR for the above which adds a test for the original problem AND uses the simple json.loads trick, I'm happy to review and try to get this out early next week. If not, I'll try and get the change in myself! 😄

@davicorreiajr
Copy link
Contributor Author

davicorreiajr commented Mar 27, 2020

Nice, thanks guys!

@AlexanderMann I'll try and do it.

@AlexanderMann
Copy link
Collaborator

This just needs to be deployed now. Will close once deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants