Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nullValue being respected when parsing CSVs #224

Closed
wants to merge 2 commits into from

Conversation

addisonj
Copy link

This change makes it so that we look for a user specified nullValue
through the CSV parsing. This allows for handling CSVs that might use
something else other than an empty string to represent nulls.

It reuses the same flag as CSV saving, nullValue. This change should
be non-breaking.

This also pushes this behavior into inferSchema so that inferred schemas
will properly reflect the user given null value.

This change makes it so that we look for a user specified nullValue
through the CSV parsing. This allows for handling CSVs that might use
something else other than an empty string to represent nulls.

It reuses the same flag as CSV saving, `nullValue`. This change should
be non-breaking.

This also pushes this behavior into inferSchema so that inferred schemas
will properly reflect the user given null value.
@codecov-io
Copy link

Current coverage is 86.05%

Merging #224 into master will increase coverage by +0.23% as of eb0714c

@@            master    #224   diff @@
======================================
  Files           11      11       
  Stmts          501     509     +8
  Branches       148     148       
  Methods          0       0       
======================================
+ Hit            430     438     +8
  Partial          0       0       
  Missed          71      71       

Review entire Coverage Diff as of eb0714c

Powered by Codecov. Updated on successful CI builds.

@addisonj
Copy link
Author

Just now noticed #76 does the same thing 🤦

It seems like that is pretty stale and might mess with inferSchema for those cases... But does accept a range of null values (not sure how likely it is that a single CSV will have multiple types of null?)

But as is mentioned in that PR and in other issues, it really seems like null handling should be a part of CSV handling as we have large CSV files where it would be too expensive to remap just for remapping nulls.

@falaki
Copy link
Member

falaki commented Jan 6, 2016

Thanks @addisonj this looks good. I am going to merge it.

@falaki falaki closed this in b556e59 Jan 6, 2016
@addisonj
Copy link
Author

addisonj commented Jan 6, 2016

Thanks! Currently running from a fork, any ETA on when you will be cutting a new release?

@ankurmitujjain
Copy link

Is this part of release?

@ankurmitujjain
Copy link

Not working on Databricks notebook.. I tried to load csv data with ? as null values.

@rjurney
Copy link

rjurney commented Sep 25, 2016

So this seems to still not be working in 2.0.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants