-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding ndjson format #218
adding ndjson format #218
Conversation
header/header.go
Outdated
@@ -15,6 +15,7 @@ type ParserSettings struct { | |||
Version string `json:"version,omitempty"` | |||
FileFormatType string `json:"file_format_type,omitempty"` | |||
Encoding *string `json:"encoding,omitempty"` | |||
NDJSON bool `json:"ndjson,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the type of case preferred in this project
cli/cmd/transformCmd.go
Outdated
start := "[\n%s" | ||
middle := ",\n%s" | ||
end := "\n]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to come up with better names or create a method on parser settings that returns a struct that encapsulates these variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably lparen
and rparen
, and delim
might be a better naming option?
I'm fine with the current implementation you proposed. But do need to add unit tests - we tried very hard to keep coverage at 100%.
I'm thinking about adding a utility into https://github.com/jf-tech/go-corelib/tree/master/jsons which this omniparser uses extensively, that the utility is a json writer and encapsulates the functionalities you implement here. But that's a later optimization/refactoring. No need for this time. Just unittest coverage.
If accepted I will add the tests and refactor |
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #218 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 53 53
Lines 3041 3041
=========================================
Hits 3041 3041 ☔ View full report in Codecov by Sentry. |
Oh and just curious @jf-tech , have you thought about updating the go version? From my experience it's completely backwards compatible and has had some good improvements on the performance side and new methods |
Thoughts? And again, thanks for chiming in and contributing!! Love it! |
Thanks for the prompt reply! I think it makes sense to have it as a command line option, I'll make that change! |
cli/cmd/transformCmd.go
Outdated
@@ -39,6 +40,8 @@ func init() { | |||
|
|||
transformCmd.Flags().StringVarP( | |||
&input, "input", "i", "", "input file (optional; if not specified, stdin/pipe is used)") | |||
transformCmd.Flags().BoolVarP( | |||
&ndjson, "ndjson", "", false, "change the output format to ndjson") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jf-tech is this what you meant with a command line option or did you want something like --format
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing it to a long flag only --stream
. By default or not specified, it's false. The flag doesn't have a short form, only the --stream
long form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looking good. thanks for contributing. please do add unittest coverage.
cli/cmd/transformCmd.go
Outdated
@@ -39,6 +40,8 @@ func init() { | |||
|
|||
transformCmd.Flags().StringVarP( | |||
&input, "input", "i", "", "input file (optional; if not specified, stdin/pipe is used)") | |||
transformCmd.Flags().BoolVarP( | |||
&ndjson, "ndjson", "", false, "change the output format to ndjson") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing it to a long flag only --stream
. By default or not specified, it's false. The flag doesn't have a short form, only the --stream
long form.
cli/cmd/transformCmd.go
Outdated
|
||
if ndjson { | ||
return string(b), nil | ||
} | ||
|
||
return strings.Join( | ||
strs.NoErrMapSlice( | ||
strings.Split(jsons.BPJ(string(b)), "\n"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look the code can be a little bit optimized here:
b = string(b)
if stream {
return b, nil
}
return strings.Join(
strs.NoErrMapSlice(
strings.Split(jsons.BPJ(string(b)), "\n"),
since string(b)
needs to be done no matter what so do it early-on.
cli/cmd/transformCmd.go
Outdated
start := "[\n%s" | ||
middle := ",\n%s" | ||
end := "\n]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably lparen
and rparen
, and delim
might be a better naming option?
I'm fine with the current implementation you proposed. But do need to add unit tests - we tried very hard to keep coverage at 100%.
I'm thinking about adding a utility into https://github.com/jf-tech/go-corelib/tree/master/jsons which this omniparser uses extensively, that the utility is a json writer and encapsulates the functionalities you implement here. But that's a later optimization/refactoring. No need for this time. Just unittest coverage.
@jf-tech I'll make the changes and add tests, thank you! |
@jf-tech I'm looking through the tests and I think this should be in Maybe something like this will allow multiple places to use the same writing that the transform command uses:
which reduces
|
@jose-sherpa sorry for the delay. We were out/offline for several days. First I'm not seeing the updated PR. Did you push your updates (based on my feedbacks) to origin? So I would say don't worry about the tests for changes in this directory. Just push your latest updates to the PR and we'll take a look again and approve. |
@jf-tech hey sorry I was out of town, yeah I made the changes you suggested unless you see I missed something. The only commit I did not push was the commit containing the changes I outlined in my previous comment. If you like those changes let me know and I can push that commit |
cli/cmd/transformCmd.go
Outdated
@@ -31,6 +31,7 @@ var ( | |||
} | |||
schema string | |||
input string | |||
ndjson bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought we said changing ndjson
to stream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I changed the command option name but not in code, I'll make that change now
cli/cmd/transformCmd.go
Outdated
@@ -39,6 +40,8 @@ func init() { | |||
|
|||
transformCmd.Flags().StringVarP( | |||
&input, "input", "i", "", "input file (optional; if not specified, stdin/pipe is used)") | |||
transformCmd.Flags().BoolVarP( | |||
&ndjson, "stream", "", false, "change the output format to ndjson") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the argument help display text to "if specified, each record will be a standalone/full JSON blob and printed out immediately once transform is done"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, ndjson
-> stream
cli/cmd/transformCmd.go
Outdated
@@ -86,22 +89,42 @@ func doTransform() error { | |||
if err != nil { | |||
return "", err | |||
} | |||
|
|||
s := string(b) | |||
if ndjson { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
cli/cmd/transformCmd.go
Outdated
func(s string) string { return "\t" + s }), | ||
"\n"), nil | ||
} | ||
|
||
record, err := doOne() | ||
if err == io.EOF { | ||
fmt.Println("[]") | ||
if ndjson { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
cli/cmd/transformCmd.go
Outdated
if ndjson { | ||
fmt.Println("") | ||
} else { | ||
fmt.Println("[]") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this is just written weirdly, why not something inline with:
if !stream {
println("[]")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Println writes a new line even with a blank string, should we just leave it blank?
cli/cmd/transformCmd.go
Outdated
lparen := "[\n%s" | ||
delim := ",\n%s" | ||
rparen := "\n]" | ||
if ndjson { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream
cli/cmd/transformCmd.go
Outdated
@@ -39,6 +40,8 @@ func init() { | |||
|
|||
transformCmd.Flags().StringVarP( | |||
&input, "input", "i", "", "input file (optional; if not specified, stdin/pipe is used)") | |||
transformCmd.Flags().BoolVarP( | |||
&stream, "stream", "", false, "change the output format to ndjson") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought i had a comment on the argument help text?
While the omniparser tool outputs JSON format currently, you will often need another tool or package to stream the JSON output. While I am aware this tool will only be used for JSON output, there is a type of JSON called NDJSON which stands for new line delimited JSON. This makes it easy to stream parse and process a JSON array with no added packages or complexity since you just read each line and parse them one by one. Since a strength of omniparser is to stream parse large files, we think it makes sense to make the output easily streamable without violating the output of JSON. It also results in a smaller file size.
http://ndjson.org/