Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding ndjson format #218

Merged
merged 6 commits into from
Oct 9, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions cli/cmd/transformCmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ var (
}
schema string
input string
ndjson bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought we said changing ndjson to stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I changed the command option name but not in code, I'll make that change now

)

func init() {
Expand All @@ -39,6 +40,8 @@ func init() {

transformCmd.Flags().StringVarP(
&input, "input", "i", "", "input file (optional; if not specified, stdin/pipe is used)")
transformCmd.Flags().BoolVarP(
&ndjson, "ndjson", "", false, "change the output format to ndjson")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jf-tech is this what you meant with a command line option or did you want something like --format?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing it to a long flag only --stream. By default or not specified, it's false. The flag doesn't have a short form, only the --stream long form.

}

func openFile(label string, filepath string) (io.ReadCloser, error) {
Expand Down Expand Up @@ -86,6 +89,11 @@ func doTransform() error {
if err != nil {
return "", err
}

if ndjson {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return string(b), nil
}

return strings.Join(
strs.NoErrMapSlice(
strings.Split(jsons.BPJ(string(b)), "\n"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look the code can be a little bit optimized here:

b = string(b)
if stream {
    return b, nil
}
return strings.Join(
	strs.NoErrMapSlice(
		strings.Split(jsons.BPJ(string(b)), "\n"),

since string(b) needs to be done no matter what so do it early-on.

Expand All @@ -95,13 +103,27 @@ func doTransform() error {

record, err := doOne()
if err == io.EOF {
fmt.Println("[]")
if ndjson {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fmt.Println("")
} else {
fmt.Println("[]")
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this is just written weirdly, why not something inline with:

if !stream {
    println("[]")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Println writes a new line even with a blank string, should we just leave it blank?

return nil
}
if err != nil {
return err
}
fmt.Printf("[\n%s", record)

start := "[\n%s"
middle := ",\n%s"
end := "\n]"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to come up with better names or create a method on parser settings that returns a struct that encapsulates these variables

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably lparen and rparen, and delim might be a better naming option?

I'm fine with the current implementation you proposed. But do need to add unit tests - we tried very hard to keep coverage at 100%.

I'm thinking about adding a utility into https://github.com/jf-tech/go-corelib/tree/master/jsons which this omniparser uses extensively, that the utility is a json writer and encapsulates the functionalities you implement here. But that's a later optimization/refactoring. No need for this time. Just unittest coverage.

if ndjson {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream

start = "%s"
middle = "\n%s"
end = ""
}

fmt.Printf(start, record)
for {
record, err = doOne()
if err == io.EOF {
Expand All @@ -110,8 +132,8 @@ func doTransform() error {
if err != nil {
return err
}
fmt.Printf(",\n%s", record)
fmt.Printf(middle, record)
}
fmt.Println("\n]")
fmt.Println(end)
return nil
}