Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: influx inspect export parquet #25047

Open
wants to merge 16 commits into
base: master-1.x
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion cmd/influx_inspect/export/export_parquet.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ func (cmd *Command) writeValuesParquet(_ io.Writer, seriesKey []byte, field stri
}

func (cmd *Command) exportDoneParquet(_ string) error {
if len(vc) == 0 {
return nil
}

defer func() {
vc = nil
}()
Expand Down Expand Up @@ -87,7 +91,7 @@ func (cmd *Command) exportDoneParquet(_ string) error {
TagSet: tagSet,
FieldSet: fieldSet,
}
// schema does not change in a table
// schema does not change in a table in one tsm file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag set schema can change within a single TSM file from one series key to the next.

If a user writes the following point:

m0,tag0=val0 f0=1.3

The schema is for the previous line is:

col type
tag0 string (tag)
f0 float (field)

If the next write is:

m0,tag1=val0,tag2=val1 f1=false

The schema for that line is:

col type
tag1 string (tag)
tag2 string (tag)
f1 bool (field)

Therefore, the schema must be the union of all series keys, resulting in a table schema of:

col type
tag0 string (tag)
tag1 string (tag)
tag2 string (tag)
f0 float (field)
f1 bool (field)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume then the export have to iterate over twice TSM files. In the first iteration, complete tables schema would be gathered, and in the seconds iteration the actual data exported, correct?

break
}

Expand Down