Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to flatten JSON logs in internal pipeline #5346

Open
zyy17 opened this issue Jan 12, 2025 · 0 comments · May be fixed by #5358
Open

Support to flatten JSON logs in internal pipeline #5346

zyy17 opened this issue Jan 12, 2025 · 0 comments · May be fixed by #5358
Assignees
Labels
C-enhancement Category Enhancements

Comments

@zyy17
Copy link
Collaborator

zyy17 commented Jan 12, 2025

What type of enhancement is this?

API improvement

What does the enhancement do?

The current internal greptimedb pipeline will write JSON objects as the following schema:

<key>: <json blob>

It's easy to implement with good performance. However, it's very hard to use unless you write another pipeline to parse it.

Another user-friendly process is to flatten JSON objects. For example, if we want to write the following JSON:

{
  "kubernetes": {
    "container_id": "containerd://21086f05ddd5a29d194babf3879dafb85de278e31b3ac564ea08c174e8f039bc",
    "container_image": "localhost:5001/printer:latest",
    "container_image_id": "localhost:5001/printer@sha256:21b8881e5e2bcea49cca791ef20540f89881957895fd049293ee82ad8bf13e99",
    "container_name": "printer",
    "namespace_labels": {
      "control-plane": "controller-manager",
      "kubernetes.io/metadata.name": "default"
    }
  }
}

It can be flatten as:

{
  "kubernetes.container_id": "containerd://21086f05ddd5a29d194babf3879dafb85de278e31b3ac564ea08c174e8f039bc",
  "kubernetes.container_image": "localhost:5001/printer:latest",
  "kubernetes.container_image_id": "localhost:5001/printer@sha256:21b8881e5e2bcea49cca791ef20540f89881957895fd049293ee82ad8bf13e99",
  "kubernetes.container_name": "printer",
  "kubernetes.namespace_labels.control-plane": "controller-manager",
  "kubernetes.namespace_labels.kubernetes.io/metadata.name": "default"
}

It's similar to the vector VRL function flatten().

For an array-type object, it's hard to flatten it as an object with a unique key. We can marshal it as JSON string, for example:

{
  "a": {
    "b": {
      "c": 1,
      "d": 2,
      "e": {
        "f": "g",
        "h": true
      },
      "u": [
        "a",
        "b",
        "c"
      ]
    }
  }
}
=>
{
  "a.b.c": 1,
  "a.b.d": 2,
  "a.b.e.f": "g",
  "a.b.e.h": true,
  "a.b.u": "[\"a\",\"b\",\"c\"]"
}

Implementation challenges

  1. Limit the depth when flattening the big JSON object;
  2. Performance;
@zyy17 zyy17 added the C-enhancement Category Enhancements label Jan 12, 2025
@zyy17 zyy17 self-assigned this Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant