Support to flatten JSON logs in internal pipeline #5346

zyy17 · 2025-01-12T13:30:02Z

What type of enhancement is this?

API improvement

What does the enhancement do?

The current internal greptimedb pipeline will write JSON objects as the following schema:

<key>: <json blob>

It's easy to implement with good performance. However, it's very hard to use unless you write another pipeline to parse it.

Another user-friendly process is to flatten JSON objects. For example, if we want to write the following JSON:

{
  "kubernetes": {
    "container_id": "containerd://21086f05ddd5a29d194babf3879dafb85de278e31b3ac564ea08c174e8f039bc",
    "container_image": "localhost:5001/printer:latest",
    "container_image_id": "localhost:5001/printer@sha256:21b8881e5e2bcea49cca791ef20540f89881957895fd049293ee82ad8bf13e99",
    "container_name": "printer",
    "namespace_labels": {
      "control-plane": "controller-manager",
      "kubernetes.io/metadata.name": "default"
    }
  }
}

It can be flatten as:

{
  "kubernetes.container_id": "containerd://21086f05ddd5a29d194babf3879dafb85de278e31b3ac564ea08c174e8f039bc",
  "kubernetes.container_image": "localhost:5001/printer:latest",
  "kubernetes.container_image_id": "localhost:5001/printer@sha256:21b8881e5e2bcea49cca791ef20540f89881957895fd049293ee82ad8bf13e99",
  "kubernetes.container_name": "printer",
  "kubernetes.namespace_labels.control-plane": "controller-manager",
  "kubernetes.namespace_labels.kubernetes.io/metadata.name": "default"
}

It's similar to the vector VRL function flatten().

For an array-type object, it's hard to flatten it as an object with a unique key. We can marshal it as JSON string, for example:

{
  "a": {
    "b": {
      "c": 1,
      "d": 2,
      "e": {
        "f": "g",
        "h": true
      },
      "u": [
        "a",
        "b",
        "c"
      ]
    }
  }
}
=>
{
  "a.b.c": 1,
  "a.b.d": 2,
  "a.b.e.f": "g",
  "a.b.e.h": true,
  "a.b.u": "[\"a\",\"b\",\"c\"]"
}

Implementation challenges

Limit the depth when flattening the big JSON object;
Performance;

The text was updated successfully, but these errors were encountered:

zyy17 added the C-enhancement Category Enhancements label Jan 12, 2025

zyy17 self-assigned this Jan 12, 2025

zyy17 linked a pull request Jan 14, 2025 that will close this issue

refactor: support to flatten json object in greptime_identity pipeline #5358

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to flatten JSON logs in internal pipeline #5346

Support to flatten JSON logs in internal pipeline #5346

zyy17 commented Jan 12, 2025 •

edited

Loading

Support to flatten JSON logs in internal pipeline #5346

Support to flatten JSON logs in internal pipeline #5346

Comments

zyy17 commented Jan 12, 2025 • edited Loading

What type of enhancement is this?

What does the enhancement do?

Implementation challenges

zyy17 commented Jan 12, 2025 •

edited

Loading