Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the format of the 'output' field in the JSON payload #2985

Closed
Issif opened this issue Dec 20, 2023 · 25 comments · Fixed by #3314
Closed

Change the format of the 'output' field in the JSON payload #2985

Issif opened this issue Dec 20, 2023 · 25 comments · Fixed by #3314
Assignees
Milestone

Comments

@Issif
Copy link
Member

Issif commented Dec 20, 2023

Motivation

Right now, the default output for Falco is stdout with basic text as format. The generated log lines follow this pattern <timestamp> <Priority> <output>:

14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)

By choosing the JSON format we get the same content in the output field, with same elements timestamp and Priority, but these elements are also contained in specific fields of the JSON:

{
    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

It implies we have a duplication of the information between the fields, and it creates a mess for systems trying to deduplicate alerts. See this issue for Falcosidekick.

Feature

I propose to allow an opt-in to remove the timestamp and/or the priority from the outpout when the JSON format is used.

Alternatives

I created a PR for Falcosidekick allowing to reformat the output field. See PR#729

Additional context

@incertum
Copy link
Contributor

Someone else brought that up a while ago, had started something here https://github.com/falcosecurity/falco/pull/2670/files maybe let's discuss? I can see a bit of a better cleanup even under the hood than that wip PR.

@falcosecurity/falco-maintainers

@Andreagit97
Copy link
Member

the proposed changes make sense! There is no reason to duplicate the timestamp information in the output, or at least nothing I can think about at the moment!

@Andreagit97 Andreagit97 added this to the TBD milestone Dec 21, 2023
@h4l0gen
Copy link

h4l0gen commented Jan 5, 2024

Hey @Issif @incertum @Andreagit97 , i would like to work on this issue. Please guide me how can I start on this. Thanks!!

@h4l0gen
Copy link

h4l0gen commented Jan 10, 2024

/assign

@leogr
Copy link
Member

leogr commented Jan 11, 2024

allow an opt-in to remove the timestamp

👍

@Andreagit97
Copy link
Member

@h4l0gen thank you for tackling this!
As you correctly highlighted here https://kubernetes.slack.com/archives/CMWH3EH32/p1705578359716419?thread_ts=1705577834.276719&cid=CMWH3EH32 the issue is this line

formatter->tostring_withformat(evt, line, gen_event_formatter::OF_NORMAL);

there is no need to change the tostring_withformat function, probably switching the formatter from OF_NORMAL to OF_JSON should be enough

diff --git a/userspace/engine/formats.cpp b/userspace/engine/formats.cpp
index 106a58b7..c6bd07c5 100644
--- a/userspace/engine/formats.cpp
+++ b/userspace/engine/formats.cpp
@@ -43,9 +43,6 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,
 
        formatter = m_falco_engine->create_formatter(source, format);
 
-       // Format the original output string, regardless of output format
-       formatter->tostring_withformat(evt, line, gen_event_formatter::OF_NORMAL);
-
        if(formatter->get_output_format() == gen_event_formatter::OF_JSON)
        {
                std::string json_line;
@@ -89,6 +86,7 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,
                if(m_json_include_output_property)
                {
                        // This is the filled-in output line.
+                       formatter->tostring_withformat(evt, line, gen_event_formatter::OF_JSON);
                        event["output"] = line;
                }
 
@@ -127,6 +125,11 @@ std::string falco_formats::format_event(gen_event *evt, const std::string &rule,
                full_line.append("}");
                line = full_line;
        }
+       else
+       {
+               // Obtain `line` in the normal format
+               formatter->tostring_withformat(evt, line, gen_event_formatter::OF_NORMAL);
+       }
 
        return line.c_str();
 }

BTW just to answer your question on slack:

if i can found the definition of gen_event or any other function which is generating output then i can modify its definition to solve issue but I am not able to locate it. Please assist me with this, and also confirm if this approach would be effective

The function tostring_withformat is defined in libsinsp and more precisely here https://github.com/falcosecurity/libs/blob/6d57bdec7d3f0ed898c24682c7ac417a5e3d6294/userspace/libsinsp/eventformatter.cpp#L250
Just for context: libsinsp is a library that Falco uses for many things (all the host capture + filter checks + other stuff)

@Andreagit97
Copy link
Member

Andreagit97 commented Jan 18, 2024

BTW I propose to switch from

{
    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

to

{
    "output": "(proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)",
    "priority": "WARNING",
    "rule": "PTRACE attached to process",
    "time": "2023-12-20T14:37:27.505989596Z",
    "output_fields": {
        "container.info": "container.info",
        "evt.arg.flags": "evt.arg.flags",
        "evt.type": "evt.type",
        "proc.cmdline": "proc.cmdline",
        "proc.exepath": "proc.exepath",
        "proc.name": "proc.name",
        "proc.pcmdline": "proc.pcmdline",
        "proc.pname": "proc.pname",
        "proc.tty": "proc.tty",
        "user.loginuid": "user.loginuid",
        "user.name": "user.name",
        "user.uid": "user.uid"
    },
    "hostname": "host-7.local",
    "source": "syscalls",
    "tags": [
        "maturity_stable",
        "host",
        "container",
        "process",
        "mitre_privilege_escalation",
        "T1055.008"
    ]
}

@Issif @falcosecurity/falco-maintainers
So removing from output the 3 fields: time, priority, rule -> remove (14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt)

If we agree on this the proposed patch #2985 (comment) should be enough

@Issif
Copy link
Member Author

Issif commented Jan 18, 2024

It's good to me.

FYI, our example is not 100% accurate, the rule is not really part of the output for that specific rule, it's just a part of the configured output:

Detected ptrace PTRACE_ATTACH attempt (proc_pcmdline=%proc.pcmdline evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty %container.info)

See: https://github.com/falcosecurity/rules/blob/main/rules/falco_rules.yaml#L1105,L1117

@Andreagit97
Copy link
Member

Oh you are right rule is PTRACE attached to process while the output is Detected ptrace PTRACE_ATTACH attempt! BTW yes i think that having the rule tag in the json object should be enough even if it could be different from the rule description in the output

@poiana
Copy link
Contributor

poiana commented Apr 17, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@h4l0gen
Copy link

h4l0gen commented Apr 17, 2024

/remove-lifecycle stale

@leogr
Copy link
Member

leogr commented Apr 23, 2024

Any updates on this? Is there any blocker? 🤔

@Andreagit97
Copy link
Member

The blocker is this #3037 (comment), we need to decide what to do:

  • we can change the output of formatter->tostring_withformat(evt, line, sinsp_evt_formatter::OF_NORMAL); directly in sinsp, but not sure this is what we want since Falco is not the unique consumer of this method (@jasondellaluce)
  • We can remove from the output the 3 fields: time, priority, rule (-> remove "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt") manually in Falco after the call to formatter->tostring_withformat(evt, line, sinsp_evt_formatter::OF_NORMAL);

@Issif
Copy link
Member Author

Issif commented Apr 26, 2024

I vote for option 2. This was my initial proposal after all. Correct me if I'm wrong, by the rule is not contained in the output, just the timestamp and the priority.

@Andreagit97
Copy link
Member

Yep we don't have the exact content of the rule tag but we have something very similar:

    "output": "14:37:27.505989596: Warning Detected ptrace PTRACE_ATTACH attempt ...",
    "rule": "PTRACE attached to process",

So Detected ptrace PTRACE_ATTACH attempt vs PTRACE attached to process.
We can also keep Detected ptrace PTRACE_ATTACH attempt in the output even if it's more or less a duplicate of the rule tag

@Issif
Copy link
Member Author

Issif commented Apr 26, 2024

This is a particular case, the output is set to display that message that's all. See https://github.com/falcosecurity/rules/blob/e65f2518b06b6e439b49451a0738115f74d224e0/rules/falco_rules.yaml#L1099,L1111

@h4l0gen
Copy link

h4l0gen commented Apr 26, 2024

Hi @Andreagit97, @Issif, and @leogr. Since we've encountered such a significant change in Falco, I'll wait for your decision on this PR before making any further changes. While I don't have a strong opinion on this at the moment, though as @Issif mentioned, the second option looks good to me for now

@leogr
Copy link
Member

leogr commented Apr 30, 2024

I still have to double-check, but I believe the 2nd option is ok.

cc @falcosecurity/falco-maintainers wdyt?

@Issif
Copy link
Member Author

Issif commented May 28, 2024

Is this feature will be integrated after the 0.38?

@FedeDP
Copy link
Contributor

FedeDP commented May 28, 2024

Nope, it is in TBD milestone since we could not reach an agreement.

@LucaGuerra
Copy link
Contributor

Hey folks, I would like to see this in 0.39.0 since it's being discussed these days. I'd like to share my opinion on the matter about a potential solution:

From what I can see the main issue is that modifying the output value would introduce breaking changes (in Falco and potentially in libsinsp as well), and I think it's a very valid point. So, let's not do that. Since we have priority, rule and time instead of removing data let's add a new field (can call it message if we don't have another idea) that contains the pure rule output without timestamps/priority/rule name. The main advantages are that this won't introduce breaking changes and it is intuitive to use when parsing the log with an aggregator; disadvantage is that the json payload will be a bit bigger because it duplicates the output but I think advantages outweigh disadvantages.

WDYT?

@Issif
Copy link
Member Author

Issif commented Jul 22, 2024

Hey folks, I would like to see this in 0.39.0 since it's being discussed these days. I'd like to share my opinion on the matter about a potential solution:

From what I can see the main issue is that modifying the output value would introduce breaking changes (in Falco and potentially in libsinsp as well), and I think it's a very valid point. So, let's not do that. Since we have priority, rule and time instead of removing data let's add a new field (can call it message if we don't have another idea) that contains the pure rule output without timestamps/priority/rule name. The main advantages are that this won't introduce breaking changes and it is intuitive to use when parsing the log with an aggregator; disadvantage is that the json payload will be a bit bigger because it duplicates the output but I think advantages outweigh disadvantages.

WDYT?

Another issue with this solution is all tools ingesting the Falco event will have to adapt (falcosidekick, talon, etc, ...) to get the benefit

@LucaGuerra
Copy link
Contributor

Another issue with this solution is all tools ingesting the Falco event will have to adapt (falcosidekick, talon, etc, ...) to get the benefit

Makes sense. On the flip side, if we need to display a message with timestamp and severity on Falcosidekick or Talon I guess we'd still need to adapt them because depending on the underlying Falco version (e.g. < 0.39.0 vs >= 0.39.0) they will need to pick different json fields so I believe it wouldn't be seamless either.

@leogr
Copy link
Member

leogr commented Jul 22, 2024

I like the idea of adding another field (e.g., message). But at this point, we should also introduce the ability to remove output from the JSON output. I think it makes totally sense to target this for 0.39.

Ultimately, all these should be opt-in settings for the JSON output. We might want to allow even more customization (i.e., choose which fields should be included and which should not).

@LucaGuerra
Copy link
Contributor

I like the idea of adding another field (e.g., message). But at this point, we should also introduce the ability to remove output from the JSON output. I think it makes totally sense to target this for 0.39.

That's exactly how it will work for Falco 0.39.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment