Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidekick Crashes After Triggering the Same Rule Multiple Times in a Short Window with Falco 0.38.2 #1011

Open
cme-incom opened this issue Oct 1, 2024 · 3 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@cme-incom
Copy link

cme-incom commented Oct 1, 2024

Describe the bug

After executing Aqua Security’s kube-bench, the Sidekick service fails and crashes. This issue occurs when the same Falco rule is triggered more than 15 times within a very short time window. Instead of handling the load gracefully, the service crashes.

How to reproduce it

Run Aqua Security’s kube-bench to perform security checks.
Ensure that a specific Falco rule is triggered more than 15 times in a very short window.

Expected behaviour

The Sidekick service should handle multiple rule triggers without crashing. It should remain stable and not be terminated

Screenshots
No screenshots available.

Environment

  • Falco version:
    Falco version: 0.38.2

  • OS:

Talos 1.6.5

  • Kernel:

6.6.32-talos

  • Installation method:

Helm
Additional context

The rule triggered:

   # Note that runsv is both in protected_shell_spawner and the
   # exclusions by pname. This means that runsv can itself spawn shells
   # (the ./run and ./finish scripts), but the processes runsv can not
   # spawn shells.
   #
   # Also, trivy uses this for vulnerability scanning and kyverno uses it to clean ephemeral reports
   # And we exclude the incom user
   - rule: Incom Run shell untrusted
     desc: > 
       An attempt to spawn a shell below a non-shell application. The non-shell applications that are monitored are 
       defined in the protected_shell_spawner macro, with protected_shell_spawning_binaries being the list you can 
       easily customize. For Java parent processes, please note that Java often has a custom process name. Therefore, 
       rely more on proc.exe to define Java applications. This rule can be noisier, as you can see in the exhaustive 
       existing tuning. However, given it is very behavior-driven and broad, it is universally relevant to catch 
       general Remote Code Execution (RCE). Allocate time to tune this rule for your use cases and reduce noise. 
       Tuning suggestions include looking at the duration of the parent process (proc.ppid.duration) to define your 
       long-running app processes. Checking for newer fields such as proc.vpgid.name and proc.vpgid.exe instead of the 
       direct parent process being a non-shell application could make the rule more robust.
     condition: >
       spawned_process
       and shell_procs
       and proc.pname exists
       and not (k8s.ns.name = trivy)
       and not (k8s.ns.name = kyverno)
       and not serf_script
       and not check_process_status
       and not (container.image.repository in (incom_network_images))
       and not (user.name = incom)
       and not (proc.pexe = /bin/containerd-shim-runc-v2)
     output: Shell spawned by untrusted binary (parent_exe=%proc.pexe parent_exepath=%proc.pexepath pcmdline=%proc.pcmdline gparent=%proc.aname[2] ggparent=%proc.aname[3] aname[4]=%proc.aname[4] aname[5]=%proc.aname[5] aname[6]=%proc.aname[6] aname[7]=%proc.aname[7] evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)
     priority: ERROR
     tags: [maturity_stable, host, container, process, shell, mitre_execution, T1059.004]

The error msg from the failed pod:

2024/09/23 17:48:45 [INFO]  : Slack - POST OK (200)
2024/09/23 17:48:45 [INFO]  : Pagerduty - Create Incident OK
2024/09/28 09:25:13 [INFO]  : Slack - POST OK (200)
fatal error: concurrent map iteration and map write
goroutine 502012 [running]:
github.com/falcosecurity/falcosidekick/outputs.getSortedStringKeys(0xc00089e1e0?)
   /home/runner/work/falcosidekick/falcosidekick/outputs/utils.go:12 +0x6b
github.com/falcosecurity/falcosidekick/outputs.newSlackPayload({{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, 0x0}, ...}, ...)
   /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:75 +0x62c
github.com/falcosecurity/falcosidekick/outputs.(*Client).SlackPost(0xc0008e1d00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
   /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:152 +0x78
created by main.forwardEvent in goroutine 502010
   /home/runner/work/falcosidekick/falcosidekick/handlers.go:235 +0x148
goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fce1861fed0, 0x72)
   $GOROOT/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0x3?, 0x1?, 0x0)
   $GOROOT/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
   $GOROOT/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0009dd100)
   $GOROOT/src/internal/poll/fd_unix.go:611 +0x2ac
net.(*netFD).accept(0xc0009dd100)
   $GOROOT/src/net/fd_unix.go:172 +0x29
net.(*TCPListener).accept(0xc0009c95e0)
   $GOROOT/src/net/tcpsock_posix.go:159 +0x1e
net.(*TCPListener).Accept(0xc0009c95e0)
   $GOROOT/src/net/tcpsock.go:327 +0x30
net/http.(*Server).Serve(0xc000568690, {0x3079fb0, 0xc0009c95e0})
   $GOROOT/src/net/http/server.go:3255 +0x33e
net/http.(*Server).ListenAndServe(0xc000568690)
   $GOROOT/src/net/http/server.go:3184 +0x71
main.main()
   /home/runner/work/falcosidekick/falcosidekick/main.go:934 +0x1287
goroutine 13 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000143680)
   pkg/mod/[email protected]/stats/view/worker.go:292 +0x9f
created by go.opencensus.io/stats/view.init.0 in goroutine 1
   pkg/mod/[email protected]/stats/view/worker.go:34 +0x8d
goroutine 502011 [runnable]:
net.(*OpError).Timeout(0xc0000cf400?)
   $GOROOT/src/net/net.go:507 +0x133
net/http.(*connReader).backgroundRead(0xc00067d290)
   $GOROOT/src/net/http/server.go:708 +0xa9
created by net/http.(*connReader).startBackgroundRead in goroutine 502010
   $GOROOT/src/net/http/server.go:677 +0xba
goroutine 502013 [runnable]:
bytes.(*Buffer).WriteByte(0xc000ce8980?, 0x7b?)
   $GOROOT/src/bytes/buffer.go:285 +0x9c
encoding/json.mapEncoder.encode({0xc000b16538?}, 0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x2426d60?}, {0x14?, 0x0?})
   $GOROOT/src/encoding/json/encode.go:737 +0x215
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x7c9779?}, {0x40?, 0xde?})
   $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.interfaceEncoder(0xc000ce8980, {0x23dde40?, 0xc0008c66f0?, 0x6f8345?}, {0x60?, 0xa6?})
   $GOROOT/src/encoding/json/encode.go:658 +0xba
encoding/json.structEncoder.encode({{{0xc00033e488, 0x8, 0x8}, 0xc000652a80, 0xc000652ab0}}, 0xc000ce8980, {0x273f520?, 0xc0008c6680?, 0xc0000f8f20?}, {0x0, ...})
   $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.ptrEncoder.encode({0xc0000f8f20?}, 0xc000ce8980, {0x2275700?, 0xc0000f8f20?, 0xc0000f8f20?}, {0xa?, 0x0?})
   $GOROOT/src/encoding/json/encode.go:876 +0x23c
encoding/json.structEncoder.encode({{{0xc00033e008, 0x8, 0x8}, 0xc000652b40, 0xc000652ba0}}, 0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0xc000b16950?}, {0x0, ...})
   $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0x4?}, {0x60?, 0x24?})
   $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.(*encodeState).marshal(0x411ce5?, {0x273f640?, 0xc0000f8ea0?}, {0xc8?, 0xa5?})
   $GOROOT/src/encoding/json/encode.go:297 +0xc5
encoding/json.Marshal({0x273f640, 0xc0000f8ea0})
   $GOROOT/src/encoding/json/encode.go:163 +0xd0
github.com/PagerDuty/go-pagerduty.ManageEventWithContext({0x3089ca0, 0x46aa1a0}, {{0xc000064015, 0x20}, {0x289802d, 0x7}, {0x0, 0x0}, {0x0, 0x0, ...}, ...})
   pkg/mod/github.com/!pager!duty/[email protected]/event_v2.go:175 +0x74
github.com/falcosecurity/falcosidekick/outputs.(*Client).PagerdutyPost(0xc0008e1e00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
   /home/runner/work/falcosidekick/falcosidekick/outputs/pagerduty.go:34 +0x1ac
created by main.forwardEvent in goroutine 502010
   /home/runner/work/falcosidekick/falcosidekick/handlers.go:375 +0x2d28
goroutine 502010 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc000ce8690, 0x0)
   $GOROOT/src/runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0xc00067d290?)
   $GOROOT/src/sync/cond.go:70 +0x85
net/http.(*connReader).abortPendingRead(0xc00067d290)
   $GOROOT/src/net/http/server.go:729 +0xa6
net/http.(*response).finishRequest(0xc000578b60)
   $GOROOT/src/net/http/server.go:1671 +0x87
net/http.(*conn).serve(0xc000897560, {0x3089e60, 0xc00066de90})
   $GOROOT/src/net/http/server.go:2045 +0x62b
created by net/http.(*Server).Serve in goroutine 1
   $GOROOT/src/net/http/server.go:3285 +0x4b4
@cme-incom cme-incom added the kind/bug Something isn't working label Oct 1, 2024
@cme-incom cme-incom changed the title Sidekick Crashes After Running Aqua Security’s Kube-Bench with Falco 0.38.2 Sidekick Crashes After Running Kube-Bench with Falco 0.38.2 Oct 1, 2024
@cme-incom cme-incom changed the title Sidekick Crashes After Running Kube-Bench with Falco 0.38.2 Sidekick Crashes After Triggering the Same Rule Multiple Times in a Short Window with Falco 0.38.2 Oct 1, 2024
@Issif Issif self-assigned this Oct 7, 2024
@Issif Issif added this to the 2.30 milestone Oct 7, 2024
@Issif
Copy link
Member

Issif commented Oct 7, 2024

This is another issue created about this "bug", wasn't able to reproduce til now falcosecurity/charts#746

@Issif
Copy link
Member

Issif commented Oct 7, 2024

Which version of Falcosidekick are you running? The 2.29.0 or the latest (== master) ?

@Issif
Copy link
Member

Issif commented Nov 22, 2024

Are you still facing the issue?

@Issif Issif modified the milestones: 2.30, 2.x Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: To do
Development

No branches or pull requests

2 participants