Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch envoy through capability wrapper #2050

Open
wants to merge 10,000 commits into
base: master
Choose a base branch
from

Conversation

swalberg
Copy link
Contributor

@swalberg swalberg commented Nov 21, 2019

(if capabilities are present)

When capabilities are explicitly dropped

sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-7cb95448b8-mjxhp | grep cap
2019-11-21 17:59:05 AMBASSADOR INFO cap_net_bind_service is not supported, launching Envoy directly

When capabilities are present

sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-bdbc4675d-hmq82 | grep cap
2019-11-21 18:00:08 AMBASSADOR INFO cap_net_bind_service is supported, launching Envoy through a wrapper
2019/11/21 18:00:08 Starting Envoy with CAP_NET_BIND_SERVICE capability
2019/11/21 18:00:08 Succeeded in setting capabilities

Additionally, if the capabilities are present, you can bind Envoy to a
low port, e.g. 80 or 443:

bash-5.0$ netstat -anp | grep 0.0.0.0:80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      -
bash-5.0$ ps -ef | grep [w]rapper
  167 8888      0:06 {envoy} wrapper -c /ambassador/bootstrap-ads.json

Related Issues

Fixes #1841

Testing

Manual testing and added CI tests.

Todos

  • Tests
  • Documentation

Other

@swalberg
Copy link
Contributor Author

Don't merge this just yet... Going to do a bit more testing. The tests don't run locally and I want to see if they work in CI.

@swalberg swalberg force-pushed the use_capabilities_wrapper branch from 287fac7 to 8d33416 Compare November 21, 2019 21:09
@swalberg swalberg force-pushed the use_capabilities_wrapper branch from 8d33416 to f9d061c Compare December 5, 2019 16:32
@swalberg swalberg force-pushed the use_capabilities_wrapper branch from f9d061c to 0d43857 Compare December 16, 2019 13:33
@swalberg
Copy link
Contributor Author

This works fine locally, I'll be darned if I can get tests to pass (on this branch or master) on my Mac or on a new k3s cluster though.

@swalberg
Copy link
Contributor Author

With the latest rebase it's now only my tests that are failing. I'll try it again locally.

@swalberg swalberg force-pushed the use_capabilities_wrapper branch from c442dfe to 15c6d4e Compare January 22, 2020 18:47
@kflynn
Copy link
Member

kflynn commented Mar 8, 2020

Hey, now that we've gotten past the 1.0 transition I'd like to get this landed. Need help from us to get tests sorted out?

@swalberg
Copy link
Contributor Author

swalberg commented Mar 9, 2020

Hey, yes, please. Give me some time to get caught up and rebased so I know what the current problem is. Thanks!

LukeShu added a commit that referenced this pull request Apr 27, 2020
setcap(8) doesn't work on symlinks.  This was a regression in 1.3.0.
Fortunately, because #2050
hasn't landed yet, that didn't actually affect anything.
@LukeShu LukeShu mentioned this pull request Apr 27, 2020
6 tasks
@stale
Copy link

stale bot commented May 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label May 8, 2020
@swalberg
Copy link
Contributor Author

swalberg commented May 8, 2020

A million apologies. Let me get this rebased and over the line.

@stale stale bot removed the stale Issue is stale and will be closed label May 8, 2020
@swalberg swalberg force-pushed the use_capabilities_wrapper branch from 86f52b8 to dc6922a Compare May 8, 2020 23:15
@swalberg
Copy link
Contributor Author

swalberg commented May 9, 2020

Is there still an issue running tests on a mac? I tried make test after setting the registry and k8s config and get failure in smoke tests:

Teleproxy exited with 1 error(s):
  K8S: Get https://kubernetes.docker.internal:6443/api?timeout=32s: dial tcp: lookup kubernetes.docker.internal on 127.0.0.11:53: no such host
2020/05/09 00:44:58 poll get: Get http://httptarget: dial tcp: lookup httptarget on 127.0.0.11:53: no such host
--- FAIL: TestSmoke (1.01s)
    teleproxy_test.go:44: exit status 1
    teleproxy_test.go:106: giving up because we have already failed
    teleproxy_test.go:52: os: process already finished

@kflynn
Copy link
Member

kflynn commented May 19, 2020

@swalberg I run them on a Mac a lot -- what kind of cluster were you trying to use?

@swalberg
Copy link
Contributor Author

I'm using docker-desktop. Happy to change it to whatever works.

@kflynn
Copy link
Member

kflynn commented Jun 2, 2020

I don't think the whole test suite can fit into Docker on a Mac. 🙁 Internally, we have an old tool called Kubernaut that provisions clusters in EC2, but really any cluster should work, it's just that the tests spin up a lot of Ambassadors, so a beefy cluster is a must.

Alternately, if you merge master yet again, CI should be able to run the tests for you again. You've been running this at your site, yes?

@swalberg
Copy link
Contributor Author

swalberg commented Jun 4, 2020

Oh, I can find a bigger test cluster then. I had it in our cluster for testing but since we're so far behind on Ambassador I can't use current versions. This'll change soon 🤞 .

@swalberg
Copy link
Contributor Author

Just a note that we've upgraded things here enough that I can get back to testing.

@swalberg
Copy link
Contributor Author

I'm getting crashes on the capabilities wrapper now. I see there was a minor refactoring in 58c1267 to extract it which seems related. I'm having trouble explaining what I'm seeing, but it looks like the return address of capset is getting overwritten, resulting in a segfault. BUT, if I throw a fmt.Println in that method (anywhere) it works.

I can't explain it yet, but here's a gist to show I'm not nuts: https://gist.github.com/swalberg/c4b50e395c239920863f898551a12c24

@swalberg swalberg force-pushed the use_capabilities_wrapper branch from 6af5080 to 0f9801b Compare June 29, 2020 12:55
@swalberg
Copy link
Contributor Author

Bah is there a way to re-run this? Looks like there was a connection reset in the middle of it when it tries to pull down the gcloud tools.

@swalberg swalberg force-pushed the use_capabilities_wrapper branch 2 times, most recently from a20b8a6 to 778c8bd Compare July 6, 2020 12:47
@swalberg swalberg force-pushed the use_capabilities_wrapper branch from 778c8bd to dd9d184 Compare July 6, 2020 13:22
@swalberg
Copy link
Contributor Author

swalberg commented Jul 6, 2020

Is there a way to run just one test locally? I tried make pytest TEST_NAME=CanBindToLowPort but it seems to run everything.

@kflynn
Copy link
Member

kflynn commented Jul 6, 2020

make push pytest-only PYTEST_ARGS="-k CanBindToLowPort" should do it.

LukeShu and others added 22 commits November 30, 2020 12:29
…et/dev/master/pin-pip

ambassador/builder: pin pip to 20.2.4
fix doc typo

add browser test

update changelog

add some unit tests

add comments to tests
…ite-preview

Always run website-preview, even when it'll be a no-op.
(cherry picked from commit b602adf)
…adf-master

Update gRPC example for insecure host
…/tutorial-master

[master] Revise tutorial
(if capabilities are present)

When capabilities are explicitly dropped
```
sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-7cb95448b8-mjxhp | grep cap
2019-11-21 17:59:05 AMBASSADOR INFO cap_net_bind_service is not supported, launching Envoy directly
```

When capabilities are present
```
sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-bdbc4675d-hmq82 | grep cap
2019-11-21 18:00:08 AMBASSADOR INFO cap_net_bind_service is supported, launching Envoy through a wrapper
2019/11/21 18:00:08 Starting Envoy with CAP_NET_BIND_SERVICE capability
2019/11/21 18:00:08 Succeeded in setting capabilities
```

Additionally, if the capabilities are present, you can bind Envoy to a
low port, e.g. 80 or 443:

```
bash-5.0$ netstat -anp | grep 0.0.0.0:80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      -
bash-5.0$ ps -ef | grep [w]rapper
  167 8888      0:06 {envoy} wrapper -c /ambassador/bootstrap-ads.json
```

Add missing import

Try commenting out tests to see if it's me
(if capabilities are present)

When capabilities are explicitly dropped
```
sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-7cb95448b8-mjxhp | grep cap
2019-11-21 17:59:05 AMBASSADOR INFO cap_net_bind_service is not supported, launching Envoy directly
```

When capabilities are present
```
sean~/play/ambassador (use_capabilities_wrapper *%)$ kubectl logs ambassador-bdbc4675d-hmq82 | grep cap
2019-11-21 18:00:08 AMBASSADOR INFO cap_net_bind_service is supported, launching Envoy through a wrapper
2019/11/21 18:00:08 Starting Envoy with CAP_NET_BIND_SERVICE capability
2019/11/21 18:00:08 Succeeded in setting capabilities
```

Additionally, if the capabilities are present, you can bind Envoy to a
low port, e.g. 80 or 443:

```
bash-5.0$ netstat -anp | grep 0.0.0.0:80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      -
bash-5.0$ ps -ef | grep [w]rapper
  167 8888      0:06 {envoy} wrapper -c /ambassador/bootstrap-ads.json
```

Add missing import

Try commenting out tests to see if it's me
Yes, it segfaults unless that Println is there. Something about that
data.Inheritable smashing the retval on the stack.
The Ambassador container needs both the NET_BIND_SERVICE capability and
also securityContext.allowPrivilegeEscalation to be enabled, so updating
the init script to check for both. It checks to make sure it's not
disabled (love these double negatives) so that if we're on a non docker
runtime it won't get tripped up.
@swalberg swalberg force-pushed the use_capabilities_wrapper branch from a1a404f to b2d3e74 Compare December 2, 2020 15:55
Old place didn't seem to get executed first
@stale
Copy link

stale bot commented Feb 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Feb 2, 2021
@swalberg
Copy link
Contributor Author

Funny enough, PodSecurityPolicies are being removed from K8s, which is the reason I needed this wrapper.

Seems like there are a lot of changes since I looked at this. Is this solution still viable?

@LukeShu
Copy link
Contributor

LukeShu commented Jan 27, 2022

FWIW, I still want to get this in.

@akevdmeer
Copy link

This feature would still be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issue is stale and will be closed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: cannot run as non-root user with PSP in 0.78