Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Will Nomad Consul Connect Envoy proxy work with containerd driver? #59

Open
Oloremo opened this issue Jan 11, 2021 · 23 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Oloremo
Copy link

Oloremo commented Jan 11, 2021

Hello,

By default, Nomad launches the Envoy proxy as a docker container: https://www.nomadproject.io/docs/job-specification/sidecar_task#default-envoy-configuration

I wonder if it could successfully run with a containerd driver.

@shishir-a412ed shishir-a412ed added the question Further information is requested label Jan 11, 2021
@shishir-a412ed
Copy link
Contributor

@Oloremo Have you tried running it?

You should be able to change the driver from docker to containerd-driver and set the image and args.

@Oloremo
Copy link
Author

Oloremo commented Jan 12, 2021

Not yet, we're in the PoC stages with Nomad currently and I wonder if you folks tried it already and willing to share your experience.

As you can see I wonder if we could use Nomad without the docker engine installed at all.

@shishir-a412ed
Copy link
Contributor

shishir-a412ed commented Jan 12, 2021

@Oloremo

Not yet, we're in the PoC stages with Nomad currently and I wonder if you folks tried it already and willing to share your experience.

No, We haven't tried it. Let me know how it goes when you try it, and if you run into any issues.

As you can see I wonder if we could use Nomad without the docker engine installed at all.

This is exactly the use-case for which nomad-driver-containerd was designed. To be able to launch jobs in nomad (directly using containerd) without docker-engine installed at all. If you want to try it in a local environment, you can just clone this repo, and do vagrant up from the project $root directory. That will spin up an Ubuntu VM for you with just nomad and containerd installed on it (No docker engine on the VM) and you can try out by launching some example jobs.

@Oloremo
Copy link
Author

Oloremo commented Jan 12, 2021

Ok, we'll do some Consul Connect related experiments in Q1 2021 and I'll report back if it works.

@Oloremo
Copy link
Author

Oloremo commented Mar 4, 2021

@shishir-a412ed

So I tried to run a countdash Nomad example for Service Mesh using a containerd-driver 0.7.0 and containerd runtime 1.4.3.
It has a weird issue.

It's started and envoy proxy side-car are starting and failing with:

su-exec: setgroups: Operation not permitted

Job spec:

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
        sidecar_task {
          driver = "containerd-driver"
          config {
            #image = "${meta.connect.sidecar_image}"
            image = "docker.io/envoyproxy/envoy:v1.16.0"

            command = "/docker-entrypoint.sh"
            args = [
              "-c",
              "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json",
              "-l",
              "${meta.connect.log_level}",
              "--concurrency",
              "${meta.connect.proxy_concurrency}",
              "--disable-hot-restart"
            ]
          }

          logs {
            max_files     = 2
            max_file_size = 2 # MB
          }

          resources {
            cpu    = 250 # MHz
            memory = 128 # MB
          }

          shutdown_delay = "5s"
        }
      }
    }

    task "web" {
      driver = "containerd-driver"
      config {
        image = "docker.io/hashicorpnomad/counter-api:v3"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
        sidecar_task {
          driver = "containerd-driver"
          config {
            #image = "${meta.connect.sidecar_image}"
            image = "docker.io/envoyproxy/envoy:v1.16.0"

            command = "/docker-entrypoint.sh"
            args = [
              "-c",
              "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json",
              "-l",
              "${meta.connect.log_level}",
              "--concurrency",
              "${meta.connect.proxy_concurrency}",
              "--disable-hot-restart"
            ]
          }

          logs {
            max_files     = 2
            max_file_size = 2 # MB
          }

          resources {
            cpu    = 250 # MHz
            memory = 128 # MB
          }

          shutdown_delay = "5s"
        }
      }
    }

    task "dashboard" {
      driver = "containerd-driver"
      config {
        image = "docker.io/hashicorpnomad/counter-dashboard:v3"
      }

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }
    }
  }
}

@Oloremo
Copy link
Author

Oloremo commented Mar 4, 2021

Tried to add a

privileged = true

to a config {} stanza with the same results.

@Oloremo
Copy link
Author

Oloremo commented Mar 4, 2021

Ok so it's from envoy container entrypoint script: https://github.com/envoyproxy/envoy/blob/v1.16.0/ci/docker-entrypoint.sh

Not sure why it's unable to execute it with containerd driver.

@Oloremo
Copy link
Author

Oloremo commented Mar 23, 2021

@shishir-a412ed Sorry for the ping, I wasn't sure if you saw the test above

@shishir-a412ed
Copy link
Contributor

@Oloremo Sorry, I should have responded earlier 🙂 . We are still working on the initial rollout, and currently integration with consul connect is not super high on the priority list. I will try to find some time this week and see if I can reproduce/debug this.

@Oloremo
Copy link
Author

Oloremo commented Mar 23, 2021

Thanks for the reply!

We'll wait. :) It's not super critical for us right now but we do want to use Service Mesh in the near future.

@Oloremo
Copy link
Author

Oloremo commented Apr 21, 2021

@shishir-a412ed Sorry for the ping, just wanted to check if you had time to check on this.

@shishir-a412ed
Copy link
Contributor

@Oloremo Sorry I have not been able to get to this. Did you try to debug more?
You can try to add some print statements in the bash script docker_entrypoint.sh.

e.g.

  1. You can take the original bash script docker_entrypoint.sh.
  2. Create a custom version of that, with your added debug statements.
  3. Build a new envoy image with your custom script as the entrypoint, and use that instead to debug more.

I ll try to see if I can find sometime to debug more, but I have been juggling with few other things.
Internally, containerd-driver has been bumped down in terms of priority, because we have a few other high priority items we need to take care of right now.

@shishir-a412ed
Copy link
Contributor

@Oloremo We are doing some internal work around consul service mesh (not using containerd-driver) but it's helpful for me to pick up the background/context. I ll see if I can find sometime over the weekend to look into this. No promises :) but will try to get to this sooner than I was planning.

@Oloremo
Copy link
Author

Oloremo commented May 17, 2021

Appreciate that! I wasn't able to go back to that issue as well. Pls ping me if you'll need some additional testing.

@shishir-a412ed
Copy link
Contributor

@Oloremo Also, looking at your job spec it looks like you are using both sidecar_service and sidecar_task which might be incorrect.

Looking at the official docs: https://www.nomadproject.io/docs/job-specification/sidecar_task#default-envoy-configuration

Nomad automatically launches and manages an Envoy task for use as a proxy sidecar or connect gateway, when sidecar_service or gateway are configured.

The default Envoy task is equivalent to:

When you specify:

connect {
    sidecar_service {}
}

Nomad will automatically launch the Envoy sidecar proxy for you. The sidecar_task definition is just how it looks like under the hood. In your example, the api service (upstream) will only need:

connect {
    sidecar_service{}
}

and the dashboard service (downstream) will need to define count-api as the upstream.

connect {
   sidecar_service {
        proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
        }
   }
}

I ll take a deeper look over the weekend.

@Oloremo
Copy link
Author

Oloremo commented May 17, 2021

Nomad will automatically launch the Envoy sidecar proxy for you.

With docker driver.
I wanted to launch it with containerd one since it's not yet allowed to change driver without full re-definition - I did the full re-definition just to test it.

@shishir-a412ed
Copy link
Contributor

@Oloremo aah I see! Makes sense 🙂

@shishir-a412ed shishir-a412ed added the help wanted Extra attention is needed label Jul 15, 2021
@Oloremo
Copy link
Author

Oloremo commented Mar 17, 2022

@shishir-a412ed Hey just wanted to say that we're still interested in that. :-)

The only thing that stop us from moving to containerd plugin is Consul Connect

@shishir-a412ed
Copy link
Contributor

@Oloremo Let me see if I can find sometime to progress this.

@mister2d
Copy link

Hi there. I happened to stumble upon this ticket. Setting the driver for sidecar_task is already supported in Nomad. Here is an example:

job "example" {
    datacenters = ["dc1"]
    group "api" {
        network {
            mode = "bridge"
        }
        service {
            name = "example-api"
            port = "9001"
            connect {
                sidecar_service = {}
                sidecar_task {
                    driver = "containerd-driver"
                }
            }
        }
        task "web" {
            driver = "containerd-driver"
            config {
                image = "<image>"
            }
        }
    }
}

https://www.nomadproject.io/docs/job-specification/sidecar_task#driver

@Oloremo
Copy link
Author

Oloremo commented Apr 13, 2022

@mister2d interesting!

I'm trying to test it with:

Jobspec
job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
        sidecar_task {
          driver = "containerd-driver"
        }
      }
    }

    task "web" {
      driver = "containerd-driver"

      config {
        image = "hashicorpnomad/counter-api:v3"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
        sidecar_task {
          driver = "containerd-driver"
        }
      }
    }

    task "dashboard" {
      driver = "containerd-driver"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v3"
      }
    }
  }
}

But I'm getting a:

Recent Events:
Time                  Type               Description
2022-04-13T12:25:10Z  Killing            Sent interrupt. Waiting 5s before force killing
2022-04-13T12:25:10Z  Not Restarting     Error was unrecoverable
2022-04-13T12:25:10Z  Failed Validation  2 errors occurred:
	* failed to parse config:
	* Root value must be object: The root value in a JSON-based configuration must be either a JSON object or a JSON array of objects.

@MagicRB
Copy link

MagicRB commented May 11, 2022

I can confirm it works for me, I had docker job running consul connect before and the only thing i changed was the driver to containerd. Connect still works. I don't really know why but I'm happy. If you want to dig around my config then https://gitea.redalder.org/RedAlder/systems (I'm using a patched version of this repo that adds support for Nix flakes, the changes I made shouldn't affect consul connect though)

@Oloremo
Copy link
Author

Oloremo commented May 11, 2022

@MagicRB it's a big repo and I clicked a few things and saw Docker as a driver.

Anyway it's better to test and confirm by using a default countdash example as I did above. It would remove all other configurations and make it reproducible for anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants