Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container Network DNS Resolution Error #54

Closed
slominskir opened this issue Jul 9, 2024 · 3 comments
Closed

Container Network DNS Resolution Error #54

slominskir opened this issue Jul 9, 2024 · 3 comments

Comments

@slominskir
Copy link
Member

slominskir commented Jul 9, 2024

With the latest version of JAWS (v4.13.0) a container in the test environment can no longer correctly resolve another container IP in the same network by name. Continues to work fine in production with older version (v4.12.0). Rolling test environment back to previous version fixes the issue.

Specifically, the issue is that the registrations2epics container cannot use curl to test readiness of the container named registry as curl finds a host on the Internet instead. Other containers in the test network can resolve the local registry host IP fine. A Compose down/up cycle does not resolve. Host reboot does not resolve. No obviously problematic changes were made to the registrations2epics container, though it was re-built so vague Dockerfile tag may result in updated base image.

The badly behaving curl command looks like:

/ $ curl -vvvv http://registry:8081/schemas/types
* Host registry:8081 was resolved.
* IPv6: 2600:9000:a70e:7345:dad6:536:9027:3018, 2600:9000:a407:fc88:1d86:6f29:fbe0:8100
* IPv4: 99.83.186.75, 75.2.49.220
*   Trying 99.83.186.75:8081...
*   Trying [2600:9000:a70e:7345:dad6:536:9027:3018]:8081...
* Immediate connect fail for 2600:9000:a70e:7345:dad6:536:9027:3018: Network unreachable
*   Trying [2600:9000:a407:fc88:1d86:6f29:fbe0:8100]:8081...
* Immediate connect fail for 2600:9000:a407:fc88:1d86:6f29:fbe0:8100: Network unreachable
^C
/ $ nslookup 99.83.186.75
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
75.186.83.99.in-addr.arpa       name = ac911f1a260dd6d18.awsglobalaccelerator.com

vs good curl command outcome:

/ $ curl -vvvv http://registry:8081/schemas/types
* processing: http://registry:8081/schemas/types
*   Trying 172.19.0.3:8081...
* Connected to registry (172.19.0.3) port 8081
> GET /schemas/types HTTP/1.1
> Host: registry:8081
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Tue, 09 Jul 2024 15:24:02 GMT
< X-Request-ID: 51dbe1a5-8116-4297-9bc4-98673fb8bef0
< Content-Type: application/vnd.schemaregistry.v1+json
< Vary: Accept-Encoding, User-Agent
< Content-Length: 26
<
* Connection #0 to host registry left intact
["JSON","PROTOBUF","AVRO"]

/ $ nslookup 172.19.0.3
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
3.0.19.172.in-addr.arpa name = registry.jaws_default

Note: You can also "fix" the problem by using the fully qualified name registry.jaws_default.

Note: We rely on the default network that docker compose creates (which is not to be confused with the default bridge network that docker run uses). The default compose network is actually a custom network, but created automatically on compose up, but is also of type bridge. The private IP address space chosen appears to change randomly on each up/down cycle. The configuration looks like:

[root@jawstest ~]# docker network inspect jaws_default
[
    {
        "Name": "jaws_default",
        "Id": "1cdd350be2fa7d40997856b21b866f1ca6df5c490ea19f55717c4d60e9b4fc13",
        "Created": "2024-07-09T09:38:58.351226735-04:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "192.168.64.0/20",
                    "Gateway": "192.168.64.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "32858f00149e8619de017659f467b22326db0c9d8de6b6d57e9dcb3e912622bf": {
                "Name": "epics2kafka",
                "EndpointID": "e9c58d478bd1117b3cecc4f614bff8b01ad5cbd2c8b59b887ec9be3c1f1f010a",
                "MacAddress": "02:42:c0:a8:40:06",
                "IPv4Address": "192.168.64.6/20",
                "IPv6Address": ""
            },
            "4ccbd79a40c0a648f136c3bb4befe7794160dc27939a235976523ad2960b5040": {
                "Name": "web",
                "EndpointID": "526dd5e57993794d12e5589dbe206fe9d66c9879c2346e204d0a9a88fc968c14",
                "MacAddress": "02:42:c0:a8:40:07",
                "IPv4Address": "192.168.64.7/20",
                "IPv6Address": ""
            },
            "936011621b3a99076a079fa9ab142497f8706d1127f154e3c7b622f3adefad60": {
                "Name": "cli",
                "EndpointID": "4869cd731d2cfc12d408230084dcb8a809b5f0c64934c392abf25bce5bfbd6b6",
                "MacAddress": "02:42:c0:a8:40:04",
                "IPv4Address": "192.168.64.4/20",
                "IPv6Address": ""
            },
            "9631a572ace95ac3f42cec82371929d303dce1117106e3a968f89b9f696e98d6": {
                "Name": "registrations2epics",
                "EndpointID": "839c8d5d2bc72fdd12d8bfe8c4cc67244a3ac1280fddba8a1258c34a292ab8a9",
                "MacAddress": "02:42:c0:a8:40:08",
                "IPv4Address": "192.168.64.8/20",
                "IPv6Address": ""
            },
            "b2234e1f551ab59a71f302ae97606c635dd6dbc363d5999834f1d41d300008c9": {
                "Name": "kafka",
                "EndpointID": "a7079ad772b6d3093fc15984f70ea1a4732ebd8d63423cac0adef80dafaab16c",
                "MacAddress": "02:42:c0:a8:40:02",
                "IPv4Address": "192.168.64.2/20",
                "IPv6Address": ""
            },
            "e1e9bae0593623ef6491c6d6e7185c3ca047b2284ba82a2c46af3c203306eb86": {
                "Name": "registry",
                "EndpointID": "fa66abe44c6f2e02203f9d91808cea6b47d77ac0db11bc4c70a7c2a4273ef945",
                "MacAddress": "02:42:c0:a8:40:03",
                "IPv4Address": "192.168.64.3/20",
                "IPv6Address": ""
            },
            "e8887dd61eedceea530b234dba448c7652e22cd96db6cf7c13497a8e8d94d4b7": {
                "Name": "processor",
                "EndpointID": "83c1530df2d41cf07ae71d9a0926e0f7b75e02578417422bce149dccedc87487",
                "MacAddress": "02:42:c0:a8:40:05",
                "IPv4Address": "192.168.64.5/20",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "default",
            "com.docker.compose.project": "jaws",
            "com.docker.compose.version": "2.27.0"
        }
    }
]

The curl version is different in the different JAWS versions, and linked libraries also differ (so base image did change):

JAWS v4.13.0 (registrations2epics v4.6.0)

/ $ curl -V
curl 8.5.0 (x86_64-alpine-linux-musl) libcurl/8.5.0 OpenSSL/3.1.4 zlib/1.2.13 brotli/1.0.9 libidn2/2.3.4 nghttp2/1.57.0
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM SSL threadsafe TLS-SRP UnixSockets
/ $ cat /etc/alpine-release
3.19.2

vs JAWS v4.12.0 (registrations2epics v4.5.0)

/ $ curl -V
curl 8.2.1 (x86_64-alpine-linux-musl) libcurl/8.2.1 OpenSSL/3.1.2 zlib/1.2.13 brotli/1.0.9 libidn2/2.3.4 nghttp2/1.55.1
Release-Date: 2023-07-26
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe TLS-SRP UnixSockets
/ $ cat /etc/alpine-release
3.18.3

The /etc/resolv.conf is identical in both versions:

/ $ cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
search acc.jlab.org jlab.org
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [*redacted*]
# Overrides: []
# Option ndots from: internal

Possibly related:

@slominskir
Copy link
Member Author

slominskir commented Jul 9, 2024

Most likely the issue is due to curl now using c-ares in Alpine 3.19 instead of the system DNS resolver as was used in Alpine 3.18 and there is an issue in c-ares where it does not support ndots: 0. This is fixed in c-ares v1.31.0. However, that fix was just 3 weeks ago and hasn't made it into the upstream eclipse-termurin:11-alpine image. Unfortunately, there doesn't appear to be a tag for eclipse-termurin for Java 11 and Alpine 3.18 specifically so it isn't so simple to roll back.

Note: This appears to be a curl specific issue as it uses c-ares. The ping command for example works fine and apparently continues to use the system DNS resolver even in Alpine 3.19.

Note: To rollback to a specific previous version of an image we must use a digest as the tags provided by maintainers are constantly updated and insufficient in this case. Unfortunately searching the history to find the digest to rollback and pin is a hole in the Docker environment at the moment: docker/roadmap#185

@slominskir
Copy link
Member Author

Musings:

@slominskir
Copy link
Member Author

slominskir commented Jul 10, 2024

Poking around in Docker Desktop I got lucky and noticed the base tag used for jeffersonlab/registrations2epics:4.5.0 actually corresponded to several tags, which Docker Desktop enumerated and one happened to have been more specific and not changed: 11.0.20_8-jdk-alpine. Once you have the tag you can lookup the digest in various ways, easiest being on DockerHub. If this tag didn't exist then it isn't clear how to find the appropriate digest. There is actually a separate index digest and manifest digest. The index apparently is for dealing with multi-platform images.

Untitled

slominskir added a commit to JeffersonLab/jaws-effective-processor that referenced this issue Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant