Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic due to Syslog when running 'docker compose up' for boulder #7853

Open
eap-nutanix opened this issue Nov 27, 2024 · 5 comments
Open

panic due to Syslog when running 'docker compose up' for boulder #7853

eap-nutanix opened this issue Nov 27, 2024 · 5 comments

Comments

@eap-nutanix
Copy link

Summary:
In theory, I'm deploying totally 'stock' boulder on stock Ubuntu 24.10 server and it appears to panic due to syslog error.
[I'm totally new to this, so I don't know what "normal" looks like...]

Steps to reproduce:
[boulder release-2024-11-19]
git clone https://github.com/letsencrypt/boulder/
cd boulder
docker compose up
Expected result:
success

Actual result:

boulder-1 | pebble-challtestsrv - 2024/11/26 02:33:38 Starting management server on :8055
boulder-1 | panic: Could not connect to Syslog: Unix syslog delivery error
boulder-1 |
boulder-1 | goroutine 1 [running]:
boulder-1 | github.com/letsencrypt/boulder/cmd.Fail(...)
boulder-1 | /boulder/cmd/shell.go:395
boulder-1 | github.com/letsencrypt/boulder/cmd.FailOnError({0xcb0ae0, 0xc000384080}, {0xbbfb4e?, 0x0?})
boulder-1 | /boulder/cmd/shell.go:408 +0xd4
boulder-1 | github.com/letsencrypt/boulder/cmd.NewLogger({0xbac594?, 0x1?})
boulder-1 | /boulder/cmd/shell.go:204 +0x9b
boulder-1 | main.main()
boulder-1 | /boulder/test/mail-test-srv/main.go:231 +0x48e
boulder-1 | Starting service mail-test-srv
boulder-1 |
boulder-1 |
boulder-1 | These processes exited early (check above for their output):
boulder-1 | '('./bin/mail-test-srv', '--closeFirst', '5', '--cert', 'test/certs/ipki/localhost/cert.pem', '--key', 'test/certs/ipki/localhost/key.pem')' with pid 244 exited 2
bpkilint-1 | hdlr.handle(record)
bpkilint-1 | File "/usr/local/lib/python3.12/logging/init.py", line 1028, in handle
bpkilint-1 | self.emit(record)
bpkilint-1 | File "/usr/local/lib/python3.12/logging/init.py", line 1168, in emit
bpkilint-1 | self.handleError(record)
bpkilint-1 | File "/usr/local/lib/python3.12/logging/init.py", line 1081, in handleError
bpkilint-1 | sys.stderr.write('--- Logging error ---\n')
bpkilint-1 | RuntimeError: reentrant call inside <_io.BufferedWriter name=''>
bpkilint-1 | Call stack:
bpkilint-1 | File "/usr/local/bin/gunicorn", line 8, in
bpkilint-1 | sys.exit(run())
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
bpkilint-1 | WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]", prog=prog).run()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/app/base.py", line 236, in run
bpkilint-1 | super().run()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/app/base.py", line 72, in run
bpkilint-1 | Arbiter(self).run()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/arbiter.py", line 227, in run
bpkilint-1 | self.halt()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/arbiter.py", line 342, in halt
bpkilint-1 | self.stop()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/arbiter.py", line 396, in stop
bpkilint-1 | time.sleep(0.1)
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
bpkilint-1 | self.reap_workers()
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/arbiter.py", line 554, in reap_workers
bpkilint-1 | self.log.error(msg)
bpkilint-1 | File "/usr/local/lib/python3.12/site-packages/gunicorn/glogging.py", line 272, in error
bpkilint-1 | self.error_log.error(msg, *args, **kwargs)
bpkilint-1 | Message: 'Worker (pid:13) was sent SIGTERM!'
bpkilint-1 | Arguments: ()

full log: err.txt
Additional details:
Ubuntu 24.10 running on VM

docker version

Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:40:42 2024
OS/Arch: linux/amd64
Context: default

Server: Docker Engine - Community
Engine:
Version: 27.3.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 41ca978
Built: Fri Sep 20 11:40:42 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.23
GitCommit: 57f17b0a6295a39009d861b89e3b3b87b005ca27
runc:
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0

docker compose version

Docker Compose version v2.30.3

@jsha
Copy link
Contributor

jsha commented Nov 27, 2024

Hm, that all should work. test/entrypoint.sh is responsible for starting up rsyslog before all the components start. And the entrypoint config in docker-compose.yml is responsible for setting that as the entrypoint.

If you want to debug further, I might add an echo "I am here" in test/entrypoint.sh to see if it's getting called.

If you just want to get things running you can edit each file in test/config/*.json to remove the syslogLevel field (don't forget to remove the trailing comma on the line above it!). This should turn off syslog-based logging entirely, in favor of stdout logging, and should avoid this particular error.

@eap-nutanix
Copy link
Author

Thanks very much for jumping in so quickly. I added "-x" to the set command in test/entrypoint.sh for verbose output, and also removed sysloglevel from test/config/*.json and it still barfs:

bproxysql-1 | 2024-11-27 20:42:09 [INFO] Latest ProxySQL version available: 2.7.1-16-g2726c27
boulder-1 | +++ dirname test/entrypoint.sh
boulder-1 | ++ cd test
boulder-1 | ++ pwd
boulder-1 | + DIR=/boulder/test
boulder-1 | + rm -f /var/run/rsyslogd.pid
boulder-1 | + service rsyslog start
boulder-1 | * Starting enhanced syslogd rsyslogd
boulder-1 | rsyslog startup failure, child did not respond within startup timeout (60 seconds)
boulder-1 | ...done.
boulder-1 | + ./test/wait-for-it.sh boulder-mysql 3306
bmysql-1 | 2024-11-27 20:43:10 6 [Warning] Aborted connection 6 to db: 'unconnected' user: 'unauthenticated' host: '10.77.77.77' (This connection closed normally without authentication)
boulder-1 | Connected to boulder-mysql:3306
boulder-1 | + ./test/wait-for-it.sh bproxysql 6032
boulder-1 | Connected to bproxysql:6032
boulder-1 | + ./test/wait-for-it.sh bpkilint 80
boulder-1 | Connected to bpkilint:80
boulder-1 | + MYSQL_CONTAINER=1
boulder-1 | + /boulder/test/create_db.sh
boulder-1 |
boulder-1 | boulder_sa_test
boulder-1 | Already exists - skipping create
boulder-1 | Applied 0 migrations
boulder-1 | ERROR 1146 (42S02) at line 38: Table 'boulder_sa_test.paused' doesn't exist
boulder-1 | ERROR 1146 (42S02) at line 59: Table 'boulder_sa_test.paused' doesn't exist
boulder-1 | Added users from ../db-users/boulder_sa.sql
boulder-1 |
boulder-1 | boulder_sa_integration
boulder-1 | Already exists - skipping create
boulder-1 | Applied 0 migrations
boulder-1 | ERROR 1146 (42S02) at line 38: Table 'boulder_sa_integration.paused' doesn't exist
boulder-1 | ERROR 1146 (42S02) at line 59: Table 'boulder_sa_integration.paused' doesn't exist
boulder-1 | Added users from ../db-users/boulder_sa.sql
boulder-1 |
boulder-1 | incidents_sa_test
boulder-1 | Already exists - skipping create
boulder-1 | Applied 0 migrations
boulder-1 | Added users from ../db-users/incidents_sa.sql
boulder-1 |
boulder-1 | incidents_sa_integration
boulder-1 | Already exists - skipping create
boulder-1 | Applied 0 migrations
boulder-1 | Added users from ../db-users/incidents_sa.sql
boulder-1 |
boulder-1 | database setup complete
boulder-1 | + [[ 0 -eq 0 ]]
boulder-1 | + exec python3 ./start.py
boulder-1 | echo bin/akamai-purger bin/crl-checker bin/bad-key-revoker bin/notify-mailer bin/nonce-service bin/rocsp-tool bin/boulder-va bin/log-validator bin/contact-auditor bin/remoteva bin/expiration-mailer bin/boulder-publisher bin/boulder-sa bin/reversed-hostname-checker bin/boulder-ra bin/boulder bin/ceremony bin/boulder-wfe2 bin/admin-revoker bin/boulder-observer bin/id-exporter bin/admin bin/crl-updater bin/crl-storer bin/cert-checker bin/ocsp-responder bin/sfe bin/boulder-ca
boulder-1 | bin/akamai-purger bin/crl-checker bin/bad-key-revoker bin/notify-mailer bin/nonce-service bin/rocsp-tool bin/boulder-va bin/log-validator bin/contact-auditor bin/remoteva bin/expiration-mailer bin/boulder-publisher bin/boulder-sa bin/reversed-hostname-checker bin/boulder-ra bin/boulder bin/ceremony bin/boulder-wfe2 bin/admin-revoker bin/boulder-observer bin/id-exporter bin/admin bin/crl-updater bin/crl-storer bin/cert-checker bin/ocsp-responder bin/sfe bin/boulder-ca
boulder-1 | GOBIN=/boulder/bin GO111MODULE=on go install -mod=vendor -tags "integration" ./...
boulder-1 | ./link.sh
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating HTTP-01 challenge server on 10.77.77.77:80
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating HTTPS HTTP-01 challenge server on 10.77.77.77:443
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating TCP and UDP DNS-01 challenge server on :8053
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating TCP and UDP DNS-01 challenge server on :8054
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating DoH server on :8343
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating DoH server on :8443
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Creating TLS-ALPN-01 challenge server on 10.88.88.88:443
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Answering A queries with 10.77.77.77 by default
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Starting challenge servers
boulder-1 | pebble-challtestsrv - 2024/11/27 20:43:16 Starting management server on :8055
boulder-1 | panic: Could not connect to Syslog: Unix syslog delivery error [recovered]
boulder-1 | panic: Unix syslog delivery error
boulder-1 |
boulder-1 | goroutine 1 [running]:
boulder-1 | github.com/letsencrypt/boulder/log.initialize()
boulder-1 | /boulder/log/log.go:101 +0x75
boulder-1 | github.com/letsencrypt/boulder/log.Get.func1()
boulder-1 | /boulder/log/log.go:131 +0x19
boulder-1 | sync.(*Once).doSlow(0xc00036db20?, 0x1?)
boulder-1 | /usr/local/go/src/sync/once.go:76 +0xb4
boulder-1 | sync.(*Once).Do(...)
boulder-1 | /usr/local/go/src/sync/once.go:67
boulder-1 | github.com/letsencrypt/boulder/log.Get()
boulder-1 | /boulder/log/log.go:129 +0x2c
boulder-1 | github.com/letsencrypt/boulder/cmd.AuditPanic()
boulder-1 | /boulder/cmd/shell.go:366 +0x38
boulder-1 | panic({0x15b04e0?, 0xc000411c60?})
boulder-1 | /usr/local/go/src/runtime/panic.go:785 +0x132
boulder-1 | github.com/letsencrypt/boulder/cmd.Fail(...)
boulder-1 | /boulder/cmd/shell.go:395
boulder-1 | github.com/letsencrypt/boulder/cmd.FailOnError({0x198bf80, 0xc000411c30}, {0x177ee16?, 0x0?})
boulder-1 | /boulder/cmd/shell.go:408 +0xd4
boulder-1 | github.com/letsencrypt/boulder/cmd.NewLogger({0x7ffe43d01e78?, 0x10?})
boulder-1 | /boulder/cmd/shell.go:204 +0x9b
boulder-1 | github.com/letsencrypt/boulder/cmd.StatsAndLogging({0x0?, 0x0?}, {{0x0?, 0x7ffe43d01e78?}, 0x10?}, {0x7ffe43d01e96, 0x5})
boulder-1 | /boulder/cmd/shell.go:184 +0x3c
boulder-1 | github.com/letsencrypt/boulder/cmd/nonce-service.main()
boulder-1 | /boulder/cmd/nonce-service/main.go:104 +0x308
boulder-1 | main.main()
boulder-1 | /boulder/cmd/boulder/main.go:134 +0x432
boulder-1 | Starting service s3-test-srv
boulder-1 | Starting service nonce-service-zinc-1
boulder-1 |
boulder-1 |
boulder-1 | These processes exited early (check above for their output):
boulder-1 | '('./bin/boulder', 'nonce-service', '--config', 'test/config/nonce-b.json', '--addr', '10.77.77.77:9401', '--debug-addr', ':8112')' with pid 248 exited 2
boulder-1 exited with code 1

@jsha
Copy link
Contributor

jsha commented Nov 27, 2024

Here's where rsyslog is failing to start up:

boulder-1 | rsyslog startup failure, child did not respond within startup timeout (60 seconds)

removed sysloglevel from test/config/*.json and it still barfs:

I realized I gave you incorrect advice here. Looking at the code again, syslogging is set up if the syslog level is set to 0 or above. And omitting it sets it to zero. You'd need to set it to -1 to completely disable syslogging.

Alternately... Have you tried docker compose build, followed again by docker compose up? It's possible there's something broken with the docker image being started.

@eap-nutanix
Copy link
Author

Ok, I set all sysloglevels to -1, no change for that or after running docker compose build.
Just for fun, I copied the rsyslog.d/boulder.rsyslog.conf file to another Ubuntu 24.10 machine and rsyslog started in a couple of seconds with no errors. rsyslog -N1 is happy with the syntax. So far, I'm unable to duplicate the hang.
To be completely honest, I'm only seeking the OCSP responder functionality. Is there a way to bring up just that and skip the other bits?

@eap-nutanix
Copy link
Author

All my previous testing was on a VM. Today, I tried on a physical server running Ubuntu 24.04 and got much further. I did a docker compose up and got a bunch of errors about various attempts to open certificates. I poked around and ran docker compose up bsetup (found this in tests/certs/README.md) and tried docker compose up and things seemed to start. I see various services listening on localhost now:

 lsof -P -i
COMMAND     PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 35510            root    4u  IPv4  45363      0t0  TCP *:4001 (LISTEN)
docker-pr 35520            root    4u  IPv6  48542      0t0  TCP *:4001 (LISTEN)
docker-pr 35526            root    4u  IPv4  42537      0t0  TCP *:4002 (LISTEN)
docker-pr 35533            root    4u  IPv6  42542      0t0  TCP *:4002 (LISTEN)
docker-pr 35539            root    4u  IPv4  39540      0t0  TCP *:4003 (LISTEN)
docker-pr 35547            root    4u  IPv6  42549      0t0  TCP *:4003 (LISTEN)

So now onto making OCSP work!
I'm now trying:

openssl ocsp -issuer ca-chain.crt.pem -cert test.crt.pem  -url http://[::0]:4002 -text
OCSP Request Data:
    Version: 1 (0x0)
    Requestor List:
        Certificate ID:
          Hash Algorithm: sha1
          Issuer Name Hash: 0CFFDCBEFDE22E830CBC193CE18A1D914C8E3D29
          Issuer Key Hash: B19AAD9226C31884B729BD9BCDC89A9D58A241C1
          Serial Number: 216D22BF210838A95D95DEBD2ABE026637009D55
    Request Extensions:
        OCSP Nonce: 
            0410C83D90713496500A349C28CCA2898778
Responder Error: unauthorized (6)

so I still need to figure out that last bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants