Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in the Upstairs leaves an instance in a zombie coma #1652

Open
leftwo opened this issue Aug 29, 2022 · 0 comments
Open

Panic in the Upstairs leaves an instance in a zombie coma #1652

leftwo opened this issue Aug 29, 2022 · 0 comments

Comments

@leftwo
Copy link
Contributor

leftwo commented Aug 29, 2022

While testing core files for this issue I made crucible upstairs panic just because. After the panic, I did get a core file and I see that the propolis-server service has restarted:

Aug 29 17:15:26.270 INFO accepted connection, remote_addr: [fd00:1122:3344:101::1]:63263, local_addr: [fd00:1122:3344:101::c]:12400
Aug 29 17:15:26.271 INFO request completed, response_code: 101, uri: /instance/serial, method: GET, req_id: a759e184-bbb3-4243-b7d9-0626bac14639, remote_addr: [fd00:1122:3344:101::1]:63263, local_addr: [fd00:1122:3344:101::c]:12400
Aug 29 17:15:55.387 INFO rdmsr, msr: 3221291675, sync_task: vcpu-1, component: vmm
Aug 29 17:15:55.387 INFO rdmsr, msr: 3221291673, sync_task: vcpu-1, component: vmm
Scrub at offset 335616/4194304 sp:335616
thread 'tokio-runtime-worker' panicked at 'We are going to panic now!', /home/alan/ws/crucible/upstairs/src/volume.rs:402:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Aug 29 10:17:39 Stopping because all processes in service exited. ]
[ Aug 29 10:17:39 Executing stop method (:kill). ]
[ Aug 29 10:17:39 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /var/svc/manifest/site/propolis-server/config.toml [fd00:1122:3344:101::c]:12400 --metric-addr [fd00:1122:3344:101::3]:12221 &"). ]
[ Aug 29 10:17:39 Method "start" exited with status 0. ]
Aug 29 17:17:39.784 ERRO could not query reservoir Os { code: 1, kind: PermissionDenied, message: "Not owner" }
Aug 29 17:17:39.784 INFO Metrics server will use InstanceMetricsConfig { propolis_addr: [fd00:1122:3344:101::c]:12400, metric_addr: [fd00:1122:3344:101::3]:12221 }
Aug 29 17:17:39.784 INFO Starting server...
Aug 29 17:17:39.785 INFO listening, local_addr: [fd00:1122:3344:101::c]:12400

However, my instance did not come back. The console and API think it's running:

alan@atrium:omicron-files$ oxide instance view -o myorg -p myproj debian
 time_run_state_updated | 3 minutes ago                        
 time_modified          | 3 minutes ago                        
 time_created           | 3 minutes ago                        
 run_state              | running                              
 project_id             | 637caf31-5b5b-41d7-ab16-6175dd1b98a5 
 ncpus                  | 2                                    
 memory                 | 1073741824                           
 hostname               | debian                               
 description            | debian                               
 name                   | debian                               
 id                     | f71d0d33-7b1c-47dd-888f-dcf3aa5b5b85 

I attempted to stop it, but fails my stop request:

alan@atrium:omicron-files$ oxide instance stop -o myorg -p myproj debian
Type debian to confirm stop:: debian
✘ Oxide API internal error: Internal Server Error

The propolis log reports:

Aug 29 17:20:25.096 INFO request completed, error_message_external: Internal Server Error, error_message_internal: Server not initialized (no instance), response_code: 500, uri: /instance/state, method: PUT, req_id: 05b58734-2f88-4864-b279-123d33ffc83f, remote_addr: [fd00:1122:3344:101::1]:53026, local_addr: [fd00:1122:3344:101::c]:12400

Another attempt to stop it results in the command hanging (at least I gave up after 10 minutes):

alan@atrium:omicron-files$ oxide instance stop --confirm -o myorg -p myproj debian
⠁ Waiting for instance status to be `stopped`
@leftwo leftwo changed the title Panic in the Upstairs leaves an instance in a coma Panic in the Upstairs leaves an instance in a zombie coma Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant