Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleter: inconsistency after AGNOS11 update (NVMe related) #33993

Open
Tracked by #34087
AlexandreSato opened this issue Nov 12, 2024 · 15 comments
Open
Tracked by #34087

deleter: inconsistency after AGNOS11 update (NVMe related) #33993

AlexandreSato opened this issue Nov 12, 2024 · 15 comments
Labels
Milestone

Comments

@AlexandreSato
Copy link
Contributor

AlexandreSato commented Nov 12, 2024

Describe the bug

I suspect that the deleter after AGNOS11 update is no more consistent. I haven't tested it yet to confirm, but I've already caught four people who had to delete realdata/* manually (because a noEnter "out of storage" event)

Provide a route where the issue occurs

dongleIDs: 0b54d0594d924cd9, 075b133b6181e058, a0e5673b89222047

075b133b6181e058/000000cc--7d1fff3f36/42

openpilot version

1376096

Additional info

Not confirmed

image

@adeebshihadeh adeebshihadeh added this to the 0.9.8 milestone Nov 12, 2024
@adeebshihadeh
Copy link
Contributor

Got any routes or dongles?

@AlexandreSato
Copy link
Contributor Author

Got any routes or dongles?

only some dongs, added to description

@nelsonjchen
Copy link
Contributor

Got this error on this route:

https://connect.comma.ai/fe18f736cb0d7813/0000040d--3a202cd3b9

It actually increased by 1% after I couldn't engage anymore either. Logs/data are uploading.

I pretty much only run master-ci so this was surprising.

@AlexandreSato
Copy link
Contributor Author

got one segment when it's happened:
075b133b6181e058/000000cc--7d1fff3f36/42

Interestingly when I connected to the cell phone hotspot to ssh and run the deleter test script, the deleter worked the moment the connection was established without me running any command and the out of storage event cleared, when I get home I upload the route

@korhojoa
Copy link

My routes don't appear in connect so can't really help there, but got it with dongle a954cbd0682c58bd

@michaelhonan
Copy link
Contributor

Yup mine's doing the same too. Hoping the new build of AGNOS in master will address...

My dongle ID is likely useless as I don't have a data connection anymore with my C3.
dc30ef2ad294a732

@adeebshihadeh
Copy link
Contributor

The new AGNOS doesn't address this. Just looking into this now.

So far seems like it's all devices with NVMe drives, so likely a race condition (or failure in) mounting it. Have you guys gotten the "failed to mount" alert at all?

@adeebshihadeh adeebshihadeh changed the title deleter: inconsistency after AGNOS11 update deleter: inconsistency after AGNOS11 update (NVMe related) Nov 18, 2024
@michaelhonan
Copy link
Contributor

Thanks for confirming!
I personally have not seen an error like that

@nelsonjchen
Copy link
Contributor

I have not seen that error.

@AlexandreSato
Copy link
Contributor Author

I also have not seen that error too.

@AlexandreSato
Copy link
Contributor Author

AlexandreSato commented Nov 18, 2024

grab more one: a0e5673b89222047/0000003d--2ac5e373e1/8 or in a0e5673b89222047/0000003c--0133a69a87/3 according user description

@adeebshihadeh adeebshihadeh mentioned this issue Nov 22, 2024
3 tasks
@pbassut
Copy link
Contributor

pbassut commented Nov 23, 2024

Not sure if this is related but my C3X wasn't uploading anything. After deleting realdata it started uploading again.

@adeebshihadeh
Copy link
Contributor

adeebshihadeh commented Nov 25, 2024

Just started looking into this:

075b133b6181e058/000000cc--7d1fff3f36/42

  • eventually recovers towards the end of the drive, almost like the deleter was stuck
  • no rlogs
  • bootlog for this route is missing, not on the NVMe?

fe18f736cb0d7813/0000040d--3a202cd3b9

  • doesn't recover like the other route did
  • thanks for uploading all the logs @nelsonjchen!
  • has the bootlog, and NVMe is mounted for that bootlog

@AlexandreSato
Copy link
Contributor Author

I missed it when I did the AGNOS flash in the sound issue test

@AlexandreSato
Copy link
Contributor Author

got some routes:

075b133b6181e058/0000003e--67e218c522/1
075b133b6181e058/0000003d--2e512616b9/3
i think 0000003f have something interesting too

075b133b6181e058/0000003a--3043e775b6/25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants