Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption on 9.013.02 ? #59

Open
ewaldc opened this issue Sep 26, 2024 · 1 comment
Open

Data corruption on 9.013.02 ? #59

ewaldc opened this issue Sep 26, 2024 · 1 comment

Comments

@ewaldc
Copy link

ewaldc commented Sep 26, 2024

I am in the process of rewriting the r8125 driver starting from your latest version 9.013.02. The goal is to bring it closer to the mainline r8169 driver, reduce it's size, increase the performance/reduce system load while keeping the additional features such as RSS, PTP etc.
While testing the 3 versions (9.013.02, r8125 new, r8169 6.8 mainline backported to 6.1 to fix several data corruptions) side-by-side, it was noticed that a few of the 9.013.02 tests failed.
The test platform is a Rockchip rk3588 system with dual 8125B running Ubuntu Jammy 24.04. All drivers were compiled using gcc14.2.0 cross compiler for arm64 and used MTU 1500/9000 for each round of testing. For 9.013.02, all was left untouched including the Makefile. Kernel is 6.1.

On one of the more challenging tests (a 500 MB zstd compressed archive containing many binary files) :

wget "http://ports.ubuntu.com/pool/main/l/linux-firmware/linux-firmware_20240913.gita34e7a5f-0ubuntu2_arm64.deb"

both the mainline r8169 and the new r8125 driver have the (correct) sha512 checksum (cksum -a sha512) of 3956d1faa6a5aebf624a42d39883594b70562d3919545cb4c18c44842df16eceddbc1a79867482c88310f63cc5033b13e1590472a4e11695d5a75ee23683905f but the original 9.013.02 driver is giving different checksum results at each download. NOTE: during the wget, there is a also a shell script which continuously uploads binary files to the test platform, adding to the stress test. This can be replaced by using manual sftp/scp transfers e.g. using filezilla or scp.

Is there any chance to execute this test ? If not reproducable, I need to find the mistake from my side or look into external factors like gcc version, kernel version/corruption, system HW etc. Alternatively, if confirmed, it would be good for users to know...

Thanks in advance.

@ewaldc
Copy link
Author

ewaldc commented Dec 22, 2024

I managed to find several root causes for these corruptions.

  • with lots of small packets, frags could contain the wrong value. This also hit the mainline r8169 driver
  • memory barriers missing and/or incorrectly positioned + some additional READ_ONCE/WRITE_ONCE required, causing issues with (fast) 6+ core CPU's like RK3588.
    I posted a rewrite of the driver here, it's now much closer to the mainline r8169 version, about half the size and consumes less memory and CPU resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant