Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Overalignment Issue for aarch64 (Apple M1): CachePadded Aligned to 128 Bytes #131019

Closed
xpepermint opened this issue Sep 29, 2024 · 2 comments

Comments

@xpepermint
Copy link

While working on a project involving my own channel implementation, I noticed an issue where data was being loaded incorrectly from atomic variables wrapped in CachePadded. This problem arose specifically on an aarch64 platform, particularly on Apple M1. After troubleshooting for a while, I began to suspect that the alignment of CachePadded could be contributing to this behavior.

In this case, the issue did not seem to be related to any specific Rust library object. I have used CachePadded in other parts of my code without encountering problems, but in this particular case, I was working with an Arc<Vec<Commit>>, where each Commit structure included fields like Weak<AtomicUsize> and AtomicUsize. Additionally, the vector had a fairly large capacity, which might have exposed the issue.

Currently, CachePadded aligns data to 128 bytes on aarch64, likely due to the assumption that prefetchers on modern CPUs, including ARM chips, can fetch multiple cache lines at once. However, as far as I know, the actual cache line size on Apple M1 is 64 bytes. This over-alignment could be introducing unnecessary padding, leading to inefficiencies or even the incorrect behavior I’ve observed. Reducing the alignment to 64 bytes, matching the actual cache line size, resolved this issue.

Unfortunately, I’m unable to share the full code due to a signed policy. However, I believe the alignment strategy for CachePadded on aarch64 is worth investigating, especially in situations like mine where high capacity vectors and atomic operations are involved. Adjusting the alignment for aarch64 to 64 bytes may help prevent similar issues from arising in other cases.

I’d appreciate it if someone could take a look or clarify whether this behavior is expected. From my testing, it seems that reducing the alignment to 64 bytes mitigates the problem on M1, though I’m not entirely certain if this is the root cause.

Ref: crossbeam-rs/crossbeam#1139

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 29, 2024
@workingjubilee
Copy link
Member

@xpepermint The Apple M1 has an asymmetric cache line size across its different cores and caches. It in fact uses a 128-byte cache on non-efficiency cores.

However, this repository does not expose CachePadded, so you have filed the issue in the wrong place.

@workingjubilee workingjubilee closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2024
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 29, 2024
@thomcc
Copy link
Member

thomcc commented Sep 29, 2024

See also the output of sysctl hw.cachelinesize on my M1 mac:

hw.cachelinesize: 128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants