-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect ADC readings after toolchain bump #567
Comments
Paging @Patryk27 (sorry for always bugging you with these topics!) @tronje, your best bet is building a minimal reproducer example that still triggers this compiler regression. You can do this by continuously removing code from your application until you cannot remove anything without the regression vanishing. Then people here can investigate where this problem originates from. |
Hi Rahix, thanks for your response.
I've done that, you can see it here: https://github.com/tronje/adc-read-compiler-bug I'm guessing it'd be possible to minimize it even further. For my setup, I have to set up a GPIO expander and pull two of its outputs for my pin of interest to report correct values. I'd guess any other pin should exhibit the same bugged behavior. |
Hi, thanks for pinging! I'll check it tomorrow 🙂
No worries, glad to be helpful 😄 |
Alright, I've been comparing the disassembly between the working and broken firmware, and one important change that stands out is the inlining of Now, while on its own inlining shouldn't cause any issues (or rather, if inlining a function breaks the binary, then it's a bug in the codegen), narrowing the scope down is always a step forward. So, @tronje would you mind compiling your code with this updated [build]
target = "avr-specs/avr-atmega164pa.json"
rustflags = ["-Zcross-crate-inline-threshold=0"]
[unstable]
build-std = ["core"]
build-std-features = ["compiler-builtins-mangled-names"] |
Hey @Patryk27, thanks a lot for taking the time to look into this! Unfortunately, your suggested change made no difference. I get the correct values with the old toolchain, and mostly 0's with the newer toolchain, just like before. |
@tronje does the issue still occur if you compare the values instead of printing them? Like: loop {
let val = pin.analog_read(&mut adc);
if val >= 128 {
ufmt::uwriteln!(serial, "high").ok();
} else {
ufmt::uwriteln!(serial, "low").ok();
}
delay_ms(100);
} (trying to narrow it down as to whether it's the measurement that's wrong or rather the printing part) |
Your example always prints loop {
let val = pin.analog_read(&mut adc);
if val == 0 {
ufmt::uwriteln!(serial, "zero\r").ok();
} else if val < 5 {
ufmt::uwriteln!(serial, "small\r").ok();
} else {
ufmt::uwriteln!(serial, "large\r").ok();
}
delay_ms(100);
} Which mostly prints |
I see, thanks for checking! I'll try reproducing the issue in simavr and report back. |
No, thank you for taking all this time to help me out! I really, really appreciate it. |
Alright, I've been testing the firmware under simavr and I can't reproduce the problem there (i.e. everything behaves correctly). Let's try another angle:
|
Looks that way;
Doesn't make a difference. |
I looked at the assembly, but couldn't find anything obvious. Could you upload your verified working and broken versions in case it's system dependent? EDIT: that was on the original version, didn't see that you simplified it further |
Yep. Anyway, I've dumped the binaries using the following commands:
and attached them here. built-with-nightly-2023-12-28.txt |
I compared those dumps with mine and they're bit for bit identical. |
Status: since there were a couple of codegen bugs uncovered and fixed recently, and since I can't reproduce the issue in simavr, I think it makes sense to wait until #573 is resolved (which is stalled on llvm/llvm-project#106993) and see on the newest rustc. |
@tronje would you mind rechecking on the newest toolchain? (e.g. from today) Lots of issues got fixed in the meantime and while I can't reproduce the issue locally, my hopes are that maybe one of those fixes happened to cover what was going on here. |
Looks like it's still broken. Digital reads still always return
|
I did a little binary search and found the first toolchain that introduces the bug. The last working toolchain is I'll start comparing assembly outputs and changes between the two toolchains. |
Small correction: the So
edit (don't want to spam with too many comments): both versions use the same LLVM version, 17.0.6. The git log in the |
I did a
I had to
Anyway, about the possible culprit-commits. They all seem to be from the same merge request. 432fff is the merge-commit that merges them, c2fd26 seems to do the primary work, and the other two are smaller fixes. I'll admit, I have no real idea what's going on here, but I'm guessing it may be time to take this to the main rust project? For your convenience, here are the commits I mentioned as GitHub links:
edit: this is the pull request were these changes were introduced: rust-lang/rust#118991 |
So I'm not a compiler guy at all, but I built the current diff --git a/compiler/rustc_codegen_llvm/src/abi.rs b/compiler/rustc_codegen_llvm/src/abi.rs
index 2fe5ed32daa..031daa843be 100644
--- a/compiler/rustc_codegen_llvm/src/abi.rs
+++ b/compiler/rustc_codegen_llvm/src/abi.rs
@@ -5,7 +5,7 @@
use rustc_abi::Primitive::Int;
use rustc_abi::{HasDataLayout, Size};
use rustc_codegen_ssa::MemFlags;
-use rustc_codegen_ssa::mir::operand::{OperandRef, OperandValue};
+use rustc_codegen_ssa::mir::operand::OperandValue;
use rustc_codegen_ssa::mir::place::{PlaceRef, PlaceValue};
use rustc_codegen_ssa::traits::*;
use rustc_middle::ty::Ty;
@@ -248,7 +248,7 @@ fn store(
bx.lifetime_end(llscratch, scratch_size);
}
_ => {
- OperandRef::from_immediate_or_packed_pair(bx, val, self.layout).val.store(bx, dst);
+ OperandValue::Immediate(val).store(bx, dst);
}
}
} That's on top if this state: rust-lang/rust@55b7f8e I was just grasping at straws, basically, and got lucky. Can't really say I understand what the change does, though 😬 |
Thanks for the thorough report! I'm going to NixCon tomorrow, so I won't be able to take a look at it over the weekend, but I'll try to take a stab the next week. Having all this here should be sufficient to reproduce the issue locally (or at least be able to compare the differences in generated assembly). If you find anything else, of course please do write it down 😄 |
Thank you! Have fun at NixCon 🙂 I have pushed a handful of commits to my example program. We also observed a few differences in the generated assembly. The two toolchains we compared were both based on rust-lang/rust@55b7f8e, one with the diff I mentioned above, and one without. So, very similar compilers. The one with the fix generates a few extra instructions in the code for push r28
push r29
in r28, 0x3d ; 61
in r29, 0x3e ; 62
sbiw r28, 0x01 ; 1
in r0, 0x3f ; 63
cli
out 0x3e, r29 ; 62
out 0x3f, r0 ; 63
out 0x3d, r28 ; 61 And towards the end, seemingly the reverse: push r28
push r29
in r28, 0x3d ; 61
in r29, 0x3e ; 62
sbiw r28, 0x01 ; 1
in r0, 0x3f ; 63
cli
out 0x3e, r29 ; 62
out 0x3f, r0 ; 63
out 0x3d, r28 ; 61 The toolchain without the fix, so just the clean git state, does not generate this code, but the functions look almost identical otherwise. The other thing we noticed is that the Maybe that's something to go on, no idea. I'll attached my assembly dumps here. |
While that difference is interesting, I don't think that's the cause of the bug, for 2 reasons:
|
Ah, too bad. Thanks for the insight! 🙂 |
Seems like the inlining of |
Alright, I'm leaning towards a hardware quirk here - which I know sounds a bit crazy, but hear me out 😄 I've been looking at the firmware, analyzing the assembly by hand - and while the output is a bit different between toolchain versions, that's mostly because LLVM does a better job at inlining constant expressions within Overall the newest binary looks correct, I'd say. Since static eye-analysis can only get one so far, I've took a moment to add TWI (aka I2C) support to use avr_tester::AvrTester;
#[test]
fn test() {
let mut avr = AvrTester::atmega164pa()
.with_clock(10_000_000)
.load("/home/pwy/t/projects/adc-read-compiler-bug/target/avr-atmega164pa/release/adc-read-compiler-bug.elf");
avr.twi0().attach_slave_fn({
let mut reg = None;
let mut regs = [0; 256];
move |packet| {
if packet.is_start() {
return Some(packet.respond_ack());
}
if packet.is_stop() {
return Some(packet.respond_ack());
}
if packet.is_write() {
if let Some(dev_reg) = reg.take() {
println!("write_register({}, {})", dev_reg, packet.data);
regs[dev_reg] = packet.data;
} else {
reg = Some(packet.data as usize);
}
return Some(packet.respond_ack());
}
if packet.is_read() {
let reg = reg.take().expect("No register picked");
let reg_val = regs[reg];
println!("read_register({}) = {}", reg, reg_val);
return Some(packet.respond_data(reg_val));
}
None
}
});
avr.run_for_ms(1500);
avr.pins().pa0().set_high();
avr.run_for_ms(1500);
panic!("{}", avr.uart0().read::<String>());
} ... and the output looks exactly the same on both the older and the newer firmware:
Now, of course it's not a definite test - simavr could be emulating something differently (I know of buserror/simavr#453, for which I'm applying a workaround within My guess is that because the newer code uses substantially less instructions between the calls to Another totally random observation is in regards to avr-hal/avr-hal-generic/src/i2c.rs Line 400 in 15080aa
... which I guess makes sense, but for instance Arduino implementations usually (seem to) use two separate transactions: Maybe it would be worthwhile to check whether we shouldn't be doing two txs as well, like: fn write_read(
&mut self,
address: u8,
bytes: &[u8],
buffer: &mut [u8],
) -> Result<(), Self::Error> {
self.p.raw_start(address, Direction::Write)?;
self.p.raw_write(bytes)?;
self.p.raw_stop()?;
self.p.raw_start(address, Direction::Read)?;
self.p.raw_read(buffer)?;
self.p.raw_stop()?;
Ok(())
} |
That all sounds pretty plausible. Thank you so much for all that effort! I will try to play around with the I2C, I suppose. Will start with your suggestion, then maybe check if I can't get our logic analyzer connected to the hardware. |
I don't think the write_read implementation should create any problems as that is explicitly supported. I also don't see why writing immediately after reading should cause any problems, especially because you're only running it at 100kHz while the MCP supports up to 1.7MHz. So even if the issue is resolved by changing the I2C code, I think it's a good idea to check whether the root cause might be something else (similar to how inlining prevents/causes the issue). |
Hi! We're using avr-hal and avr-device with an atmega164pa uC.
We tried bumping our Rust toolchain, as you have done with 2eb28fa.
This causes incorrect ADC readings. We've literally changed nothing else, and suddenly an
analog_read
onPin<Analog, PA0>
returns usually0
, sometimes1
, but never a value in the hundreds, as would be expected in our situation.I guess it could be due to some
unsafe
weirdness in our codebase, but I mean the ADC is set up normally, and then it's just a call toanalog_read
, really. Not sure how much of our codebase I can share; I could probably build a minimal example to show the behavior. Just figured I'd ask first if anyone has any ideas about this. Works just fine onnightly-2023-12-28
.Thanks!
The text was updated successfully, but these errors were encountered: