-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give more time to reset the BL-Touch after a failure to raise the probe #2655
Conversation
I can only test this in a few days, printer is busy right now. So I can only just comment tentatively: c367667 Yes, although one hardly notices this. Currently pin goes up and I suppose if you look closely, you can see that the head also begins to move at the same time. Actually, in one version of my test code, I was purposely doing something like that to speed up the retract move + pin up operation by having them happen simultaneously. Now, with no retract between samples, this would not be an advantage. For users who do retract between samples the effects of this change are zero as there would already have been a 4849414 Oh, yes. But for safeties sake maybe one could also apply the 1s reset time at this place also:
|
FYI, this change is only to multi_probe_end() so it would only occur at the very end of the probing tool. It should therefore be an inconsequential slow-down. The goal of this change is to ensure we don't start a print while the pin is still rising.
I'm only expecting that code path to run on really old clones. (I'd expect most users to leave pin_up_reports_not_triggered at its default.) I don't know what the sequence should be for the old clones (that can't verify the pin has been raised), so I just left it as it was before. I suspect you have more knowledge of what should be done on these old clones - so if you know of a preferred timing then that would be great. I'd implement it as a separate commit though. Thanks, |
I was aware of that. I only meant to say that: If someone does not retract between probes there is no advantage to a hack that I was playing with, namely to raise the pin and to start moving quasi simultaneously instead of serially. It was a bit OT, so sorry.
Needs to be done. I agree. |
Actually that is not the case But the internal processing that needs to be done in such a probe to effect a reset and perform the internal setup won't be so much different from probe type to clone type - and it would be safe to say that the longer the reset command itself is allowed to perform before further commands are sent to the probe, the better. Note that I am not talking about the duration of the command being held as PWM but rather the grace time after the command has been presented to the probe and a new command is given. In Marlin, both are equal as the PWM is upheld. In klipper one could easily implement this as command-time (with PWM active) followed by wait-time (with PWM inactive). I.e. command recognition time and command execution time. But I digress… FYI, even at the current state I consider the On the other hand, if someone has a faulty/flaky setup, of course he will run into the recovery scenarios but even if they are resolved well, he won't be happy and should need to fix his hardware. |
Agreed, but note that code path is not a failure case. It's the normal "raise the probe" code path. So, if we wait 1s to issue the reset then everyone with one of these old clones now waits an additional second on every probe. An alternative would be to just remove the "reset", but it doesn't seem to harm anything and it may help on some old clones that can clear state quickly. Another alternative would be to remove the pin_up and just issue the "reset" for pin_move_time. However, it was my understanding that some old clones do not raise the pin on a reset command. -Kevin |
Even ANTClabs have not reacted to my reports that RESET on their probes will also not always STOW the pin (contrary to their claims) when the probe is in a certain state.
It is, in the Oh shucks. We need to raise as fast as possible if we have just touched the bed on a downward probe. Either that or/and move the nozzle up as fast as possible. I should have played around with the original code more than I did. That's also why in Marlin I decided to just raise the probe and then after that, checking for an error condition ("is it raised?"), which raises right away and then spends some time checking, recovering Does this explain why there is sometimes an observation and even an issue that a certain amount of think time goes by after the bed touch before the pin is taken up? 100ms or so? I mean beyond the other problems that were fixed (race condition etc.) On wednesday I will have the opportunity to test this PR and I will have a look at this specific topic as well. I quote myself from #2653
|
Let me repeat: it works quite well already. And things get complicated very quickly if you want to cover all the bases on this. |
No, because that code-path is only called when the user has explicitly set On the other hand, if a user has mistakenly set that config option, then I think it could cause issues. If that is the case I suspect the solution would be to update the documentation to make it more clear that config option really should not be set unless absolutely necessary.. -Kevin |
That's a good point. Separate to this topic, the test_sensor() code path wasn't actually running pin_up - it was assuming reset would do the pin_up. Since test_sensor() is an infrequent code path, I think it would make sense to run the full reset followed by the full pin_up sequence. I added commit 3566ed3 to this branch. -Kevin |
3566ed3
to
2f5d2a3
Compare
Yes. So if "pin-up, touch mode, verify" fails, we get a retry with "reset, pin-up,...." which is exactly what the doctor ordered. In my experience, no further retries are needed, that single one should do it. Otherwise something is actually broken. But it doesn't hurt to retry a couple of times. The "reset, pin-up" sequence is important as a replacement for all places where a "reset" is needed conceptually. Looking forward to testing the PR on Wednesday |
Thanks. I don't know if you have access to a BLTouch v3.0 and BLTouch v3.1. If you do, it would also be good to see if It would be great if we could document that both -Kevin |
I have a 3.0, a pre APR5/19 3.1, a current 3.1, a 2.0 and a clone as well.
That I can understand very well and I will put it on the agenda. |
BTW this Since we (for example) concat other commands directly behind each other, or look at Marlin, keeping the PWM active, couldn't we have the |
Yes, and a delay of some kind is needed between the scheduling of two PWM updates even if the update is to turn off the PWM. Otherwise, it's possible to get obscure "Timer too close" errors in some corner cases. The reason is that the Klipper mcu code only supports queing one pwm update at a time within the mcu (if the schedule is deeper then additional updates get queued in the host serial buffer first). This is discussed a bit in #133. Currently Klipper always uses 100ms for this "minimum pin time" delay - I'm sure other values are possible, but I'm not sure it is worth optimizing given the severity of the error (and difficulty to test for it). That said, it is not necessary to delay future toolhead moves for this period. (It's only necessary to ensure no future pwm updates occur during this delay.) So, it may be possible to optimize sync_print_time() to avoid adding -Kevin |
FYI, I added that to this branch as commit 5271f35. -Kevin |
5271f35
to
9d8d099
Compare
FYI - I also added another minor optimization (primarily in commit 9d8d099) to this branch. Normally the "pin_up" command comes after the homing code has fully "finalized" which includes querying the "mcu stepper position" for each of the Z steppers - which takes a few milliseconds per stepper. This commit should allow the host to send the pin_up slightly faster (in the stow_on_each_sample=True case). This change, though, is a bit tricky and I don't have a good way to test it. -Kevin |
9d8d099
to
5c46a79
Compare
Ok, I am set up and have started testing. In loose order I will report everything I am finding out, to remove some misconceptions floating around. Currently using a After-Apr5/19 BLTouch V3.1 and when I move to older probes I will check these again. Step 1: possible difference when holding PWM (case 1) or turning PWM off (case 2) after the (Oh, and how am I doing this, sorry, I am using a modified Marlin's LCD commands to send commands directly to the probe, very convenient, with a judicious The following holds true for BOTH cases:
So it is NOT the held PWM which causes the probe to "want to be deployed" Step 2: What's with this RESET procedure???: Once again testing these by issuing single commands to the probe and using my finger to tickle the pin up and down Here's some (maybe surprising facts): If the probe is deployed and NOT in an alarm state, you can RESET it to your hearts desire, but it won't stow the pin. In that case you need to issue a STOW to stow the pin (if it is possible). If the probe is in ALARM state, a RESET will stow the pin (if it can). Here's a real tricky one: If you are in SW mode, a RESET will not stow the pin (because in SW mode you cannot ever get into an ALARM state. A RESET will turn off the SW mode in part only, as now the trigger will not respond any more on pin-push-up. You are now in a strange half-SW mode, pushing in the pin doesn't cause a trigger signal and the probe is more or less "dead". It's a BUG. An SW mode command gets you back into full SW mode, or a STOW gets you back into normal mode (and the probe will attempt to stow the pin, which might fail), or a DEPLOY will also work, but it will attempt to deploy the pin, which might not be so attractive and could fail. Conclusion: If you want to reliably reset and stow the probe when you don't know what state it is in, you need to take care of these above quirks: First, you do a STOW. This would stow the probe even if it is not in an ALARM condition. It would also cleanly terminate an open SW mode or half SW mode. Second, in an ALARM condition the previous STOW would have failed (be ignored), so now a RESET is done. This will also stow the pin if it happens to be deployed. That would the following sequence:
Why the 550ms wait time on the STOW (just as for a DEPLOY, by the way)? A pin movement command might fail because something is physically broken. Then it will take the probe about 550ms of internal retries to recognize that and then go into an ALARM condition. The final RESET will now reset such an ALARM condition and try to stow the probe. This might then finally result in an ALARM condition once again which should be checked. Edit: |
Trigger behaviour in normal mode when deployed: BLTouch V3.1: I think this is what makes the V3.1 a little finnicky in terms of how fast you need to STOW or move the nozzle up before it goes into ALARM mode. Slow controllers/printers might prefer SW mode for this. BLTouch V3.0: BLTouch V.20: Same behaviour as the V3.0 but signal is "more powerful" - I won't discuss what that means right here, no time, no space. Clone (3D-Touch): Interesting differences: When triggered, it issues a single 10ms pulse, (powerful) and turns on the red led. It doesn't do anything to the pin (it limply drops down again). It never triggers again until you do the next DEPLOY command (even though the pin might be down). It is the only probe that can have the red LED turned on when the pin is down. SW-Mode works as expected on all the above probes. |
@KevinOConnor MULTI is used (no stow between probes): works fine MULTI is off (stow between probes):
error:
|
Update: If I turn off the reactor to raise the probe on a trigger, it works fine. I will continue with my testing while maybe you can check that last commit: 5c46a79 |
Intermediate result
All of my probes including the clone want:
Setting one of these to false is semantically equivalent to turning off one of the checks being done. I realize that the names describe exactly what feature/behaviour is referred to, but for the less informed user, these are cryptic. I would much prefer:
Change the eror message in Same goes for the verify, only here the error message is already "compatible". Edit: Or of course perhaps I prefer the opposite logic:
|
Intermediate result In unhomed state,
I'll open an issue, mabye. This is potentially destructive, the probe should never be deployed in an XY unhomed situation. |
Observation If the probe is in an ALARM state, for whatever reason, then deploy actions will obviously fail:
Just like with raise probe, a (perhaps) single retry would be a nice thing and this would get over the blinking state without needing to resort to Some users are putting a sequence of Example:
|
Observation I have reduced second homing speed to 1, probe speed and lift speed to 1, trying to make everything crazy slow - hoping to reproduce the problem (the imagined? problem) where not lifting up again fast enough might cause a problem. No such luck with this version of |
All of the above comments hold valid for all my probes. I am getting tired of changing them over and over again and the little plugs and wires are wearing out. IMHO, apart from some exotic clones, the BLTouch V3.0 is probably the most critical one, because of its weak signal driver. The BLTouch V3.1 comes a close second, because it wants to be STOW-commanded or nozzle-retracted in such a timely fashion after trigger. |
Test and observation I have, for stress testing purposes, done the following: In
This has the (for testing purposes) desired effect of delaying the Pin touches bed. Movement stops ( good! ), By reducing the number of Result I can put a maximum of four That means, I have 400ms leeway on my setup. That does not seem like much, somehow. But consider: Of the about 600+ms overall that I know of from using my scope, I have used 200ms at that point (from trigger to that code point). Does that correspond to your estimates @KevinOConnor ? I suppose one could actually code a routine that determines the leeway on an unknown setup by repeatedly doing this with increasing delay until an alarm occurs. <- actually serious idea |
That's odd. The FYI, you can directly set the delay amount with something like -Kevin |
5c46a79
to
399e227
Compare
By all means, not my call to make |
re. 1: Yes, broken, drop. You were just testing to see if I really tested this :-)) |
I know. But the 4 x 100 looks more graphic in the code sequence when posting it here. AFAIK .550 gets rounded to .560, we are looking at 20ms increments. Tomorrow I will binary search to the best closer value than by 100ms steps. |
I can definitely test with my clone probe once my current print is finished. I have always needed to have "pin_up_touch_mode_reports_triggered" set to False for this probe, so it will be interesting to see if it works with it set to True (along with "pin_up_reports_not_triggered"). I have also been unable to get this probe to honour the "stow_on_each_sample" set to False with the recent refactor. The best my clone ever worked was with @FanDjango's monolithic branch before it was split up into the four different phased rollouts (I think that version had some command timing differences that never made it into the final version?). |
My clone (TriangleLab 2019 NEW 3D Touch) behaves like @FanDjango's (when issued a The difference is that I can't use:
It results in an error (probably not relevant for this issue but I'll attach a log anyway). Also: Neither of these issues are particularly relevant to this case but they do seem to represent some difference in my probe behaviour. If there is something more I should test I would be happy to, and if I should open an issue for the other problems I am seeing I will be happy to do that as well. |
@fiferboy - Thanks. Just to be clear, this branch does not make it worse for you? Also, I understand you need to set -Kevin |
Correct - this branch does not make anything worse. My working config has:
|
Further testing shows me that a maximum leeway of 360ms is possible and still probe reliably with a BLTouch V3.1 on this machine (klipper and octoprint on ASUS Tinkerboard S, 32bit LPC1768 SBASE controller). This is independant on probe downward speed and lift speed. The error state experienced in this szenario occurs before any upward movement is begun.
At least this confirms that there is not much time to diddle around before stowing this probe. |
From your
That's the result of setting This routine takes the pin up (stow) and the sets touch mode. This probe does not then report a trigger, so I would actually surmise that it does not support "SW" touch mode at all, but who knows. Anyway, it means that SW mode with a pin-up cannot be "mis-used" to check that the probe is working. It doesn't really have any other disadvantage. This setting (to "false") just "turns off" the periodic check for functionality. Oh, one possible disadvantage might be that if klipper ever supports using SW (optionally) for probing, that maybe that will not work either. We will see...
I will do some more testing with my clone to see if I can discover some optimizations.
What happens?
We're getting there. Just need to identify some more details |
I rechecked my clone and confirm once more: It supports BUT YOU ARE RIGHT and I never tested this before with this clone, Reason: It will never trigger again after the first trigger, even if the pin is down. A very small addition to this PR will fix that. Lemme see if I can add something here... |
All I was able to do was to create PR #2678 against this branch here. I will let @KevinOConnor sort this out. But @fiferboy maybe you can look at the two very small code changes and test them? You need a new option in the config though, you can also see that in the PR #2678. |
CLONE BEHAVIOUR Looking more closely at how the clone operates. When the pin is deployed and you trigger the probe, the red light goes on to show that the probe has been triggered. To re-arm it you need a new deploy, as stated before. This new deploy, if the pin is already down, will do a very quick pin-up/down. Annoying but inconsequential, just costs a little time. So deploy timing on clones should be very "gracious" to make sure the probe is really deployed. A normal deploy is faster than a deployed-deploy, so to speak. So when |
CLONE BEHAVIOUR cont. Never had this with an original BLTouch:
|
@KevinOConnor Ok, I have concluded this now and cannot currently find any show-stoppers in this. Moving on to other things now. Thanks for your efforts and the new code. |
Return the triggered state from verify_state() and update the callers to raise the error (if needed). Signed-off-by: Kevin O'Connor <[email protected]>
Simplify raise_probe() by separating out the pin_up_not_triggered case. Signed-off-by: Kevin O'Connor <[email protected]>
…lure If an error is found during the pin_up_not_triggered check, then apply the reset command for a full second (instead of just 100ms). This gives the bltouch more time to check its internal state. Signed-off-by: Kevin O'Connor <[email protected]>
Be sure to fully raise the probe prior to starting any future toolhead movements in the multi_probe_end() case. Signed-off-by: Kevin O'Connor <[email protected]>
Some clones don't raise the pin on a reset and the ANTClabs BL-Touch sometimes doesn't raise the pin either. Rework the (infrequently called) sensor test code to always issue a pin_up command before the touch command. Also, perform a reset and retry if the sensor test fails. Signed-off-by: Kevin O'Connor <[email protected]>
Signed-off-by: Kevin O'Connor <[email protected]>
399e227
to
5c8d15b
Compare
Thanks - I went ahead and merged this branch.
That's odd. One way to diagnose this would be to produce an error (when using under 500ms of delay), then immediately issue a M112 command, and then attach the resulting Klipper log file here. The diagnostic info in the log contains the timing of the responses and actions. -Kevin |
I can do that next week, I'll put it on my schedule. |
In the event that a "raise probe" operation fails, the current code will issue a "reset" command to attempt to clear the error state. However, the "reset" command is only sent for 100ms. It appears a longer time is needed ( see #2267 (comment) ). This update reworks the code so the existing RETRY_RESET_TIME delay time of 1 second is now done while the "reset" command is being issued. This should hopefully give the BL-Touch more time to update its internal state.
@FanDjango - does this change look okay to you?
-Kevin