Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stub support (ESF-126) #103

Closed
wants to merge 1 commit into from
Closed

Conversation

higaski
Copy link
Contributor

@higaski higaski commented Mar 27, 2024

Add stub loader support as requested by #92 and #84.

The stubs are pulled from https://github.com/espressif/esptool/tree/master/esptool/targets/stub_flasher. A CMake target called gen_esp_stubs pulls the json files from a specified esptool version and converts them to .h/.c files. That target does not depend on ALL and does not get created if Python isn't installed (a warning is issued though).

New Kconfig option
- SERIAL_FLASHER_STUB_ENABLED ~~
~~ defaults to y

Additions to public API:

  • esp_loader_error_t esp_loader_connect_to_stub(esp_loader_connect_args_t *connect_args)
    Upload stub loader, then connect to it instead of the internal ROM loader.
  • esp_loader_change_transmission_rate_stub(uint32_t new_transmission_rate, uint32_t old_transmission_rate)
    Like esp_loader_change_transmission_rate but has a second parameter for the stub version of the CHANGE_BAUDRATE command.

Additions to private API:
-esp_loader_error_t loader_no_stub(bool no_stub) ~~
~~ Actual implementation of esp_loader_no_stub.

  • esp_loader_error_t loader_run_stub(target_chip_t target)
    Upload and run the stub loader.

New definition
- STUB_ENABLED

@higaski higaski marked this pull request as draft March 27, 2024 15:48
@github-actions github-actions bot changed the title WIP: stub support WIP: stub support (ESF-126) Mar 27, 2024
@higaski higaski force-pushed the stub-support branch 3 times, most recently from b4d52b9 to fed960a Compare March 27, 2024 18:01
@higaski higaski changed the title WIP: stub support (ESF-126) stub support (ESF-126) Mar 27, 2024
@DNedic
Copy link
Collaborator

DNedic commented Mar 28, 2024

Hello @higaski thanks for the contribution! I will investigate the issue you are having as soon as I can.

Currently, our intent is to use the Rust flasher stub for esp-serial-flasher for stub support, however for regular serial flashing it should be 100% compatible with the C flasher stub.

@higaski
Copy link
Contributor Author

higaski commented Mar 29, 2024

Currently, our intent is to use the Rust flasher stub for esp-serial-flasher for stub support, however for regular serial flashing it should be 100% compatible with the C flasher stub.

Sure, feel free to change the way the stub loader gets generated (and or included) to whatever way you prefer. I already thought that the JSON conversion would be more of a temporary thing.

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

I'm not sure we should do both compile and run-time stub toggling this way.

In the case of run-time toggling, if the compiler has visibility into the fact that stubs are not used at compile-time, it will still mark all unused symbols for removal by the linker.

Conversely, we could just leave the compile-time toggle and simplify the code. Cases where we would try stub flashing and then fall back if something doesn't work should be extreme.

Personally, I would add a boolean flag use_stub to esp_loader_connect_args_t to toggle stub use or create a new esp_loader_connect_stub function. None of these would be breaking changes either.

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

Regarding the SPI_ATTACH logic for ESP8266, in esptool, ESPLoader is only a base class and the behavior for ESP8266 is overriden by its concrete class here.

Edit: I realize this is not what you are reffering to, according to the code, esptool should be sending SPI_ATTACH when using the stub only when a custom SPI config is passed with the commandline. We can skip it entirely apparently until we decide to support that.

* - ESP_LOADER_SUCCESS Success
*/
#if STUB_ENABLED
esp_loader_error_t esp_loader_no_stub(bool no_stub);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A name like esp_loader_use_stub/esp_loader_dont_use_stub would be more verbose.

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

Stub support should be gated behind #if (defined SERIAL_FLASHER_INTERFACE_UART) || (defined SERIAL_FLASHER_INTERFACE_USB) for now, as the stub does not support uploading over SPI.

@higaski
Copy link
Contributor Author

higaski commented Apr 3, 2024

I'm not sure we should do both compile and run-time stub toggling this way.

The reason I've added both, compile- and run-time toggling is that embedded systems might not want to add the stubs if they don't need them. I assume adding all stubs takes a reasonable amount of flash memory. Run-time toggling on the other hand is useful for things like x86 applications where the footprint of all stubs isn't an issue...

In the case of run-time toggling, if the compiler has visibility into the fact that stubs are not used at compile-time, it will still mark all unused symbols for removal by the linker.

I don't see how this could happen. The compiler can't optimize away the stubs if their usage depends on another run-time value.

Personally, I would add a boolean flag use_stub to esp_loader_connect_args_t to toggle stub use or create a new esp_loader_connect_stub function. None of these would be breaking changes either.

I like that, it sounds like a clean solution. It would also make esp_loader_no_stub obsolete.

Stub support should be gated behind #if (defined SERIAL_FLASHER_INTERFACE_UART) || (defined SERIAL_FLASHER_INTERFACE_USB) for now, as the stub does not support uploading over SPI.

STUB_ENABLED implies SERIAL_FLASHER_INTERFACE_UART. But I guess no harm in being more explicit.




I have another issue with the stub code. When I try to switch the transmission rate like done in the examples

if (higher_transmission_rate && esp_loader_get_target() != ESP8266_CHIP) {
err = esp_loader_change_transmission_rate(higher_transmission_rate);
if (err == ESP_LOADER_ERROR_UNSUPPORTED_FUNC) {
printf("ESP8266 does not support change transmission rate command.");
return err;
} else if (err != ESP_LOADER_SUCCESS) {
printf("Unable to change transmission rate on target.");
return err;
} else {
err = loader_port_change_transmission_rate(higher_transmission_rate);
if (err != ESP_LOADER_SUCCESS) {
printf("Unable to change transmission rate.");
return err;
}
printf("Transmission rate changed changed\n");
}
}

then detecting the flash size fails.

    if (esp_loader_flash_detect_size(&flash_size) == ESP_LOADER_SUCCESS) {
        if (image_size + offset > flash_size) {
            return ESP_LOADER_ERROR_IMAGE_SIZE;
        }
        loader_port_start_timer(DEFAULT_TIMEOUT);
        RETURN_ON_ERROR( loader_spi_parameters(flash_size) );
    } else {
        loader_port_debug_print("Flash size detection failed, falling back to default");
    }

More specifically this call here

RETURN_ON_ERROR( spi_flash_command(SPI_FLASH_READ_ID, NULL, 0, &flash_id, 24) );

I can't find documentation of an 0x9F command? Is this a ROM loader only thing?

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

In the case of run-time toggling, if the compiler has visibility into the fact that stubs are not used at compile-time, it will still mark all unused symbols for removal by the linker.

I don't see how this could happen. The compiler can't optimize away the stubs if their usage depends on another run-time value.

It will happen if the value is known at compile time and the effect is visible in that translation unit. In the case of using the config struct, it will not happen without link-time optimization, however if we have a separate connect function, and that function never gets called, the compiler will mark all symbols that have a sole dependency on that function for removal by the linker, stubs included.

@higaski
Copy link
Contributor Author

higaski commented Apr 3, 2024

In the case of run-time toggling, if the compiler has visibility into the fact that stubs are not used at compile-time, it will still mark all unused symbols for removal by the linker.

I don't see how this could happen. The compiler can't optimize away the stubs if their usage depends on another run-time value.

It will happen if the value is known at compile time and the effect is visible in that translation unit. In the case of using the config struct, it will not happen without link-time optimization, however if we have a separate connect function, and that function never gets called, it will discard all symbols that have a sole dependency on that function, stubs included.

So that I understand you correctly. Let's drop compile-time support in favor of a separate connect function and rely on the compiler to remove the stubs if not needed?

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

In the case of run-time toggling, if the compiler has visibility into the fact that stubs are not used at compile-time, it will still mark all unused symbols for removal by the linker.

I don't see how this could happen. The compiler can't optimize away the stubs if their usage depends on another run-time value.

It will happen if the value is known at compile time and the effect is visible in that translation unit. In the case of using the config struct, it will not happen without link-time optimization, however if we have a separate connect function, and that function never gets called, it will discard all symbols that have a sole dependency on that function, stubs included.

So that I understand you correctly. Let's drop compile-time support in favor of a separate connect function and rely on the compiler to remove the stubs if not needed?

Yes, either that or only use compile-time toggling. Two toggles complicate code, and over time will create bugs when people forget to check both.

If you're concerned about the approach, try mocking up a quick test and see if the stubs are dropped by looking at the mapfile.

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

I have another issue with the stub code. When I try to switch the transmission rate then detecting the flash size fails.

I assume that is again because of this line:

RETURN_ON_ERROR( loader_spi_parameters(flash_size) );

Reading flash ID is also done by esptool: https://github.com/espressif/esptool/blob/master/esptool/loader.py#L937

Edit: looks like there is a difference in the command for changing the baudrate: https://github.com/espressif/esptool/blob/master/esptool/loader.py#L1138

@DNedic
Copy link
Collaborator

DNedic commented Apr 3, 2024

When writing to RAM, we need to check if we are overwriting the stub.

@higaski
Copy link
Contributor Author

higaski commented Apr 3, 2024

Edit: looks like there is a difference in the command for changing the baudrate: https://github.com/espressif/esptool/blob/master/esptool/loader.py#L1138

Yes, this works. Now I'm starting to miss function overloads...
Any preferences how this should be handled with the function

esp_loader_error_t esp_loader_change_transmission_rate(uint32_t transmission_rate)

The stub version would require the old baud rate as second parameter.
I've thought about adding a baudrate variable to

typedef struct {
    uint32_t sync_timeout;  /*!< Maximum time to wait for response from serial interface. */
    int32_t trials;         /*!< Number of trials to connect to target. If greater than 1,
                               100 millisecond delay is inserted after each try. */
    uint32_t baudrate;  /*!< Like this maybe? */
} esp_loader_connect_args_t;

and then store a local copy of it. This would then allow setting the old baud rate for the CHANGE_BAUDRATE command depending on whether the stub loader is used or not.

/edit
Or maybe add another function

esp_loader_error_t esp_loader_change_transmission_rate_from_to(uint32_t from, uint32_t to)

but then the user would have to know those subtle differences between stub and ROM loader.


// stub loader sends a custom SLIP packet of the sequence OHAI
uint8_t buff[4];
err = SLIP_receive_packet(buff, sizeof(buff) / sizeof(buff[0]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a problem here where the UART/USB specific protocol details are bleeding into the common source file, and we are adding an ugly #ifdef to a common module, defeating the purpose.

Can we move loader_run_stub() into protocol_uart.c?

Copy link
Contributor Author

@higaski higaski Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
But half of the functions inside the common module are already not really common are they? The header contains an ifdef as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, however

  1. We can address this at a later time
  2. They will be common once stub gets adapted for use over SPI (though they will depend on the stub use)

Copy link
Contributor Author

@higaski higaski Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DNedic
Copy link
Collaborator

DNedic commented Apr 5, 2024

The stub version would require the old baud rate as second parameter.
I've thought about adding a baudrate variable to

typedef struct {
    uint32_t sync_timeout;  /*!< Maximum time to wait for response from serial interface. */
    int32_t trials;         /*!< Number of trials to connect to target. If greater than 1,
                               100 millisecond delay is inserted after each try. */
    uint32_t baudrate;  /*!< Like this maybe? */
} esp_loader_connect_args_t;
and then store a local copy of it. This would then allow setting the old baud rate for the CHANGE_BAUDRATE command depending on whether the stub loader is used or not.

/edit
Or maybe add another function

esp_loader_error_t esp_loader_change_transmission_rate_from_to(uint32_t from, uint32_t to)

but then the user would have to know those subtle differences between stub and ROM loader.

For this reason, the fact that we now have to track if the stub is running for some things (like checking whether we are overwriting it), and the fact that stubs are now a part of the build process, whether users use them or not (along with attempting to pull from esptool repo) I am leaning more and more to a preprocessor only toggle without the possibility of deciding on stub use at runtime at all.

@dobairoland WDYT?

@higaski
Copy link
Contributor Author

higaski commented Apr 5, 2024

For this reason, the fact that we now have to track if the stub is running for some things (like checking whether we are overwriting it), and the fact that stubs are now a part of the build process, whether users use them or not (along with attempting to pull from esptool repo) I am leaning more and more to a preprocessor only toggle without the possibility of deciding on stub use at runtime at all.

I'd like to keep the runtime toggle if possible. I've already built it into my application.

/edit
Even with a compile time only switch you'd need to check if the stub is running for the RAM command.

@DNedic
Copy link
Collaborator

DNedic commented Apr 5, 2024

Even with a compile time only switch you'd need to check if the stub is running for the RAM command.

Unless I am missing something, if we always upload the stub with esp_loader_connect() when the compile-time switch is enabled, we don't have to check anything at runtime.

@higaski
Copy link
Contributor Author

higaski commented Apr 6, 2024

Even with a compile time only switch you'd need to check if the stub is running for the RAM command.

Unless I am missing something, if we always upload the stub with esp_loader_connect() when the compile-time switch is enabled, we don't have to check anything at runtime.

When uploading the bootloader (mem* functions) you can't check for RAM being overwritten. Of course the stub loader gets written to the stub loader destination. So, even with a compile time switch you'd need a flag for this special case to make sure that check does not run before the stub is in place.

Maybe we can find a compromise and have both again, compile- and run time switch?

/edit
Pulling the stubs from the repo is not part of the build process. This is an optional target.

CMakeLists.txt Outdated Show resolved Hide resolved
@DNedic
Copy link
Collaborator

DNedic commented Apr 6, 2024

When uploading the bootloader (mem* functions) you can't check for RAM being overwritten. Of course the stub loader gets written to the stub loader destination. So, even with a compile time switch you'd need a flag for this special case to make sure that check does not run before the stub is in place.

Sorry, maybe I didn't elaborate well enough. To upload anything using mem*() functions you first need to connect using esp_loader_connect(). If the stub is always uploaded during esp_loader_connect(), and we know the stub size, we can safely know where the user is not allowed to write to RAM.

@higaski
Copy link
Contributor Author

higaski commented Apr 6, 2024

When uploading the bootloader (mem* functions) you can't check for RAM being overwritten. Of course the stub loader gets written to the stub loader destination. So, even with a compile time switch you'd need a flag for this special case to make sure that check does not run before the stub is in place.

Sorry, maybe I didn't elaborate well enough. To upload anything using mem*() functions you first need to connect using esp_loader_connect(). If the stub is always uploaded during esp_loader_connect(), and we know the stub size, we can safely know where the user is not allowed to write to RAM.

Then the stub loader would require its own esp_loader_mem_start (and friends) functions. Currently I use it to upload the stub and those functions don't know if the stub is already in place or not.

@DNedic
Copy link
Collaborator

DNedic commented Apr 6, 2024

Then the stub loader would require its own esp_loader_mem_start (and friends) functions. Currently I use it to upload the stub and those functions don't know if the stub is already in place or not.

You're correct, my apologies.

"stub_flasher_32c3.json", # ESP32C3_CHIP
"stub_flasher_32s3.json", # ESP32S3_CHIP
"stub_flasher_32c2.json", # ESP32C2_CHIP
"reserved.json", # ESP32_RESERVED0_CHIP
Copy link
Collaborator

@DNedic DNedic Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause an exception, and write " // " + file_to_download + "\n" " {},\n" "\n" to the file, however if we are adding support for a new chip, and mistype the .json filename, now we're going to need to debug things to find out that the reason the stub info all 0s is because of that.

An even bigger problem is when this overwrites existing stubs for people without an internet connection at the moment of building.

Perhaps we can add special handling for reserved.json, and either not handle the exception, or print a friendly error message that the specified file could not be found (and that it is possibly because the host has no internet connection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target which creates the stubs does not run during the build. I've intentionally chosen not to do that. If you don't explicitly invoke the "gen_flasher_stubs" target the stubs will not change!

I'd have preferred to scan the GitHub repo and only download existing .json files, but I wanted to match the existing target enumeration order. That's also the reason why there absolutely needs to be a hole for the currently reserved target. Otherwise the following enum value would be off by one.

I agree that this isn't ideal though, but getting rid of the "reserved" value is not an option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason my brain hallucinated the PRE_BUILD argument, my bad.

Still, we should not fail by writing zeroes, but instead notify the user. We don't have to get rid of the "reserved" value, just add a check for it and work around it by writing " // " + file_to_download + "\n" " {},\n" "\n".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen of the stubs now fails if the HTTP request fails for whatever reason.

https://github.com/higaski/esp-serial-flasher/commit/e4cbcae6c606b79850abc46dbd9a30457a613752

@DNedic
Copy link
Collaborator

DNedic commented Apr 6, 2024

Overall, the PR seems in a solid state now, I will dedicate some time to test it on hardware. @radimkarnis If you have the time please take a look as well.

@higaski
Copy link
Contributor Author

higaski commented May 24, 2024

I apologize, I should learn to read with comprehension... I've thought esp-flasher-stub is GPL but actually the esptool is. So, in fact, my tool is currently violating GPL (oops 🙈 ).

The URL option would solve my problem of course, but it sounds like an unreasonable amount of work for... well... just me. How much time would you estimate to get the >16MB issue done? Because if you can manage that over the next few months, your time would probably be better spent there.

@higaski
Copy link
Contributor Author

higaski commented Jun 6, 2024

Well?

@DNedic
Copy link
Collaborator

DNedic commented Jun 6, 2024

Unfortunately we cannot commit to a timeframe at the moment, but it is quite possible that we will add the >16MB feature to the Rust flasher stub within the next few months. I am currently resolving an issue with the first packet being lost after the baudrate change with the Rust stub, but I expect that the stub support will be merged shortly after.

If you are fine with this, we will proceed without adding additional support for using the esptool stub, otherwise we can add it.

@higaski
Copy link
Contributor Author

higaski commented Jun 6, 2024

Sounds good to me, thanks!

@DNedic
Copy link
Collaborator

DNedic commented Jul 2, 2024

Hi @higaski , please take a look at the changes merged and adapt your usecase. We've added support for overriding the stub sources as well, so you can use the esptool stub until we add support for >=16MB flash in the esp-flasher-stub project.

@higaski
Copy link
Contributor Author

higaski commented Jul 2, 2024

I appreciate the effort of adding an override for the stub sources, but I can't figure out how to use it properly.
The esptool stub files and the esp-flasher-stub files do not follow the same naming convention.

E.g., the ESP32 one is called stub_flasher_32.json in the esptool repo and esp32.json in the esp-flasher-stub repo.

So even if the path gets overriden the python code gen is going to fail isn't it?

if stub_override_path:
    with open(f"{stub_override_path}/{file_to_download}") as file_path:
        cfile.write(read_stub_json(file_path))

@DNedic
Copy link
Collaborator

DNedic commented Jul 2, 2024

Sorry for the confusion, I forgot to mention that the esptool stub has been moved to a new repository which will have releases with the same naming conventions.

@dobairoland
Copy link
Collaborator

Sorry for the confusion, I forgot to mention that the esptool stub has been moved to a new repository which will have releases with the same naming conventions.

@higaski The release is done: https://github.com/espressif/esptool-legacy-flasher-stub/releases

@higaski
Copy link
Contributor Author

higaski commented Jul 15, 2024

Little user guide would have been helpful:

This seems to work

"SERIAL_FLASHER_STUB_PULL_VERSION 1.1.0"
"SERIAL_FLASHER_STUB_PULL_SOURCE https://github.com/espressif/esptool-legacy-flasher-stub/releases/download"

@higaski
Copy link
Contributor Author

higaski commented Jul 31, 2024

@DNedic
I found some time to switch from my latest branch before the merge to version 1.4.0 and now all of a sudden I'm getting the "Flash size detection failed, falling back to default" messages thrown at me. Before I further investigate... any ideas what might be causing this? You've mentioned some timing issues before.

Here's a log of a failed try
failure.log

And here one writing the same binaries but with my old branch
success.log

/edit
Does not seem to matter if I use the new Rust or the old C stubs. Chip doesn't matter either, ESP32 and ESP32S3 don't work for me now. Flashing without stubs still works.

@DNedic
Copy link
Collaborator

DNedic commented Jul 31, 2024

Hi @higaski , the timing issues we had were due to the flasher stub not being ready for a new command immedeately after responding to a previous one. For this reason, we have added this delay. It looks like the issue you are having has identical symptoms to the ones we faced. With esptool, this issue was probably masked due to inherent OS/python/driver delays. Can you try increasing the delay and reporting back results?

One other way we could solve this is to make retries apply to all commands.

@higaski
Copy link
Contributor Author

higaski commented Jul 31, 2024

@DNedic
Sadly changing that delay to 10, 20, 50 or even 100ms did not change anything.

@DNedic
Copy link
Collaborator

DNedic commented Jul 31, 2024

If you have a logic analyzer, or an oscilloscope, is it possible for you to compare the timing between the ESP port and your own port? That might give us some clues.

@higaski
Copy link
Contributor Author

higaski commented Aug 2, 2024

@DNedic
Ahhh got it. You've switched the parameters of esp_loader_change_transmission_rate_stub around. 😄 I hadn't noticed before.

/edit
So the only thing left still not working for me is MD5 verification. I've tried to increase MD5_TIMEOUT_PER_MB but still all I get are timeouts. It fails when checking the response inside send_cmd_md5.

@DNedic
Copy link
Collaborator

DNedic commented Aug 2, 2024

So the only thing left still not working for me is MD5 verification. I've tried to increase MD5_TIMEOUT_PER_MB but still all I get are timeouts. It fails when checking the response inside send_cmd_md5.

You mentioned using 1.4.0, we should have fixed this issue in 1.4.1

@higaski
Copy link
Contributor Author

higaski commented Aug 2, 2024

So the only thing left still not working for me is MD5 verification. I've tried to increase MD5_TIMEOUT_PER_MB but still all I get are timeouts. It fails when checking the response inside send_cmd_md5.

You mentioned using 1.4.0, we should have fixed this issue in 1.4.1

Problem still exists for me in 1.4.1

According to the trace the stub response looks strange

--- WRITE ---
c0 00 13 10 00 00 00 00 00 00 10 00 00 50 68 00 00 00 00 00 00 00 00 00 00 c0 
--- READ ---
c0 01 12 00 00 00 00 00 55 0e 40 72 d4 2f 93 1b 96 90 05 bd 63 99 20 9b 00 00 c0 

Command 0x13 (SPI_FLASH_MD5) does trigger a response with 0x12.

/edit
What's even stranger is that the response contains the correct checksum...

55 0e 40 72 d4 2f 93 1b 96 90 05 bd 63 99 20 9b

is exactly what get's calculated inside esp_loader_flash_verify

@DNedic
Copy link
Collaborator

DNedic commented Aug 5, 2024

This is rather strange @higaski , unfortunately I was not able to reproduce it with the ESP32 port example. The problem, from your log does seem to be that the command for the response is wrong, however this makes me suspicious of it being a hardware problem.

Can you reproduce this exact output every time or are different bytes also wrong from time to time?

@higaski
Copy link
Contributor Author

higaski commented Aug 10, 2024

This is rather strange @higaski , unfortunately I was not able to reproduce it with the ESP32 port example. The problem, from your log does seem to be that the command for the response is wrong, however this makes me suspicious of it being a hardware problem.

Can you reproduce this exact output every time or are different bytes also wrong from time to time?

No, it always looks like this. Just tried an ESP32S3-DevKitC instead (checksum is different obviously):

--- WRITE ---
c0 00 13 10 00 00 00 00 00 00 00 00 00 40 52 00 00 00 00 00 00 00 00 00 00 c0 
--- READ ---
c0 01 12 00 00 00 00 00 5a e5 9e 4d 0e c1 15 34 c2 37 5c 67 74 f0 75 00 00 c0 

@DNedic
Copy link
Collaborator

DNedic commented Aug 10, 2024

This is rather strange @higaski , unfortunately I was not able to reproduce it with the ESP32 port example. The problem, from your log does seem to be that the command for the response is wrong, however this makes me suspicious of it being a hardware problem.
Can you reproduce this exact output every time or are different bytes also wrong from time to time?

No, it always looks like this. Just tried an ESP32S3-DevKitC instead (checksum is different obviously):

--- WRITE ---
c0 00 13 10 00 00 00 00 00 00 00 00 00 40 52 00 00 00 00 00 00 00 00 00 00 c0 
--- READ ---
c0 01 12 00 00 00 00 00 5a e5 9e 4d 0e c1 15 34 c2 37 5c 67 74 f0 75 00 00 c0 

Is it possible for you to get a logic analyzer trace of this? Also, since you're using an user port we really need to be sure that the problem is in the library and not the port/hardware combination itself.

@higaski
Copy link
Contributor Author

higaski commented Aug 10, 2024

Is it possible for you to get a logic analyzer trace of this? Also, since you're using an user port we really need to be sure that the problem is in the library and not the port/hardware combination itself.

Sure, give me some time though. I'll see to get it done till the end of the upcoming week.

@higaski
Copy link
Contributor Author

higaski commented Aug 19, 2024

I've hooked up an oscilloscope with serial decoding for now.
Sadly I don't have the exact same software (and therefor MD5 hash) I've used in my previous example anymore, but the error didn't change.

So, at the point when the MD5 hash is presumably requested it transmits

--- WRITE ---
c0 00 13 10 00 00 00 00 00 00 10 00 00 50 68 00 00 00 00 00 00 00 00 00 00 c0 

This is correct, it shows the same values on the wire.

The response according to my log though is

--- READ ---
c0 01 12 00 00 00 00 00 1e 88 ee de 77 32 e9 7e d2 48 e7 c1 aa e1 37 00 00 c0 

And this diverges from what I can see on my oscilloscope.

There it says (with quite some delay before the hash)

--- OSCILLOSCOPE ---
c0 01 13 12 00 00 00 00 00 1e 88 ee de 77 32 e9 7e 11 d2 48 e7 c1 aa e1 37 00 00 c0

So... for some reason the actual command response (0x13?) get's lost?
lost_byte

/edit
The behavior can also be observed on smaller baudrates (e.g. 115.2k)

@DNedic
Copy link
Collaborator

DNedic commented Aug 19, 2024

Thanks for the feedback @higaski ,it looks like this is a problem with either the port or the hardware, since the tracing is happening at the port level and the target is receiving and sending the correct data.

You can try using different USB to Serial adapters, it is possible that there is a slight enough baud mismatch to cause this. You can also try using the esp-usb-bridge project as an USB to Serial converter.

@higaski
Copy link
Contributor Author

higaski commented Aug 19, 2024

Wouldn't a hardware problem also be present when not doing the MD5 verify?

@DNedic
Copy link
Collaborator

DNedic commented Aug 19, 2024

It is possible, you can try checking if other instances of 0x13 get wrongly read as well, but because we cannot really reproduce your issue and the fact that tracing happens at the port level and you're using a user port means that we can't help you with the issue unfortunately.

If there infact turns out to be a problem not at the port level, for instance due to memory corruption due to buffer over/underruns in the library or any other similar cause, please make sure to report that. Since the port itself runs on desktop PCs you can try using tools like Valgrind.

@DNedic
Copy link
Collaborator

DNedic commented Oct 17, 2024

@higaski Do you happen to still have the setup you had used when the MD5 issue with the stub happened? I'm wondering what the ECO version of the target chip is.

@higaski
Copy link
Contributor Author

higaski commented Oct 17, 2024

@higaski Do you happen to still have the setup you had used when the MD5 issue with the stub happened? I'm wondering what the ECO version of the target chip is.

Yes I do, will report back next week.

@higaski
Copy link
Contributor Author

higaski commented Oct 23, 2024

@higaski Do you happen to still have the setup you had used when the MD5 issue with the stub happened? I'm wondering what the ECO version of the target chip is.

Does this contain the information you're looking for? I guess it's the chip revision?

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:7176
load:0x40078000,len:15564
ho 0 tail 12 room 4
load:0x40080400,len:4
load:0x40080404,len:3904
entry 0x40080640
I (31) boot: ESP-IDF v5.3 2nd stage bootloader
I (31) boot: compile time Oct 23 2024 07:27:53
I (31) boot: Multicore bootloader
I (36) boot: chip revision: v1.0

/edit
And the S3

rst:0xc (RTC_SW_CPU_RST),boot:0x18 (SPI_FAST_FLASH_BOOT)
Saved PC:0x403758e0
SPIWP:0xee
Octal Flash Mode Enabled
For OPI Flash, Use Default Flash Boot Mode
mode:SLOW_RD, clock div:1
load:0x3fce2810,len:0x178c
load:0x403c8700,len:0x4
load:0x403c8704,len:0xcb8
load:0x403cb700,len:0x2db0
entry 0x403c8914
I (37) boot: ESP-IDF v5.3 2nd stage bootloader
I (38) boot: compile time Oct 23 2024 08:06:43
I (38) boot: Multicore bootloader
I (41) boot: chip revision: v0.1

@DNedic
Copy link
Collaborator

DNedic commented Oct 23, 2024

Thanks for the info! I will test these exact chip revisions. It is possible (however unlikely that both suffer from it) that ROM versions for certain revisions can have bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants