Disabling optimizations in bootloader code. Why? #26
-
I didn't feel like this belonged as an issue, but it's a curiosity for me. I enabled I-Cache and D-Cache on my H7 and got significant gains in CPU bound operations. Also disabling branch prediction? That has me scratching my head. I found this in the bootloader assembly code for the multicore example - startup.s but I've seen similar code elsewhere in this repo I can't help but wonder what is the reason for doing these things? The cache I could potentially see if you don't want to tie up valuable SRAM, but what if you do? Are there any gotchas I should be aware of? And the branch prediction disabling, that just floors me. Does it break something? (I know I said I would take a break, but my curiosity got the better of me) :) mrc p15, 0, r0, c1, c0, 0 // Read System Control register (SCTLR)
bic r0, r0, #(0x1 << 12) // Clear I bit 12 to disable I Cache
bic r0, r0, #(0x1 << 2) // Clear C bit 2 to disable D Cache
bic r0, r0, #0x1 // Clear M bit 0 to disable MMU
bic r0, r0, #(0x1 << 11) // Clear Z bit 11 to disable branch prediction |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The caches and branch prediction are enabled in SystemInit(). They are just disabled in startup so we can set up the stack and other things without having to worry about flushing the cache or anything complex. Also, the cache does not use SRAM, so disabling it doesn't give you more available RAM (same as on the H7). The A7 is hardly usable with the caches off. The MMU can be used to set up regions that are non-cacheable, which you need to do for DMA transfers (or else keep the cache on and make sure to flush/invalidate at the right times) |
Beta Was this translation helpful? Give feedback.
-
Thanks. That clarifies a lot for me. Guess I made a poor assumption about how the cache works based on some lesser Espressif devices I'm more familiar with. In my first real use of the MCU I will be using 1 of the DMA controllers and 5 channels of it to drive SPI TX only lines as fast as the devices will allow (they are rated for 10Mhz according to the datasheet, but will accept up to 40MHz, and yet I've found no real improvements after 20MHz for some reason but that was on different hardware). The main challenge for me is getting LVGL divied across each core. I don't want them to collide, and I don't want a scheduler. I know, i want my cake, and I want to eat it. very demanding! |
Beta Was this translation helpful? Give feedback.
The caches and branch prediction are enabled in SystemInit().
They are just disabled in startup so we can set up the stack and other things without having to worry about flushing the cache or anything complex.
Also, the cache does not use SRAM, so disabling it doesn't give you more available RAM (same as on the H7).
The A7 is hardly usable with the caches off. The MMU can be used to set up regions that are non-cacheable, which you need to do for DMA transfers (or else keep the cache on and make sure to flush/invalidate at the right times)