stm32: Fix cache corruption on unaligned SDCard reads#19140
stm32: Fix cache corruption on unaligned SDCard reads#19140projectgus wants to merge 3 commits intomicropython:masterfrom
Conversation
|
Code size report: |
- This is necessary if the USB MSC device is exposing the SD Card, as a USB operation might read/write SD. - However, rather than disabling USB interrupts (and all lower interrupts) unconditionally, we can check if MSC is enabled & SDCard is configured as one of the USB MSC LUNs. This happens to be necessary to reliably trigger SDCard DMA alignment bugs from hard ISRs. However, it's good to do anyhow - allows interrupt processing to continue while the CPU is blocked waiting for the SD operation. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
63a390f to
3e43ed3
Compare
|
@kwagyeman @peterzuger This PR should fix STM32 D-Cache corruption on SD reads. |
|
Thanks for working on this! The fix (even if it's not the complete solution) looks relatively clean and simple, which is great.
Did you find issues with buffers that were not 4-byte aligned? If not, could the test be tweaked to test also cases of non-4-byte alignment? |
(I assume you mean that the tests fail with the first one/two commits from this PR, to keep IRQs on while SD card reads are active. The tests don't fail on master.) I tested this PR on PYBD_SF2 and PYBD_SF6:
|
Oh, this case is the reason for what I described! That makes more sense, I thought it was a weird behaviour if the hardware was doing it. What I was seeing is that if you (for example) have a DMA operation reading into a buffer starting at address 0x13 and during the operation you have an interrupt write the byte at offset 0x12 then the whole word starting at 0x10 is corrupted. I'm pretty sure it's a race between the interrupt and this fixup logic that's doing that. The proposed "future work" change with temporary buffers will effectively align these buffers in another way, so that fixup logic can probably come out then (meaning we'll also not have the potential race any more if some code does write into that word from an interrupt).
The We could change the test so instead of testing every possible offset it only tests offsets which are a multiple of 4, but then we're actually reducing the test coverage so I think this is better. |
Oh yes, my mistake. Those tests do fail on master but very infrequently, the same as |
Rename the existing dma_alignment test to spi_dma_align, as this one uses the SPI driver. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
Will avoid corruption for unaligned reads, unless a SPI DMA operation is in progress at the same time (to be fixed separately). This fixes the unit test added in the parent commit. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
3e43ed3 to
799356e
Compare
Ah.. I'd been assuming we could use scatter/gather chained transfers, so we're only needing temporary buffers for the head & tail of any unaligned operation (this is how we did it when I was at Espressif). The newer ST GPDMA peripheral supports these, but the older ST DMA peripheral does not. Meaning H5,H7,U5,N6 all have GPDMA so we can have a generic solution with very little performance overhead, but the F7-based boards will be caught out (I think F7 is the only STM32 family we support that has D-Cache but doesn't have GPDMA). |
|
There might be a related underlying issue I ran into on #18451 - the tests @robert-hh shared here did a great job of reproducing the intermittent issue. I gave up trying to fix it there but I'll take a look at what you've done here @projectgus to see if it's transferable. |
Maybe for the older parts (F7) we can just keep the current dma protect/unprotect and just have enough MPU regions for all possible concurrent DMA uses? Which might be just SD card and SPI. |

Summary
This PR fixes an issue where SDCard read operations were prone to cache corruption on STM32s with D-Cache, if the read region was not cache-aligned. Should fix the issue reported in #19010 on stm32 - although more work is needed to fix a known corner case (see below).
Frequency of this corruption was very low because most interrupts were disabled during SDCard operations (to prevent USB interrupts, as the USB device may also expose the SDCard as a block device). This PR makes that interrupt change conditional on the SDCard being currently exposed over MSC. This makes the bug much easier to trigger (but it's also a useful change, to not unnecessarily starve interrupts while waiting for SDCard operations to complete).
Otherwise the new test case & fix are very similar to #13717, uses the
dma_protect_rx_region()function added in that PR.There is still some potential for corruption if the buffer is not 4-byte aligned and during an interrupt the CPU writes to another byte in the same 32-bit word.
This seems a limitation of the bus not the D-Cache, and won't trigger except in very unusual cases (as buffers will almost always be 4-byte aligned.)EDIT: This is a race with existing logic in sdcard_read_blocks() that shuffles around data to handle word-unaligned reads. The tests are written to avoid this case.Further work needed
The
dma_protect_rx_regionfixes applied for SPI and SDCard reads are effective provided that two "protected" DMA operations are not in progress at the same time. If two operations are in progress at the same time (i.e. a SPI read when an interrupt starts an SD read, or vice versa) then the second operation to start will clobber the MPU settings used to protect the first operation's unaligned regions.If we extend
dma_protect_rx_regionto cover even more DMA operations then this kind of race will become even more likely, so we need to use a different approach (probably temporary DMA-aligned buffer(s) covering any unaligned parts of the operation.)Additional work is also needed to apply this protection to other DMA reads on stm32 port.
Testing
test_interrupted_reads()andtest_interrupted_read_write()fail immediately on PYBD_SF2 without the fix commit from this PR, do not fail after fix is applied.test_reads()depends on CPU speculative reads so is less predictable to trigger (the other two introduce deliberate reads and writes). This one triggers approx 1 in 300 repeats without this fix, am unable to trigger in >1500 repeats after fix (the submitted test doesn't run this many repeats by default, as this takes >7min to complete).boot.pyand addedpyb.usb_mode('VCP+MSC', msc=(pyb.Flash(), pyb.SDCard())), then reverted the fix commit and verified the new test cases passed in this configuration (the bug doesn't trigger easily because most interrupts are still disabled during SD operations if the SD card is exposed via USB MSC).Trade-offs and Alternatives
m_malloc_dma()function, but there's still nothing stopping Python code from slicing a memoryview and passing this arbitrary slice to a DMA operation (as demonstrated in the new tests). So we would still need a fix like this (or the proposed future fix) to ensure unaligned DMA operations don't trigger corruption.Generative AI
I did not use generative AI tools when creating this PR.