stm32: Fix cache corruption on unaligned SDCard reads by projectgus · Pull Request #19140 · micropython/micropython · GitHub
Skip to content

stm32: Fix cache corruption on unaligned SDCard reads#19140

Open
projectgus wants to merge 3 commits intomicropython:masterfrom
projectgus:bugfix/stm32_sdcard_dma_prot
Open

stm32: Fix cache corruption on unaligned SDCard reads#19140
projectgus wants to merge 3 commits intomicropython:masterfrom
projectgus:bugfix/stm32_sdcard_dma_prot

Conversation

@projectgus
Copy link
Copy Markdown
Contributor

@projectgus projectgus commented Apr 23, 2026

Summary

This PR fixes an issue where SDCard read operations were prone to cache corruption on STM32s with D-Cache, if the read region was not cache-aligned. Should fix the issue reported in #19010 on stm32 - although more work is needed to fix a known corner case (see below).

Frequency of this corruption was very low because most interrupts were disabled during SDCard operations (to prevent USB interrupts, as the USB device may also expose the SDCard as a block device). This PR makes that interrupt change conditional on the SDCard being currently exposed over MSC. This makes the bug much easier to trigger (but it's also a useful change, to not unnecessarily starve interrupts while waiting for SDCard operations to complete).

Otherwise the new test case & fix are very similar to #13717, uses the dma_protect_rx_region() function added in that PR.

There is still some potential for corruption if the buffer is not 4-byte aligned and during an interrupt the CPU writes to another byte in the same 32-bit word. This seems a limitation of the bus not the D-Cache, and won't trigger except in very unusual cases (as buffers will almost always be 4-byte aligned.) EDIT: This is a race with existing logic in sdcard_read_blocks() that shuffles around data to handle word-unaligned reads. The tests are written to avoid this case.

Further work needed

The dma_protect_rx_region fixes applied for SPI and SDCard reads are effective provided that two "protected" DMA operations are not in progress at the same time. If two operations are in progress at the same time (i.e. a SPI read when an interrupt starts an SD read, or vice versa) then the second operation to start will clobber the MPU settings used to protect the first operation's unaligned regions.

If we extend dma_protect_rx_region to cover even more DMA operations then this kind of race will become even more likely, so we need to use a different approach (probably temporary DMA-aligned buffer(s) covering any unaligned parts of the operation.)

Additional work is also needed to apply this protection to other DMA reads on stm32 port.

Testing

  • New test cases test_interrupted_reads() and test_interrupted_read_write() fail immediately on PYBD_SF2 without the fix commit from this PR, do not fail after fix is applied.
  • New test case test_reads() depends on CPU speculative reads so is less predictable to trigger (the other two introduce deliberate reads and writes). This one triggers approx 1 in 300 repeats without this fix, am unable to trigger in >1500 repeats after fix (the submitted test doesn't run this many repeats by default, as this takes >7min to complete).
  • To verify the interrupt behaviour change: Edited boot.py and added pyb.usb_mode('VCP+MSC', msc=(pyb.Flash(), pyb.SDCard())), then reverted the fix commit and verified the new test cases passed in this configuration (the bug doesn't trigger easily because most interrupts are still disabled during SD operations if the SD card is exposed via USB MSC).

Trade-offs and Alternatives

  • The linked issue has some discussion about ensuring particular DMA reads are always 32-byte aligned to avoid these cache issues. MicroPython heap blocks are 16-byte aligned, so currently nothing in the heap can be guaranteed DMA-aligned. We could add an m_malloc_dma() function, but there's still nothing stopping Python code from slicing a memoryview and passing this arbitrary slice to a DMA operation (as demonstrated in the new tests). So we would still need a fix like this (or the proposed future fix) to ensure unaligned DMA operations don't trigger corruption.

Generative AI

I did not use generative AI tools when creating this PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Code size report:

Reference:  extmod/moductypes: Be more defensive with uctypes_struct_agg_size args. [8a56be6]
Comparison: sdcard: Use dma_protect_rx_region on read operations. [merge of 799356e]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:   +56 +0.014% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

- This is necessary if the USB MSC device is exposing the SD Card,
  as a USB operation might read/write SD.
- However, rather than disabling USB interrupts (and all lower interrupts)
  unconditionally, we can check if MSC is enabled & SDCard is configured
  as one of the USB MSC LUNs.

This happens to be necessary to reliably trigger SDCard DMA alignment bugs
from hard ISRs. However, it's good to do anyhow - allows interrupt
processing to continue while the CPU is blocked waiting for the SD
operation.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@projectgus projectgus force-pushed the bugfix/stm32_sdcard_dma_prot branch from 63a390f to 3e43ed3 Compare April 23, 2026 05:07
@projectgus
Copy link
Copy Markdown
Contributor Author

@kwagyeman @peterzuger This PR should fix STM32 D-Cache corruption on SD reads.

@projectgus projectgus requested a review from dpgeorge April 23, 2026 05:16
@dpgeorge
Copy link
Copy Markdown
Member

Thanks for working on this! The fix (even if it's not the complete solution) looks relatively clean and simple, which is great.

There is still some potential for corruption if the buffer is not 4-byte aligned and the CPU writes to another byte in the same 32-bit word. This seems a limitation of the bus not the D-Cache, and won't trigger except in very unusual cases (as buffers will almost always be 4-byte aligned.) The tests are written to avoid this case.

sdcard_read_blocks() has a special case to handle destination buffers that are not 4-byte aligned. So this case you describe should never be hit.

Did you find issues with buffers that were not 4-byte aligned? If not, could the test be tweaked to test also cases of non-4-byte alignment?

@dpgeorge
Copy link
Copy Markdown
Member

New test cases test_interrupted_reads() and test_interrupted_read_write() fail immediately on PYBD_SF2 without this fix, do not fail after fix.

(I assume you mean that the tests fail with the first one/two commits from this PR, to keep IRQs on while SD card reads are active. The tests don't fail on master.)

I tested this PR on PYBD_SF2 and PYBD_SF6:

  • including the MSC IRQ improvement from this PR, the test included in this PR fails on both boards (with about 110 failures reported by unittest)
  • including the DMA fix in this PR, the test passes on both boards (I ran it a few times and it passed all times)

@projectgus
Copy link
Copy Markdown
Contributor Author

sdcard_read_blocks() has a special case to handle destination buffers that are not 4-byte aligned. So this case you describe should never be hit.

Oh, this case is the reason for what I described! That makes more sense, I thought it was a weird behaviour if the hardware was doing it.

What I was seeing is that if you (for example) have a DMA operation reading into a buffer starting at address 0x13 and during the operation you have an interrupt write the byte at offset 0x12 then the whole word starting at 0x10 is corrupted. I'm pretty sure it's a race between the interrupt and this fixup logic that's doing that.

The proposed "future work" change with temporary buffers will effectively align these buffers in another way, so that fixup logic can probably come out then (meaning we'll also not have the potential race any more if some code does write into that word from an interrupt).

Did you find issues with buffers that were not 4-byte aligned? If not, could the test be tweaked to test also cases of non-4-byte alignment?

The test_interrupted_read_write works around it by having the interrupt not write into the same word as the start or end of the buffer. I'll update the comment in the test to explain the real reason that's necessary.

We could change the test so instead of testing every possible offset it only tests offsets which are a multiple of 4, but then we're actually reducing the test coverage so I think this is better.

@projectgus
Copy link
Copy Markdown
Contributor Author

projectgus commented Apr 23, 2026

(I assume you mean that the tests fail with the first one/two commits from this PR, to keep IRQs on while SD card reads are active. The tests don't fail on master.)

Oh yes, my mistake. Those tests do fail on master but very infrequently, the same as test_reads(), because without the other changes there's no timer interrupts firing during the DMA operation so they're basically testing the same thing.

Rename the existing dma_alignment test to spi_dma_align, as this one uses
the SPI driver.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Will avoid corruption for unaligned reads, unless a SPI DMA operation is in
progress at the same time (to be fixed separately).

This fixes the unit test added in the parent commit.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@projectgus projectgus force-pushed the bugfix/stm32_sdcard_dma_prot branch from 3e43ed3 to 799356e Compare April 23, 2026 07:13
@projectgus
Copy link
Copy Markdown
Contributor Author

projectgus commented Apr 23, 2026

we need to use a different approach (probably temporary DMA-aligned buffer(s) covering any unaligned parts of the operation.)

Ah.. I'd been assuming we could use scatter/gather chained transfers, so we're only needing temporary buffers for the head & tail of any unaligned operation (this is how we did it when I was at Espressif). The newer ST GPDMA peripheral supports these, but the older ST DMA peripheral does not. Meaning H5,H7,U5,N6 all have GPDMA so we can have a generic solution with very little performance overhead, but the F7-based boards will be caught out (I think F7 is the only STM32 family we support that has D-Cache but doesn't have GPDMA).

@andrewleech
Copy link
Copy Markdown
Contributor

There might be a related underlying issue I ran into on #18451 - the tests @robert-hh shared here did a great job of reproducing the intermittent issue. I gave up trying to fix it there but I'll take a look at what you've done here @projectgus to see if it's transferable.

@dpgeorge
Copy link
Copy Markdown
Member

Ah.. I'd been assuming we could use scatter/gather chained transfers, so we're only needing temporary buffers for the head & tail of any unaligned operation (this is how we did it when I was at Espressif). The newer ST GPDMA peripheral supports these, but the older ST DMA peripheral does not.

Maybe for the older parts (F7) we can just keep the current dma protect/unprotect and just have enough MPU regions for all possible concurrent DMA uses? Which might be just SD card and SPI.

@kwagyeman
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants