rp2: RP235: PSRAM support by Gadgetoid · Pull Request #15620 · micropython/micropython · GitHub
Skip to content

rp2: RP235: PSRAM support#15620

Closed
Gadgetoid wants to merge 4 commits into
micropython:masterfrom
pimoroni:feature/psram
Closed

rp2: RP235: PSRAM support#15620
Gadgetoid wants to merge 4 commits into
micropython:masterfrom
pimoroni:feature/psram

Conversation

@Gadgetoid

@Gadgetoid Gadgetoid commented Aug 8, 2024

Copy link
Copy Markdown
Contributor

Depends upon RP2350 support: #15619

This is a rough draft gathering our PSRAM support code - with a dash of SparkFun's - and attempting to get it into the right shape for mainline.

TODO:

  • Potential: Remove init from main.c and add bindings to modrp2 so PSRAM can be enabled (or not) in boot.py
  • Potential: Support for PSRAM as either extending the GC heap (needs split heap) or storage/tmpfs!?
  • Clean up auto detect code and dispel magic numbers (Note: Have asked for SDK support)
  • Test test test test!

Known issues:

  • Possibly crashy, need a good test to reliably test the extent of the PSRAM
  • Needs to be reconfigured when machine.freq() is called, since a clock change breaks PSRAM timings
  • Currently no way to specify where MicroPython should store a particular object
  • Might break USB on startup or otherwise fail to start up
  • Support for PSRAM opens up RP2 to the class of bugs and pitfalls addressed here: ports/stm32: Add memory map with attributes for different MCU families. #16644

Improvements:

We're (Pimoroni) currently using yet more customisations on top of this PSRAM code for Presto, which stores the entire display buffer in SRAM and leaves basically none for others buffers. See: pimoroni@888f52f

This is a bit rough for inclusion into MicroPython - I always wince at a near duplicate linker script - but incredibly useful nonetheless.

Some kind of bindings for PSRAM rp2.PSRAM() in modrp2 might be nice, too. This would allow us to - theoretically - add a portion of PSRAM to the gc_heap and reserve the remainder for RAMFS, long-lived buffers, custom memory pools and other such nonsense. I did contemplate adding MICROPY_HW_PSRAM_RESERVE_BYTES for this, which would reserve N bytes from the detected PSRAM size for tomfoolery and shenanigans.

@codecov

codecov Bot commented Aug 8, 2024

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Aug 8, 2024

Copy link
Copy Markdown

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:  +224 +0.024% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@Gadgetoid Gadgetoid changed the title rp2: Add PSRAM support for new RP2350 MCU rp2: RP235: PSRAM support Aug 9, 2024
@Gadgetoid

Copy link
Copy Markdown
Contributor Author

I've retitled this PR to avoid it being confused with the RP2350 MCU support one 😆

@Gadgetoid

Copy link
Copy Markdown
Contributor Author

I seem to be having trouble with PSRAM when using split heap, a crude test which uses RAM until a garbage collection is forced will crash/hang on that collection when using split heap, but seems to work fine using exclusively PSRAM.

I'm going to refactor this code slightly to make split heap an option, rather than a requirement.

@MichaelBell MichaelBell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small remarks (about my code!) while I'm on a train.

Comment thread ports/rp2/rp2_psram.c Outdated
Comment thread ports/rp2/rp2_psram.c Outdated
// - Max select assumes a sys clock speed >= 120MHz
// - Min deselect assumes a sys clock speed <= 138MHz
// - Clkdiv of 1 is OK up to 133MHz.
qmi_hw->m[1].timing = 1 << QMI_M1_TIMING_COOLDOWN_LSB |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else branch could use a lower max select time, but given we're reprogramming this on every clock frequency change, let's just get it right for whatever the frequency is - I'll send a PR for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ever correct this? I've lost the plot a little!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gadgetoid Gadgetoid force-pushed the feature/psram branch 4 times, most recently from a732af7 to 7f1fcc7 Compare August 15, 2024 16:02
@Gadgetoid Gadgetoid force-pushed the feature/psram branch 2 times, most recently from 56abd12 to 870d858 Compare August 22, 2024 16:07
@Gadgetoid Gadgetoid force-pushed the feature/psram branch 3 times, most recently from 2fae3a9 to 3cd6781 Compare September 5, 2024 08:24
@Gadgetoid Gadgetoid force-pushed the feature/psram branch 5 times, most recently from e3ba430 to c98c139 Compare September 27, 2024 15:07
@Gadgetoid Gadgetoid force-pushed the feature/psram branch 3 times, most recently from 55880bf to 1585553 Compare October 23, 2024 10:01
Comment thread ports/rp2/rp2_flash.c
@projectgus

Copy link
Copy Markdown
Contributor

(Note to self: a graph would be nice)

Thanks @Gadgetoid. I dropped these into text files and ran the comparison pass (which isn't as nice as a graph, but it's something):

❯ ./run-perfbench.py -s ./pico2w.txt ./pico2wp_psram.txt 
diff of scores (higher is better)
N=133 M=100                ./pico2w.txt -> ./pico2w_psram.txt         diff      diff% (error%)
bm_chaos.py                    324.72 ->     213.81 :    -110.91 = -34.156% (+/-0.05%)
bm_fannkuch.py                  58.27 ->      72.60 :     +14.33 = +24.592% (+/-0.10%)
bm_fft.py                     2356.17 ->    3056.05 :    +699.88 = +29.704% (+/-0.03%)
bm_float.py                   4833.98 ->    4464.70 :    -369.28 =  -7.639% (+/-0.12%)
bm_hexiom.py                    43.40 ->      41.09 :      -2.31 =  -5.323% (+/-0.07%)
bm_nqueens.py                 2704.03 ->    2994.93 :    +290.90 = +10.758% (+/-0.06%)
bm_pidigits.py                 808.87 ->     572.44 :    -236.43 = -29.230% (+/-0.04%)
bm_wordcount.py                 72.39 ->      47.99 :     -24.40 = -33.706% (+/-0.03%)
core_import_mpy_multi.py       420.04 ->     376.57 :     -43.47 = -10.349% (+/-0.09%)
core_import_mpy_single.py       82.98 ->      75.29 :      -7.69 =  -9.267% (+/-0.49%)
core_locals.py                  59.06 ->      62.11 :      +3.05 =  +5.164% (+/-0.03%)
core_qstr.py                   152.09 ->     177.41 :     +25.32 = +16.648% (+/-0.12%)
core_str.py                     28.64 ->      26.87 :      -1.77 =  -6.180% (+/-0.06%)
core_yield_from.py             402.00 ->     385.85 :     -16.15 =  -4.017% (+/-0.04%)
misc_aes.py                    437.18 ->     548.87 :    +111.69 = +25.548% (+/-0.07%)
misc_mandel.py                2435.39 ->    3365.37 :    +929.98 = +38.186% (+/-0.07%)
misc_pystone.py               2069.15 ->    1932.38 :    -136.77 =  -6.610% (+/-0.05%)
misc_raytrace.py               345.46 ->     278.90 :     -66.56 = -19.267% (+/-0.04%)
viper_call0.py                 574.36 ->     569.95 :      -4.41 =  -0.768% (+/-0.03%)
viper_call1a.py                560.19 ->     556.04 :      -4.15 =  -0.741% (+/-0.02%)
viper_call1b.py                457.61 ->     454.76 :      -2.85 =  -0.623% (+/-0.03%)
viper_call1c.py                464.69 ->     461.78 :      -2.91 =  -0.626% (+/-0.02%)
viper_call2a.py                549.96 ->     545.87 :      -4.09 =  -0.744% (+/-0.02%)
viper_call2b.py                405.63 ->     403.41 :      -2.22 =  -0.547% (+/-0.02%)

The numbers are all over the place, which is to be expected with flash caching differences - just a different binary layout can be more or less cache-friendly for a particular build, and the cache behaviour is probably totally different with PSRAM. However, I think it's encouraging that nothing is massively slower with PSRAM enabled and there is a reasonable mix of faster and slower.

@projectgus

Copy link
Copy Markdown
Contributor

I seem to have hit some code formatting edge case where I vehemently disagree with its changes.

This is definitely one of the weirder set of uncrustify formatting choices that I've seen it make. 😆

I've pushed a fixup commit here that seems to keep it happy:
pimoroni/micropython@feature/psram...projectgus:micropython:feature/psram

Feel free to cherry-pick and squash into this PR.

Comment thread ports/rp2/rp2_psram.c Outdated
@scruss

scruss commented Mar 25, 2025

Copy link
Copy Markdown
Contributor

I think we should probably build one of the PSRAM boards in CI to avoid surprise bitrot, but we can add that in a follow-up PR.

It's also in @mattytrentini's board def for the WeAct Studio RP2350B in the main repo:
micropython/ports/rp2/boards/WEACTSTUDIO_RP2350B_CORE/mpconfigboard.h:

// TODO: Split PSRAM option off as a variant
#define MICROPY_HW_PSRAM_CS_PIN (0)
#define MICROPY_HW_ENABLE_PSRAM (0)

@Gadgetoid Gadgetoid force-pushed the feature/psram branch 2 times, most recently from c8eb6c5 to eb9e0b3 Compare March 26, 2025 09:32
@Gadgetoid

Copy link
Copy Markdown
Contributor Author

WeAct Studio RP2350B

This could be a good candidate for covering a couple of bases at once. (B variant and PSRAM)

Feel free to cherry-pick and squash into this PR.

Done, thank you!

I have added COPY_BUFFER_SIZE_BYTES and set it to FLASH_PAGE_SIZE for now, making the copy buffer code a little more explicit about what it's doing and a little easier to change.

Comment thread ports/rp2/rp2_psram.h Outdated
Comment thread ports/rp2/rp2_flash.c Outdated
Comment thread ports/rp2/CMakeLists.txt Outdated
@dpgeorge

Copy link
Copy Markdown
Member
  • Potential: Support for PSRAM as either extending the GC heap (needs split heap) or storage/tmpfs!?
    ...
    Some kind of bindings for PSRAM rp2.PSRAM() in modrp2 might be nice, too. This would allow us to - theoretically - add a portion of PSRAM to the gc_heap and reserve the remainder for RAMFS, long-lived buffers, custom memory pools and other such nonsense. I did contemplate adding MICROPY_HW_PSRAM_RESERVE_BYTES for this, which would reserve N bytes from the detected PSRAM size for tomfoolery and shenanigans.

Yes, I agree that it would be good to give more control to the user as to how they use PSRAM. There are lots of possibilities here about how the API could look and what functionality it would have. And that could also be implemented on esp32 and other ports that may have large external (and slower) RAM.

But in the interests of getting this PR across the line and merged, I think what you have here is pretty good: just allowing a board to enable PSRAM or not, and using SRAM+PSRAM or just PSRAM for the GC heap. That's nice and simple, for now.

The only downside of doing it that way now is that if we change it in the future, it'll be a backwards incompatible change (because most likely we will change it so there is less RAM available by default on start up).

@Gadgetoid

Gadgetoid commented Mar 26, 2025

Copy link
Copy Markdown
Contributor Author

The only downside of doing it that way now is that if we change it in the future, it'll be a backwards incompatible change (because most likely we will change it so there is less RAM available by default on start up).

As someone relying heavily on this code in shipping products- that ship has already sailed 😆 at least for some of us, anyway. I think it could be reasoned that it'll be mostly vendors- not end users- who will see the fallout from such a compatibility break? Gotta keep us on our toes!

@Gadgetoid Gadgetoid force-pushed the feature/psram branch 6 times, most recently from 7441621 to bc2fb20 Compare March 28, 2025 12:57
@dpgeorge

dpgeorge commented Apr 7, 2025

Copy link
Copy Markdown
Member

I tested this on a Pimoroni Pico Plus2 RP2350 (thanks @Gadgetoid for it!), which has 8MiByte PSRAM.

I ran the standard MP test suite. And also an extensive RAM test that I have which does 8-, 16- and 32-bit write and then verification (using a PRNG), in sequential forward and sequential reverse mode (the entire RAM) and also in random access mode.

All tests passed.

Comment thread ports/rp2/rp2_psram.c Outdated
Comment thread ports/rp2/rp2_psram.c
Gadgetoid and others added 4 commits April 7, 2025 15:27
Add PSRAM support with auto detection.

Performs a best-effort attempt to detect attached PSRAM,
configure it and *add* it to the MicroPython heap.

If PSRAM is not present, should fall back to use internal
RAM.

Introduce two new port/board defines:

* MICROPY_HW_ENABLE_PSRAM to enable PSRAM.
* MICROPY_HW_PSRAM_CS_PIN to define the chip-select pin.

Changes:

ports/rp2/rp2_psram.c/h: Add new PSRAM module.
ports/rp2/main.c: Add optional PSRAM support.
ports/rp2/CMakeLists.txt: Include rp2_psram.c.
ports/rp2/mpconfigport.h: Add MICROPY_HW_ENABLE_PSRAM.
ports/rp2/modmachine.c: Reconfigure PSRAM on freq change.

Co-authored-by: Kirk Benell <kirk.benell@sparkfun.com>
Co-authored-by: Mike Bell <mike@mercuna.com>
Signed-off-by: Phil Howard <phil@gadgetoid.com>
PSRAM will be used exclusively if MICROPY_GC_SPLIT_HEAP == 0,
it will be added to RAM if MICROPY_GC_SPLIT_HEAP == 1,
and the system will fall back to RAM only if it's not detected.

Due to the size of PSRAM, GC stack was overflowing and causing
the GC to scan through the entire memory pool.

This caused noticable slowdowns during GC. Increase the stack
from 256 to 4096 bytes to avoid overflow and increase the
stack entry type size to accomodate 8MB+ PSRAM.

Changes:

ports/rp2/mpconfigport.h: Make split-heap optional and enable by default.
			  Increase GC stack entry type to uint32_t.
			  Raise GC stack size.

Co-authored-by: Kirk Benell <kirk.benell@sparkfun.com>
Signed-off-by: Phil Howard <github@gadgetoid.com>
Add a 256 byte (FLASH_PAGE_SIZE) SRAM copy buffer to allow copies
from PSRAM to flash.

This would otherwise hardfault since PSRAM is disabled to write flash.

Changes:

ports/rp2/rp2_flash.c: Add 256 byte (flash page size) SRAM copy buffer
                       for PSRAM to flash copies.
                       Invalidate the XIP cache to purge any PSRAM
                       data before critical flash operations.

Co-authored-by: Phil Howard <github@gadgetoid.com>
Co-authored-by: Angus Gratton <angus@redyak.com.au>
Signed-off-by: Phil Howard <github@gadgetoid.com>
Configure flash timings dynamically to match the system clock.

Reconfigure timings after flash writes.

Changes:

ports/rp2/main.c: Set default flash timings.
ports/rp2/modmachine.c: Configure optimal flash timings on freq change.
ports/rp2/rp2_flash.c: Reconfigure flash when leaving critical section.

Signed-off-by: Phil Howard <github@gadgetoid.com>
@dpgeorge

dpgeorge commented Apr 8, 2025

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants