Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4954

General • Re: RP2350 PIO DMA performance question

$
0
0
I mostly got the idea from the PIO itself and the PlayStation 2 Vector Units which run two instructions at once, in practice on two separate units, and only on one of those units it can be a branch.
I noticed some odd behavior on RP2040 PIO but I didn't determine it conclusively so I didn't want to write about it.
When the PIO runs using a clock divisor of 1, the IRQ WAIT completes in the cycle following the cycle where IRQ CLEAR happened. That is all fine. But when using a much higher divisor, IRQ WAIT completes in the same cycle as IRQ CLEAR. This makes it hard to use the PIO divisor as a way to control the implemented interface clock, though of course this can be compensated with a simple one cycle delay added to the instruction (if necessary at all), but I thought it was odd anyway. Of course I don't complain if it really completes faster on slower divisors, as that is better. I guess the signals routing is still at the sysclk, and only the instruction advancing is at the PIO clock after the divisor.

As for the maximum clock an RP2040 / RP2350 can use - I was a bit surprised that if the RP2040 (and even the RP2350) can run so much faster than their specified max frequencies, and yet the RP2350 was only specified up to 150MHz rather than for example 200MHz. Of course that can be because it includes all devices and systems on the chip.
But still, if PIO 'word/bit' cycle time can be halved for a given interface, that means two times the speed. For example, if PIO can only work with 4 instructions in the inner loop at 200MHz that means 50MHz interface clock, while with two instr, it would be 100MHz. Of course at some point such frequencies might be beyond what the IO ports can take perhaps, and then being able to run the interface clock faster would be useless. But if, for example, 100MHz are needed as clock, then running the RP at 200MHz or at 400MHz is a big difference. So halving the necessary instructions in the inner loop really enables very fast interfaces, and most of what is necessary for that is already available.

I haven't tried using the RP2350 yet, and I expect some good surprises from it. :) And testing the CPU performance instead of the PIO would be also something interesting. The problem with the CPU is usually not that it is that slow, but that that its pipeline has many stages, so load/store instructions take a lot of time and so on. One thing I didn't test on the RP2040 was to disable the input synchronizes flip-flops on the SIO, when testing with CPU instead of PIO (or at least I don't remember doing so). So that may have made that even faster. Though one serious problem with the RP2040 using CPU for fast addr->data responses was that the CPU does not respond always with the same delay from when it sees the input addr, which can cause hard to debug issues later.

Statistics: Posted by wisi — Tue Oct 08, 2024 9:02 am



Viewing all articles
Browse latest Browse all 4954

Trending Articles