Stubbornness got me this far…
This is part 5 of this series of posts on programming the i.MX SDMA Engine. The full series is as follows:
Ok! If you’ve made it this far, you should have a pretty good idea of the inner workings of the SDMA and the ROM and RAM scripts which it uses. Now, we have all the info we need to complete our task. We’re going to rewrite the
mcu_2_ecspi script with two important changes:
Also, we’ll make the effort to preserve the bugfix included in the
The SDMA doesn’t have a relative jump instruction. The only jump instructions it has all go to absolute addresses. This isn’t great for us, as we don’t know where in memory our script will be loaded at compile time. However, the SDMA does have relative branch instructions (BF, BT, etc). We’ll use these to fake a relative jump instruction.
So, when we’re looking at a line from
mcu_2_ecspi like this:
jmp 6185 # 6180 # goto byte combiner
We can translate this line into our new script as:
cmpeq r1, r1 # always true bt byte_combiner
Eli’s assembler will find the relative offset of the
byte_combiner label, and fix the bt instruction to move PC to the correct spot. Incidentally, this pattern will destroy the state of the T flag, which you may find is important to the code, and wants to be preserved across the jump. In this case, maybe a better pattern is:
bt byte_combiner # one way... bf byte_combiner # ...or the other
This gets you a relative jump-like pattern without affecting you processor flags. It does change the length of your program, though, so use caution!
Ok, so let’s start out our code by assuming that the registers and buffer descriptors are set up exactly as for mcu_2_shp.
We should begin by making use of that nifty shared peripheral address rewriter, and then call the standard setup subroutine:
zeros_2_ecspi: jsr 473 # rewrite shp_addr from AP address into SPBA address jsr 485 # load BD, r5 = buf remaining main_loop: ld r7, (r3, 0x1b) # r7 = "burst length" ld r2, (r3, 0x1e) # r2 = "BD buffer addr"
Then, we need to reproduce the bugfix contained in the
mcu_2_ecspi script. However, instead of using the Peripheral DMA module to read the TESTREG, let’s used the shared peripheral bus, like so:
xfer_loop: ld r2, (r3, 0x1f) # r2 = scratch1f (shp_addr) ld r2, (r2, 0x1c) r2 = *(ECSPIx_TXDATA + 0x1c == ECSPIx_TESTREG) revblo r2 # get RXCNT in low order bits andi r2, 0xff # mask off other stuff ldi r0, 0x30 # r0 = 48 cmplt r2, r0 # if RXCNT >= 48 bf yield_for_now # then jump to yield and wait
Let’s hope that does the trick! Now we are free to proceed with the burst handling code.
mov r0, r7 # r0 = r7 (burst length) bclri r0, 0x1f # clear high bit of burst length (??) cmplt r0, r5 # if burst_length < buf_remaining: bt 1 # skip next line mov r0, r5 # else r0 = burst_length sub r5, r0 # buf_remaining -= burst_length st r5, (r3, 0x1d) # scratch1d = r5 (buf_remaining) btsti r4, 0x18 # ** Truth table for mode bits bt 3 # | b24 b25 btsti r4, 0x19 # | n bt mode_16b # | n y, goto mode_16b bf mode_32b # | n n, goto mode_32b btsti r4, 0x19 # | y bf mode_8b # | y n, goto mode_8b # | y y, 24b mode. fall through... ld r6, (r6, 0x1d) # r6 = scratch1d (buf_remaining)
My annotation of the original code had a note here saying how 24-bit mode is tricky. But if we don’t need to read from DRAM, we don’t need to concern ourselves with alignment! So this code becomes simpler.
ldi r2, 0x0 # r2 = 0 (dummy data) ld r5, (r3, 0x1f) # r5 = shp_addr loop_24b: st r2, (r5, 0x0) # write zero to shp bdf do_fault subi r0, 0x3 # burst remaining -= 3 bytes cmpeqi r0, 0x0 # test if done bf loop_24b # if not done, do loop again bt cleanup # otherwise, clean up.
The loops for the other 3 data widths become very simple indeed!
mode_32b: lsr1 r0 mode_16b: lsr1 r0 mode_8b: ldi r2, 0x0 # r2 = 0 (dummy data) ld r5, (r3, 0x1f) # r5 = shp_addr loop 1, 0 st r2, (r5, 0x0) # write zero to shp bf do_fault bt cleanup
Then we have the cleanup routine, which reproduces another part of the bugfix from mcu_2_ecspi:
cleanup: btsti r7, 0x1f # test high bit of burst length bt yield_for_now # if true (when??) goto 6222 ld r2, (r3, 0x1f) # r2 = shp_addr ld r5, (r2, 0x4) # r5 = *(shp_addr+4) (aka ECSPIx_CONREG) ori r5, 0x4 # r5 |= XCH st r5, (r2, 0x4) # write CONREG
And then finally the
yield_for_now handler referenced above, and the
load_next_bd handlers as well. You’ll notice a few instances of the relative jump-like pattern in here.
yield_for_now: done 0 # yield, no interrupt ld r5, (r3, 0x1d) # r5 = buf_remaining cmpeqi r5, 0x0 # if buf_remaining == 0 bt load_next_bd # goto load next bd jsr 508 # else call 508 ld r7, (r3, 0x1b) # restore r7 from proc 508 cmpeq r0, r0 bt xfer_loop # goto xfer_loop do_fault: clrf 0 jsr 533 load_next_bd: jsr 524 # buf_remaining == 0, load next BD cmpeq r0, r0 bt main_loop # goto main loop.
And that does it! This should result in much better utilization of the SDMA core during RX-only SPI transfers. In my testing, I can verify that this increases the maximum throughput of incoming SPI transactions to beyond the capabilities of my SPI slave devices, which is good enough for my needs.
I hope you’ve found this series useful! Thanks much for reading.