20 hours ago, ZeroByte said:
And as @BruceMcF points out - having to switch RAM banks during a block copy operation is of negligible impact. You'd just need to build the DMA controller to know about the banking structure and issue the appropriate bank swap writes and update its src/dst pointers accordingly.
Since I am "imagining doing" the REU, I certainly was NOT having the DMA controller know ANYTHING about banking structure ... I was having the CPU handle that.
So, one register has the chunk size control, maybe the control register, maybe another. The control register has whatever is has, but the readable bit 0 is "start" when set to 1, remains 1 for the operation (which doesn't matter because the CPU is asleep), and goes to 0 when completed. Point the REU address to the source, the target address to $A000. The bank is in A and the number of chunks is in X. Y is transient. "0" in X implies a count of 256.
PASTEBANK: LDY $0 : PHY : STA $0 : LDA REUCONTROL : ORA #1 : - STA REUCONTROL : DEX : BNE - : PLA : STA $0 : RTS
My Interrupt code has to not change the bank without storing it, but the longest an interrupt has to wait is 35 cycles, because in effect STA REUCONTROL is a 35 clock cycle instruction that copies 32 bytes.
The overhead is 8 clocks on top of 32, so 25% overhead. If you want lower overhead, make the chunk bigger. At 128 byte chunks, it's 6.3% overhead. At 256 byte chunks, it's 3.2% overhead. So I don't see any particular reason why it would ever need to be bigger than 128 byte chunks ... making that a loop adds 3 bytes to multiply the block moved by up to 256, so a maximum 128 chunk covers 32K, where the maximum Bank move is 8K before time to increment the bank register. In the above, if interrupts can touch transient zero page API space and this is rommable code so I can't just store after this routine, I can have 128 byte chunks, if I am not worried about interrupt lagginess, X will have 64 in it, and Y can say have many blocks:
PASTEBANKS: SEI : STA $20 : LDA $0 : PHA : LDA $20 : CLI : STA $0 : LDA REUCONTROL : ORA #1 : PHX : -- PHX : PLX : - STA REUCONTROL : DEX : BNE - : INC $0 : DEY : BNE -- : PLX : PLA : STA $0 RTS
(Unless I have the meaning of SEI and CLI reversed ... it's been over 40 years) ... A power of two chunk size is 0 for 1 byte through to 7 for 128 bytes, so three bits in the REUCONTROL register for chunk size. 1 bit for increment target versus stable target, 1 bit for direction (copy into REU, paste into CX16) ... we still have two bits in the REU control register. Two bytes for CX16 address, three bytes for REU address If we have a 512K SRAM in the REU, and plenty of room for expansion if people decide they want bigger ones.