1 hour ago, Scott Robison said:
Just to be clear: It's not necessarily that the FPGA needs a lot of logic cells to allow an external source (such as the X16) to interact with SPI. It's that in order to add the flexibility to VERA to allow it to directly store bytes from SPI to video RAM without passing through the CPU first would require more logic cells than are currently allocated to it. Even if there are enough logic cells left to support a "fire and forget" strategy for the next X bytes, it's not as though we're dealing with a full blown multitasking friendly CPU or OS. Typically (or so it seems to me) if you have an ability to tell the hardware "transfer the next X bytes without the use of the CPU" you would generally want to signal the main system when that process is complete so that it can set up the next transfer. Given the typical implementation of the kernal, it would wind up sitting in a busy loop waiting for the signal that the transfer is done.
I see several possibilities:
1. There aren't enough logic cells available to add the functionality to support both CPU and VRAM delivery options.
2. There are enough logic cells available but it increases the complexity meaning there is another thing that could go wrong, and it doesn't really improve CPU performance because it still has to wait for the delivery notification.
3. There are enough logic cells available and the kernal becomes more complex due to dealing with an interrupt driven SPI interface so that the CPU can go on about other business while waiting for the background VRAM transfer to complete.
In a perfect world, sure, it would be nice to support this mode. I think the general purpose approach is more than adequate for most tasks, even if it isn't optimal for loading into VRAM. It's not like sales of the C=64 were too negatively impacted by its slow IEC bus protocol.
The bigger challenge with such functionality may be VRAM contention. Based on the way the sprite composer works, I suspect that the VRAM is 32 bits wide clocked at 25MHz. It isn't clear what timing guarantees VERA provides; a random read requires at least two bus operations which the CPU cannot do in less than 6 or 8 cycles - I forget exactly - however a bus mastering expansion card could do those two operations in exactly two bus cycles giving us roughly 4 cycles @ 25MHz from writing ADDRx until the data needs to be on data bus without violating tDSR (maybe only 3 depending on the latency of the bus transceivers) - unless VERA can bus master and assert RDY#. The bus mastering story with X16 isn't quite clear so I am suspecting that VERA doesn't do that.
This is primarily a problem for handling random reads from the system bus; sequential reads can probably be anticipated. Writes could be posted to alleviated VRAM pressure, but that's only meaningful if the posted write FIFO can drain during HBLANK.
I could see a SPI-to-VRAM mechanism improving performance of some specific/niche activities; e.g. you could stream frames from the SD card directly to VRAM for video playback. Not sure that would be enough to justify the complexity though.