20 hours ago, StephenHorn said:
Well, it's much faster to have the assets in himem because you can retrieve them more quickly than SDCard data. The SDCard requires banging away at the protocol for data transfer, which might be assisted somewhat with hardware but might also involve literally byte-banging single bytes through what effectively amounts to a serial port on an I/O address with commands and replies to retrieve blocks of data... or worse, bit-banging a similar interface to the same job but 8X slower. However the kernal ends up implementing it, it will be way slower than accessing himem.
We do know that if it's coming from the built in SD card port, Vera is reading the data in from the SD card, which is an SPI interface with a 12.5MHz serial clock, so in the middle of a block read, the process of writing the output byte to $9F3E, reading $9F3F until bit7 is clear, then reading the input put from $9F3E is going to be about half the speed of a memory to memory copy, if it is properly interleaved. Add the file system overhead to set up which block is being read, perhaps a third of the speed of a memory to memory copy.
That seems likely to be part of why Vera was redesigned to bring out the registers which were previously accessed through the data ports ... direct copy from the SD drive to the VRAM works much faster if the SPI port can be accessed directly and then written to the data port where the auto-increment writes it to the correct location.
; Block read ready, VRAM port A destination & increment already set up
; Y is the byte countdown, X holds the dummy value to write to trigger the SPI transfer
LDX #0
TXY
STX SPI_DATA
- LDA SPI_CTRL
BMI -
LDA SPI_DATA
STX SPI_DATA
STA DATA0
DEY
BNE -
; (Next I assumed a 'DEC N' on a memory register is going to test whether more pages need transferring)
... note that would certainly busy wait in the first byte, because a maximum byte transfer frequency of 1.5625MHz can't keep up with a 4-clock instruction which has an effective instruction frequency of 2MHz plus, but storing the dummy to start the next SPI cycle immediately after reading the previous SPI data received means that normally the busy will be cleared by the time the loop restarts at SPI_CTRL ... a 13 clock 65C02 sequence has an effective frequency of 615kHz, so the SPI port is plenty fast to keep up with that.
But this is a block transfer that has already been set up. One choice that is likely going to be needed in a number of games is whether to load video assets straight into VRAM from SD card in a double buffered arrangement and feed audio from assets stored in High RAM, or to load, in particular, PCM assets straight into Vera's PCM ports, and load video assets stored in High RAM.