Commander X16

Posted: **Mon May 17, 2021 12:31 am**

The process of reading a particular file in a FAT file system:

Find the master boot record to identify the partition table type.
1. Is it truly MBR or is it really a GPT? These have different ways of identifying partitions, so a robust system has to handle both, though I suspect we only care about the "true MBR" case and not the GPT.

Is there more than one partition on the media of a compatible type? Pick one.

What is the size of the partition? This is the "official" way to identify whether it is FAT12/FAT16/FAT32.

Get the boot sector of the partition. Use the values in it to find the FAT, root directory, and cluster size.

Use a directory to find a file of interest, which will tell us the size of the file and the location of the first cluster of the file.

Read the cluster.

Use the FAT to find the next cluster. If not end of tile, go to 6.

Most of the items have variables in them that you do not know how to process until after you've read them, and this is just the high level overview.

Can an FPGA be tasked with handling this? Sure. It doesn't require a full blown general purpose CPU. But it is certainly far more than just one or even a few logic gates.

Could it be possible to implement this so that the CPU sends the commands and tells the FPGA the eventual destination of the next X bytes in advance so that it doesn't have to handle the final delivery when VRAM is the destination? Yes, and that would be less complicated than the co-processor (where a co-processor is less than a full CPU but more than a few logic gates). There likely is not enough space left in the FPGA for even that, though I do not know. And it would have difficulty dealing with error conditions.

Then of course we have the question "what if the bytes in the file must be processed in some way before spewing them into VRAM?" Many / most file formats include at minimum a header of some sort to identify the contents of the file. Compression is often used to make files smaller. Some define a program in some virtual machine. For anything more complex than "raw sequence of bytes already in the format you want for VRAM" you'd have to have the CPU read the data so that some processing can be done (decompression for PNG files, as an example, or running a VM over it for true type font byte codes, processing the header to know where to seek to so that substreams of data can be processed correctly, etc).

All these things require logic. You are correct in what is theoretically possible, though I think your estimates of how many resources would be required are on the low side given the flexibility built into the FAT32 format. One can mandate a lot of the variables be set to specific constant values to limit the complexity for particular use cases, but you can never get rid of it completely based on my experience.

General purpose systems provide flexibility, though they cannot offer an optimal solution to every problem.

Posted: **Mon May 17, 2021 4:46 am**

On 5/17/2021 at 8:31 AM, Scott Robison said:

Could it be possible to implement this so that the CPU sends the commands and tells the FPGA the eventual destination of the next X bytes in advance so that it doesn't have to handle the final delivery when VRAM is the destination? Yes, and that would be less complicated than the co-processor (where a co-processor is less than a full CPU but more than a few logic gates). There likely is not enough space left in the FPGA for even that, though I do not know. And it would have difficulty dealing with error conditions.

The simplest IS "when the SPI completes the next byte, store it where port A or port B is pointing, and autoincrement that port". That allows all of the Filesystem management and sector write code to be shared, and only the sector read code has two versions, one for RAM targets and the other for Vera targets.

Indeed, with 5 control bits unallocated in SPI, there is LOGICAL space for that in the control register ... one bit for "Vera write mode on", one bit for "target A/B" ... though since the control register is just the bits allocated, even that is two extra bits of logical storage, and not "for free, because there's already a register allocated for that", so it's an open question whether the logical resources exist for that.

Each bit requires 2 25MHz cycles, so that is 16 for a byte, and I assume the trigger requires a clock or two, so make that 18, and then a three cycle state machine that is triggered when the eighth bit is transferred, then copy to the addressed port location, then increment the port address, so make it at least 23 ticks on the 25MHz clock, which is 8-9 ticks on the CPU clock. Even if an interrupt routine is triggered, it likely won't store an existing register to the same Vera Port in its first operation, so THAT one probably DOESN'T need to worry about contention. But two consecutive bytes MIGHT, and 16 bytes DEFINITELY would have to deal with contention for writes to / reads from the Vera embedded Static RAM.

Posted: **Tue May 18, 2021 10:32 pm**

I think what RJE is hoping for is something like separate pins for video data meant to go directly to VERA, while main CPU data goes to the 65C02.

That's how most game consoles handled their pinouts (or at least the only systems made in the cartridge era that I have data for that didn't handle it this way were the Atari 5200 and Commodore 64 Game System, which were based on 8-bit home computer architectures). There were also several computer designs that worked this way too, most especially the TI 99/4 and MSX machines (or at least cartridge slots 1 and 2 on the latter).

The problem is that this isn't how the SD card system seems to work. To make it work like that wouldn't necessarily require much addition to the FPGA softcore, but it would definitely need tracing the SD card interface for specific I/O pin lanes (and if the system only has one I/O line, or even only one input and one Output, forget about it), a complete revamp of the SD card firmware requiring two physically different file systems, a near complete rewrite of Kernal and the back end of much of the rest of the ROM, and a significant rewiring of the motherboard and the VERA daughterboard.

And, when you consider that the Geometry Synthesis and PCM channels are completely physically separate from YM2151, and aren't even accessed from the same memory banks, things get even messier...

Posted: **Tue May 18, 2021 11:09 pm**

26 minutes ago, Kalvan said:

I think what RJE is hoping for is something like separate pins for video data meant to go directly to VERA, while main CPU data goes to the 65C02.

I think rather what he originally said assumes that the FPGA has something like the IEC2SD built into it, when the reality it that the 65C02, flash ROM and RAM that replaces the processor, RAM and flash ROM in the IEC2SD and what the Vera adds is SIMPLY the hardware SPI which is built into the microcontroller used in the IEC2SD.

A separate SD card SPI Port for loading data into the Vera embedded Static RAM would cost more logic than adding the circuitry for the contents of the SPI data register to automatically be copied to the Port A or Port B target when the byte completes.

Posted: **Tue May 25, 2021 12:10 am**

On 5/16/2021 at 5:46 PM, BruceMcF said:

What do you mean "you are not doing any processing on the data"?

Even doing it A BYTE AT A TIME, you are setting up the SPI register to transfer a byte with the SD card. You are moving it from the SPI register to SOMEWHERE in the Static RAM attached to the FPGA.

The "single logic gate" you are talking about is if a SINGLE byte is moved, and it's much more than a single logic gate: you actually have to wire up the circuitry to DO THE MOVING. The "other part of the FPGA that is expecting input to store in VRAM" is WIRED to react to an action on the CX16 bus.

And you are talking about zooming through until "End of File" ... then you have to get those bytes a segment at a time. And then you have to find out which segment is the next segment in the file, and instruct the SD card that you want the next segment. All of that is data processing, even if you don't change the contents of the segments at all.

Getting an entire file requires a CPU of some sort, so if the 6502 is not doing that, you have to build a coprocessor into the FPGA.

Yes, wired to react to an action on the CX16 bus. Instead of a straight wire, that's where the gate would go. If active low, an AND gate. The existing wire from the CX16 would be one input. The SPI output that signifies buffer full (one full byte) is the other input. Same output as before. It would just have to be set up differently initially.

In order for the data to get from the SD card into the X16, VERA has to put that data on one of two channels or the CPU couldn't see it at all. That same byte then gets transferred right back into VERA on the other channel. That step can be skipped by making it the same channel - the SPI provides the signal to the channel that data is ready to latch.

It might be possible to do this without any changes to VERA at all, which let's face it, VERA isn't going to change. But I might be able to roll my own VLOAD that does this. If it doesn't work, oh well. That just means I have to find another, better method to play video, maybe figure out some sort of MP4 player. And after all, isn't finding your way around limitations about half the fun of programming?

Posted: **Tue May 25, 2021 3:31 am**

10 hours ago, Ed Minchau said:

In order for the data to get from the SD card into the X16, VERA has to put that data on one of two channels or the CPU couldn't see it at all.

You are saying something that contradicts the Vera Programmer's Reference: the SPI data and control register are accessed at $9F3E for the data register and $9F3F for the control register (three control bits). Port A and B are accessed at $9F23 and $9F24. There is no need for Vera to "put that data on one of two channels". There is, in fact, no direct way to get that data through PortA or PortB ... the registers that are read directly are not in the Vera internal memory map. (They used to be, but not anymore.)

So ASSERTING the SPI data register ONTO the same internal data bus used by Port A and Port B would NOT BE "just a transistor". You've reached that conclusion based on a false premise.

So, setting that "just a transistor" hyperbole aside, suppose the logic is available so that the data port register can be asserted onto the internal bus used by Port A and Port B, then, yes, the CPU setting the port address to the desired Vera target minimizes the additional logic resources requires. It's not done yet, though, since triggering the "phony CPU write" on the Data port addressing the Vera RAM when the SPI countdown reaches 0 is additional logic.

But that only saves 4 clocks per byte, since the CPU still has to check the finished bit and trigger the next byte transfer. It's hardly worth the extra work.

To make it worth the trouble, you need to define a standard chunk size and work in chunks, say 16bytes at a time, so you need a 4bit countdown circuit. If it is read-only, you wouldn't need a data direction bit, but you still need a "SD to Vera block write" selection bit. The original starting byte written on MOSI could always be #0 for each cycle.

You also need additional logic for the "ready" bit, which means byte finished in regular mode and chunk finished in write chunk mode: even if that is the block write bit works as a trigger and when it is reset the transfer has completed, you need the circuit to reset it as part of the countdown circuit underflow circuit.

So if we are minimizing additional logic, the target port is dedicated, say PortB, and there is one control bit added that when set puts the SPI system into "Vera Write" mode, and when that goes 0 again, the chunk move is finished.

With the serial clock halved for the SPI (because 25MHz is faster than the SPI mode serial clock maximum of 20MHz), the write of the byte could happen in one Vera external clock, the autoincrement of the Port and the decrement of the chunk count in another, which taken together is 1 serial clock, so it would be 9 serial clocks per bit. That is 144 serial clock pulses on the 12.5MHz SCLK, or 93 CX16 clock cycles to move a 16 byte chunk, with the 65C02 doing the SPI traffic for getting the file, getting the first sector, setting up Vera, triggering the block moves, getting the next sector, and so on.

Note that while this approach maximizes re-user of existing logic resources, by the same token it locks out access by the CPU to either PortA or PortB, because it uses the same internal data bus that is connected to the motherboard data bus. So ONLY Vera control that can be managed through the direct memory mapped registers can be done while a section is transferring ... that's one reason for 16byte chunks. For instance, you can load the PCM while a "SD to Vera Write" is in process, but you cannot access the PSG registers: the PSG registers would have to be the target of the block move (which is one reason the chunk is more flexible if it is not bigger than 16).

Posted: **Wed May 26, 2021 4:24 am**

SPI is just a glorified shift register. The FPGA only needs to use a few latches to hold a byte , and have them be read/write on the data bus pins as a byte, and be able to shift them in/out of the MISO/MOSI pins while holding a SS pin active and pulsing the clock pin.

it does this as a dumb, unthinking reflex. It doesn’t know what those bits mean any more than the shift register in an SNES joypad does.

you could implement SPI in the VIA chips just as well, but the VERA shifts bits at 25 MHz so you have a byte ready by the time the next CPU instruction is ready, whereas the VIA would be 3 times slower.

Posted: **Wed May 26, 2021 5:51 am**

3 hours ago, ZeroByte said:

SPI is just a glorified shift register. The FPGA only needs to use a few latches to hold a byte , and have them be read/write on the data bus pins as a byte, and be able to shift them in/out of the MISO/MOSI pins while holding a SS pin active and pulsing the clock pin.

it does this as a dumb, unthinking reflex. It doesn’t know what those bits mean any more than the shift register in an SNES joypad does.

you could implement SPI in the VIA chips just as well, but the VERA shifts bits at 25 MHz so you have a byte ready by the time the next CPU instruction is ready, whereas the VIA would be 3 times slower.

Exactly, SPI for any given mode is a parallel read/write shift register with serial input and output, designed for a specific clock and latch polarity. All the SPI in Vera adds to that is 3 register bits and a countdown to shift eight bits then stop sending the clock signal into SCLK.

In Mode 0, the serial shift register samples on the leading, rising edge of the clock and shifts out on the falling, trailing edge of the clock. Make the shift register work that way and you need no "mode" control.

For the details, SPI in VIA would be worse than 1/3 the speed ... (1) the Vera shifts bits at 12.5MHz. It could easily shift bits faster, but the SPI mode spec only requires cards to cope with serial clocks up to 20MHz in SPI mode, and 12.5MHz is the simplest rational fraction of the Vera clock less than or equal to 20MHz ... but (2) the VIA serial shift register shifts up to half of PHI2 clock (the process is the count down plus one clock to flip the serial clock on underflow, so 2 hardware clock cycles per serial clock cycle is the fastest possible) ... so ideally it would be 1/3 as fast but (3) the VIA hardware serial shift register doesn't work well with the phase of the SD SPI mode.

The problem is that the VIA shift register is designed to shift first and sample after, so it works best with SPI Modes 1 and 3. Using it with modes 0 and 2 in hardware requires additional work and some glue logic, and the additional work adds overhead to the transfer. Or you can bit bang with no extra hardware required, but a BIG slowdown.

Even adding the circuit to load 16bytes in a row straight to Vera is substantially MORE complexity than the current SPI register itself.

Posted: **Sun May 30, 2021 7:58 pm**

The FPGA used by VERA has a SPI hard IP block (see TN 2010 iCE40 I2C and SPI Hardened IP Usage Guide) so the number of logic cells that need to be dedicated to SPI functionality should be pretty minimal. The bus timing of SD cards in SPI mode is identical to the timing in SD mode (see Section 7.8, Physical Layer Simplified Spec v8.00, SD Association). There is a TRAN_SPEED register on the card you can read to determine the maximum supported speed but only 8'h32 and 8'h5A are permitted (Section 5.3.2 of spec mentioned) specifying either 25MHz SPI in standard mode and 50MHz in high speed mode.

Posted: **Sun May 30, 2021 8:30 pm**

9 minutes ago, Wavicle said:

The FPGA used by VERA has a SPI hard IP block (see TN 2010 iCE40 I2C and SPI Hardened IP Usage Guide) so the number of logic cells that need to be dedicated to SPI functionality should be pretty minimal. The bus timing of SD cards in SPI mode is identical to the timing in SD mode (see Section 7.8, Physical Layer Simplified Spec v8.00, SD Association). There is a TRAN_SPEED register on the card you can read to determine the maximum supported speed but only 8'h32 and 8'h5A are permitted (Section 5.3.2 of spec mentioned) specifying either 25MHz SPI in standard mode and 50MHz in high speed mode.

Just to be clear: It's not necessarily that the FPGA needs a lot of logic cells to allow an external source (such as the X16) to interact with SPI. It's that in order to add the flexibility to VERA to allow it to directly store bytes from SPI to video RAM without passing through the CPU first would require more logic cells than are currently allocated to it. Even if there are enough logic cells left to support a "fire and forget" strategy for the next X bytes, it's not as though we're dealing with a full blown multitasking friendly CPU or OS. Typically (or so it seems to me) if you have an ability to tell the hardware "transfer the next X bytes without the use of the CPU" you would generally want to signal the main system when that process is complete so that it can set up the next transfer. Given the typical implementation of the kernal, it would wind up sitting in a busy loop waiting for the signal that the transfer is done.

I see several possibilities:

1. There aren't enough logic cells available to add the functionality to support both CPU and VRAM delivery options.

2. There are enough logic cells available but it increases the complexity meaning there is another thing that could go wrong, and it doesn't really improve CPU performance because it still has to wait for the delivery notification.

3. There are enough logic cells available and the kernal becomes more complex due to dealing with an interrupt driven SPI interface so that the CPU can go on about other business while waiting for the background VRAM transfer to complete.

In a perfect world, sure, it would be nice to support this mode. I think the general purpose approach is more than adequate for most tasks, even if it isn't optimal for loading into VRAM. It's not like sales of the C=64 were too negatively impacted by its slow IEC bus protocol.

Commander X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16

Getting data to/from the X16