I've downloaded the 65C02 datasheet and will give it a read sometime soon™
I have also considered ordering one of Ben Eaters 6502 Hello World kits as a way of testing hardware until the x16 is released.
I'm just not sure how useful it would be when you don't have the IO Select lines etc.
Still I would be able to test things with the data bus and RWB.
Interested in developing a simple DMA controller
Re: Interested in developing a simple DMA controller
Yes - be sure that it is the W65C02S datasheet, as that is the version of the 6502 that is still in production.
You can fake the I/O select lines by adding a 3 to 8 decoder.I have also considered ordering one of Ben Eaters 6502 Hello World kits as a way of testing hardware until the x16 is released.
I'm just not sure how useful it would be when you don't have the IO Select lines etc. ...
You also might want to modify the circuit to give 32KB of RAM and 16KB of ROM, with the I/O in the $8000-$BFFF space ... with the 3 to 8 decoder set up correctly, it can mirror eight 32 byte register address spaces in every page of that range, so you could just go ahead and ignore the other ones and write to the $9Fxx page.
-
- Posts: 70
- Joined: Thu Jul 02, 2020 2:47 am
Re: Interested in developing a simple DMA controller
You should read it sooner rather than later. X16 expansion slots are bare-metal direct connections to the 65C02. The datasheet is the expansion bus spec, and any DMA controller needs to meet (or exceed) the timing specified in the datasheet.
At a minimum you need to understand operation and timing of PHI2, A0-A15, BE, D0-D7, RDY and RWB. You might also need SYNC.
Even a simple DMA controller is not all that simple to design. If you find the datasheet hard or impossible to understand you are not ready yet. Start with some easier hardware to get your bearings.
The PHI discussion was unnecessarily complicated. For CX16 there is PHI2, period. The datasheet explains how every other signal relates to PHI2.
Re: Interested in developing a simple DMA controller
Yes -- this is the "tl;dr:" of what I was saying about Phi1O and Phi2O ... they are obsolete features for modern 65C02 systems, retained in the original 65C02 CMOS chip design for compatibility with boards designed for the "classic" NMOS 6502.picosecond wrote: ↑Wed Mar 15, 2023 1:03 pm ... The PHI discussion was unnecessarily complicated. For CX16 there is PHI2, period. The datasheet explains how every other signal relates to PHI2.
An NMOS 6502 running at, say, 250kHz in a desktop calculator may be able to get away with connecting a sinusoidal oscillator directly to Phi2, and then use Phi1O and Phi2O with pull up resisters to give a squarer wave to the rest of the circuit, but that doesn't really fly when running at 8MHz, which is why the CMOS datasheet doesn't give specs for Phi1O and Phi2O timings anymore.
__________________
Note that while the 65C02 CPU timings are the foundation, some of the expansion port timings will be due to other parts of the design ... specifically for a bus mastering card, you'd need to know the ROM and RAM speeds and the logic family & number of gates between asserting addresses and generating the ROM and RAM chip selects.picosecond wrote: ↑Wed Mar 15, 2023 1:03 pm ... You should read it sooner rather than later. X16 expansion slots are bare-metal direct connections to the 65C02. The datasheet is the expansion bus spec, and any DMA controller needs to meet (or exceed) the timing specified in the datasheet.
At a minimum you need to understand operation and timing of PHI2, A0-A15, BE, D0-D7, RDY and RWB. You might also need SYNC.
For instance, I don't think a Phi1=0 "half cycle stealing" approach as discussed somewhere in this thread is workable given the tight timing that has been discussed over the past two or three years between the clock speed, the and ROM/RAM chip selects, and the ROM and RAM minimum required read (and for RAM, write) cycle time.
- StephenHorn
- Posts: 565
- Joined: Tue Apr 28, 2020 12:00 am
- Contact:
Re: Interested in developing a simple DMA controller
If someone wanders in wondering, that was post #26204 that linked to the half-cycle shadow access concept.BruceRMcF wrote: ↑Wed Mar 15, 2023 3:00 pm Note that while the 65C02 CPU timings are the foundation, some of the expansion port timings will be due to other parts of the design ... specifically for a bus mastering card, you'd need to know the ROM and RAM speeds and the logic family & number of gates between asserting addresses and generating the ROM and RAM chip selects.
For instance, I don't think a Phi1=0 "half cycle stealing" approach as discussed somewhere in this thread is workable given the tight timing that has been discussed over the past two or three years between the clock speed, the and ROM/RAM chip selects, and the ROM and RAM minimum required read (and for RAM, write) cycle time.
Yes, looking at the 65C02's timing, I don't think that could be made to work easily. Much easier to, instead, catch when the 65C02 performs a write to an appropriate register on the "DMA controller" (in quotes with respect to some conversation on Discord) and then just hold the CPU in limbo in its entirety with RDY.
I happen to know the ROM chip is SST39SF040-70-4C-PHE (datasheet via Mouser), but I don't know the RAM chips offhand and I don't know the specific timing requirements of the VERA, to build a more complete picture of the timing requirements of the other bus devices for reads and writes. I can check when I get home. (Edit: See below.)
The RAM is AS6C4008-55PCN (datasheet via Mouser).
The 6522s are, unsurprisingly, W65C22S6TPG-14 (datasheet via Mouser).
I'm sure there's some delay caused by the logic chips as well.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
Re: Interested in developing a simple DMA controller
I did a whole bunch of math involving the VERA's clock and the CPU clock for something unrelated.
The VERA is clocked at 25MHz, and the CPU is clocked at 8MHz, so that means there's 3.125 VERA clock cycles for every CPU clock cycle, give or take a little, because I have to assume there's two separate crystal oscillators and that the two clocks are not synchronized (please correct me if I'm wrong). (Edit, forgot to say why I mentioned this: Since the VERA's clock speed is so much quicker than the CPU's, I'd expect that it could handle memory accesses at, at least, 1 CPU cycle per fetch. I'd need to check the verilog to see if there's any gotchas with that assumption though)
The absolute best possible case scenario for the program copying data to the VERA is if it happens in one gigantic unrolled loop of self-modifying code:
By this benchmark, the DMA could go all the way to 3 CPU cycles per memory fetch, and it'd be on par with this (which would already be super helpful). A more common scenario would be just a plain loop, or Duff's device if you want to be fancier, which would be around 9-10 cycles per read+write (LDA ($nnnn),y), plus loop overhead.
Of course, 1 cycle per DMA fetch would be ideal, but there's wiggle room.
The VERA is clocked at 25MHz, and the CPU is clocked at 8MHz, so that means there's 3.125 VERA clock cycles for every CPU clock cycle, give or take a little, because I have to assume there's two separate crystal oscillators and that the two clocks are not synchronized (please correct me if I'm wrong). (Edit, forgot to say why I mentioned this: Since the VERA's clock speed is so much quicker than the CPU's, I'd expect that it could handle memory accesses at, at least, 1 CPU cycle per fetch. I'd need to check the verilog to see if there's any gotchas with that assumption though)
The absolute best possible case scenario for the program copying data to the VERA is if it happens in one gigantic unrolled loop of self-modifying code:
Code: Select all
LDA #byte1 ; 2 cycles
STA $9F23 ; 4 cycles
LDA #byte2
STA $9F23
LDA #byte3
STA $9F23
...
Of course, 1 cycle per DMA fetch would be ideal, but there's wiggle room.
-
- Posts: 70
- Joined: Thu Jul 02, 2020 2:47 am
Re: Interested in developing a simple DMA controller
Not so. These already work with the 65C02. Expansion card slots drive/receive the exact same wires as the CPU. Address decode/RAM/ROM have no idea and do not care if the 65C02 or a slot is in control. This is why meeting or exceeding 65C02 timing is all you need.BruceRMcF wrote: ↑Wed Mar 15, 2023 3:00 pm Note that while the 65C02 CPU timings are the foundation, some of the expansion port timings will be due to other parts of the design ... specifically for a bus mastering card, you'd need to know the ROM and RAM speeds and the logic family & number of gates between asserting addresses and generating the ROM and RAM chip selects.
This assumes of course that CX16 is well-designed and is already meeting worst-case datasheet requirements for all components. It is a different story if CX16 is relying on typical components being better than their specs. In that case, an expansion card designer who wants a reliable worst-case design needs to know all of the above-mentioned details.
Re: Interested in developing a simple DMA controller
Thanks for that -- from looking at SRAM & EEEPROM datasheets for the CP/M SBC sketches, I expected it would be 70ns FlashROM and 55ns SRAM.StephenHorn wrote: ↑Wed Mar 15, 2023 10:15 pm If someone wanders in wondering, that was post #26204 that linked to the half-cycle shadow access concept.
Yes, looking at the 65C02's timing, I don't think that could be made to work easily. Much easier to, instead, catch when the 65C02 performs a write to an appropriate register on the "DMA controller" (in quotes with respect to some conversation on Discord) and then just hold the CPU in limbo in its entirety with RDY.
One option to bear in mind, if you don't want to kill the ability to process interrupts while the DMA is in process, is to do the DMA in batches (say, 8 bytes in 16 clocks), triggered by the SYN line, so that instructions continues to execute, just with an extra overhead of 8 clocks per instruction. Then a "pause" capability could be used by a timing sensitive interrupt routine to pause any ongoing DMA operation until the timing sensitive part of the interrupt routine is completed.
__________________________
To be sure, if you generate the address on or before the address set-up time for the 65C02, you can't be in trouble, but that doesn't tell you the minimum set-up time for the RAM or the ROM. If the RAM is qualified by Phi1=1, then so long as the address is asserted before the end of the first phase, you are OK for RAM (which is to say, any DMA Write cycle), but if ROM is not qualified by Phi1=1 (which is typical practice above 4MHz clocks, to cope with the longer set-up time requirements of EEPROMs), then the actual minimum set-up time to read from ROM is the ROM set-up time plus the X16 gate delay.picosecond wrote: ↑Thu Mar 16, 2023 12:59 pmNot so. These already work with the 65C02. Expansion card slots drive/receive the exact same wires as the CPU. Address decode/RAM/ROM have no idea and do not care if the 65C02 or a slot is in control. This is why meeting or exceeding 65C02 timing is all you need.BruceRMcF wrote: ↑Wed Mar 15, 2023 3:00 pm Note that while the 65C02 CPU timings are the foundation, some of the expansion port timings will be due to other parts of the design ... specifically for a bus mastering card, you'd need to know the ROM and RAM speeds and the logic family & number of gates between asserting addresses and generating the ROM and RAM chip selects.
Which was my point above. The DMA will be a state machine. If the internal clock runs at 4 times the system clock, then start of a read or write cycle can be triggered by the drop of the main system clock, and the address asserted on the next drop of the internal clock. That asserts the clock in the middle of the first phase of the system clock.
The 65C02 address set-up at 5v, TA=0-70 degrees C is 30ns. A four beat internal clock places the start of the second beat at +31.25ns plus whatever internal gate delays are happening in the FPGA. So to hit the 65C02 timing, you'd need to crank up the clock to 6 times the system clock (to have an integer number of internal bus cycles per 65C02 clock phase).
But it may be that the maximum gate delay of generating the /ROM_CS, the minimum address set-up time per the datasheet specification of the gates used to generate the /ROM_CS shorter than the 95ns that can be inferred from the 65C02 data sheet and the 8MHz clock frequency. If it is short enough, then generating the address in the middle of the first clock phase works for both ROM and RAM.
Re: Interested in developing a simple DMA controller
I had worked out a timing diagram for RAM some time ago. The RAM access is very tight needing every bit of that 30ns. The ROM timing is slightly less tight having about 5ns of breathing room - this is because ROM chip select decode is a single NAND gate. In either case, they have their respective data ready by the required 10ns setup time before the PHI2 negative edge.
Re: Interested in developing a simple DMA controller
Chip enable can be inferred but you will need at least one more pin for Bus Enable. It is required for the 6502 to tri-state the bus so that you can bus master freely. Just asserting RDY will not do it.StephenHorn wrote: ↑Tue Mar 07, 2023 7:37 pm I think that this will need 29 pins:I'm uncertain if I need a 30th pin for CE ("chip enable"), or if I can safely infer that. I think I might need it and rely on external logic to trigger it, because I don't necessarily want the design to be "stuck" on an exact set of I/O addresses (though I figure the module would take up one of the I/O expansion slots' address ranges).
- 16 address pins
- 8 data pins
- VCC
- GND
- RDY
- PHI2
- RWB