DragWx wrote: ↑Thu Feb 15, 2024 4:51 pm ... Programming wise:
The 65816 gives you actual 16-bit operations, which means one opcode can handle loading two bytes of memory into a 16-bit accumulator, and you can do actual 16-bit math on that accumulator.
Indexed memory access (e.g., LDA $nnnn,X) can use a 16-bit index so you can have more than 256 bytes in a single table (for example).
The stack pointer is a full 16 bits, which means the CPU stack can be located
anywhere in memory, not just $01xx, and I'm assuming that means the stack can hold more than 256 bytes.
Instead of zeropage addressing, you have "direct" addressing, which is just like zeropage except you can move around to anywhere you want. ...
Yes. Add all of these up, and a VM written for the 65816 (p-code or m-code Pascal, Basic itself, Wozniak's Sweet16) just runs faster.
Forth is similar ... a Forth written for the 65816 will typically execute around twice as fast as a Forth with the same underlying model on the 65C02.
However, if you are familiar with 6502 assembly language, or the original 65C02 extensions, you can set the chip into "E" mode and just run it as a 65C02 system.
However, if you do that you only get the original WDC 65C02 extensions. The four Rockwell additions (which WDC later added to their 65C02 so they could be a second source for modems designed to use the R65C02) are not included.
One of the biggest advantages of the 65816 which is sadly unavailable on the X16 is its ability to address up to 16 MB of memory. The way it works is, you have two "bank select" registers: one used when the CPU is reading opcodes and arguments (PBR) and one used when the program is accessing memory (DBR), so for example, instead of needing to write to a memory address to switch banks, you use an opcode to load PBR or DBR with the desired bank number, and you can have the program running in one bank and have it manipulate memory in a different bank at the same time.
But the flipside is that the processor might have to be slowed down for the Dev Board, since you have to wait until you have latched the high segment address before you can start doing the address decoding, and the DIP package 512KB SRAMs aren't the fastest parts. I know that the Feonix256/Feonix256Jr, which are compatible with being upgraded to a 65816, run at 6.29MHz rather than 8MHz.
On the X16, the A16..A23 output from the 65816 is ignored, so you still see the same memory map that a 6502 sees (regardless of PBR and DBR), complete with the bankswitching regions and the memory-mapped bank select registers at $0000 and $0001.
So it's an upgrade to the new instructions and the new 16bit address modes, but the bank registers are simplified and the 24bit address modes are redundant.
Edit: There's also a "block move" command but I haven't looked into how it works just yet. I imagine it just copies a chunk of memory from one location to another with just one opcode.
There are two. 16-bit X points to the source data, 16-bit Y points to the destination for the data, 16-bit A contains the number of bytes to move minus 1 (it is post-decremented and when it underflows, the instruction is finished. It takes 7 clocks per byte + 3 clocks overhead. It is a three byte instruction, because it includes the source bank and the destination bank as immediate operands to the instruction.
The reason there are two is because of what happens when the source and data overlap:
[low address] ... [Source-Start] ... [Dest-Start] ... [Source-End] ... [Dest-end] ... [high address]
This is a "Move-uP" or "Move-Positive". You have to start at the end and work down to the beginning, so that the source data at the end has been moved before that space is needed for destination data. So you set up the MVP with X and Y pointing to the last address of both blocks and it decrements X and Y as it goes.
[low address] ... [Dest-Start] ... [Source-Start] ... [Dest-End] ... [Source-end] ... [high address]
This is a "Move-dowN" or "Move-Negative". You have to start at the beginning and work up to the end, so that the source data at the beginning has been moved before that space is needed for destination data. So you set up the MVN with X and Y pointing to the beginning addresses of both blocks and it increments X and Y as it goes.
Assuming moving of whole pages, a 6502 MVN is (MVN assumes A, X and Y set up, so I also assume (SRC) and (DST) and a one byte N count of pages set up here:
Code: Select all
LDY #$FF
- LDA (SRC),Y
STA (DST),Y
DEY
BNE -
LDA (SRC),Y
STA (DST),Y
DEC N
BEQ +
DEC SRC+1
DEC DST+1
DEY
BRA -
+ ...
... roughly 17 cycles per byte, so the 6502 block move takes about 140% more clock cycles than the 65816.