The possibility of moving to a 65816 instead of a 6502

Yazwho · Post by **Yazwho** » Thu Feb 15, 2024 5:58 pm

Ed Minchau wrote: ↑Thu Feb 15, 2024 4:18 pm This is something that should have been decided four years ago, not as we are shipping. It requires changes to the Kernal and forces changes to the whole software base. There's a bunch of commands we can't use anymore, all our interrupt routines have to change, what else?

I can't understand why this is even being considered at this late date. If we were going to use the 816 then we should have been using it right from the start.

This.

If we're going to have a seismic change, then all parts of the system should be open for review, and probably should be called something else.

m00dawg · Post by **m00dawg** » Thu Feb 15, 2024 6:23 pm

Yazwho wrote: ↑Thu Feb 15, 2024 5:58 pm
Ed Minchau wrote: ↑Thu Feb 15, 2024 4:18 pm This is something that should have been decided four years ago, not as we are shipping. It requires changes to the Kernal and forces changes to the whole software base. There's a bunch of commands we can't use anymore, all our interrupt routines have to change, what else?

I can't understand why this is even being considered at this late date. If we were going to use the 816 then we should have been using it right from the start.
This.

If we're going to have a seismic change, then all parts of the system should be open for review, and probably should be called something else.

I think there's some potential confusion/misinformation here and invite folks to read the linked article as it covers some of these very concerns.

I spent some time talking to the folks working on the ROM over on Discord. Nothing there is changing from how it works. For the C02 (or emulated mode on the 816), the interrupt vectors, KERNAL call locations, etc. are all the same. Existing code works aside from those 4 instructions. Even if the X16 remains as a C02 machine, David himself asked folks not to use those moving forward just in case and the docs have now been updated to reflect this.

I myself got burned by this and was a bit salty as well. But in hindsight it took me 30 minutes to create macros for the instructions with software alternatives (see https://gitlab.com/m00dawg/dream-tracke ... heads#L419 - similar ones are also in the KERNAL code too) and update my code. That's annoying, for sure, but I think 816 support may stand to expand interest in the platform and for me it was worth the short inconvenience.

That said, if you don't want to use 816 instructions, you're in no way obligated to. And today you _should not_. Beyond those 4 instructions, you can write C02 applications and they will run on X16s that have an 816 or a C02.

I'm definitely on board with the X16 being a stable platform, especially now that it is in people's hands, and noting it's nearly impossible to keep up with new retro solutions which are constantly being replaced or updated. We're building an ecosystem so the platform has to be stable. I'm with folks here.

But I'm willing to concede the 816 since it's a drop-in and other than those 4 instructions, is compatible. But again, *no decision has been made at this time on whether the X16 will have an official 816 upgrade path*. The reason this came about was economics of the subsequent X16's (Console/Gen2 and Edu/Gen3) as those may be using SMD CPUs which would be difficult to upgrade. So if the 816 was fully compatible to the ecosystem, it leaves the door open for a future upgrade path. I can understand where they are coming from here since, though the prices move around, there's times where the 816 is cheaper than the C02.

DragWx · Post by **DragWx** » Thu Feb 15, 2024 7:45 pm

Yeah, I'm not against the '816, and before reading any of the documentation, I'd already assumed the advice to avoid the 4 problematic opcodes was specifically for leaving this door open, but I haven't written any software that uses these opcodes, so I'm not affected by it the way others are.

The literal only thing I feel "strongly" about is, I'm only familiar with the 6502 (not even the 'C02, just the regular NMOS 6502), with the '816 being unfamiliar territory, but that's a fairly simple problem to solve, so I don't even feel that strongly.

...now I'm just waiting to see how the Kernal ROM would need to change, because that's the next rug-pull.

m00dawg · Post by **m00dawg** » Thu Feb 15, 2024 7:54 pm

Yeah you'll be fine in that case. Original 6502 didn't have those 4 instructions either and the 816 will happily run in emulated (6502) mode. The KERNAL will have to be updated for 816 support for sure, but all the stuff pertinent to the 6502/emulated mode isn't changing as I understand it.

BruceRMcF · Post by **BruceRMcF** » Thu Feb 15, 2024 11:45 pm

DragWx wrote: ↑Thu Feb 15, 2024 4:51 pm ... Programming wise:
The 65816 gives you actual 16-bit operations, which means one opcode can handle loading two bytes of memory into a 16-bit accumulator, and you can do actual 16-bit math on that accumulator.

Indexed memory access (e.g., LDA $nnnn,X) can use a 16-bit index so you can have more than 256 bytes in a single table (for example).

The stack pointer is a full 16 bits, which means the CPU stack can be located anywhere in memory, not just $01xx, and I'm assuming that means the stack can hold more than 256 bytes.

Instead of zeropage addressing, you have "direct" addressing, which is just like zeropage except you can move around to anywhere you want. ...

Yes. Add all of these up, and a VM written for the 65816 (p-code or m-code Pascal, Basic itself, Wozniak's Sweet16) just runs faster.

Forth is similar ... a Forth written for the 65816 will typically execute around twice as fast as a Forth with the same underlying model on the 65C02.

However, if you are familiar with 6502 assembly language, or the original 65C02 extensions, you can set the chip into "E" mode and just run it as a 65C02 system.

However, if you do that you only get the original WDC 65C02 extensions. The four Rockwell additions (which WDC later added to their 65C02 so they could be a second source for modems designed to use the R65C02) are not included.

One of the biggest advantages of the 65816 which is sadly unavailable on the X16 is its ability to address up to 16 MB of memory. The way it works is, you have two "bank select" registers: one used when the CPU is reading opcodes and arguments (PBR) and one used when the program is accessing memory (DBR), so for example, instead of needing to write to a memory address to switch banks, you use an opcode to load PBR or DBR with the desired bank number, and you can have the program running in one bank and have it manipulate memory in a different bank at the same time.

But the flipside is that the processor might have to be slowed down for the Dev Board, since you have to wait until you have latched the high segment address before you can start doing the address decoding, and the DIP package 512KB SRAMs aren't the fastest parts. I know that the Feonix256/Feonix256Jr, which are compatible with being upgraded to a 65816, run at 6.29MHz rather than 8MHz.

On the X16, the A16..A23 output from the 65816 is ignored, so you still see the same memory map that a 6502 sees (regardless of PBR and DBR), complete with the bankswitching regions and the memory-mapped bank select registers at $0000 and $0001.

So it's an upgrade to the new instructions and the new 16bit address modes, but the bank registers are simplified and the 24bit address modes are redundant.

Edit: There's also a "block move" command but I haven't looked into how it works just yet. I imagine it just copies a chunk of memory from one location to another with just one opcode.

There are two. 16-bit X points to the source data, 16-bit Y points to the destination for the data, 16-bit A contains the number of bytes to move minus 1 (it is post-decremented and when it underflows, the instruction is finished. It takes 7 clocks per byte + 3 clocks overhead. It is a three byte instruction, because it includes the source bank and the destination bank as immediate operands to the instruction.

The reason there are two is because of what happens when the source and data overlap:

[low address] ... [Source-Start] ... [Dest-Start] ... [Source-End] ... [Dest-end] ... [high address]

This is a "Move-uP" or "Move-Positive". You have to start at the end and work down to the beginning, so that the source data at the end has been moved before that space is needed for destination data. So you set up the MVP with X and Y pointing to the last address of both blocks and it decrements X and Y as it goes.

[low address] ... [Dest-Start] ... [Source-Start] ... [Dest-End] ... [Source-end] ... [high address]

This is a "Move-dowN" or "Move-Negative". You have to start at the beginning and work up to the end, so that the source data at the beginning has been moved before that space is needed for destination data. So you set up the MVN with X and Y pointing to the beginning addresses of both blocks and it increments X and Y as it goes.

Assuming moving of whole pages, a 6502 MVN is (MVN assumes A, X and Y set up, so I also assume (SRC) and (DST) and a one byte N count of pages set up here:

Code: Select all

    LDY #$FF
-   LDA (SRC),Y
    STA (DST),Y
    DEY
    BNE - 
    LDA (SRC),Y
    STA (DST),Y
    DEC N
    BEQ +
    DEC SRC+1
    DEC DST+1
    DEY
    BRA -
+   ...

... roughly 17 cycles per byte, so the 6502 block move takes about 140% more clock cycles than the 65816.

Ser Olmy · Post by **Ser Olmy** » Fri Feb 16, 2024 12:38 am

The positive consequences of moving to a 65816 are advantages all 16-bit systems have over similar 8-bit platforms. As this has obviously been the case since the first 16-bit CPU was made, I struggle to see why this change is being proposed now, nearing the end of Phase 1 of the X16 project.

The one developer I know of who has been doing serious 3D maths on the X16 (Ed Minchau, creating a 3D engine, no less) has explained how this will negatively affect his codebase, but it seems to me that his misgivings aren't given due consideration.

Calculon · Post by **Calculon** » Fri Feb 16, 2024 1:24 am

Ser Olmy wrote: ↑Fri Feb 16, 2024 12:38 am I struggle to see why this change is being proposed now, nearing the end of Phase 1 of the X16 project.

It was mentioned a while ago in an 8BG video that the dev board is designed to be electrically compatible with the 65C816. So dev board owners have the opportunity to swap CPUs if they choose.

But because the Phase 2 / console board won't have a socketed CPU, they are investigating now to see what the impact would be if they used a 65C816 there, and relied on its backward compatibility with the 6502.

Hopefully that helps you to see why it's being looked into, although you may disagree with it being done at all. Since no decision has been made yet, this is the opportunity for everyone's concerns to be given consideration. Obviously this also means that some people (we don't know who yet) will end up not getting what they want and will need to adjust. We'll see how it plays out.

Personally, I think it's nice to have the option for a more advanced CPU. Even without the extra memory space, I like the addressing modes that make it more friendly to high-level languages like C. But if I don't get my wish, I'll carry on without those features.

Ed Minchau · Post by **Ed Minchau** » Fri Feb 16, 2024 3:10 am

Ser Olmy wrote: ↑Fri Feb 16, 2024 12:38 am.

The one developer I know of who has been doing serious 3D maths on the X16 (Ed Minchau, creating a 3D engine, no less) has explained how this will negatively affect his codebase, but it seems to me that his misgivings aren't given due consideration.

I'm not the only one. Jeffrey had the first test out there years ago with Wolf3d, and kgsws is probably a bit ahead of me with his 3d game engine.

m00dawg · Post by **m00dawg** » Fri Feb 16, 2024 3:40 am

Having had to update my own code, I'm not sure I really buy the doom and gloom of having to update the 4 instructions, especially for a 3D engine where surely the 16-bit math and the (potential, it's still being worked on) ability to write to both VERA registers at the same time would be a huge win. I linked to my macros a few comments ago, but here it is just in case no one took a peek:

Code: Select all

;; For 65816 support, simulates the bit# instructions of the 65C02S
;; Used with minor modifications from the x16-rom mac.inc
.macro bbs bit_position, data, destination
	.if (bit_position = 7)
		bit data
		bmi destination
	.else
		.if (bit_position = 6)
			bit data
			bvs destination
		.else
			lda data
			and #1 << bit_position
			bne destination
		.endif
  .endif
.endmacro

.macro bbr bit_position, data, destination
	.if (bit_position = 7)
		bit data
		bpl destination
	.else
		.if (bit_position = 6)
			bit data
			bvc destination
		.else
			lda data
			and #1 << bit_position
			beq destination
		.endif
	.endif
.endmacro

.macro rmb bit, destination
  lda #$1 << bit
  trb destination
.endmacro

.macro smb bit, destination
	lda #$1 << bit
  tsb destination
.endmacro
;;

I modified these slightly for my specific app from the KERNAL source. Yes, these are _slightly_ less efficient, at least in terms of code size, than the original instructions. But that minor tradeoff plus the time taken to effectively do a find/replace is well worth the enhancements of the 816 if pushing pixels as fast as possible is the goal.

The negative comments dunking on David and the project I think are unfair. The source of some of these coming from folks that I'm only not sure will ever be happy no matter what direction the project goes. Either way, David well knows the need for a stable platform, this is part of the reason the X16 is designed the way it is. I can't imagine a CPU swap, which by the way *is not official yet and for all we know may never be* was taken lightly. Certainly given the spirited conversations on Discord when this was first discussed.

I won't hide my bias though. I was among the folks initially deeply concerned about swapping out the 65C02 for the 65816, for some of the reasons mentioned here (including the 4 instructions which definitely burned me too). But I understand the economics and I find the software based solution reasonable to the cause.

Obviously folks do not have to agree, but it's a Kobayashi Maru regardless. There's going to be folks upset and folks very happy. Certainly if 816 support required taking knives to purchased X16 Dev boards, I'd be a lot more vocal in support for the 65C02. But it seems very unlikely that would be the chosen path given the ROM based solutions. Some one time minor discomfort in implementing the above macros seems reasonable even if it's not going to make some folks happy. At this stage, other than testing my apps on both CPUs, I might not end up using many 65816 just to make sure my app works on either CPU. Certainly making any 65816 changes to applications now would be premature.

Ultimately David has invested waaay more monetarily than I have - sure I've invested time in writing programs, but in my case, some "find and replace in file" operations later and I was back up and running within 30 minutes to the point I will admit I was being hyperbolic originally about the lift those changes would require.

BruceRMcF · Post by **BruceRMcF** » Fri Feb 16, 2024 3:48 am

Ed Minchau wrote: ↑Thu Feb 15, 2024 4:18 pm This is something that should have been decided four years ago, not as we are shipping. It requires changes to the Kernal and forces changes to the whole software base. There's a bunch of commands we can't use anymore, all our interrupt routines have to change, what else?

If the Rockwell extensions to the 65C02 instruction set have snuck into the Kernel, it would indeed require changes to the Kernel, though I don't personally know if that is the case. One omission I would consider to be a mistake, given the system floating in this undecided state from the announcement of the 64KB memory map -- where at the outset development would proceed on the 65C02 and "we would see" whether a future move to the 65816 would be possible -- was not having an E-mode option in the emulator, which would execute the Rockwell extensions as NOPs, as on the original W65C02's.

I can't understand why this is even being considered at this late date. If we were going to use the 816 then we should have been using it right from the start.

It seems like it's because it is an early date in the design of the console version, and the compromise hardware-compatible socket they went with for the Gen1 Dev Board doesn't work for a pick and place SMB soldered CPU. That would be forcing the long-time unresolved design choice to be finally made one way or the other.

Given what seems to have been a bit of naivete about the production process, I would not be surprised if there was a general notion four years ago that the same flexible-socket approach would work for Gen2.

Commander X16

The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502

Re: The possibility of moving to a 65816 instead of a 6502