Using the WAI instruction correctly

All aspects of programming on the Commander X16.
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: Using the WAI instruction correctly

Post by StephenHorn »

BruceRMcF wrote: Tue Mar 21, 2023 6:23 pm I would very much like this for any ROM based Forth system, as my approach for defining and combining modules for additional ROM banks is looking toward a "smallest power of two" package with manual fitting together of pieces -- 2^(7+index) where index runs from 0 to 6 for RAM segment modules, 0 to 7 for ROM segment modules ... 128, 256, 512, 1KB, 2KB, 4KB, 8KB, [16KB, ROM only] ... and not having to fit IRQ support into the top page of the ROM segment would substantially simplify module packaging.
Indeed, one of the first reasons I'd heard for this change was so cartridges wouldn't have to reserve space in each of their banks for IRQ handlers and/or vectors. Technically, you can still avoid this requirement by carefully ensuring that your code performs sei before setting the ROM bank to cartridge banks, but I think most people would prefer not having to think about it at all and merely pay the 16 clocks penalty (or however much it was) for the kernal in bank 0 to handle it all on their behalf, allowing them to gleefully pretend that interrupts don't exist except within the confines of whatever handler is pointed to by CINV.

What's unfortunate is that it appears very unlikely this feature will be included with retail gen-1 hardware, which means cartridge developers for gen-2 will have to make a choice whether to support gen-1 or not. Worse is that I'm not certain how we could detect the difference between gen-1 and gen-2, for the purposes of preventing incompatible cartridges from executing in the wrong context, unless gen-2 will implement some kind of hardware marker, such as placing a hardware revision byte somewhere in the reserved ($9F42-$9F5F) chunk of I/O memory. I'll ask about that, see if there's a solution planned.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
TomXP411
Posts: 1806
Joined: Tue May 19, 2020 8:49 pm

Re: Using the WAI instruction correctly

Post by TomXP411 »

Yazwho wrote: Tue Mar 21, 2023 6:34 pm Thats really disappointing. Having the ability to remove the kernel completely and construct your own is one of the most exciting features of the 'cartridge'.

Constricting devs to whatever happens in the Kernel's handler feels awful. It hamstrings projects from the get go. If you're writing things for a cartridge you should understand how the system works, so what is needed is better documentation and tooling. Then we don't need to coddle developers.
Except you can hook the ISR to do whatever you want. The ROM vector doesn't point to a ROM routine; it's simply an address that points somewhere else.

I believe that is currently $0314, which is in RAM. So a cartridge can set up whatever ISR they want there.
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: Using the WAI instruction correctly

Post by StephenHorn »

TomXP411 wrote: Tue Mar 21, 2023 9:42 pm Except you can hook the ISR to do whatever you want. The ROM vector doesn't point to a ROM routine; it's simply an address that points somewhere else.

I believe that is currently $0314, which is in RAM. So a cartridge can set up whatever ISR they want there.
Well, I think Yaz is lamenting that a cartridge would not be able to have the 65C02 jump directly into a wholly bespoke interrupt handler that would, for instance, bypass the check against the break flag and the additional 5 cycles of an indirect jmp, in order to accomplish something extremely quickly, say within a single scanline -- or I'm guessing their hope would be to accomplish something within h-blank except I have significant doubts that anything could be done that quickly. That said, for as long as the capability lasts within the emulators, I welcome anyone to prove me wrong and do something cool. I'm still decidedly out of practice in my high-performance 6502 assembly programming and open to possibility of something being managed.

For reference, the overhead on the current interrupt handling looks like this:
 
	pha
	lda rom_bank
	beq :+
	pla
	jmp banked_irq
:	phx
	phy
	tsx
	lda $104,x      ;get old p status
	and #$10        ;break flag?
	beq puls1       ;...no
	jmp (cbinv)     ;...yes...break instr
puls1:
	jmp (cinv)      ;...irq
 
 
And the overhead could, instead, look more like this:
 
	pha ; And my interrupt handler will not touch x or y for any reason.
 
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: Using the WAI instruction correctly

Post by StephenHorn »

... I could add that's the overhead in the event that the ROM bank happens to already be 0 (plus a pla/jmp that would be skipped), so it's pretty small. Handling an IRQ on a non-zero ROM or cartridge bank would be much more expensive:
 
puls:
	pha
	lda rom_bank
	beq :+          ; branch not taken
	pla
	jmp banked_irq
banked_irq:
	pha
	phx
	lda rom_bank    ;save ROM bank
	pha
	lda #BANK_KERNAL
	sta rom_bank
	lda #>@l1       ;put RTI-style
	pha             ;return-address
	lda #<@l1       ;onto the
	pha             ;stack
	tsx
	lda $0106,x     ;fetch status
	pha             ;put it on the stack at the right location
	jmp ($fffe)     ;execute other bank's IRQ handler
puls:
	pha
	lda rom_bank
	beq :+          ; branch taken
:	phx
	phy
	tsx
	lda $104,x      ;get old p status
	and #$10        ;break flag?
	beq puls1       ;...no
	jmp (cbinv)     ;...yes...break instr
puls1:
	jmp (cinv)      ;...irq
 
 
I'm not sure we need this to be so expensive, though. For instance, why are we re-pushing the interrupted processor status onto the stack? Shouldn't that just be a php instead, saving us both the work of looking up the stack value and the phx?
 
banked_irq:
	pha
	lda rom_bank    ;save ROM bank
	pha
	lda #BANK_KERNAL
	sta rom_bank
	lda #>@l1       ;put RTI-style
	pha             ;return-address
	lda #<@l1       ;onto the
	pha             ;stack
	php
	jmp ($fffe)     ;execute other bank's IRQ handler
 
 
Either way, it's still a fairly large preamble to handle IRQs while the ROM bank is set to anything but 0.
Last edited by StephenHorn on Wed Mar 22, 2023 3:05 am, edited 2 times in total.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
DragWx
Posts: 363
Joined: Tue Mar 07, 2023 9:07 pm

Re: Using the WAI instruction correctly

Post by DragWx »

You know, the fastest you could service an IRQ on the 65c02 is to do something like this:

Code: Select all

; Code code code
; Now I want to specifically wait for an IRQ
SEI
WAI
; Immediately handle the IRQ here
CLI
All IRQs will terminate the WAI instruction, even when interrupts are disabled via SEI (the CPU won't jump to the IRQ vector until the following CLI if and only if the IRQ line is still asserted at that time). If you were looking to have really tight IRQs packed together, this is how you'd do them.

Regarding the CPU vectors being in bankswitched ROM space, it was very common on the NES, when using a memory mapper, to repeat the 6502 vectors and a short code stub (e.g., to temporarily switch to a specific bank) at the end of every ROM bank, so that no matter which random bank was loaded during boot, there was always a copy of valid vectors.



EDIT:
Here, I traced out some timings:

On r42, on IRQ with ROM bank != 0, the preamble takes 89 CPU cycles to complete (i.e., PC address is now CINV).
VERA Line IRQ occurs as the first pixel of the line is about to be rendered.
In VGA mode: there are 256 CPU cycles per scanline, and 204.8 CPU cycles until the VERA reaches hblank.
In NTSC mode: there are 508.16 CPU cycles per scanline, and 409.6 CPU cycles until the VERA reaches hblank.

NOTE: I don't have schematics, so I'm assuming the 8 MHz CPU crystal and the 25 MHz VERA crystal are two separate crystals, so all CPU-cycles-per-VERA-feature timing figures here are approximate and will vary slightly from machine to machine.

Regardless, this seems like plenty of leeway for doing hblank stuff even while allowing the kernal to do its IRQ preamble. If you want to do stuff on consecutive hblanks, you'd need to then ACK the Line IRQ, set it to the next line (remember, it's +2 lines in NTSC mode!), do your HBLANK stuff for this scanline, then WAI to align with the start of the next scanline (the "I" flag is already set, so no SEI necessary), setup/calculate/delay to HBLANK again, etc etc.

Seems ok so far, but I'm sure there's a gotcha in here somewhere. :P

Edit: The gotcha is, you want to set the line IRQ to the line just before the one you want, then ack/update/WAI to the one you actually want, to remove any variable timing from different ROM revisions, branches, etc.
User avatar
Yazwho
Posts: 172
Joined: Fri Feb 19, 2021 2:59 pm
Contact:

Re: Using the WAI instruction correctly

Post by Yazwho »

TomXP411 wrote: Tue Mar 21, 2023 9:42 pm I believe that is currently $0314, which is in RAM. So a cartridge can set up whatever ISR they want there.
Sadly this is not the case.
rom.png
rom.png (22.13 KiB) Viewed 3078 times
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: Using the WAI instruction correctly

Post by StephenHorn »

Your arrow points to $E6F6, except that jmp instruction doesn't execute under normal IRQs. The beq before it branches to $E6F9, and the system performs that jmp instead.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
User avatar
Yazwho
Posts: 172
Joined: Fri Feb 19, 2021 2:59 pm
Contact:

Re: Using the WAI instruction correctly

Post by Yazwho »

Yeah, sorry, the one below.

Edit: And that should probably be flipped around, save 1 cycle every interrupt.
Post Reply