Page 5 of 5
SweetCX16
Posted: Tue Jan 18, 2022 3:01 pm
by BruceMcF
On 1/16/2022 at 9:01 AM, BruceMcF said:
Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.
However, after drafting several approaches, the game is not worth the candle ... the smallest I can come up with, without going in and copy and pasting from Woz's code, gets down to 416 bytes from the 496 bytes of the smaller version of the "pure" JUMP (optable,X) version that jumps directly to each OP. With a drop down to 394 bytes available from just adopting Woz's code, (including save/restore register code that Woz's version gets from the Apple II ROM), it's not worth it.
Not, that is, unless someone could find space savings IN Woz's version by doing some decoding, but as spaghetti coded as the original Sweet16 is, that someone would not be me.
If either version of my Sweet16 and Woz's original are assembled to be at the END of GoldenRAM, they each would have a different start point.
However, after translating a copy of Woz's code to acme assembler, with "SAVE" and "RESTORE" in front, I find there are six bytes at the end free before the end of the page. Then I could assemble versions of all three with a two routine jump table at the TOP of golden RAM ($07FA and $CFFA for CX16 and C64 respectively), one for entering Sweet16, the other for entering either SAVE or RESTORE (based on carry set or carry clear). Then the starting point of the routine is flexible, C64 code could enter Sweet16 with JSR $CFFE and CX16 code with JSR $07FE.
That would make it possible to assemble Sweet16 code independent of the choice of Sweet16 VM.
To fit into that, I'm going to shrink the size of my "two page" version by using INC Register and DEC Register subroutines, which will free up as much space as it frees up, and leave my "3 page" version as the full fat speed optimized version.
Edit: What I get is that the "full fat" Sweet16c would occupy $0500-$07FF of Golden Ram, leaving one page (256 bytes) free at $0400. The "two page" Sweet16c2 would occupy $061C-$07FF, leaving 530 bytes (two pages plus 18 bytes) of Golden RAM available at $0400. And the adapted "Sweet 16 original" with SAVE/RESTORE code included and the jump table would occupy $066f-$07FF, leaving 623 bytes (two pages plus 111bytes) of Golden RAM free.
TBC, none of those are tested code, so the final numbers may vary following bug fixes, but those should be the right ball park.
SweetCX16
Posted: Sun Apr 24, 2022 6:51 pm
by BruceMcF
I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.
The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.
I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.
This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.
I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call. "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.
"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page.
At one and the same time, these SYS operations allow the writing of bridge routines to called Kernel routines, as well as routines to extend Sweet16 to include any desired operation. Indexed calls are available by simply using Sweet16 ADD operations and "SYSR" on the result.
Note that "SYSZ" uses zero page address, not register number like Sweet16 instruction codes, so I will also note that a convenient way to include Sweet16 code in your assembly code is to define the opcodes and registers as named byte symbols and use your byte data pseudo-op to include the code. Bytewise OR ("|" in ACME) can be used for the 15 registers that embed their target register in their bytecode, with the register number given rather than the register address. An advantage of this is that "extended" Sweet16 code with SYSZ that is portable between Apple systems, based on 16 pseudo-registers at $00-$1F, and those for the C64/CX16, based on 16 pseudo-registers at $02-$21, can be ported by simply re-assembling with the register symbols set correctly.
Placing $0416 in Register 10 would be done with
!byte ..., SET|10, $16, $04,...
Then using that register to call the routine at the Golden RAM location $0416 would be done with:
!byte ..., SYSZ, Reg10, ...
SweetCX16
Posted: Mon Apr 25, 2022 2:30 pm
by rje
That's a clever use of the assembler to write target-agnostic Sw*t16.
SweetCX16
Posted: Mon Apr 25, 2022 10:02 pm
by BruceMcF
On 4/24/2022 at 2:51 PM, BruceMcF said:
I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.
The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.
I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.
This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.
I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call. "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.
"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page. ...
The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):
Quote
SYSOP:
CPX #$1C ; X = #$1C = 2*SYSR?
BEQ SYS1 ; If so, test register index is in A
BMI SYS2 ; X= #$1A = 2*SYS13, no loading needed
LDY #0 ; Else X = #$1E = 2*SYSZ
LDA (R15L),Y ; ZP address is at (Reg15)
SEC ; Adjust to use R0L,X indexing
SBC #R0L
CLC
SYS1:
JSR SYS3 ; Fetch vector into Reg13, then use
RTS
SYS2:
CLC ; Vector already in Reg13, just use
JSR SYS4
RTS
SYS3:
TAX ; Load Reg13 if needed, ...
LDA R0L,X
STA R13L
LDA R0H,X
STA R13H
SYS4:
JMP (R13L) ; Vectored jump based on (Reg13)
Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.
SweetCX16
Posted: Mon Apr 25, 2022 11:27 pm
by BruceMcF
On 4/25/2022 at 10:30 AM, rje said:
That's a clever use of the assembler to write target-agnostic Sw*t16.
One thing to be careful of is that the code using the Sweet16 VM cannot be in the same namespace as the code implementing the Sweet16 VM, because the namespace uses the "plaintext" names of the operations as addresses of the implementation of the operation, while the code using the Sweet16 VM would have those defined as symbols for the opcode of those operations.
SweetCX16
Posted: Wed Apr 27, 2022 10:24 pm
by BruceMcF
On 4/25/2022 at 6:02 PM, BruceMcF said:
The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):
"..."
Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.
Waitaminute! I just realized that the Sweet16 "status" register is the HIGH byte of Register 14 ... if Register 13 is being "reused" as the temporary store for the SYS jump vector ... so can the low byte of Register 14, allowing a complete JMP() instruction to be built IN the Sweet16 register space. Instead of SYS13, I can have a SYSZ call with a one byte zero page address, and a SYSM call with a two byte absolute address, which can increment Reg15 and use it to grab the high byte of the address.
Quote
SYSOP:
CLC
LDY #0 ; not used in 65C02
CPX #$1C ; 2*$0E = 2*SYSZ = $1C
BMI SYS2 ; 2*$0D = 2*SYSR = $1A -- contents of A is a zero page address
BEQ SYS1 ; If NE, then, 2*$0F = 2*SYSM = SEC to fetch high byte
SEC
SYS1:
LDA (R15L),Y ; fetch zero page address, "LDA (R15L)" in 65C02
SYS2:
STA R13H ; low byte of JMP() operand
LDA #$6C ; JMP() opcode
STA R13L
TYA ; for zero page addressing
BCC SYS4
INC R15L
BNE SYS3
INC R15H
SYS3:
LDA (R15L),Y
CLC
SYS4:
STA R14L
JMP R13L
I'm thinking the Swift16 version would be basically the same, but with three entry points because of no need to "wedge" the call into the common Sweet16 VM opcode page:
Quote
SYSR:
CLC
BRA SYS2
SYSZ:
CLC
BRA SYS1
SYSM:
SEC
SYS1:
LDA (R15L) ; fetch first byte of operand
SYS2:
STZ R14L ; High byte of operand for zero page addressing
STA R13H ; store first byte of operand
LDA #$6C ; JMP() opcode
STA R13L ; JMP() instruction is now built
BCC SYS4
INC R15L
BNE SYS3
INC R15H
SYS3:
CLC
LDA (R15L)
STA R14L
SYS4:
JMP R13L ; returns to Sweet16 VM executive loop
SweetCX16
Posted: Thu Apr 28, 2022 11:15 pm
by BruceMcF
I've been thinking on this, and think that while I was getting closer, I was fighting Sweet16 too much, rather than going along with it.
Given that the calls are going to be machine language routines providing operations IN the Sweet16 source code -- whether all new operations or bridge calls to Kernel calls -- they can be packaged into a jump table or vector table format for access, so what is really needed is an INDEXED machine language call. That fits well with the single byte operand of the Branch operations.
Also, while the contents of Reg13 are purely transitory, since they are overwritten by each CPR operation ... if a "JMP addr" or "JMP (addr)" instruction is written in R13L, R13H and R14L, then these operations can take advantage of the fact that R14L is a "free" single byte register (the "high" byte, R14H, is constantly over-written to point to the register that Zero/Nonzero, Minus1,NotMinus1 refer to after load, arithmetic and comparison operations), so unlike the JMP opcode and the low byte of the operand, the high byte of the operand in R14L can stay resident.
Which leaves me at TWO TYPES of operation, making up the "Tabled System Calls" opecodes: "TBL page", which sets R14L to the desired page (high byte address), and the "SYS n" and "SYSI n" operations, which performs either a jump TO the nth byte of the table page or a jump using the VECTOR at the nth byte of the table page.
In the "Sweet16 wedge", included with the block of SAVE and RESTORE code between the opcode tables and the "opcode page", this would be something like:
Quote
; $0D -- TBL n -- set binary page (high address byte) used for SYS calls
; $0E -- SYS n -- Jump to indexed address of table page
; $0F -- SYSI n -- Jump using indexed vector of table page
SYSOP: LDA #$4C
CPX #$1C
BMI SYS2
BEQ SYS1
LDA #$6C
SYS1: STA R13L
LDY #0
LDA (R15L),Y
STA R13H
JMP R13L
SYS2: LDY #0
LDA (R15L),Y
STA R14L
RTS
where the "Swift16" version would be something like:
Quote
; $0D -- TBL n -- set high page of SYS calls
; $0E -- SYS n -- Jump to indexed address of table page
; $0F -- SYSI n -- Jump using indexed vector of table page
SYS: LDA #$4C
BRA +
SYSI: LDA #$6C
+ STA R13L
LDA (R15L)
STA R13H
JMP R13L
TBL: LDY #0
LDA (R15L),Y
STA R14L
RTS
SweetCX16
Posted: Fri Apr 29, 2022 7:54 pm
by rje
I appreciate you geeking out on this... My brain is too tired to follow it, but I like to see this. And hope to try things out with it.
SweetCX16
Posted: Sun May 01, 2022 1:12 am
by BruceMcF
After the effort of trying to "crunch" the JMP (abs,X) approach to a Sweet16 VM couldn't beat Woz's code for compactness, I've evolved toward a slightly extended version of Woz's Sweet16 as the "compact" VM, Sweet16c for the 65C02 as the "faster, though larger" 65c02" version, and a 65816 version of the VM that can execute mixed 6502/Sweet16 code with the 6502 code running in emulation mode and the Sweet16VM implemented in native 65816 mode.
Now, aside from porting the VM independent of Woz's code, I have two "new" things: the three new System Jump Table opcodes, and the jump table at the end allowing the same code on a system to be able to be used with a variety of Sweet16 VM implementations.
However, assembling the Woz code with my SYSOP "wedge", the page with the opcodes didn't have space for the jump table -- it came up three bytes short.
The first opcode has to be at address $01 or higher in the page, because first "LDA #>SET" is pushed onto the stack, and then the bottom byte of the subroutine return vector is defined with, eg,, "<SET-1". But if SET is at (eg) $0700, then ">SET" is $07 and "<SET-1" is $FF, because SET-1 is $06FF. But then the subroutine return vector on the stack is, effectively, $07FF, which returns to $0800 ... oops!
To be clear, the idea is to tuck the VM up "high" in a memory space ... the top of "Golden RAM", or the top of a HighRAM segment, or etc. The "high entry point" when added to Woz's original VM really has to fit into the end of the same page that has the opcodes.
But if placing the first opcode routine at one past the page boundary, my precious two-operation jump table spills three bytes out of Golden RAM!
The first trick is following Woz's lead with "BPL SETZ" being an effective "BRA SETZ" because branch apps are called after loading A with the offset from Register0 of the register that the status is based on, so the sign flag should always be clear when starting execution of a "Branch Op".
I had already done that with "BPL SYSOP" ... but Woz placed "RTN: JMP RTNZ" at the end of his code. Replacing that with a "RTN: BPL RTNZ" in front of "SET: BPL SETZ" saves one byte.
And then the second trick was a design simplification, winnowing the jump table to just the single "JMP SWEET16". The idea of the second routine in the table was to export the Save/Restore register routines, but it is possible to set things up so that that their addresses can be inferred, so I've settled for that.
Now it all JUST fits. And ... with a single byte to spare!
NOTE: The idea I am have been toying that makes direct access to register restore an issue for interspersed Sweet16 and 6502 code is to make the state of carry significant when entering Sweet16: with carry clear, state is stored on entry and restored on exit, with carry set. So if originally called with carry clear, then returning to 6502 code for some task before returning to Sweet16 code with carry set, the ORIGINAL state stored when first entering Sweet16 is still there, and at the end of the WHOLE process, Sweet16 can return to 65C02 code which can end with a JUMP to restore the state, where the restore state subroutine returns to the caller. And of course, say, fetching the call address at the end of the Sweet16 VM, subtracting two from it and fetching the word at that address in a Sweet16 register (that the process won't be using) is a very short routine in Sweet16 code. If it was Reg11, the terminating 6502 ending code might end with JMP (Reg11) to restore the register state when the whole combined routine was first called.
SweetCX16
Posted: Thu May 05, 2022 11:05 pm
by BruceMcF
OK, cracked it. Since I have exactly one byte leeway in my "augmented version", what I am doing is this:
START POINT ; Doesn't have to be first byte of VM, but often is
JSR PUTSTATE
...
GETSTATE:
LDA REGP
PHA
LDA REGA
LDX REGX
LDY REGY
PLP
RTS
PUTSTATE:
PHP
STA REGA
STX REGX
STY REGY
PLA
STA REGP
RTS
...
GS_OFFSET: !byte (PUTSTATE - GETSTATE)
; ENTRY POINT
JMP SWEET16
... In other words, the final word of the VM is implicitly a handle for SAVE ... it contains a pointer to one less than the pointer to the SAVE routine.. So if I know how far RESTORE, aka GETSTATE is located (within 255 bytes), I can build my own jump table or vector table. That offset is contained in the byte before the entry point.
The limitations on ANY Sweet16 VM using this system would be that the SAVE routine must FOLLOW the RESTORE routine, and be within 255 bytes of it.
It is arbitrary which one must be first, so this follows the Apple2 ROM addresses of register "SAVE" at $FF4A and register "RESTORE" at $FF3F, so a RAM based "augmented Sweet16" for an Apple II could re-use the Apple II ROM SAVE and RESTORE.
For the direct additions to Woz's original Sweet16 source code, I don't have an open source licensed copy (even if clearly Woz won't mind!), I can distribute additions to the source available at 6502.org, so that must follow the naming in the original, but for my own implementation, I avoid calling them "SAVE" and "RESTORE" to avoid confusion with C64 KERNAL / CX16 Kernal routines.