A Tokenizer in RAM Bank 1

rje · Post by **rje** » Thu Sep 02, 2021 5:45 pm

This spring I had written a tokenizer, and pieces of a compiler, and pieces of an interpreter, in C on the X16. I was getting to the hard parts, and its binary was 25K with no sign of letting up, when I put it on the shelf for awhile.

The source and token stream are living on Bank 1, but maybe I've got it backwards... Now I'm wondering if perhaps the tokenizer can live in Bank 1.

Anything to free up main RAM...

BruceMcF · Post by **BruceMcF** » Fri Sep 03, 2021 1:26 am

An advantage of a tokenizer in Bank 1 is that if it outgrows the bank, you can either take "longwinded" routines and put them in Bank 2 or modularize and split modules between Bank 1 and Bank 2 ...

Cross module call, dispatch vectors in $BFxx, CALLXB in $BFxx, Y free:

... : LDX #operation : JSR CALLXB ...

; Bank 1 version

CALLXB: LDA #2 : STA $0000 : JSR + : LDA #2 : STA $0000 : RTS : + JMP ($BF00,X)

; Bank 2 version

CALLXB: LDA #1 : STA $0000 : JSR + : LDA #1 : STA $0000 : RTS: + JMP ($BF00,X)

The magic is, of course, that at "STA $000", you bounce to the other bank's version.

That is a lot more overhead than a subroutine call, but less than a general purpose cross bank call, so it makes program units of either 8KB or 16KB and the distinct units can have the additional overhead of a more general purpose cross bank call.

rje · Post by **rje** » Fri Sep 03, 2021 3:51 am

Well I can have a 1K orchestration program in main RAM that knows how to toggle between banked routines, when it comes to that.

The tokenizer is 5.5 k. Plenty of room.

Tokens are 5 byte structures:


typedef struct {
   uint8_t   type;
   uint8_t   length;
   char     *start_position;
   uint8_t   line;
} Token;

Scripts are limited to 256 lines. The parsed source is diced up into strings and used in situ.

8-Shell, my current 25K attempt at the tokenizer + compiler + interpreter, is a mess, but the intro screen is pretty:

and it can evaluate SOME expressions:

and the logout screen reads a random entry from a FORTUNE file (8K, stored in yet another bank):

BruceMcF · Post by **BruceMcF** » Fri Sep 03, 2021 11:05 am

7 hours ago, rje said:

Well I can have a 1K orchestration program in main RAM that knows how to toggle between banked routines, when it comes to that.

The tokenizer is 5.5 k. Plenty of room. ...

The point there is not general purpose toggling between banked routines but minimizing the overhead in doing so for the cases when you know it's from one specific one to one specific one.

So in this case ... something that fits within 8KB would not use it, of course!

But you can still hold onto that 2.5KB spare space in case one of the other units (interpreter or compiler) is bound by the 8KB bound of a single bank and splitting out one or two longer subroutines, or a distinct submodule that is not in a tight inner loop (eg, an initialization module) to a side bank helps it fit comfortably.