3 minutes ago, paulscottrobson said:
It's worthwhile, but not enough to get out of writing assembler. If it was 4-5 times faster you could write 80s style arcade games pretty much entirely in BASIC.
Please note I'm not trying to denigrate your work at all. I'm just trying to think of ways to:
1. Write a native interpreter; 2. That does a more sophisticated tokenization / crunching process than v2 BASIC; 3. That still keeps around the original form of the source code so that it can be listed and edited; What you've done is great from the perspective of having better tooling to emit BASIC compatible for the platform. My thoughts are more of an intermediary between v2 BASIC and assembly code. Something that could still give the programmer an interactive feeling of immediacy by typing and running their code, but that spends more time optimizing.
At this point it is just a thought exercise that I might never have the time to work on, but it is similar in spirit to what I did with PCBoard Programming Language. The source code was similar to BASIC without line numbers, and it went through a "compilation" phase to create tokenized form. So if you wanted a program like:
PRINTLN "2+3*4-6", 2+3*4-6
It would generate a tokenized form that looked like:
PRINTLN 2 "2+3*4-6" NUL 2 3 4 * + 6 - NUL
Where the first token indicated what statement, followed by a count of expressions, followed by postfix expressions terminated with NUL markers. Each of the tokens was just a reference to a variable table (even constants were stored as variables because I was young and inexperienced and it was the first compiler I ever wrote). Then the BBS had a runtime system / VM in it that knew how to parse the token stream.
My first thought when tokenizing code by this theoretical BASIC interpreter would be that it parses the line into compact tokens, then stores a copy of the human readable form of the token stream, then a copy of the "optimized" (not the best word) sequence of the tokens. So using the example above, maybe a serialized form of the line looks like (labels are for convenient reference, it really is more or less just an array of tokens in three sub-sections):
TOKEN_COUNT = 11
TOKEN_0 = SPACE CHARACTER
TOKEN_1 = PRINTLN
TOKEN_2 = "2+3*4-6"
TOKEN_3 = ,
TOKEN_4 = 2
TOKEN_5 = +
TOKEN_6 = 3
TOKEN_7 = *
TOKEN_8 = 4
TOKEN_9 = -
TOKEN_10 = 6
PRE_COUNT = 12
PRE_TOKENS = T1 T0 T2 T3 T0 T4 T5 T6 T7 T8 T9 T10
POST_COUNT = 12
POST_TOKENS = T1 T4 T2 NUL T4 T6 T8 T7 T5 T10 T9 NUL
This isn't extremely well thought through yet, just stream of consciousness ideas, but it could give one an interactive environment that allows listing and editing of existing statements, while eliminating a significant portion of the runtime cost. The PRE data includes the niceties of original spacing, intuitive expression order with order of operations support. The POST data has already processed the data to a greater extent than v2 BASIC did, so it can more efficiently process the tokenized form.
This can never be as good as a real compiler or assembler that discards the original program text after doing the same transformations, but maybe it could be enough of an enhancement to justify larger BASIC tokenized text in exchange for faster speed. Or maybe not.