BASIC 2? Why not get BASIC 7?

TomXP411 · Post by **TomXP411** » Tue Mar 16, 2021 5:19 pm

On 3/13/2021 at 8:29 PM, BruceMcF said:

Oh my goodness yes ... even in a hosted Basic transpiler, 80 columns and 30ish rows is enough that there is no reason to have crunched source.

Also, on indents, while eight column tabstops is the traditional 80s standard for programming and five column tabstops the standard for text editing, to my eye either three or four column tabstops work fine. When working on BMW, my pure text character line editor for & in Forth, I had a first line tabstop define, where if the first line was a comment line:

I go with 4 column tabs, as this gives a clear indent, but doesn't walk the text all the way across the screen. If I find I've indented more than 20 columns, that means I've got too many nested blocks, and I probably need a procedure, CASE statement, or more complex IF predicates.

And I've never appreciated the habit of the Commodore community to write crunched code... while it's often necessary from a performance of space standpoint, don't do it while explaining things to other people. I've seen people give example code like FORI=1TO20STEP3 to new programmers, and I always have to stop and say something. It's fine to crunch code when you know what you're doing and need the bytes, but never, ever give crunched code in an example. (Unless the topic is "how to crunch code.")

paulscottrobson · Post by **paulscottrobson** » Thu Mar 18, 2021 5:45 am

Progress update, at about 4 weeks as of v0.17 (very alpha)

So far :

32 bit integer and string types (hooks for floating point)

long variable names.

Integer and string functions and operators, BBC Basic style indirection.

While/Repeat/For/Multilevel Ifs

Procedures

Locals/Parameters

VPEEK/VPOKE and VDEEK/VDOKE because there are so many 16 bit values.

Inline 65C02 assembler

Multidimensional Arrays

Tokenising / Detokenising, interactive editing, Load and Save from the console.

Text I/O functions, Timer function, Event function.

Usable Python workflow

Next up:

Some Sprite commands (X16 specific)

Links to the OS drawing commands .... when I can figure out exactly how they work ?

Write a couple of games as a test of the BASIC, the individual tests are somewhat limited.

Something like AMAL for the Amiga, a sort of simple scripting language to animate stuff in the background with low resource usage.

Floating Point

Current module usage. Each module runs through a single link, so it should be possible for example to load it into 2 or 3 pages of $A000 RAM and link it through page switching code without much of a performance hit.

Section "header"         $1000-$102d (45 bytes)

Section "assembler"      $102d-$138b (862 bytes) (inline assembler)

Section "device"         $138b-$14e0 (341 bytes) (I/O virtualisation)

Section "error"          $14e0-$1815 (821 bytes) (Error messages, multiple languages)

Section "extension"      $1815-$1924 (271 bytes) (X16 specific stuff)

Section "floatingpoint" $1924-$1929 (5 bytes) (floating point routines)

Section "interaction"    $1929-$1b0b (482 bytes) (the console I/O stuff, program editing etc.)

Section "main"           $1b0b-$31a8 (5789 bytes) (core interpreter)

Section "string"         $31a8-$3458 (688 bytes) (string functions/string memory management)

Section "tokeniser"      $3458-$3c50 (2040 bytes) (tokenise/detokenise program lines)

Section "variable"       $3c50-$401a (970 bytes) (variables and arrays)

Section "footer"         $401a-$4100 (230 bytes)

Approximate total 12544 bytes.

mobluse · Post by **mobluse** » Tue Mar 30, 2021 8:50 pm

I developed RatBAS2 in order to generate large BASIC v.2 programs from C-like source code:

https://github.com/mobluse/ratbas2

RatBAS2 is similar to RatFOR (Rational Fortran). I discovered that RatFOR could be used to generate BASIC since it is similar to Fortran 66 and modern RatFOR has a disabled Fortran 66 mode that I could enable.

RatBAS uses only IF...GOTO, but BASIC does have FOR...NEXT loops, ON...GOTO, GOSUB, and ON...GOSUB also. Unfortunately the program that turns a program in BASIC with labels into BASIC with line numbers doesn't handle the ON... constructs. FOR... NEXT is probably faster than using IF...GOTO.

Snickers11001001 · Post by **Snickers11001001** » Wed Mar 31, 2021 3:35 am

On 3/13/2021 at 9:29 PM, BruceMcF said:

.. there is no reason to have crunched source..

For 'old school' Commodore, crunching was habit both because of that 80 character max on a line limit, and the speed benefit (both because a : between two things done in sequence was faster than doing the exact same two things on two subsequent lines, (ah, linked lists... thanks Bill Gates!) and the spaces themselves slowed things down).

But as BASIC is still an interpreted language, I would presume that is the case even with the transpilers or the like. An interpreter has to step through every byte and ask "what is this thing" as it goes. Bill Gates stuck the CHRGET code (the core of BASIC parsing) into Zeropage on the C64 because it was so important to speed.

On the X16 emulator, I just did this on the command line, with and without spaces (represented as [] below):

TI=0: FOR[]I[]=[]1[]to[]10000[]:[]A[]=[]1[]:[]NEXT:?TI

With the spaces the result was that elapsed TI was 266.

Without the spaces, elapsed TI was 258

That's a 3% performance loss for a very routine bunch of "stuff people do in basic." (To be fair, I think leaving spaces in was closer to 4 or 5% hit on the C64, so our CX16 wizards are already ahead of the game).

For those of you guys doing the 'modern' BASICs, are you still doing a run time interpreter? How are you optimizing to deal with spaces compared with how the original '64 might have done? I envision there still needing to be some mechanism to _scan_ through the BASIC, execute commands, parse variables, perform jumps etc; although, it occurs to me that with modern speed and memory it might be possible to hold some sort of optimized runtime embodiment (tokens, no-non-operating bytes outside of structure, operators, variables, etc) and even to have direct memory pointers substituted for variable names etc, sort of a real time version of what the old basic compilers did.

Interesting projects, which I'll follow with interest for sure.

Scott Robison · Post by **Scott Robison** » Wed Mar 31, 2021 4:42 am

31 minutes ago, Snickers11001001 said:

For 'old school' Commodore, crunching was habit both because of that 80 character max on a line limit, and the speed benefit (both because a : between two things done in sequence was faster than doing the exact same two things on two subsequent lines, (ah, linked lists... thanks Bill Gates!) and the spaces themselves slowed things down).

But as BASIC is still an interpreted language, I would presume that is the case even with the transpilers or the like. An interpreter has to step through every byte and ask "what is this thing" as it goes. Bill Gates stuck the CHRGET code (the core of BASIC parsing) into Zeropage on the C64 because it was so important to speed.

On the X16 emulator, I just did this on the command line, with and without spaces (represented as [] below):

TI=0: FOR[]I[]=[]1[]to[]10000[]:[]A[]=[]1[]:[]NEXT:?TI

With the spaces the result was that elapsed TI was 266.

Without the spaces, elapsed TI was 258

That's a 3% performance loss for a very routine bunch of "stuff people do in basic." (To be fair, I think leaving spaces in was closer to 4 or 5% hit on the C64, so our CX16 wizards are already ahead of the game).

For those of you guys doing the 'modern' BASICs, are you still doing a run time interpreter? How are you optimizing to deal with spaces compared with how the original '64 might have done? I envision there still needing to be some mechanism to _scan_ through the BASIC, execute commands, parse variables, perform jumps etc; although, it occurs to me that with modern speed and memory it might be possible to hold some sort of optimized runtime embodiment (tokens, no-non-operating bytes outside of structure, operators, variables, etc) and even to have direct memory pointers substituted for variable names etc, sort of a real time version of what the old basic compilers did.

Interesting projects, which I'll follow with interest for sure.

That's an excellent point about the overhead of reading the extra characters just to skip whitespace. Many probably already realize it, but when evaluating an expression, BASIC has to parse numbers every time. So using this little example:

10 TI = 0

20 FOR I = 1 TO 1000

30 A = 200*300*500*700

40 NEXT I

50 PRINT TI

Running that took 1290 jiffies through the loop. Adding a line 15 and modifying line 30:

15 W=200:X=300:Y=500:Z=700

30 A = W*X*Y*Z

Running that only takes 716 jiffies (45% faster by not having to parse four digit sequences to floating point numbers each time through the loop).

I think there are many ways to approach a "better" BASIC interpreter, but the things that will always take more time are interpreting the human readable form of code (such as infix expressions) into something the computer can work with (converting digit sequences, applying operator precedence, and so on).

I think it would be interesting to write an interpreter that tokenizes the lines more than just replacing keywords with tokens. Actually pre-evaluate digit strings into the equivalent binary formats. Convert the infix notation to postfix notation so that the interpreter didn't have to redo it every time. Replace constant expressions with their equivalent values. Other optimizations could be done. Those are things a compiler does, but a compiler also (generally, usually) discards the original text of the line afterward as it is of no value to the computer. An interpreter that is intended to be used to both edit and run code would need to keep some information about the original format of the text so that it could be "listed" for humans.

Perhaps these ideas belong in a different topic...

paulscottrobson · Post by **paulscottrobson** » Wed Mar 31, 2021 10:54 am

6 hours ago, Scott Robison said:

I think it would be interesting to write an interpreter that tokenizes the lines more than just replacing keywords with tokens. Actually pre-evaluate digit strings into the equivalent binary formats. Convert the infix notation to postfix notation so that the interpreter didn't have to redo it every time. Replace constant expressions with their equivalent values. Other optimizations could be done. Those are things a compiler does, but a compiler also (generally, usually) discards the original text of the line afterward as it is of no value to the computer. An interpreter that is intended to be used to both edit and run code would need to keep some information about the original format of the text so that it could be "listed" for humans.

Some and some. Infix to Postfix would work, except that if you are (in this example) doing multiplies the 6502 time spent doing that tends to dwarf the time climbing the parse tree (which actually is fairly minimal). My Basic does most of these and it's probably only 40-50% quicker than Microsoft BASIC, much of which is down to 32 bit math being 32 bit math. The only thing that isn't tokenised is text in quoted strings, and that's stored with a precalculated length.

Scott Robison · Post by **Scott Robison** » Wed Mar 31, 2021 10:57 am

2 minutes ago, paulscottrobson said:

Some and some. Infix to Postfix would work, except that if you are (in this example) doing multiplies the 6502 time spent doing that tends to dwarf the time climbing the parse tree (which actually is fairly minimal). My Basic does most of these and it's probably only 40-50% quicker than Microsoft BASIC, much of which is down to 32 bit math being 32 bit math. The only thing that isn't tokenised is text in quoted strings, and that's stored with a precalculated length.

I call 40 to 50% significant! ?

paulscottrobson · Post by **paulscottrobson** » Wed Mar 31, 2021 11:00 am

Just now, Scott Robison said:

I call 40 to 50% significant! ?

It's worthwhile, but not enough to get out of writing assembler. If it was 4-5 times faster you could write 80s style arcade games pretty much entirely in BASIC.

Scott Robison · Post by **Scott Robison** » Wed Mar 31, 2021 11:48 am

3 minutes ago, paulscottrobson said:

It's worthwhile, but not enough to get out of writing assembler. If it was 4-5 times faster you could write 80s style arcade games pretty much entirely in BASIC.

Please note I'm not trying to denigrate your work at all. I'm just trying to think of ways to:

1. Write a native interpreter;

2. That does a more sophisticated tokenization / crunching process than v2 BASIC;

3. That still keeps around the original form of the source code so that it can be listed and edited;

What you've done is great from the perspective of having better tooling to emit BASIC compatible for the platform. My thoughts are more of an intermediary between v2 BASIC and assembly code. Something that could still give the programmer an interactive feeling of immediacy by typing and running their code, but that spends more time optimizing.

At this point it is just a thought exercise that I might never have the time to work on, but it is similar in spirit to what I did with PCBoard Programming Language. The source code was similar to BASIC without line numbers, and it went through a "compilation" phase to create tokenized form. So if you wanted a program like:

PRINTLN "2+3*4-6", 2+3*4-6

It would generate a tokenized form that looked like:

PRINTLN 2 "2+3*4-6" NUL 2 3 4 * + 6 - NUL

Where the first token indicated what statement, followed by a count of expressions, followed by postfix expressions terminated with NUL markers. Each of the tokens was just a reference to a variable table (even constants were stored as variables because I was young and inexperienced and it was the first compiler I ever wrote). Then the BBS had a runtime system / VM in it that knew how to parse the token stream.

My first thought when tokenizing code by this theoretical BASIC interpreter would be that it parses the line into compact tokens, then stores a copy of the human readable form of the token stream, then a copy of the "optimized" (not the best word) sequence of the tokens. So using the example above, maybe a serialized form of the line looks like (labels are for convenient reference, it really is more or less just an array of tokens in three sub-sections):

TOKEN_COUNT = 11

TOKEN_0 = SPACE CHARACTER

TOKEN_1 = PRINTLN

TOKEN_2 = "2+3*4-6"

TOKEN_3 = ,

TOKEN_4 = 2

TOKEN_5 = +

TOKEN_6 = 3

TOKEN_7 = *

TOKEN_8 = 4

TOKEN_9 = -

TOKEN_10 = 6

PRE_COUNT = 12

PRE_TOKENS = T1 T0 T2 T3 T0 T4 T5 T6 T7 T8 T9 T10

POST_COUNT = 12

POST_TOKENS = T1 T4 T2 NUL T4 T6 T8 T7 T5 T10 T9 NUL

This isn't extremely well thought through yet, just stream of consciousness ideas, but it could give one an interactive environment that allows listing and editing of existing statements, while eliminating a significant portion of the runtime cost. The PRE data includes the niceties of original spacing, intuitive expression order with order of operations support. The POST data has already processed the data to a greater extent than v2 BASIC did, so it can more efficiently process the tokenized form.

This can never be as good as a real compiler or assembler that discards the original program text after doing the same transformations, but maybe it could be enough of an enhancement to justify larger BASIC tokenized text in exchange for faster speed. Or maybe not.

paulscottrobson · Post by **paulscottrobson** » Thu Apr 01, 2021 12:00 am

12 hours ago, Scott Robison said:

Please note I'm not trying to denigrate your work at all. I'm just trying to think of ways to:

1. Write a native interpreter;

2. That does a more sophisticated tokenization / crunching process than v2 BASIC;

3. That still keeps around the original form of the source code so that it can be listed and edited;

Oh, that's pretty much what it does. It just detokenises the tokenised code ?

There are two main handicaps. The first which we are stuck with is 32 bit (or 16 bit if you like) multiply and divide. Float or Int, you have the same problem, it still has to be done. Short of having a hardware multiplier (can live without divide mostly) there isn't really an answer to it.

The second is accessing variables and other identifiers fast enough. Mine uses several techniques : A-Z are done seperately as on the BBC Micro. Rather than one linked list there's 8 linked liists accessed via hashes. Checks on hashes are done before checks on the full name, and the name itself is tokenised so it doesn't have to be parsed, it's easy to find the type and the end and so on. Whether doing what some of the semicompiling BASICs do (have you ever looked at DAI Basic ?) in the way of storing identifier addresses inline in the code is worth the extra effort is debatable. You might get another 20-30% with such techniques (I wrote a language called RPL which is rather like a cross between BASIC and FORTH and does most of these things) its debatable whether it matters. The extra step to get the speed required isn't there. It's not even there if you compile to pseudocode. Compiling to assembler has the usual problems you have with the 6502 e.g. 16 bit operations are very verbose.

I don't think having RPN already available gains you an awful lot in processor time.

I always reckoned you could produce a fairly easy to use programming system if you had a fast CPU, or lots of directly available memory. At the moment you are stuck between two stools. If you generate 6502 code it's too long. If you generate p-code it's too slow. I did have an experimental compiler that did both, and could be switched on request following the 90/10 rule, but the language was quite limited and to expand that into a full programming language is quite a task. Even normally efficient things like FORTH do not have the speed they really need for this machine.

The other problem is if you store it in dual format you effectively halve the RAM, and there isn't that much of it unless you stick it in banked memory.