CX et TRSE et PROG8 et COMAL (et al) for X16

rje · Post by **rje** » Fri Feb 19, 2021 4:06 pm

From the Facebook chat, @TomXP411 was (idly?) thinking:

Quote

I've been pondering this a lot, and I'm thinking about a syntax that combines the best features of BASIC and C. The stack issue is fairly easily solved by create a soft-stack at the end of low RAM ($9EFF).

And if we designed this around PETSCII, we could avoid symbols like { and \ that aren't part of the PETSCII character set.

He gave it a tentative name of "CX", that is, something C-like for the X16.

His subsequent thought would be to alter Prog8, mainly to use PETSCII and get away from { } \.

Péteri András then suggested TRSE, which is a Pascal environment for small machines.

TRSE uses BEGIN and END to define its blocks, which are statements in their own right. So, for example, the IF THEN ELSE takes statements which are either single-line expressions or blocks.

I glanced at the source on github, and it looks like a large language... but the original intended target is the Commodore 64, so.

***

I note that languages like TRSE, COMAL, and S-BASIC all tend to resemble each other in capability, varying typically in syntactic sugar. That's probably because of the reality of programming on the Commodore: compact architecture, light memory footprint, and PETSCII dominance.

TomXP411 · Post by **TomXP411** » Fri Feb 19, 2021 9:42 pm

Heh. I didn't expect anyone would actually be interested in this, which is why I didn't post a topic here.

I've been pondering a syntax that's easy to write, yet also easy to parse. Since this would run on the Commander, we would want to keep the syntax as simple as possible. Ideally, the compiler would be self hosted (the CX compiler would be written in CX), which would probably require starting the job using cc65 or KickC.

So here are some ideas:

No line numbers.

Line endings: CR terminates a line of code. Use ← to continue a line when necessary (this is _ on a PC)

Comments: REM, or // for line comments. This keeps consistency with C and BASIC.

No maximum line length. Lines are CR terminated and use a continuation character.

Use BASIC keywords for the built-in commands and control structures.

No GOTO or line labels. This prevents a whole host of problems, including unrolling the stack if the programmer tries to GOTO out of a loop.

Embedded assembly is allowed, and will be needed for the most basic library functions.

increment and decrement operators: ++ and --

Increment-and-assign: += and -=

Support proper local, global, and static scope
- Global variables are available to all procedures. These are created in the data segment.
- Local variables are available only within the procedure being run. These are created on the stack.
- Static variables are local to a procedure and remembered from one call to the next. These are created in the data segment.
- There is no heap. Dynamic memory allocation must be managed by the program.

Variable assignments would use =. Comparisons would use ==

All subroutine and function calls would use parentheses for the argument list:
- PRINT("HELLO WORLD")
- D=DISTANCE(X1,Y1,X2,Y2)

Array references use [ ]

Variables would be explicitly declared, with atomic types covering 1,2, and 4 byte integers, floats, and arrays. Variable names may include A-Z, 0-9, shifted A-Z, and all C= characters. The PETSCII underscore is 164, which is allowed, but the ← symbol, which is _ in ASCII is not. (Because that's the line continuation character.)

Parameter passing would be done with a soft stack.
- The stack pointer would be handled by the compiler and stored in a zero page address.
- Calling functions would push arguments onto the stack, last argument first, and called functions would use the arguments directly from the stack location.
- Return value is pushed before the arguments, so the function can set the value while the arguments and locals are still in place.
- Local variables get created on the stack and are destroyed when the function terminates.
- Example: LEN = LENGTH(X,Y)

I have a more detailed list at https://docs.google.com/document/d/1fsc-8MQa8AL5_tJyGRbTXq8KV0mEFJZMgS51GVRBMgE/edit?usp=sharing

kelli217 · Post by **kelli217** » Fri Feb 19, 2021 11:35 pm

1 hour ago, TomXP411 said:

DIM TEXT AS STRING[80] // all strings are fixed length. A zero byte terminates the string and is implied in the declaration. (So STRING 80 actually reserves 81 bytes.)

...

*DIM STUFF[20] AS INTEGER // arrays are c style. STUFF[20] means you get indexes 0-19.

I note that you've used two different ways of declaring array types. What's the motivation behind that, and not DIM STUFF AS INTEGER[20] or else DIM TEXT[80] AS STRING instead? Is it to reinforce that the integer array will not have the same indexing?

Also, isn't #P1 only 4? It's a 4-byte array.

TomXP411 · Post by **TomXP411** » Sat Feb 20, 2021 12:34 am

1 hour ago, kelli217 said:

I note that you've used two different ways of declaring array types. What's the motivation behind that, and not DIM STUFF AS INTEGER[20] or else DIM TEXT[80] AS STRING instead? Is it to reinforce that the integer array will not have the same indexing?

Because right after point 13, I changed my mind, but forgot to remove point 13. Strings actually will need their own type, and so the length specifier on STRING is not an array dimension; it's an initializer to the STRING handler. So DIM S STRING(80) is a completely different thing than DIM S[80] STRING(n)

Strings will actually contain a length byte and a null terminator. So STRING(80) actually allocates 83 bytes of RAM. Byte 0 is the length (0-255 bytes) and the last byte of the structure is a 0. String manipulation functions will always honor the maximum length of the output string, to ensure that there are no buffer overruns. I expect most string functions will expect strings to be passed using the pointer, or @name, like this:

STRCAT(@DEST, @SRC)

So a string will actually be an implicit structure, consisting of:

string.MAX = the maximum length of the string.

string.LENGTH = the actual length

string.TEXT = the data itself

string.NULL = terminator byte.

I'm still hoping to create a dynamic string type later, but that will require adding a heap, garbage collection, and a few other things. That would probably come in a 2.0 version that includes pointers and indirection.

BruceMcF · Post by **BruceMcF** » Sat Feb 20, 2021 1:29 pm

I can see Basic keyword's for operations, but Basic keywords for control structures is killing a lot of opportunity for optimization.

IF: ... :ELSE: ... :THEN

BEGIN: ... :UNTIL()

BEGIN: ... :WHILE(): ... :REPEAT

FOR(): ... :NEXT()

desertfish · Post by **desertfish** » Sat Feb 20, 2021 3:37 pm

pascal like syntax is extremely verbose and uses long keywords, which will eat up precious memory and disk space on the x16. Just something to consider if you want to edit/process these files on the machine itself.

BruceMcF · Post by **BruceMcF** » Sun Feb 21, 2021 5:42 am

13 hours ago, desertfish said:

pascal like syntax is extremely verbose and uses long keywords, which will eat up precious memory and disk space on the x16. Just something to consider if you want to edit/process these files on the machine itself.

Disk space will certainly not be precious at the scale of keywords of eight bytes or less. And of course, once they've been tokenized, the original size of the text string won't matter any more.

Also note that in the above, the ":", " (" and ")" are implied in the parsing, so the space overhead over obscure pairs of punctuation characters is a net 20 bytes.

TomXP411 · Post by **TomXP411** » Sun Feb 21, 2021 10:31 am

20 hours ago, BruceMcF said:

I can see Basic keyword's for operations, but Basic keywords for control structures is killing a lot of opportunity for optimization.

IF: ... :ELSE: ... :THEN

BEGIN: ... :UNTIL()

BEGIN: ... :WHILE(): ... :REPEAT

FOR(): ... :NEXT()

I don't like it. My intent was to write compiler-friendly BASIC. This... is not BASIC.

BruceMcF · Post by **BruceMcF** » Sun Feb 21, 2021 11:07 am

36 minutes ago, TomXP411 said:

I don't like it. My intent was to write compiler-friendly BASIC. This... is not BASIC.

But as described, your only control structures are IF/THEN and floating point FOR ... NEXT, since in Basic V2, there ARE no keywords for any other control structures, they are all built with IF/THEN and GOTO, and you say there are no GOTO.

If you want it more Basickey,

DO: ... :UNTIL(): ... :LOOP

DO: ... :WHILE(): ... :LOOP

TomXP411 · Post by **TomXP411** » Sun Feb 21, 2021 6:30 pm

7 hours ago, BruceMcF said:

But as described, your only control structures are IF/THEN and floating point FOR ... NEXT, since in Basic V2, there ARE no keywords for any other control structures, they are all built with IF/THEN and GOTO, and you say there are no GOTO.

If you want it more Basickey,

DO: ... :UNTIL(): ... :LOOP

DO: ... :WHILE(): ... :LOOP

You didn’t read the full document, did you? There are three different loop structures, and all are taken straight from BBC BASIC.