Prog8 language and compiler topic

desertfish · Post by **desertfish** » Fri Oct 09, 2020 10:35 pm

I don't have a preference for one or the other simply due to the fact that I am not knowledgeable enough about parser generators to know the differences ?

How would Prog8's grammar look like in a LALR 1 definition? (as compared to the current prog8.g4 spec) Why would it be preferable?

I've chosen Antlr because that looked to be a high performance, well supported, and broadly used parser generator for Java (and thus, Kotlin). I'm open for alternative better solutions though. The parser is in its own project, so you could make another one next to it. The biggest effort would be the mapping into the Ast nodes that the rest of the compiler uses. And perhaps dealing with parser errors.

desertfish · Post by **desertfish** » Fri Oct 09, 2020 11:57 pm

Here's a sneak preview of my conversion of text-elite so far. While building it I've been adapting the prog8 compiler itself too to fix bugs and add features to make it possible to implement all this. Seems like the procedural galaxy generation is all working -- next up the actual market and trading simulation.

BruceMcF · Post by **BruceMcF** » Sat Oct 10, 2020 3:07 pm

On 9/30/2020 at 10:07 PM, desertfish said:

Maybe the subroutine calling convention can be improved. The return value is now always processed via the evaluation stack which is not optimal and subroutines cannot be reentrant because they have static allocation of their parameters rather than a new stack frame for each call , but I don't consider the latter a real problem on 6502 machines. Just write a loop instead of recursion.

Though not ever use for a reentrant function can be handled by a loop.

However, if there is a specific "reentrant" call option, it could push the current parameters onto a frame stack and pop them when done, so calls that don't need re-entrance don't have to pay the overhead.

geek504 · Post by **geek504** » Sat Oct 10, 2020 8:31 pm

22 hours ago, desertfish said:

I don't have a preference for one or the other simply due to the fact that I am not knowledgeable enough about parser generators to know the differences ?

How would Prog8's grammar look like in a LALR 1 definition? (as compared to the current prog8.g4 spec) Why would it be preferable?

I've chosen Antlr because that looked to be a high performance, well supported, and broadly used parser generator for Java (and thus, Kotlin). I'm open for alternative better solutions though. The parser is in its own project, so you could make another one next to it. The biggest effort would be the mapping into the Ast nodes that the rest of the compiler uses. And perhaps dealing with parser errors.

LR produces a much more complex state-machine for the parser but is easier to write grammars because we don't have to worry about left recursion and left factoring. LR is also a faster parser than LL but if you were to write the parser yourself, LL parser is preferable and much easier to write, but who does that nowadays?

While there are a few LR parsers for Java, I could not find something specifically for Kotlin, and I didn't spend too much time trying to figure out the Java/Kotlin integration. SINCE your grammar is already done I wouldn't change it and ANTLR is a fine product.

Quote

LL or LR?

This question has already been answered much better by someone else, so I'm just quoting his news message in full here:

I hope this doesn't start a war...

First - - Frank, if you see this, don't shoot me. (My boss is Frank DeRemer, the creator of LALR parsing...)

(I borrowed this summary from Fischer&LeBlanc's "Crafting a Compiler")

Simplicity - - LL

Generality - - LALR

Actions - - LL

Error repair - - LL

Table sizes - - LL

Parsing speed - - comparable (me: and tool-dependent)

Simplicity - - LL wins

==========

The workings of an LL parser are much simpler. And, if you have to

debug a parser, looking at a recursive-descent parser (a common way to

program an LL parser) is much simpler than the tables of a LALR parser.

Generality - - LALR wins

==========

For ease of specification, LALR wins hands down. The big

difference here between LL and (LA)LR is that in an LL grammar you must

left-factor rules and remove left recursion.

Left factoring is necessary because LL parsing requires selecting an

alternative based on a fixed number of input tokens.

Left recursion is problematic because a lookahead token of a rule is

always in the lookahead token on that same rule. (Everything in set A

is in set A...) This causes the rule to recurse forever and ever and

ever and ever...

To see ways to convert LALR grammars to LL grammars, take a look at my

page on it:

http://www.jguru.com/thetick/articles/lalrtoll.html

Many languages already have LALR grammars available, so you'd have to

translate. If the language _doesn't_ have a grammar available, then I'd

say it's not really any harder to write a LL grammar from scratch. (You

just have to be in the right "LL" mindset, which usually involves

watching 8 hours of Dr. Who before writing the grammar... I actually

prefer LL if you didn't know...)

Actions - - LL wins

=======

In an LL parser you can place actions anywhere you want without

introducing a conflict

Error repair - - LL wins

============

LL parsers have much better context information (they are top-down

parsers) and therefore can help much more in repairing an error, not to

mention reporting errors.

Table sizes - - LL

===========

Assuming you write a table-driven LL parser, its tables are nearly half

the size. (To be fair, there are ways to optimize LALR tables to make

them smaller, so I think this one washes...)

Parsing speed - comparable (me: and tool-dependent)

--Scott Stanchfield in article <33C1BDB9.FC6D86D3@scruz.net> on comp.lang.java.softwaretools Mon, 07 Jul 1997.

Here are two BASIC grammars to compare, "jvm" is from ANTLR's repository.

c64_basic_bnf.txt jvmBasic.g4

desertfish · Post by **desertfish** » Sat Oct 10, 2020 10:53 pm

2 hours ago, geek504 said:

I could not find something specifically for Kotlin, and I didn't spend too much time trying to figure out the Java/Kotlin integration.

While there seems to be somewhat of a Kotlin target for Antlr, I went with the default Java target and mapped that to Kotlin classes myself. The translation of Antlr's generated AST classes into my own Kotlin AST classes is done mostly via the extension methods principle taken from https://tomassetti.me/parse-tree-abstract-syntax-tree/

In the end, I need my own custom AST Node classes anyway, so I *think* the current solution is adequate and having a native kotlin output from the parser generator doesn't give much extra benefits.

desertfish · Post by **desertfish** » Sun Oct 11, 2020 8:16 pm

I've just released Prog8 4.5 with a bunch of improvements and bugfixes.

This version is required to compile the TextElite game correctly. Yeah version 1.0 of my text-elite space trader sim conversion is done! It can be found in the Download section under Games! ?

@SerErris all planets, their descriptions and properties, and trade markets are procedurally generated like in the original

geek504 · Post by **geek504** » Wed Oct 14, 2020 10:33 pm

On 10/9/2020 at 6:57 PM, geek504 said:

After checking out most lexers/parsers it seems to me that Lark is the sweetest thing since apple pie!

So sorry @desertfish for my being severely side-tracked on helping you out with Prog8!

I just had to explore Lark a little bit... which ended up being a lot... I just wrote an Integer BASIC (aka Woz's Game BASIC) parser using Python and Lark in a single file with only 160 lines (36 lines for the BASIC program and 111 lines for the grammar) and it parses without error Woz's BASIC Breakout game! Beautiful syntax tree automatically generated... dare I go on and actually write the 65C02 assembly compiler for it?

LOL! I'm so tempted to do it and check the benchmark against Prog8 et al. ?

desertfish · Post by **desertfish** » Wed Oct 14, 2020 10:49 pm

No worries! There's nobody waiting for us ?

I'm quite curious how that would turn out to be. !

desertfish · Post by **desertfish** » Tue Oct 20, 2020 8:07 pm

I've just released Prog8 4.6 with again a bunch of improvements and bugfixes. This version is required to compile the latest TextElite game correctly, as it provides the new diskio library module that takes care of loading and saving data.

desertfish · Post by **desertfish** » Tue Nov 10, 2020 10:00 pm

I've just released Prog8 5.0

I decided to bump to a new major version because this one fundamentally changed the way arguments are passed to subroutines, and how return values are returned. It no longer uses the software-eval-stack for that. This usually results in much smaller code size for programs calling a lot of subroutines (with arguments).

Also the obligatory bugfixes ofcourse.