New productivity upload: File based assembler

Chat about anything CX16 related that doesn't fit elsewhere
User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New productivity upload: File based assembler

Post by desertfish »




File based assembler




View File






File-based assembler.  Requires r39 or newer

Source code and list of features is here https://github.com/irmen/cx16assem

The instructions are fairly self-explanatory, a simple manual will come later



 






 
Stefan
Posts: 456
Joined: Thu Aug 20, 2020 8:59 am

New productivity upload: File based assembler

Post by Stefan »


Nice progress @desertfish!

I mostly works as expected when I made my own simple "hello world".

I also put the string at the end of the code as in your own hello world test. I tried to load each character in the loop with lda message,x and lda message,y, but that failed. Apparently the assembler did not recognize a label defined later in the code. It worked fine when I changed "message" to a fixed hexadecimal address.

If symbols with 32 characters would become too heavy, you could always do what's been done in other languages, for instance Forth.


  • Store only the first few characters of a symbol in the symbol table


  • Also store some other metadata in the symbol table, for example the length and/or a checksum, thereby minimizing false duplicates


  • This could save space and speed up assembly


User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New productivity upload: File based assembler

Post by desertfish »


Hi thanks for trying it out. Can you post the failing program? Because it should be able to deal with undefined symbols 

Stefan
Posts: 456
Joined: Thu Aug 20, 2020 8:59 am

New productivity upload: File based assembler

Post by Stefan »



* = $8000

 


CHROUT = $FFD2

 


LDY #0



LOOP:



LDA MSG,Y



BEQ EXIT



JSR CHROUT



INY



BRA LOOP



EXIT:



RTS

 


MSG:



.STR "HELLO, WORLD"



.BYTE 0


User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New productivity upload: File based assembler

Post by desertfish »


I've fixed the issues in a new upload. Turned out it wasn't correctly handling absolute-indexed with symbols

About the symbol table: how does Forth deal with collisions? 

prefix+length is way to simple ( "name1" and "name2" will be the same entry) and adding a "checksum" will only work to a certain extent if you mean "hash", I suppose.  Still there is no guarantee that we don't have collisions.

Stefan
Posts: 456
Joined: Thu Aug 20, 2020 8:59 am

New productivity upload: File based assembler

Post by Stefan »


According to the book Starting Forth, the Forth-79 standard allowed symbol names of up to 31 characters. But some variants of Forth only stored three characters + the length of the symbol. "name1" and "name2" would then be the same, which is not ideal.

I don't suggest that you copy that approach as is. It's more an inspiration.

There's always the risk for collisions. Checksums/hashes are probably better than symbol length. Only storing the first three characters is probably too little.

User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New productivity upload: File based assembler

Post by desertfish »


Well, a correct hashtable implementation will deal with collisions (colision lists or other solution)

Not dealing with  possible collisions will result in extremely hard to track down bugs in your resulting machine code. Without any warning certain symbols suddenly will pick up the value of others... I can't imagine that's acceptable in forth either.  I assume it uses a trick to deal with this as well

BruceMcF
Posts: 1336
Joined: Fri Jul 03, 2020 4:27 am

New productivity upload: File based assembler

Post by BruceMcF »



6 hours ago, Stefan said:




According to the book Starting Forth, the Forth-79 standard allowed symbol names of up to 31 characters. But some variants of Forth only stored three characters + the length of the symbol. "name1" and "name2" would then be the same, which is not ideal.



I don't suggest that you copy that approach as is. It's more an inspiration.



There's always the risk for collisions. Checksums/hashes are probably better than symbol length. Only storing the first three characters is probably too little.



In the Forth's that stored 3 characters (eg, the original FIG Forths), they dealt with collisions by the first one defined was the one stored in the dictionary, "good grief, keep track of what you are doing, you idjit" ... similar to CMB Basic variable names except different length names with the same first three letters were also distinct ... in ANS Forth and successors, some implementations use hashing to speed up dictionary searches, but the entire name is stored.

Stefan
Posts: 456
Joined: Thu Aug 20, 2020 8:59 am

New productivity upload: File based assembler

Post by Stefan »



5 hours ago, desertfish said:




Well, a correct hashtable implementation will deal with collisions (colision lists or other solution)



Not dealing with  possible collisions will result in extremely hard to track down bugs in your resulting machine code. Without any warning certain symbols suddenly will pick up the value of others... I can't imagine that's acceptable in forth either.  I assume it uses a trick to deal with this as well



Unless you plan to allow redefined symbols, the assembler could throw an error when it encounters a duplicate definition, whether an actual duplicate or a false match. 

User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New productivity upload: File based assembler

Post by desertfish »


That is an interesting idea. Sometimes "good enough" is good enough and we can think of a different symbol name to satisfy the assembler.

Post Reply