Page 1 of 1

A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 6:25 pm
by rje

Here's the fuzzy problem:  I'm slowly writing an interpreter for the X16, and I therefore need a symbol table.  HOWEVER, I can't really guarantee that I'll have much system RAM for neat data structures.  So I'm looking at the banked RAM to store my hashtable.  ALL of it if possible.

THAT has implications ... for example I don't think malloc() and free() are really going to work in banked RAM.

SO:

***

I'm trying a "cheap" banked string map implementation using a list.  The idea is to have a small, slow, symbol table (less than 1000 entries, spanning several RAM banks) with a minimum of management code (and a minimum of system RAM).  It doesn't even hash.

I understand that it will be slow and linear.  That's OK.

My thoughts go like this:

RAM bank memory can be chopped up into 256 segments like this:


typedef struct {

 


      unsigned char header;



      char data[31];

 


} BANK_LIST_SEGMENT;

 


BANK_LIST_SEGMENT* bank_list[256]; // = 8192 bytes = 1 bank.



 


Key:Value entries are segment-aligned:  the shortest mapping is 32 bytes, and the increment is 32 bytes, to a max of 288 bytes.  I'll try ways to optimize this -- for example, reducing the segment length to 24 bytes. Or scrapping it for dynamic memory management, since this is a solved problem... especially if code size doesn't really change....

I can run through the list and check the first byte of each segment.

If it's a zero, then that hunk of 32 bytes is unallocated.

If it's a printable PETSCII, then it's allocated and part of the current pair.

If it's 1 thru 31, then it's a new pair, and the key length is given (1-31 characters), so I know where the value string starts. (Or I can limit it to 30 chars, with a "31" indicating the start of the value string of the current pair for convenience.)

Keeping the key length known AND under 32 characters lets me do strncmp's safely (?).  I may have a problem with values if they don't end in zeros -- so I have to make sure then end in zero.  This may have length implications, especially if the zero intersects with byte zero of the next segment.  That means I'll have to store the length of the value string, which OK FINE.

 

 

Maybe this has already been done in a better way already.  I know hashes have been done-to-death in various ways.  Thoughts?


A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 6:43 pm
by SlithyMatt

Is there any particular reason why you aren't hashing the key strings? You don't have to do something as expensive as MD5 to make it useful for building an associative array.


A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 7:07 pm
by rje

I'm just being lazy.  I've got two other algorithms -- one stolen from Perl -- that create bona fide linked and hashed ... uh, hashes.  I just get the feeling that I can do this in a more primitive way in order to leverage banked RAM and totally minimize system RAM.  Maybe - ha!

 

 


A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 7:27 pm
by rje

I had thought about a true hashed structure, with pointer-like things.

A "pointer" could be broken down into two parts like this:


typedef struct {

 


int offset : 13; // inside bank (13 bits = 8192 bytes)



int bank : 3; // eight banks, max 64K RAM.

 


} BANKED_HASHMAP_POINTER;



 


So then, navigating through a BANKED linked list of hash entries is a matter of (a) setting the bank and then (b) heading to the offset (0xa000 + offset). 

Then two more bytes to hold a cheap hash value.

Four bytes overhead per pair isn't bad.  A thousand entries = half a bank is devoted to overhead.  Hmm well that doesn't sound great but meh.

The memory management code might be a little more involved.


A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 10:28 pm
by BruceMcF


On 4/5/2022 at 2:25 PM, rje said:




I'm trying a "cheap" banked string map implementation using a list.  The idea is to have a small, slow, symbol table (less than 1000 entries, spanning several RAM banks) with a minimum of management code (and a minimum of system RAM).  It doesn't even hash.



If less than 1000 entries, with a max of 30 character long keys, you can have linked lists of symbols with the same length (Bank & Segment in Bank pointer) with the first block having the first two segments dedicated to the heads of each list. That leaves four free bytes, so the first pair of bytes in the bank is the pointer to the first free segment and the second pair of bytes is the pointer to the last available segment. You know from the length of the symbol you are looking up which linked list to use.

Note that that if you have a page buffer for transitory things, you could have a byte length for a value field of up to 224 bytes, and even if it spans HighRAM segment boundaries you could still fetch the matching key and value into the buffer. The redundant two byte length field in the key could be used to store the length of the key and the length of the value.


A cheap list of bank key:value pairs (cc65)

Posted: Tue Apr 05, 2022 11:16 pm
by rje


On 4/5/2022 at 5:28 PM, BruceMcF said:




If less than 1000 entries, with a max of 30 character long keys, you can have linked lists of symbols with the same length (Bank & Segment in Bank pointer) with the first block having the first two segments dedicated to the heads of each list. That leaves four free bytes, so the first pair of bytes in the bank is the pointer to the first free segment and the second pair of bytes is the pointer to the last available segment. You know from the length of the symbol you are looking up which linked list to use.



Sounds a lot like how I managed virtual Commodore disk images.  Thanks for the good suggestion!