Page 1 of 2

KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Thu May 11, 2023 6:10 am
by svenvandevelde
So as you've seen in the previous articles, developing in C using pointers have the risk to produce code that is not well tuned for the 65C02 processor. The 65C02 is an 8 bit processor. Where possible, try to write code that produces code using absolute indexed addressing. To achieve this, use structures applying the method Structures of Arrays (SoA). Stick to the byte size boundaries.

So in this article we will show a concrete example of C code that produces a double linked list capability for the CX16, following the above design principles. You can find the code and executable attached, but let's walk through the code and with a few screenshots of debug output we can have a look how this works.

The double linked list has a concrete use case in my equinoxe game. I needed a capability to administer various lists for the various types of objects flying around on the screen, this frame per frame. And this frame aspect requires me to squeeze every cycle out of the CPU where I can. The equinoxe game has various types of objects, like a player space ship and later player controlled objects, various enemy ships, scenery objects, ground objects, bullets, explosions, engine exhausts, etc ... Each of these objects have code that models the moving and AI behaviour and have data properties attached. The generated code to administer all this logic, is deployed in banked memory, so each type of object code is located at a different RAM bank between A000 and BFFF on the CX16. Using kernal routine JSRFAR I'm calling for each type of object an "update" routine that calculates for each frame the new positions and actions of the object type.

By using the RAM banking, I have lot's of code space to model the behaviour of these objects and i'm not bound to the $0800 till $9F00 address range in low memory. In fact, I started my game like this and quickly realized that the low memory address space is rather limited. Unlike the C64 and C128, the upper memory address spaces cannot be "deactivated", especially the ROMs between $C000 and $FFFF. There is no shadow RAM underneath ROM address space, so hence RAM banking was the solution! Together with Jesper we developed a nice RAM banking capability in kickc, that allows you to compile your code banked and deploy it banked, but that explanation will come later how to use it.

The consequence of the RAM banking is that FAR calls need to be made to administer the positions of each object type. Imagine I have 1 player ship, but 32 enemies and 25 bullets in the air. The equinoxe game logic calls a dedicated update routine first for the player ship, then it calls the update routine for the enemy ships and then it calls the update routine for the bullets using JSRFAR kernal routine. So, instead of doing for each object a JSRFAR routine, i'm doing for each object type a JSRFAR call that then in encapsulation mode performs its logic. This saves me a lot of CPU expensive JSRFAR calls, however, that requires some mechanism to quickly be able to loop through the entries that describe the object types, and ONLY those object types.

Simply said, the CX16 has 128 sprites, of which 127 are usable for equinoxe (the sprite 0 is the mouse). Imagine I would use a simple array (been there....), that would be used by each update routine for each object type, which would simply loop through the array, validating if the object underneath is of the same type, and then perform the update logic. That means, that for players i would need to loop 127 times, for enemy ships i would need to loop 127 times and for bullets i would need to loop 127 times. Add to that equation object types for ground installations, explostions, scenary and the like, and you see quickly that the looping would cause a significant overhead to the game logic. It really counts, loop overheads are killing the frame per frame performance of the game engine. So it is important to have a design that minimizes such loop overheads!

I reworked the engine to use a simple double linked lists (plural), mapped on a SoA, with multiple root positions for each object type, having it's own linked list within the SoA. That allows me for each object type to simply iterate through the list of each object type, avoiding the problematic loop overhead!

Remember, we declare a structure that contains multiple arrays of maximum one byte length! We use where possible for each field a data type that is one byte large! This will result in absolute indexed addressing and will produce minimal code size and fast and functional code!

So below you find an example how such a list can be designed. I'm not showing the actual equinoxe game engine solution because that would really bring us too far. Let's first talk about the setup of the data structures and types.

We declare ELEMENTS, which denote a maximum of 64 elements within the SoA. Note that I've decided to use #defines instead of const as the const keyword is not the same as #define in C.

Code: Select all

#define ELEMENTS 64
We have 2 different object types, in this example we use ALPHA and GAMMA. This could also be modeled as an enumeration, but I've kept it simple. The number behind the object type name, depicts the index of the field that holds the root position within the linked list.

Code: Select all

#define ALPHA 0
#define GAMMA 1
We define a couple of data types, which model the one byte long index of each entry within the SoA, and a data type type, which models the object type ALPHA or GAMMA.

Code: Select all

typedef unsigned char data_index_t;
typedef unsigned char data_type_t;
So now that we have all constants and data types defined, below you find the actual SoA for the example. We declare a field called used, which models for each element within the SoA whether the element is in use or not. There is no malloc or free (more on that in sequent articles however ...), so we just define a SoA and use that memory space for the allocation of the object data! The field symbol will contain a one byte long char that we use for the demo to show the different allocated elements within the SoA. The field next and prev model the index to the next or previous element. The last element of next and prev would be NULL. We have multiple root types, which are administered by the field root. Each root type is indexed by a value of type data_type_t. The last field, the index, is used to add new elements to the linked list. It simply holds the last position where a new element was allocated and is used to loop through the SoA search for free (non-used) elements.

Code: Select all

typedef struct {
    char used[ELEMENTS];
    char symbol[ELEMENTS];
    char next[ELEMENTS];
    char prev[ELEMENTS];
    char root[4];
    char index;
} data_t;

Next, we allocate the memory space for the SoA, so we define a variable called data, which is globally defined in this example. Note that for absolute indexed addressing to be generated within the list manipulation functions, we have built the logic to use this one global variable.

Code: Select all

data_t data;
Before we jump into the explanation of each code, let us have a look at which functions have been designed to add and remove elements from the list ...

The function data_add will simply allocate a new element in the SoA of the given object type ALPHA or GAMMA, and will assign the char to the symbol field of the element allocated. It will adjust the next and prev fields accordingly and will adjust the root entries within the SoA if needed. Each new element will cause the root to be updated for the given object type!

The function data_remove will delete an element from the SoA for the given object type. This function will put symbol to a '-', adjust the next and prev indexes accordingly and adjusts the root position when needed for the given object type ALPHA or GAMMA.

Code: Select all

void main() {

    clrscr();
    cx16_k_screen_set_charset(3, (char*)0);

    char a = data_add(ALPHA, 'A');
    char b = data_add(ALPHA, 'B');
    char x = data_add(GAMMA, 'X');
    char y = data_add(GAMMA, 'Y');
    char c = data_add(ALPHA, 'C');
    char z = data_add(GAMMA, 'Z');

    data_remove(ALPHA, b);
    data_remove(ALPHA, c);
    data_remove(ALPHA, a);
....
}
Let's have a look at the add function, so you can see how easy it is to build such SoA driven lists...
Each new element is added at a free index within the SoA. The field used outlines whether a field us used or not. There is however one caveat! The value NULL! So are not using element 0 in the SoA, as indexes pointing to 0 should depict the end of the list. I've kept the NULL value as an actual value, in order to have code performance. 0 allows assembler instructions like BNE and BEQ to be used and avoids CMP instructions to be generated. And yes, this results in a little unused data overhead ...

So the function data_add accepts a data_type_t value ALPHA or GAMMA, which contains the index to the root field to be used to administer the root of each list within the SoA. The symbol will be assigned to the newly allocated element. The function returns the index pointing to the newly allocated element within the SoA.

data_index_t data_add(data_type_t type, char symbol) {

We iterate through the SoA fields, ensuring that we don't trespass the 64 index boundary!
The logic uses data.index to search for a free element, skipping the first element 0.

Code: Select all

    data.index %= ELEMENTS;

    while (!data.index || data.used[data.index]) {
        data.index = (data.index + 1) % ELEMENTS;
    }
    unsigned char i = data.index;
Once a new free element position has been found within the SoA, we adjust the indexes to build the double linked list. We always add the new element to the root position of the object type, and then adjust the root. Note that we can only set the prev index if there is a root. The value of root is NULL, if there is no element yet allocated within the list. But once a new element has been added, root will be updated with the index pointing to the newly added element within the SoA. This method allows to build "sub" linked lists, for each element type within the 64 elements available within the SoA, and does not require a great deal of code to do such!

Code: Select all

    data_index_t root = data.root[type];
    data.next[index] = root;
    data.prev[index] = NULL;
    if (root)
        data.prev[root] = index;
    data.root[type] = index;

Once we have the relevant indexes updated within the SoA, we update the symbol and we indicate the element is used.

Code: Select all

    data.used[index] = 1;
    data.symbol[index] = symbol;

     return index;
}
Now let us look at removing elements, using the function data_remove.
It will remove an element of the given object type ALPHA or GAMMA, by the given index.

Code: Select all

void data_remove(data_type_t type, data_index_t index) {
If the data element in the SoA is used, we flag it as not used.

Code: Select all

    if (data.used[index]) {
        data.used[index] = 0;
If the next element of root is NULL, we know this is the last element (root) to be removed!
So we set the root of the object type to NULL!

Code: Select all

        data_index_t root = data.root[type];
        if(!data.next[root]) {
            data.root[type] = NULL;
Else, we reposition the next and prev pointers (like any double linked list logic would do).
If we remove the root node, we must adjust the root field index in the SoA to the next index of the element being removed! This next index can be NULL, but that does not matter.

Code: Select all

        } else {
            data_index_t next = data.next[index];
            data_index_t prev = data.prev[index];
            if (next) {
                data.prev[next] = prev;
            }
            if (prev) {
                data.next[prev] = next;
            }
            if (root == index) {
                data.root[type] = next;
            }
        }
Once we have adjusted the indices of the next and prev elements, we clear the value and next and prev index of the removed element.

Code: Select all

        data.next[index] = NULL;
        data.prev[index] = NULL;
        data.symbol[index] = '-';
    }
}
The logic has no decent validations implemented for erroneous situations for performance reasons, this for both the add and remove functions.

The attached PRG contains a working logic, which would provide you with a view like this:

Image

The left side shows the situation before the list action and the right side the situation after the action has been processed. Making such small example programs allows me to quickly build a working logic that can be cut/pasted into my equinoxe game, and test it with all the use cases required.

Please have a look at the scenario unfolding as we run the attached prg file on the cx16.

We add 'A' to the list, element of ALPHA. We can see in the right columns that the indexes next and prev of A are NULL. Root ALPHA points to index 1. Also note that index 0 is not used in the list.

Image

We add 'B' to the list, element of ALPHA. Now you can see on the right columns that B has been added as index 2, and the next index of B points to 1, while the prev index of A points to 2. The prev index of B is NULL and the next index of A is NULL. The root ALPHA is set to 2, so the newly added element now becomes root!

Image

We add 'X' to the list, element of GAMMA. X has been added as index 3, and the next and prev indexes of X point to NULL. The root GAMMA is set to 3, and is the first element of GAMMA in the list.

Image


We add 'Y' to the list, element of GAMMA. Y has been added as index 4, and the next index of Y points to 3, while the prev index of X points to 4. The prev index of Y is NULL and the next index of X is NULL. The root GAMMA is set to 4, so the newly added element now becomes root! The ALPHA list is untouched!

Image

We add 'C' to the list, element of ALPHA. C has been added as index 5, and the next index of C points to 2 (B), while the prev index of B points to 5. The A element is untouched. The prev index of C is NULL and the next index of A is NULL. So now we have a chain of 3 elements C, B, A, following the next indexes. The root ALPHA is set to 5, so the newly added element now becomes root!

Image

We add 'Z' to the list, element of GAMMA. Z has been added as index 6, and the next index of Z points to 4 (Y), while the prev index of Y points to 6. The X element is untouched. The prev index of Z is NULL and the next index of X is NULL. The root GAMMA is set to 6, so the newly added element now becomes root! So now we have a chain of 3 elements Z, Y, X, following the next indexes of the GAMMA list.

Image

We remove 'B' from the list, element of ALPHA. The index 2 becomes free, the next index of C is adjusted to 1 (A), while the prev index of A is adjusted to 5 (C). For display reasons the symbol of B becomes '-'. Note that the root of ALPHA still remains 5, but the ALPHA list now has only 2 elements.

Image

Now we remove 'C' from the list, which is the root element of ALPHA. The index 5 (C) becomes free, the next and prev indexes of A are adjusted to NULL, as it is the only element left in the list. For display reasons the symbol of C becomes '-'. The root of ALPHA now becomes index 1.

Image

Now we remove 'A' from the list, which is the root element of ALPHA. The index 1 (A) becomes free. The next and prev indexes of A are adjusted to NULL. For display reasons the symbol of A becomes '-'. The root of ALPHA now becomes NULL, as there are no more elements in the list.

Image

The program and the C-code contains further use cases, but as you can see, this is how to build a fast double linked list using a simple structure, and no usage of pointers, which results in fast assembler code that uses absolute indexed addressing. This is the assembler for the data_add function:

Code: Select all

data_add: {
    .label data_add__5 = $16
    .label data_add__6 = $16
    .label i = $3b
    .label r = $28
    .label kbhit1_return = $14
    .label symbol = $2b
    .label return = $3b
    .label return_1 = $41
    .label return_2 = $40
    .label return_3 = $44
    .label return_4 = $43
    .label return_5 = $3f
    .label return_6 = $42
    .label type = $35
    // data_debug(0)
    // [187] call data_debug
    // [286] phi from data_add to data_debug [phi:data_add->data_debug]
    // [286] phi data_debug::c#11 = 0 [phi:data_add->data_debug#0] -- call_phi_near 
    lda #0
    sta.z data_debug.c
    jsr data_debug
    // data_add::@7
    // data.index %= ELEMENTS
    // [188] *((char *)&data+$104) = *((char *)&data+$104) & $40-1 -- _deref_pbuc1=_deref_pbuc1_band_vbuc2 
    lda #$40-1
    and data+$104
    sta data+$104
    // data_add::@2
  __b2:
    // while (!data.index || data.used[data.index])
    // [189] if(0==*((char *)&data+$104)) goto data_add::@3 -- 0_eq__deref_pbuc1_then_la1 
    lda data+$104
    beq __b3
    // data_add::@9
    // [190] if(0!=((char *)&data)[*((char *)&data+$104)]) goto data_add::@3 -- 0_neq_pbuc1_derefidx_(_deref_pbuc2)_then_la1 
    tay
    lda data,y
    cmp #0
    bne __b3
    // data_add::@4
    // unsigned char i = data.index
    // [191] data_add::i#0 = *((char *)&data+$104) -- vbuz1=_deref_pbuc1 
    tya
    sta.z i
    // data_index_t r = data.root[type]
    // [192] data_add::r#0 = ((char *)&data+$100)[data_add::type#15] -- vbuz1=pbuc1_derefidx_vbuz2 
    // p.r = 3 => f[3].n = 2, f[2].n = 1, f[1].n = -
    //         => f[3].p = -, f[2].p = 3, f[1].p = 2
    // Add 4
    // p.r = 4 => f[4].n = 3, f[3].n = 2, f[2].n = 1, f[1].n = -
    // p.r = 4 => f[4].p = -, f[3].p = 4, f[2].p = 3, f[1].p = 2
    ldy.z type
    lda data+$100,y
    sta.z r
    // data.next[i] = r
    // [193] ((char *)&data+$80)[data_add::i#0] = data_add::r#0 -- pbuc1_derefidx_vbuz1=vbuz2 
    ldy.z i
    sta data+$80,y
    // data.prev[i] = NULL
    // [194] ((char *)&data+$c0)[data_add::i#0] = 0 -- pbuc1_derefidx_vbuz1=vbuc2 
    lda #0
    sta data+$c0,y
    // if (r)
    // [195] if(0==data_add::r#0) goto data_add::@1 -- 0_eq_vbuz1_then_la1 
    lda.z r
    beq __b1
    // data_add::@5
    // data.prev[r] = i
    // [196] ((char *)&data+$c0)[data_add::r#0] = data_add::i#0 -- pbuc1_derefidx_vbuz1=vbuz2 
    tya
    ldy.z r
    sta data+$c0,y
    // data_add::@1
  __b1:
    // data.root[type] = i
    // [197] ((char *)&data+$100)[data_add::type#15] = data_add::i#0 -- pbuc1_derefidx_vbuz1=vbuz2 
    lda.z i
    ldy.z type
    sta data+$100,y
    // data.used[i] = 1
    // [198] ((char *)&data)[data_add::i#0] = 1 -- pbuc1_derefidx_vbuz1=vbuc2 
    lda #1
    ldy.z i
    sta data,y
    // data.symbol[i] = symbol
    // [199] ((char *)&data+$40)[data_add::i#0] = data_add::symbol#15 -- pbuc1_derefidx_vbuz1=vbuz2 
    lda.z symbol
    sta data+$40,y
    // data_debug(40)
    // [200] call data_debug
    // [286] phi from data_add::@1 to data_debug [phi:data_add::@1->data_debug]
    // [286] phi data_debug::c#11 = $28 [phi:data_add::@1->data_debug#0] -- call_phi_near 
    lda #$28
    sta.z data_debug.c
    jsr data_debug
    // [201] phi from data_add::@1 data_add::@6 to data_add::kbhit1 [phi:data_add::@1/data_add::@6->data_add::kbhit1]
    // data_add::kbhit1
  kbhit1:
    // data_add::kbhit1_cbm_k_clrchn1
    // asm
    // asm { jsrCBM_CLRCHN  }
    jsr CBM_CLRCHN
    // [203] phi from data_add::kbhit1_cbm_k_clrchn1 to data_add::kbhit1_@2 [phi:data_add::kbhit1_cbm_k_clrchn1->data_add::kbhit1_@2]
    // data_add::kbhit1_@2
    // cbm_k_getin()
    // [204] call cbm_k_getin -- call_phi_near 
    jsr cbm_k_getin
    // [205] cbm_k_getin::return#2 = cbm_k_getin::return#1
    // data_add::@8
    // [206] data_add::kbhit1_return#0 = cbm_k_getin::return#2
    // data_add::@6
    // while(!kbhit())
    // [207] if(0==data_add::kbhit1_return#0) goto data_add::kbhit1 -- 0_eq_vbuz1_then_la1 
    lda.z kbhit1_return
    beq kbhit1
    // data_add::@return
    // }
    // [208] return 
    rts
    // data_add::@3
  __b3:
    // data.index + 1
    // [209] data_add::$5 = *((char *)&data+$104) + 1 -- vbuz1=_deref_pbuc1_plus_1 
    lda data+$104
    inc
    sta.z data_add__5
    // (data.index + 1) % ELEMENTS
    // [210] data_add::$6 = data_add::$5 & $40-1 -- vbuz1=vbuz1_band_vbuc1 
    lda #$40-1
    and.z data_add__6
    sta.z data_add__6
    // data.index = (data.index + 1) % ELEMENTS
    // [211] *((char *)&data+$104) = data_add::$6 -- _deref_pbuc1=vbuz1 
    sta data+$104
    jmp __b2
}
Feel free to verbose in the comments if there are improvements possible to the code.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Sat May 13, 2023 7:51 pm
by Ed Minchau
It's stuff like this that makes me wish we still had the Like option.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Thu May 18, 2023 2:33 am
by ch4rl3sb
I did something similar, but rather than iterating through the SoA to find a free index to add a new element, I added deleted elements to a separate list of deleted items. You can just grab a new index instantly from this list. If it's empty, you grab the next index from 'free', (unregulated or uncounted) indexes. I sort it, slowly, one step each frame by checking for items in the deleted list that are out of order and moving them one step. That way, I'm usually grabbing a low index for a new item.
I keep a Total count of all items, ie. # of Ships, plus # of bullets, plus # of deleted items equals Total. if an item in the deleted list is the last, i.e. equal to Total, I release it to the 'free', indexes. (Also a 'slow' process, checking only one index each frame.)

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Tue May 30, 2023 7:29 pm
by svenvandevelde
Very interesting. Would you mind to elaborate a bit further on the technique u used? Maybe with an example?

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Tue May 30, 2023 11:09 pm
by StephenHorn
The studio I worked at once implemented a similar concept we called "object pools", in the sense that we broke out the traditional "next" and "prev" pointers into parallel arrays and changed the type from pointers to indices, allowing us to decouple the boilerplate for doubly-linked lists from the list elements and shave some memory while we were at it.

Of course, we had C++ and templates and were on modern platforms, so we had some advantages that helped make the code reusable. But I'd imagine something similar-ish could be achieved with macros in C.

Assuming you have a valid linked list, removing an element from C-based doubly-linked list normally looks like this:

Code: Select all

void remove(struct foo *elem, struct foo **head)
{
	if(elem == *head) {
		if(elem->next == *head) {
			*head = NULL;
		} else {
			*head = elem->next;
		}
	}
	elem->next->prev = elem->prev;
	elem->prev->next = elem->next;
	elem->next = NULL;
	elem->prev = NULL;
}
While adding to a double-linked list normally looks like this:

Code: Select all

void add(struct foo *elem, struct foo **head)
{
	if(*head == NULL) {
		*head = elem;
		elem->next = elem;
		elem->prev = elem;
	} else {
		elem->prev = *head->prev;
		elem->next = *head;
		*head->prev->next = elem;
		*head->prev = elem;
	}
}
If we have arrays of indices instead of pointers, we can just perform a substitution elem->next to next[elem], and similarly for the prev pointer. In this case, I'll use "-1" to indicate NULL, since a negative index is nonsensical/irrational:

Code: Select all

void remove(int elem, int *head)
{
	if(elem == *head) {
		if(next[elem] == *head) {
			*head = -1;
		} else {
			*head = next[elem];
		}
	}
	prev[next[elem]] = prev[elem];
	next[prev[elem]] = next[elem];
	next[elem] = -1;
	prev[elem] = -1;
}

void add(int elem, int *head)
{
	if(*head == -1) {
		*head = elem;
		next[elem] = elem;
		prev[elem] = elem;
	} else {
		prev[elem] = prev[*head];
		next[elem] = *head;
		next[prev[*head]] = elem;
		prev[*head] = elem;
	}
}
One nice part is that the identity/type of our elements has just disappeared from our implementation. As long as there is some array of those elements with the same number of elements as our "prev" and "next" index arrays, the type of those elements doesn't matter. In fact, your array of structs could even be a bunch of parallel arrays of primitives like with the SoA described by Sven. As long as there is the same number of elements in each array involved, this approach totally decouples the list management overhead from the list element type(s).

Further, if you can start with an empty list, and you know what elements are available to be added to it, you can build the list by just adding things at will. What elements are available, though?

Well, at program initialization, we know the entire possible array of elements are "unused". And our "add" function just clobbers the "next" and "prev" links for the added element, so we could can maintain a list of "available" elements by keeping a second head for the list of available elements, which is initialized and built with an initialization function:

Code: Select all

void init_list(int *used_head, int *avail_head, int size)
{
	*used_head = -1;
	*avail_head = -1;
	for(int i=0; i<size; ++i) {
		add(i, avail_head);
	}
}
Now, if we want to allocate an item from the list, we just remove it from the list at "avail_head" and add it to the list at "used_head":

Code: Select all

int alloc_index(int *used_head, int *avail_head)
{
	int index;	
	if(avail_head == -1) {
		return -1;
	}	
	index = avail_head;
	remove(index, avail_head);
	add(index, used_head);	
	return index;
}
And conversely, freeing an allocated item means removing it from "used_head" and adding it to "avail_head":

Code: Select all

void free_index(int index, int *used_head, int *avail_head)
{
	remove(index, used_head);
	add(index, avail_head);	
}
The only trick is that you have to take care never to double-free the same index. Ideally, don't do this. But if you have more system memory than faith, this is solvable with an array to indicate whether a particular element is allocated or not that is initialized to 0 and then updated in our calls to "alloc_index" and "free_index".

Code: Select all

void init_list(int *used_head, int *avail_head, char *allocated, int size)
{
	*used_head = -1;
	*avail_head = -1;
	for(int i=0; i<size; ++i) {
		add(i, avail_head);
		allocated[i] = 0;
	}
}

int alloc_index(int *used_head, int *avail_head, char *allocated)
{
	int index;	
	if(avail_head == -1) {
		return -1;
	}	
	index = avail_head;
	remove(index, avail_head);
	add(index, used_head);
	allocated[index] = 1;
	return index;
}

void free_index(int index, int *used_head, int *avail_head, char *allocated)
{
	if(allocated[index] == 0) {
		return;
	}
	remove(index, used_head);
	add(index, avail_head);
	allocated[index] = 0;
}
I'll try to come back to this thread later to discuss ways to optimize this and shave some unneeded work, and macroize it so that we have reusable snippets of code without asking the C compiler to deal with pointers.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Wed May 31, 2023 12:54 am
by ahenry3068
Damn... I'm so C averse, even though I kinda know how to read C..
It would be so ironic if targeting an 8 bit system got me writing C code.

from 1989 - 2012 my first reaction to useful C code was to
immediately translate it into Pascal.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Wed May 31, 2023 4:15 am
by StephenHorn
So, optimizing and macroizing my approach. Let's start with chopping off some of the unneeded work. If we were to look at the initialized list of elements, we'd noticed an interesting pattern emerge:

Code: Select all

next[i] =    1   2   3   4   5   ...  96  97  98  99   0
prev[i] =   99   0   1   2   3   ...  94  95  96  97  98
Neat. And we can observe that our used_head gets initialized to 0. So we can probably make our initialization loop faster by using this:

Code: Select all

void init_list(int *used_head, int *avail_head, char *allocated, int size)
{
	*used_head = 0;
	*avail_head = -1;
	for(int i=0; i<size; ++i) {
		next[i] = (i+1) % size;
		prev[i] = (i+size-1) % size;
		allocated[i] = 0;
	}
}
Or if modulo operations are slow for some reason (cough 6502 cough)...

Code: Select all

void init_list(int *used_head, int *avail_head, char *allocated, int size)
{
	*used_head = -1;
	*avail_head = 0;
	next[0] = 1;
	prev[0] = size-1;
	allocated[0] = 0;
	for(int i=1; i<size-1; ++i) {
		next[i] = i+1;
		prev[i] = i-1;
		allocated[i] = 0;
	}
	next[size-1] = 0;
	prev[size-1] = size-2;
	allocated[size-1] = 0;
}
We can also observe that since our use of add() and remove() always come in a matched set, where a remove() at an index is always followed by clobbering the next and prev elements at that index with a matching add(), we can remove the pedantic next[elem] = -1; prev[elem] = -1; from those functions.

Now, let's get rid of those pointers. This poses us a small problem, because we have two use cases for add() and remove(): One for the used_head, and one for the avail_head. We don't really have much choice except to basically have two copies of those functions, one set adding or removing from used_head, and the other adding or removing from avail_head.

But finally, we can macroize to solve at least one problem: Multiple lists. If we have more than one list, we're going to need to disambiguate which next, prev, used_head, avail_head, allocated, etc. are for which list. And we're potentially stuck with many copies of the list manipulation functions, which we don't want to do by hand. Fortunately, C-style macros can append to symbol names with the ## operator in the macro, and you can insert line breaks into macro definitions by ending lines with the backslash `\` character. So we can declare and define a unique set of list functions using parameters of a given name by wrapping them in a great big macro. So we can essentially wrap everything in a macro and start post-fixing a unique name to everything:

Code: Select all

#define LIST_IMPLEMENT(NAME, SIZE) \
int avail_head_##NAME; \
int used_head_##NAME; \
int next_##NAME[SIZE]; \
int prev_##NAME[SIZE]; \
char allocated_##NAME[SIZE]; \
void list_remove_avail_##NAME(int elem) \
{ \
	if(elem == avail_head_##NAME) { \
		if(next_##NAME[elem] == avail_head_##NAME) { \
			avail_head_##NAME = -1; \
		} else { \
			avail_head_##NAME = next_##NAME[elem]; \
		} \
	} \
	prev_##NAME[next_##NAME[elem]] = prev_##NAME[elem]; \
	next_##NAME[prev_##NAME[elem]] = next_##NAME[elem]; \
} \
void list_remove_used_##NAME(int elem) \
{ \
	if(elem == avail_head_##NAME) { \
		if(next_##NAME[elem] == used_head_##NAME) { \
			used_head_##NAME = -1; \
		} else { \
			used_head_##NAME = next_##NAME[elem]; \
		} \
	} \
	prev_##NAME[next_##NAME[elem]] = prev_##NAME[elem]; \
	next_##NAME[prev_##NAME[elem]] = next_##NAME[elem]; \
} \
\
void list_add_avail_##NAME(int elem) \
{ \
	if(avail_head_##NAME == -1) { \
		avail_head_##NAME = elem; \
		next_##NAME[elem] = elem; \
		prev_##NAME[elem] = elem; \
	} else { \
		prev_##NAME[elem] = prev_##NAME[avail_head_##NAME]; \
		next_##NAME[elem] = avail_head_##NAME; \
		next_##NAME[prev_##NAME[avail_head_##NAME]] = elem; \
		prev_##NAME[avail_head_##NAME] = elem; \
	} \
} \
\
void list_add_used_##NAME(int elem) \
{ \
	if(used_head_##NAME == -1) { \
		used_head_##NAME = elem; \
		next_##NAME[elem] = elem; \
		prev_##NAME[elem] = elem; \
	} else { \
		prev_##NAME[elem] = prev_##NAME[used_head_##NAME]; \
		next_##NAME[elem] = used_head_##NAME; \
		next_##NAME[prev_##NAME[used_head_##NAME]] = elem; \
		prev_##NAME[used_head_##NAME] = elem; \
	} \
} \
\
void init_list_##NAME() \
{ \
	int i; \
	used_head_##NAME = -1; \
	avail_head_##NAME = 0; \
	next_##NAME[0] = 1; \
	prev_##NAME[0] = SIZE-1; \
	allocated_##NAME[0] = 0; \
	for(i=1; i<SIZE-1; ++i) { \
		next_##NAME[i] = i+1; \
		prev_##NAME[i] = i-1; \
		allocated_##NAME[i] = 0; \
	} \
	next_##NAME[SIZE-1] = 0; \
	prev_##NAME[SIZE-1] = SIZE-2; \
	allocated_##NAME[SIZE-1] = 0; \
} \
\
int alloc_index_##NAME() \
{ \
	int index; \
	if(avail_head_##NAME == -1) { \
		return -1; \
	} \
	index = avail_head_##NAME; \
	list_remove_avail_##NAME(index); \
	list_add_used_##NAME(index); \
	allocated_##NAME[index] = 1; \
	return index; \
} \
\
void free_index_##NAME(int index) \
{ \
	if(allocated_##NAME[index] == 0) { \
		return; \
	} \
	list_remove_used_##NAME(index); \
	list_add_avail_##NAME(index); \
	allocated_##NAME[index] = 0; \
}
Now all we need is a cleaner way to access the list, but macros can solve this for us as well. We can even add a few helpful ones as well:

Code: Select all

#define LIST_INIT(NAME) init_list_##NAME()
#define LIST_ALLOC(NAME) alloc_index_##NAME()
#define LIST_FREE(NAME, INDEX) free_index_##NAME(INDEX)

#define LIST_IS_ALLOCATED(NAME, INDEX) (allocated_##NAME[INDEX] != 0)
#define LIST_FOR_EACH(NAME, INDEX_NAME) for(INDEX_NAME=used_head_##NAME; INDEX_NAME != -1; INDEX_NAME = (next_##NAME[INDEX_NAME] != used_head_##NAME ? next_##NAME[INDEX_NAME] : -1))
Of course, I've been using the type "int" for convenience in the examples. To make this list macro produce relatively nice and clean (and fast) 6502 assembly, just substitute "signed char" for all those ints:

Code: Select all

#define LIST_IMPLEMENT(NAME, SIZE) \
signed char avail_head_##NAME; \
signed char used_head_##NAME; \
signed char next_##NAME[SIZE]; \
signed char prev_##NAME[SIZE]; \
signed char allocated_##NAME[SIZE]; \
void list_remove_avail_##NAME(signed char elem) \
{ \
	if(elem == avail_head_##NAME) { \
		if(next_##NAME[elem] == avail_head_##NAME) { \
			avail_head_##NAME = -1; \
		} else { \
			avail_head_##NAME = next_##NAME[elem]; \
		} \
	} \
	prev_##NAME[next_##NAME[elem]] = prev_##NAME[elem]; \
	next_##NAME[prev_##NAME[elem]] = next_##NAME[elem]; \
} \
void list_remove_used_##NAME(signed char elem) \
{ \
	if(elem == avail_head_##NAME) { \
		if(next_##NAME[elem] == used_head_##NAME) { \
			used_head_##NAME = -1; \
		} else { \
			used_head_##NAME = next_##NAME[elem]; \
		} \
	} \
	prev_##NAME[next_##NAME[elem]] = prev_##NAME[elem]; \
	next_##NAME[prev_##NAME[elem]] = next_##NAME[elem]; \
} \
\
void list_add_avail_##NAME(signed char elem) \
{ \
	if(avail_head_##NAME == -1) { \
		avail_head_##NAME = elem; \
		next_##NAME[elem] = elem; \
		prev_##NAME[elem] = elem; \
	} else { \
		prev_##NAME[elem] = prev_##NAME[avail_head_##NAME]; \
		next_##NAME[elem] = avail_head_##NAME; \
		next_##NAME[prev_##NAME[avail_head_##NAME]] = elem; \
		prev_##NAME[avail_head_##NAME] = elem; \
	} \
} \
\
void list_add_used_##NAME(signed char elem) \
{ \
	if(used_head_##NAME == -1) { \
		used_head_##NAME = elem; \
		next_##NAME[elem] = elem; \
		prev_##NAME[elem] = elem; \
	} else { \
		prev_##NAME[elem] = prev_##NAME[used_head_##NAME]; \
		next_##NAME[elem] = used_head_##NAME; \
		next_##NAME[prev_##NAME[used_head_##NAME]] = elem; \
		prev_##NAME[used_head_##NAME] = elem; \
	} \
} \
\
void init_list_##NAME() \
{ \
	signed char i; \
	used_head_##NAME = -1; \
	avail_head_##NAME = 0; \
	next_##NAME[0] = 1; \
	prev_##NAME[0] = SIZE-1; \
	allocated_##NAME[0] = 0; \
	for(i=1; i<SIZE-1; ++i) { \
		next_##NAME[i] = i+1; \
		prev_##NAME[i] = i-1; \
		allocated_##NAME[i] = 0; \
	} \
	next_##NAME[SIZE-1] = 0; \
	prev_##NAME[SIZE-1] = SIZE-2; \
	allocated_##NAME[SIZE-1] = 0; \
} \
\
signed char alloc_index_##NAME() \
{ \
	signed char index; \
	if(avail_head_##NAME == -1) { \
		return -1; \
	} \
	index = avail_head_##NAME; \
	list_remove_avail_##NAME(index); \
	list_add_used_##NAME(index); \
	allocated_##NAME[index] = 1; \
	return index; \
} \
\
void free_index_##NAME(signed char index) \
{ \
	if(allocated_##NAME[index] == 0) { \
		return; \
	} \
	list_remove_used_##NAME(index); \
	list_add_avail_##NAME(index); \
	allocated_##NAME[index] = 0; \
}

#define LIST_INIT(NAME) init_list_##NAME()
#define LIST_ALLOC(NAME) alloc_index_##NAME()
#define LIST_FREE(NAME, INDEX) free_index_##NAME(INDEX)

#define LIST_IS_ALLOCATED(NAME, INDEX) (allocated_##NAME[INDEX] != 0)
#define LIST_FOR_EACH(NAME, INDEX_NAME) for(INDEX_NAME=used_head_##NAME; INDEX_NAME != -1; INDEX_NAME = (next_##NAME[INDEX_NAME] != used_head_##NAME ? next_##NAME[INDEX_NAME] : -1))
And a brief example of how this method is actually used where the rubber meets the road:

Code: Select all

LIST_IMPLEMENT(foo, 100);

int main(int argc, char **argv) 
{
	signed char zero, one, two, three, four, i;

	LIST_INIT(foo);

	zero = LIST_ALLOC(foo);
	one = LIST_ALLOC(foo);
	two = LIST_ALLOC(foo);
	three = LIST_ALLOC(foo);
	four = LIST_ALLOC(foo);

	LIST_FREE(foo, three);

	printf("Used foo indices:\n");
	LIST_FOR_EACH(foo, i) {
		printf("%d\n", i);
	}
	return 0;
}
I can't speak for how/if KickC compiles it, but Godbolt.org shows this compiling and generating pretty clean 6502 under cc65. Well, clean by cc65's standards. I'm sure certain places could be sped up with hand-rolled assembly.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Wed May 31, 2023 6:17 am
by svenvandevelde
Thank you very much! This is a very interesting technique to solve the problem and I will test this myself! Will also try this using kickc which for sure will compile this properly!

Amazing contribution to this thread! My hat off to you! It is such replies that really bring value to us all!

Sven

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Wed May 31, 2023 3:45 pm
by StephenHorn
I would be genuinely curious to know what KickC generates.

For reference, this is what Godbolt's trunk version of cc65 generates:

Code: Select all

.proc   _list_remove_avail_foo: near

        jsr     pusha
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0003
        dex
L0003:  jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0004
        dex
L0004:  jsr     toseqax
        jeq     L0002
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0007
        inx
L0007:  ldy     #$00
        jsr     ldaidx
        jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0008
        dex
L0008:  jsr     toseqax
        jeq     L0005
        ldx     #$FF
        lda     #$FF
        sta     _avail_head_foo
        jmp     L0002
L0005:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000B
        inx
L000B:  ldy     #$00
        jsr     ldaidx
        sta     _avail_head_foo
L0002:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000D
        inx
L000D:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_prev_foo)
        tay
        txa
        adc     #>(_prev_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L000F
        inx
L000F:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0011
        inx
L0011:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_next_foo)
        tay
        txa
        adc     #>(_next_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L0013
        inx
L0013:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        jsr     incsp1
        rts

.proc   _list_remove_used_foo: near

        jsr     pusha
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0003
        dex
L0003:  jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0004
        dex
L0004:  jsr     toseqax
        jeq     L0002
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0007
        inx
L0007:  ldy     #$00
        jsr     ldaidx
        jsr     pushax
        ldx     #$00
        lda     _used_head_foo
        bpl     L0008
        dex
L0008:  jsr     toseqax
        jeq     L0005
        ldx     #$FF
        lda     #$FF
        sta     _used_head_foo
        jmp     L0002
L0005:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000B
        inx
L000B:  ldy     #$00
        jsr     ldaidx
        sta     _used_head_foo
L0002:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000D
        inx
L000D:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_prev_foo)
        tay
        txa
        adc     #>(_prev_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L000F
        inx
L000F:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0011
        inx
L0011:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_next_foo)
        tay
        txa
        adc     #>(_next_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L0013
        inx
L0013:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        jsr     incsp1
        rts

.proc   _list_add_avail_foo: near

        jsr     pusha
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0003
        dex
L0003:  cmp     #$FF
        jsr     booleq
        jeq     L0002
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0004
        dex
L0004:  sta     _avail_head_foo
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0006
        inx
L0006:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0007
        dex
L0007:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0009
        inx
L0009:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L000A
        dex
L000A:  ldy     #$00
        jsr     staspidx
        jmp     L000B
L0002:  lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000D
        inx
L000D:  jsr     pushax
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _avail_head_foo
        bcc     L000F
        inx
L000F:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0011
        inx
L0011:  jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0012
        dex
L0012:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _avail_head_foo
        bcc     L0014
        inx
L0014:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_next_foo)
        tay
        txa
        adc     #>(_next_foo)
        tax
        tya
        jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0015
        dex
L0015:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _avail_head_foo
        bcc     L0017
        inx
L0017:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0018
        dex
L0018:  ldy     #$00
        jsr     staspidx
L000B:  jsr     incsp1
        rts

.proc   _list_add_used_foo: near

        jsr     pusha
        ldx     #$00
        lda     _used_head_foo
        bpl     L0003
        dex
L0003:  cmp     #$FF
        jsr     booleq
        jeq     L0002
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0004
        dex
L0004:  sta     _used_head_foo
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0006
        inx
L0006:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0007
        dex
L0007:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0009
        inx
L0009:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L000A
        dex
L000A:  ldy     #$00
        jsr     staspidx
        jmp     L000B
L0002:  lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000D
        inx
L000D:  jsr     pushax
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _used_head_foo
        bcc     L000F
        inx
L000F:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0011
        inx
L0011:  jsr     pushax
        ldx     #$00
        lda     _used_head_foo
        bpl     L0012
        dex
L0012:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _used_head_foo
        bcc     L0014
        inx
L0014:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_next_foo)
        tay
        txa
        adc     #>(_next_foo)
        tax
        tya
        jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0015
        dex
L0015:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        clc
        adc     _used_head_foo
        bcc     L0017
        inx
L0017:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0018
        dex
L0018:  ldy     #$00
        jsr     staspidx
L000B:  jsr     incsp1
        rts

.proc   _init_list_foo: near

        jsr     decsp1
        ldx     #$FF
        lda     #$FF
        sta     _used_head_foo
        ldx     #$00
        lda     #$00
        sta     _avail_head_foo
        ldx     #$00
        lda     #$01
        sta     _next_foo
        ldx     #$00
        lda     #$63
        sta     _prev_foo
        ldx     #$00
        lda     #$00
        sta     _allocated_foo
        ldx     #$00
        lda     #$01
        ldy     #$00
        sta     (sp),y
L0002:  ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0006
        dex
L0006:  sec
        sbc     #$63
        bvc     L0007
        eor     #$80
L0007:  asl     a
        lda     #$00
        ldx     #$00
        rol     a
        jne     L0005
        jmp     L0003
L0005:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000A
        inx
L000A:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L000B
        dex
L000B:  jsr     incax1
        ldx     #$00
        cmp     #$80
        bcc     L000C
        dex
L000C:  ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000E
        inx
L000E:  jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L000F
        dex
L000F:  jsr     decax1
        ldx     #$00
        cmp     #$80
        bcc     L0010
        dex
L0010:  ldy     #$00
        jsr     staspidx
        lda     #<(_allocated_foo)
        ldx     #>(_allocated_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0012
        inx
L0012:  jsr     pushax
        ldx     #$00
        lda     #$00
        ldy     #$00
        jsr     staspidx
        ldy     #$00
        ldx     #$00
        clc
        lda     #$01
        adc     (sp),y
        sta     (sp),y
        bpl     L0008
        dex
L0008:  jmp     L0002
L0003:  ldx     #$00
        lda     #$00
        sta     _next_foo+99
        ldx     #$00
        lda     #$62
        sta     _prev_foo+99
        ldx     #$00
        lda     #$00
        sta     _allocated_foo+99
        jsr     incsp1
        rts

.proc   _alloc_index_foo: near

        jsr     decsp1
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0003
        dex
L0003:  cmp     #$FF
        jsr     booleq
        jeq     L0002
        ldx     #$FF
        lda     #$FF
        jmp     L0001
L0002:  ldx     #$00
        lda     _avail_head_foo
        bpl     L0004
        dex
L0004:  ldy     #$00
        sta     (sp),y
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0005
        dex
L0005:  jsr     _list_remove_avail_foo
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0006
        dex
L0006:  jsr     _list_add_used_foo
        lda     #<(_allocated_foo)
        ldx     #>(_allocated_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0008
        inx
L0008:  jsr     pushax
        ldx     #$00
        lda     #$01
        ldy     #$00
        jsr     staspidx
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0009
        dex
L0009:  jmp     L0001
L0001:  jsr     incsp1
        rts

.proc   _free_index_foo: near

        jsr     pusha
        lda     #<(_allocated_foo)
        ldx     #>(_allocated_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0004
        inx
L0004:  ldy     #$00
        jsr     ldaidx
        cmp     #$00
        jsr     booleq
        jeq     L0002
        jmp     L0001
L0002:  ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0005
        dex
L0005:  jsr     _list_remove_used_foo
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0006
        dex
L0006:  jsr     _list_add_avail_foo
        lda     #<(_allocated_foo)
        ldx     #>(_allocated_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0008
        inx
L0008:  jsr     pushax
        ldx     #$00
        lda     #$00
        ldy     #$00
        jsr     staspidx
L0001:  jsr     incsp1
        rts

.proc   _main: near

        jsr     decsp6
        jsr     _init_list_foo
        jsr     _alloc_index_foo
        ldy     #$05
        sta     (sp),y
        jsr     _alloc_index_foo
        ldy     #$04
        sta     (sp),y
        jsr     _alloc_index_foo
        ldy     #$03
        sta     (sp),y
        jsr     _alloc_index_foo
        ldy     #$02
        sta     (sp),y
        jsr     _alloc_index_foo
        ldy     #$01
        sta     (sp),y
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0002
        dex
L0002:  jsr     _free_index_foo
        lda     #<(S0001)
        ldx     #>(S0001)
        jsr     pushax
        ldy     #$02
        jsr     _printf
        ldx     #$00
        lda     _used_head_foo
        bpl     L0007
        dex
L0007:  ldy     #$00
        sta     (sp),y
L0003:  ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0008
        dex
L0008:  cmp     #$FF
        jsr     boolne
        jne     L0006
        jmp     L0004
L0006:  lda     #<(S0002)
        ldx     #>(S0002)
        jsr     pushax
        ldy     #$02
        ldx     #$00
        lda     (sp),y
        bpl     L0011
        dex
L0011:  jsr     pushax
        ldy     #$04
        jsr     _printf
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000A
        inx
L000A:  ldy     #$00
        jsr     ldaidx
        jsr     pushax
        ldx     #$00
        lda     _used_head_foo
        bpl     L000B
        dex
L000B:  jsr     tosneax
        jeq     L000C
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000E
        inx
L000E:  ldy     #$00
        jsr     ldaidx
        jmp     L000F
L000C:  ldx     #$FF
        lda     #$FF
L000F:  ldx     #$00
        cmp     #$80
        bcc     L0010
        dex
L0010:  ldy     #$00
        sta     (sp),y
        jmp     L0003
L0004:  ldx     #$00
        lda     #$00
        jmp     L0001
L0001:  ldy     #$0A
        jsr     addysp
        rts
Granted, this is before enabling optimization flags, but even with optimizations enabled I think cc65 is way too eager to fall into 16-bit "pointer arithmetic" even though we just have a bunch of global arrays. I'll admit that I think a human could still do a lot better, since we're dealing with arrays of signed bytes and a list size of only 100 elements.

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Posted: Wed May 31, 2023 4:28 pm
by desertfish
that is... a LOT of code... what was the program being compiled?