KICK C and the 65C02 CPU - A double linked list without pointers.

All aspects of programming on the Commander X16.
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by StephenHorn »

The vast majority of the code is the various list add, remove, allocate, and free functions. It semi-helpfully shows the start point of each procedure with a ".proc" label. At the bottom is a main() function that does exactly what was in my example use case. This is literally what cc65 generated on Godbolt.org, minus a few lines of non-relevant boilerplate assembly.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by StephenHorn »

But let's see how a human can do better. For "list_remove_avail_foo", the function to remove an index from the list started at avail_head for some linked list of indices named "foo", cc65 generated this:

Code: Select all

.proc   _list_remove_avail_foo: near

        jsr     pusha
        ldy     #$00
        ldx     #$00
        lda     (sp),y
        bpl     L0003
        dex
L0003:  jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0004
        dex
L0004:  jsr     toseqax
        jeq     L0002
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0007
        inx
L0007:  ldy     #$00
        jsr     ldaidx
        jsr     pushax
        ldx     #$00
        lda     _avail_head_foo
        bpl     L0008
        dex
L0008:  jsr     toseqax
        jeq     L0005
        ldx     #$FF
        lda     #$FF
        sta     _avail_head_foo
        jmp     L0002
L0005:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000B
        inx
L000B:  ldy     #$00
        jsr     ldaidx
        sta     _avail_head_foo
L0002:  lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L000D
        inx
L000D:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_prev_foo)
        tay
        txa
        adc     #>(_prev_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L000F
        inx
L000F:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        lda     #<(_prev_foo)
        ldx     #>(_prev_foo)
        ldy     #$00
        clc
        adc     (sp),y
        bcc     L0011
        inx
L0011:  ldy     #$00
        jsr     ldaidx
        clc
        adc     #<(_next_foo)
        tay
        txa
        adc     #>(_next_foo)
        tax
        tya
        jsr     pushax
        lda     #<(_next_foo)
        ldx     #>(_next_foo)
        ldy     #$02
        clc
        adc     (sp),y
        bcc     L0013
        inx
L0013:  ldy     #$00
        jsr     ldaidx
        ldy     #$00
        jsr     staspidx
        jsr     incsp1
        rts
When it started from this:

Code: Select all

void list_remove_avail_foo(signed char elem)
{
	if(elem == avail_head_foo) {
		if(next_foo[elem] == avail_head_foo) {
			avail_head_foo = -1;
		} else {
			avail_head_foo = next_foo[elem];
		}
	}
	prev_foo[next_foo[elem]] = prev_foo[elem];
	next_foo[prev_foo[elem]] = next_foo[elem];
}
In particular, we can see that cc65 was willing to lda a single-char global, but any time it was trying to grab a value from an array, it was falling back on 16-bit pointer arithmetic to calculate the address it should load from. Surely, it could put signed char elem into one of the index registers, such as Y, and then lda _next_foo,y rather than attempt a 16-bit add by loading #<(_next_foo) and #>(_next_foo) into registers A and X and then performing an adc/bcc/inx to perform a 16-bit add.

Sorry I'm not doing more to describe my thought process here, I'm "just translating the C into 65C02 ASM", line by line, and starting out with a fairly naive approach to show just how badly cc65 is handling this:

Code: Select all

.proc list_remove_avail_foo 		; list_remove_avail_foo(signed char elem) {
	tay				; We're going to preserve signed char elem in the Y register because we frequently need it for indexing.

;	if(elem == avail_head_foo) {
	cmp avail_head_foo		; signed char elem is already in A register.
	bne avail_head_handled

;		if(next_foo[elem] == avail_head_foo) {
	lda next_foo,y
	cmp avail_head_foo
	bne advance_avail_head
	
;			avail_head_foo = -1;
	lda #$FF
	sta avail_head_foo
	bra avail_head_handled
	
;		} else {

advance_avail_head:
;			avail_head_foo = next_foo[elem];
	sta avail_head_foo 		; next_foo[elem] is already in A register
					; We can fall-through to avail_head_handled.
;		}
;	}

avail_head_handled:
;	prev_foo[next_foo[elem]] = prev_foo[elem];
	lda prev_foo,y ; A now contains prev_foo[elem]
	ldx next_foo,y ; X now contains next_foo[elem]
	sta prev_foo,x
		
;	next_foo[prev_foo[elem]] = next_foo[elem];
	tay ; We no longer need signed char elem, so we'll put prev_foo[elem] in Y.
	txa ; We need A for storing to absolute address plus index, so we'll put next_foo[elem] in A
	sta next_foo,y
	rts
}
...sans most of the C-comments, it would look like this:

Code: Select all

.proc list_remove_avail_foo 		; list_remove_avail_foo(signed char elem)
	tay				; We're going to preserve signed char elem in the Y register because we frequently need it for indexing.
	cmp avail_head_foo		; signed char elem is already in A register.
	bne avail_head_handled
	lda next_foo,y
	cmp avail_head_foo
	bne advance_avail_head
	lda #$FF
	sta avail_head_foo
	bra avail_head_handled

advance_avail_head:
	sta avail_head_foo 		; next_foo[elem] is already in A register
					; We can fall-through to avail_head_handled.
avail_head_handled:
	lda prev_foo,y 			; A now contains prev_foo[elem]
	ldx next_foo,y 			; X now contains next_foo[elem]
	sta prev_foo,x
	tay 				; We no longer need signed char elem, so we'll put prev_foo[elem] in Y.
	txa 				; We need A for storing to absolute address plus index, so we'll put next_foo[elem] in A
	sta next_foo,y
	rts
}
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by kliepatsch »

Wow, thanks guys for a lot of good ideas for dynamic data structures!

I wish I had known the "put free elements into a list" trick earlier. Now I have already implemented something similar using a bit field to mark free and used elements.

Also I am really liking the idea of using arrays of integers instead of pointers to other elements. What I am still unsure about, though, is whether this approach leads to advantageous performance when the list elements are scattered across different RAM banks and there can be more than 256 elements. I think that's when pointers and/or 16 bit arithmetic will become inevitable for the 65C02, right? The indexed absolute addressing won't work anymore here.

(I am talking about pointers in a broader sense here. My implementation, for example, stores the bank number and the high byte of the address, as all elements are aligned with the 256 byte "pages". The low byte of the address can be omitted)
User avatar
svenvandevelde
Posts: 488
Joined: Wed Dec 23, 2020 6:30 am
Location: Belgium, Antwerpen

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by svenvandevelde »

StephenHorn wrote: Wed May 31, 2023 3:45 pm I would be genuinely curious to know what KickC generates.


Granted, this is before enabling optimization flags, but even with optimizations enabled I think cc65 is way too eager to fall into 16-bit "pointer arithmetic" even though we just have a bunch of global arrays. I'll admit that I think a human could still do a lot better, since we're dealing with arrays of signed bytes and a list size of only 100 elements.
Have a look a the first post in the thread. I have added attachments with the example program .prg (= commander X16 executable) and the generated .asm code of the double linked list demo.
The .asm code contains the C source statements in comments. Note that this was without optimization and the conio library needs further optimization. But it gives you an idea. I would recommend you carefully check the add and delete functions.

Sven
KICKC home page by Jesper Gravgaard.
My KICKC alpha with Commander X16 extensions.
User avatar
StephenHorn
Posts: 565
Joined: Tue Apr 28, 2020 12:00 am
Contact:

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by StephenHorn »

If you need more than 255 elements*, then it makes less sense, especially with cc65, to avoid pointers and traditional linked lists. It's 16-bit math either way, and you can't so easily take advantage of instructions that use the CPU's index registers.

But I would reiterate that the dependency here is the number of elements on the list, not the size of those elements or their locations. If you want to retrieve a value within an object at an allocated index, then you still have to calculate the address of that value. That will vary in complexity depending on how your data is organized. If you can organize the data into parallel arrays of at most 255 elements, then you can stow the index in the Y register and reduce the cost of reading any given byte to `lda array_address,Y`.

* By switching from signed chars to unsigned chars, this is the hypothetical upper limit before we lose the ability to have a "NULL" special value to indicate an empty list and would need to swap to another strategy or a larger size type.
Developer for Box16, the other X16 emulator. (Box16 on GitHub)
I also accept pull requests for x16emu, the official X16 emulator. (x16-emulator on GitHub)
User avatar
svenvandevelde
Posts: 488
Joined: Wed Dec 23, 2020 6:30 am
Location: Belgium, Antwerpen

Re: KICK C and the 65C02 CPU - A double linked list without pointers.

Post by svenvandevelde »

I use the trick of "segmentation". So when there are more data elements than that can fit into a 255 element list, i define multiple of such lists identifiable by a segment id. In such way, absolute indexed addressing is used but this only works if you can clearly segregate the element types to be put in those different lists.
KICKC home page by Jesper Gravgaard.
My KICKC alpha with Commander X16 extensions.
Post Reply