Page 2 of 2
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Wed May 31, 2023 5:00 pm
by StephenHorn
The vast majority of the code is the various list add, remove, allocate, and free functions. It semi-helpfully shows the start point of each procedure with a ".proc" label. At the bottom is a main() function that does exactly what was in my example use case. This is literally what cc65 generated on Godbolt.org, minus a few lines of non-relevant boilerplate assembly.
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Wed May 31, 2023 6:08 pm
by StephenHorn
But let's see how a human can do better. For "list_remove_avail_foo", the function to remove an index from the list started at avail_head for some linked list of indices named "foo", cc65 generated this:
Code: Select all
.proc _list_remove_avail_foo: near
jsr pusha
ldy #$00
ldx #$00
lda (sp),y
bpl L0003
dex
L0003: jsr pushax
ldx #$00
lda _avail_head_foo
bpl L0004
dex
L0004: jsr toseqax
jeq L0002
lda #<(_next_foo)
ldx #>(_next_foo)
ldy #$00
clc
adc (sp),y
bcc L0007
inx
L0007: ldy #$00
jsr ldaidx
jsr pushax
ldx #$00
lda _avail_head_foo
bpl L0008
dex
L0008: jsr toseqax
jeq L0005
ldx #$FF
lda #$FF
sta _avail_head_foo
jmp L0002
L0005: lda #<(_next_foo)
ldx #>(_next_foo)
ldy #$00
clc
adc (sp),y
bcc L000B
inx
L000B: ldy #$00
jsr ldaidx
sta _avail_head_foo
L0002: lda #<(_next_foo)
ldx #>(_next_foo)
ldy #$00
clc
adc (sp),y
bcc L000D
inx
L000D: ldy #$00
jsr ldaidx
clc
adc #<(_prev_foo)
tay
txa
adc #>(_prev_foo)
tax
tya
jsr pushax
lda #<(_prev_foo)
ldx #>(_prev_foo)
ldy #$02
clc
adc (sp),y
bcc L000F
inx
L000F: ldy #$00
jsr ldaidx
ldy #$00
jsr staspidx
lda #<(_prev_foo)
ldx #>(_prev_foo)
ldy #$00
clc
adc (sp),y
bcc L0011
inx
L0011: ldy #$00
jsr ldaidx
clc
adc #<(_next_foo)
tay
txa
adc #>(_next_foo)
tax
tya
jsr pushax
lda #<(_next_foo)
ldx #>(_next_foo)
ldy #$02
clc
adc (sp),y
bcc L0013
inx
L0013: ldy #$00
jsr ldaidx
ldy #$00
jsr staspidx
jsr incsp1
rts
When it started from this:
Code: Select all
void list_remove_avail_foo(signed char elem)
{
if(elem == avail_head_foo) {
if(next_foo[elem] == avail_head_foo) {
avail_head_foo = -1;
} else {
avail_head_foo = next_foo[elem];
}
}
prev_foo[next_foo[elem]] = prev_foo[elem];
next_foo[prev_foo[elem]] = next_foo[elem];
}
In particular, we can see that cc65 was willing to lda a single-char global, but any time it was trying to grab a value from an array, it was falling back on 16-bit pointer arithmetic to calculate the address it should load from. Surely, it could put signed char elem into one of the index registers, such as Y, and then lda _next_foo,y rather than attempt a 16-bit add by loading #<(_next_foo) and #>(_next_foo) into registers A and X and then performing an adc/bcc/inx to perform a 16-bit add.
Sorry I'm not doing more to describe my thought process here, I'm "just translating the C into 65C02 ASM", line by line, and starting out with a fairly naive approach to show just how badly cc65 is handling this:
Code: Select all
.proc list_remove_avail_foo ; list_remove_avail_foo(signed char elem) {
tay ; We're going to preserve signed char elem in the Y register because we frequently need it for indexing.
; if(elem == avail_head_foo) {
cmp avail_head_foo ; signed char elem is already in A register.
bne avail_head_handled
; if(next_foo[elem] == avail_head_foo) {
lda next_foo,y
cmp avail_head_foo
bne advance_avail_head
; avail_head_foo = -1;
lda #$FF
sta avail_head_foo
bra avail_head_handled
; } else {
advance_avail_head:
; avail_head_foo = next_foo[elem];
sta avail_head_foo ; next_foo[elem] is already in A register
; We can fall-through to avail_head_handled.
; }
; }
avail_head_handled:
; prev_foo[next_foo[elem]] = prev_foo[elem];
lda prev_foo,y ; A now contains prev_foo[elem]
ldx next_foo,y ; X now contains next_foo[elem]
sta prev_foo,x
; next_foo[prev_foo[elem]] = next_foo[elem];
tay ; We no longer need signed char elem, so we'll put prev_foo[elem] in Y.
txa ; We need A for storing to absolute address plus index, so we'll put next_foo[elem] in A
sta next_foo,y
rts
}
...sans most of the C-comments, it would look like this:
Code: Select all
.proc list_remove_avail_foo ; list_remove_avail_foo(signed char elem)
tay ; We're going to preserve signed char elem in the Y register because we frequently need it for indexing.
cmp avail_head_foo ; signed char elem is already in A register.
bne avail_head_handled
lda next_foo,y
cmp avail_head_foo
bne advance_avail_head
lda #$FF
sta avail_head_foo
bra avail_head_handled
advance_avail_head:
sta avail_head_foo ; next_foo[elem] is already in A register
; We can fall-through to avail_head_handled.
avail_head_handled:
lda prev_foo,y ; A now contains prev_foo[elem]
ldx next_foo,y ; X now contains next_foo[elem]
sta prev_foo,x
tay ; We no longer need signed char elem, so we'll put prev_foo[elem] in Y.
txa ; We need A for storing to absolute address plus index, so we'll put next_foo[elem] in A
sta next_foo,y
rts
}
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Wed May 31, 2023 6:22 pm
by kliepatsch
Wow, thanks guys for a lot of good ideas for dynamic data structures!
I wish I had known the "put free elements into a list" trick earlier. Now I have already implemented something similar using a bit field to mark free and used elements.
Also I am really liking the idea of using arrays of integers instead of pointers to other elements. What I am still unsure about, though, is whether this approach leads to advantageous performance when the list elements are scattered across different RAM banks and there can be more than 256 elements. I think that's when pointers and/or 16 bit arithmetic will become inevitable for the 65C02, right? The indexed absolute addressing won't work anymore here.
(I am talking about pointers in a broader sense here. My implementation, for example, stores the bank number and the high byte of the address, as all elements are aligned with the 256 byte "pages". The low byte of the address can be omitted)
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Wed May 31, 2023 6:48 pm
by svenvandevelde
StephenHorn wrote: ↑Wed May 31, 2023 3:45 pm
I would be genuinely curious to know what KickC generates.
Granted, this is before enabling optimization flags, but even with optimizations enabled I think cc65 is way too eager to fall into 16-bit "pointer arithmetic" even though we just have a bunch of global arrays. I'll admit that I think a human could still do
a lot better, since we're dealing with arrays of signed bytes and a list size of only 100 elements.
Have a look a the first post in the thread. I have added attachments with the example program .prg (= commander X16 executable) and the generated .asm code of the double linked list demo.
The .asm code contains the C source statements in comments. Note that this was without optimization and the conio library needs further optimization. But it gives you an idea. I would recommend you carefully check the add and delete functions.
Sven
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Wed May 31, 2023 6:54 pm
by StephenHorn
If you need more than 255 elements*, then it makes less sense, especially with cc65, to avoid pointers and traditional linked lists. It's 16-bit math either way, and you can't so easily take advantage of instructions that use the CPU's index registers.
But I would reiterate that the dependency here is the number of elements on the list, not the size of those elements or their locations. If you want to retrieve a value within an object at an allocated index, then you still have to calculate the address of that value. That will vary in complexity depending on how your data is organized. If you can organize the data into parallel arrays of at most 255 elements, then you can stow the index in the Y register and reduce the cost of reading any given byte to `lda array_address,Y`.
* By switching from signed chars to unsigned chars, this is the hypothetical upper limit before we lose the ability to have a "NULL" special value to indicate an empty list and would need to swap to another strategy or a larger size type.
Re: KICK C and the 65C02 CPU - A double linked list without pointers.
Posted: Thu Jun 01, 2023 12:19 pm
by svenvandevelde
I use the trick of "segmentation". So when there are more data elements than that can fit into a 255 element list, i define multiple of such lists identifiable by a segment id. In such way, absolute indexed addressing is used but this only works if you can clearly segregate the element types to be put in those different lists.