Page 3 of 6

VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:27 pm
by svenvandevelde


On 10/3/2022 at 8:20 PM, ZeroByte said:




I think the main issue at hand is that the engine uses a heap management approach to allocating VRAM which comes with benefits and drawbacks, as with all design decisions. That's what makes this an engaging hobby. ?



You read my mind. This is an interesting hobby and you have understood my approach perfectly. Now ... since we are here ...

Check on the video the 48x48 images. These are in vram on a 64x64 bitmap. Waste of memory? This is beyond the heap management approach :-).

I have considered also the approach to draw multiple sprites and have some sort of "sprite painter" algorithm, that would paint the image to 4 or 9 sprites or so. But the issue then is the overhead of moving the sprites, on top of all the other logic.

Just dunno what is the best. Also i am thinking of a heap compactor/defragmenter, and an approach to position the 64x64 bitmaps from the top of the heap to the botton, so that fragmentation only occurs in the lower parts.

The issue that i have is that i paint too many image animations that cannot be fit into the vera vram at the same time. Obvious solution is to limit the amount of animation frames which i am implementing now.

But if there would be a 48x48 size, then i would be able to fit almost the double of animation frames in vram. Bascially i use vram as a cache, which is dynamically updated upon demand.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:28 pm
by ZeroByte

Alpha channels would be cool.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:30 pm
by svenvandevelde

One more note, since the cx16 doesn't have dma, every memory movement is down to the processor. Moving memory is fast, but moving 4096 bytes every frame is not an option (so i cannot use 64x64 in 8bpp) for animations, bottom line.

Note that the video shows animations of two 64x64 sprites floating.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:35 pm
by svenvandevelde


On 10/3/2022 at 8:28 PM, ZeroByte said:




Alpha channels would be cool.



Yeah, but if you think about it, not an option really. Alpha channels in a 16 color palette are hard. I mean, it woudl only make sense in 8bpp, right?

And then you are full resolution of sprites with full bit depth. So a 64x64 sprite would be 4096 bytes.

That is one sprite image, to be clear. So 0x1000 hex. If I want to animate 16 of those sprites, i would need 0x10000 or 65536 bytes in vram lol. 

 

That being said, alpha channels would be great to have for fire effects, like bullets, lighting, lasers etc. For thos bitmaps, which are naturally of smaller size, this would be a great feature to have.

The CX16 indeed has its limitations and we live with it. It is just finding the borders of the machine and using them optimally. That is the challenge, and it has been a real journey so far for me ...


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:46 pm
by ZeroByte

FWIW, using multiple sprites is pretty low overhead in the way I did it for Sonic Demo:

.struct SPRITEREG

    addr        .word    1

    xpos        .word    1

    ypos        .word    1

    orient        .byte    1

    attr        .byte    1

.endstruct

sonic_spriteregs:    ; shadow registers for Sonic sprites (initial values)

        ;sonic's body

        .byte $10, $08, <sonic_x, >sonic_x, <(sonic_y+8), >(sonic_y+8), $0c, $a0

        ;sonic's ears

        .byte $00, $08, <sonic_x, >sonic_x, <sonic_y, >sonic_y, $0c, $20



sonic_frames:    ; VRAM locations for frames of Sonic's animation (SPREG format)

    .word $0810, $0820, $0830, $0840

    .word $0800, $0804, $0800 $0804

animate_sonic:

            lda sonicframe

            inc

            and #3  ; sonic frame = 0..3

            sta sonicframe

            asl  ; use frame as X index (*2 because data stored as words, not separate HiByte / LoByte tables)

            tax

            lda    sonic_frames,x  ; sonic body address LoByte

            sta sonic_spriteregs + SPRITEREG::addr

            lda sonic_frames + 8,x  ; sonic ears address LoByte

            sta sonic_spriteregs + 8 + SPRITEREG::addr

            lda sonic_frames+1,x  ; sonic body address HiByte

            sta sonic_spriteregs + 1 + SPRITEREG::addr

            lda sonic_frames+9,x  ; sonic ears address HiByte

            sta sonic_spriteregs + 9 + SPRITEREG::addr

            lda    dirty

            ora    #DIRTY_SPRITE

            sta dirty    ; flag sprite address as dirty so VBLANK IRQ will update VERA

            rts

A similar approach in C would use something akin to this:

uint16_t sonic_frames[2][4] = { {0x810, 0x820, 0x830, 0x840} , {0x800, 0x804, 0x800, 0x804} };



Again, not saying "do this, n00b" - just sharing what I've done in case anyone else finds it useful or informative.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:50 pm
by Johan Kårlin


On 10/3/2022 at 2:37 PM, Wavicle said:




I will think on this. I think calculating row address with 48 pixels isn't too bad. E.g.:



row = y_counter - sprite_y_start;

if (mode) begin

    // 8bpp

    line_addr = sprite_offset + (row << 6) - (row << 4); // row * 64 - row * 16

end

else begin

    // 4bpp

    line_addr = sprite_offset + (row << 5) - (row << 3); // row * 32 - row * 8

end



Something along those lines should work. I probably need to wake up a bit more and reality check this against the Verilog. Another concern is breaking any existing software that uses 64x64 sprites.



Did you have a chance to look more into this afrer waking up? If the calculations aren’t too complicated, I think 48 pixels sprites is a good suggestion. Breaking existing software is not a problem as I see it. We all know we write software for a prototype.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:52 pm
by ZeroByte


On 10/3/2022 at 1:30 PM, svenvandevelde said:




One more note, since the cx16 doesn't have dma, every memory movement is down to the processor. Moving memory is fast, but moving 4096 bytes every frame is not an option (so i cannot use 64x64 in 8bpp) for animations, bottom line.



Note that the video shows animations of two 64x64 sprites floating.



So you're not animating by having all frames in VRAM and updating indexes, but by giving each object a VRAM allocation and each one updates its pixel data directly? That definitely would consume lots of CPU at scale w/o a DMA chip. That's how Sega did Sonic's animation, but the Genesis has DMA....


VERA and number 48 ...

Posted: Mon Oct 03, 2022 6:58 pm
by svenvandevelde

Just tested alpha channel in 16 colors in aseprite. It looks kinda ok like this ... the palette can deal with it.

Alpha.gif.623110ee74dbd8476f810b6426bc9f90.gif


VERA and number 48 ...

Posted: Mon Oct 03, 2022 7:00 pm
by svenvandevelde


On 10/3/2022 at 8:50 PM, Johan Kårlin said:




Did you have a chance to look more into this afrer waking up? If the calculations aren’t too complicated, I think 48 pixels sprites is a good suggestion. Breaking existing software is not a problem as I see it. We all know we write software for a prototype.



 

Thank you.


VERA and number 48 ...

Posted: Mon Oct 03, 2022 7:08 pm
by svenvandevelde


On 10/3/2022 at 8:52 PM, ZeroByte said:




So you're not animating by having all frames in VRAM and updating indexes, but by giving each object a VRAM allocation and each one updates its pixel data directly? That definitely would consume lots of CPU at scale w/o a DMA chip. That's how Sega did Sonic's animation, but the Genesis has DMA....



I have 2 things implemented in this algorithm ... A "least recently used cache or LRU cache", that monitors which was the vera image that was least recently used. The index (handle) pointing to the images are dynamically allocated in VRAM through a heap manager. So, when the drawing engine is trying to draw an image of a sprite, it checks if this sprite image is already in the lru cache.

If it is in the lru-cache, it will just re-use the image already in VRAM.

If it is not in the lru cache, it will loop ... until the required image can be put into vram. How?

It loops, freeing the last image in the lru cache (so the least recently used image) from vram. It deletes this entry from the lru cache and then frees the image from vram.

Then it tries to best-fit the new image in vram (it checks if there is space for it). If that best-fit search fails, (due to the least recently used image freed space made available being too small), it retries freeing the least recently used image from the lru cache and freeing vram of that image.

This until the image could be successfully best fitted in vram by the heap manager, and then the image is copied into the vram dynamically, and added to the lru cache as the most recently used image.

Images are copied from BRAM into VRAM using indeed, some sort of a copy funciton, which i worked on very hard to get it optimal. I still need to work on this copy module.

See below the lru cache core utilization logic for managing the images.


vera_sprite_image_offset sprite_image_cache_vram(fe_sprite_index_t fe_sprite_index, unsigned char fe_sprite_image_index) {



    // check if the image in vram is in use where the fe_sprite_vram_image_index is pointing to.



    // if this vram_image_used is false, that means that the image in vram is not in use anymore (not displayed or destroyed).

 


    unsigned int image_index = sprite_cache.offset[fe_sprite_index] + fe_sprite_image_index;

 


    // We retrieve the image from BRAM from the sprite_control bank.



    // TODO: what if there are more sprite control data than that can fit into one CX16 bank?



    bank_push_set_bram(fe.bram_sprite_control);



    heap_bram_fb_handle_t handle_bram = sprite_bram_handles[image_index];



    bank_pull_bram();

 


    // We declare temporary variables for the vram memory handles.



    lru_cache_data_t vram_handle;



    vram_bank_t vram_bank;



    vram_offset_t vram_offset;

 


    // We check if there is a cache hit?



    lru_cache_index_t vram_index = lru_cache_index(&sprite_cache_vram, image_index);



    lru_cache_data_t lru_cache_data;



    vera_sprite_image_offset sprite_offset;



    if (vram_index != 0xFF) {

 


        // So we have a cache hit, so we can re-use the same image from the cache and we win time!



        vram_handle = lru_cache_get(&sprite_cache_vram, vram_index);



         vram_bank = vera_heap_data_get_bank(VERA_HEAP_SEGMENT_SPRITES, vram_handle);



        vram_offset = vera_heap_data_get_offset(VERA_HEAP_SEGMENT_SPRITES, vram_handle);

 


        sprite_offset = vera_sprite_get_image_offset(vram_bank, vram_offset);



    } else {

 


        // The idea of this section is to free up lru_cache and/or vram memory until there is sufficient space available.



        // The size requested contains the required size to be allocated on vram.



        vera_heap_size_int_t vram_size_required = sprite_cache.size[fe_sprite_index];

 


        // We check if the vram heap has sufficient memory available for the size requested.



        // We also check if the lru cache has sufficient elements left to contain the new sprite image.



        bool vram_has_free = vera_heap_has_free(VERA_HEAP_SEGMENT_SPRITES, vram_size_required);



        bool lru_cache_not_free = lru_cache_max(&sprite_cache_vram);

 


        // Free up the lru_cache and vram memory until the requested size is available!



        // This ensures that vram has sufficient place to allocate the new sprite image.



        while(lru_cache_not_free || !vram_has_free) {

 


            // If the cache is at it's maximum, before we can add a new element, we must remove the least used image.



            // We search for the least used image in vram.



            lru_cache_key_t vram_last = lru_cache_last(&sprite_cache_vram);

 


            // We delete the least used image from the vram cache, and this function returns the stored vram handle obtained by the vram heap manager.



            vram_handle = lru_cache_delete(&sprite_cache_vram, vram_last);



            if(vram_handle==0xFFFF) {



                gotoxy(0,59);



                printf("error! vram_handle is nothing!");



            }

 


            // And we free the vram heap with the vram handle that we received.



            // But before we can free the heap, we must first convert back from teh sprite offset to the vram address.



            // And then to a valid vram handle :-).



            vera_heap_free(VERA_HEAP_SEGMENT_SPRITES, vram_handle);



            vram_has_free = vera_heap_has_free(VERA_HEAP_SEGMENT_SPRITES, vram_size_required);



        }

 


        // Now that we are sure that there is sufficient space in vram and on the cache, we allocate a new element.



        // Dynamic allocation of sprites in vera vram.



        vram_handle = vera_heap_alloc(VERA_HEAP_SEGMENT_SPRITES, (unsigned long)sprite_cache.size[fe_sprite_index]);



        vram_bank = vera_heap_data_get_bank(VERA_HEAP_SEGMENT_SPRITES, vram_handle);



        vram_offset = vera_heap_data_get_offset(VERA_HEAP_SEGMENT_SPRITES, vram_handle);

 


        memcpy_vram_bram(vram_bank, vram_offset, heap_bram_fb_bank_get(handle_bram), (bram_ptr_t)heap_bram_fb_ptr_get(handle_bram), sprite_cache.size[fe_sprite_index]);

 


        sprite_offset = vera_sprite_get_image_offset(vram_bank, vram_offset);



        lru_cache_insert(&sprite_cache_vram, image_index, vram_handle);



    }

 


    // We return the image offset in vram of the sprite to be drawn.



    // This offset is used by the vera image set offset function to directly change the image displayed of the sprite!



    return sprite_offset;



}