New demo uploaded: STNICCC Commander X16 Demo Remake

All aspects of programming on the Commander X16.
Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by Jeffrey »




STNICCC Commander X16 Demo Remake




View File






This is the release of a STNICCC Demo Remake for the Commander X16!

I have been (silently) working on this for the last couple of weeks/months. It is time to release it :). Let's just say: the Commander X16 is far more powerful than I had thought! ?

Here is a video of it running:





Enjoy!

Regards,

Jeffrey

 

---

 

PS. There was an earlier attempt to remake this demo on the X16 (done by Oziphanto on youtube). Oziphanto did a very nice comparison video of the X16 with several other machines of the 8-bit and 16-era:





He also re-created this demo, but (in my opinion) did not do such a good job extracting everything out of the X16: his demo ran in 2:32. The remake I made does it in 1:39! ? ? 

His benchmark comparison should therefore be updated:

lap_times.png

Keep in mind the Commander X16 only has:

    - An 8-bit 6502 cpu (8MHz)

    - No DMA

    - No Blitter

Yet it keeps up with 16-bit machines like the Amiga! (actually its even faster right now)

---

Extra notes:

- This only works on the x16 emulator with 2MB of RAM

- It uses the original data (but its split into 8kb blocks, so it can fit into banked ram)

- Waaaayyy to much time is spend on the core-loop to make it perform *this* fast!

- My estimate is that it can be improved by another 10-15 seconds (I have a design ready, but it requires a re-write of the core-loop)

- It uses a "stream" of audio-file data and produces 24Khz mono sound (this will not work on the real x16, since loading the files that fast is a feature of the emulator only)

Here is a version without audio (so this should run on a real x16):





And it runs even faster (1:36:90) ? 






 
DrTypo
Posts: 18
Joined: Sat Apr 10, 2021 9:13 pm

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by DrTypo »


Very nice!

I have a few questions:

Which graphic mode do you use? 16c or 256c? The 16c requires less memory and bandwidth but having 2 pixels packed per byte is a bit of a PITA to handle?

I guess the VERA auto increment feature and the chunky pixels helps in out-performing the Atari bitplanes mode?

mobluse
Posts: 175
Joined: Tue Aug 04, 2020 2:16 pm
Location: Lund, Sweden
Contact:

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by mobluse »


I got it to run on my computer with dual core x86-64 and Windows 10 using: 

..\x16emu_win-r38\x16emu.exe -ram 2048 -run -prg STNICCC.PRG

 

Unfortunately my computer is not that fast so it only run in 50% speed.

 

It said the time was 1:39.96, but it was longer in reality.

X16&C64 Quiz: Try It Now! Huge Char Demo: Try It Now! DECPS: Try It Now! Aritm: Try It Now!
Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by Jeffrey »



28 minutes ago, DrTypo said:




Very nice!



I have a few questions:



Which graphic mode do you use? 16c or 256c? The 16c requires less memory and bandwidth but having 2 pixels packed per byte is a bit of a PITA to handle?



I guess the VERA auto increment feature and the chunky pixels helps in out-performing the Atari bitplanes mode?



This version uses 16 colors per pixel. The 8-bit version is a little slower, but not by much. The cost of packing 2 pixels per byte is quite high and in 256c mode you can "re-use" colors (by "rotating" the palette) so you don't have to clear the screen that much.

Attached is my (technical) design for the version I just released. It probably requires extra explanation, but it might give you an idea about the structure of the polygon drawing routine.

BTW: like Oziphanto explained: this demo (the original and this one) is drawing 2D-polygons. Its not really doing any real 3D math. So the demo shows how fast you can draw on a machine, not how fast you can do 3D math.

stniccc x16-Inner loop v1.png

Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by Jeffrey »



6 minutes ago, mobluse said:




I got it to run on my computer with dual core x86-64 and Windows 10 using: 



..\x16emu_win-r38\x16emu.exe -ram 2048 -run -prg STNICCC.PRG



 



Unfortunately my computer is not that fast so it only run in 50% speed.



 



It said the time was 1:39.96, but it was longer in reality.



Yeah. That time was recorded by hand and hardcoded in the demo ?

It might help if you run it again: the loading of the audio files can slow it down too. If you run it again, the files are probably in cache.

mobluse
Posts: 175
Joined: Tue Aug 04, 2020 2:16 pm
Location: Lund, Sweden
Contact:

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by mobluse »


When do start measuring the time? Is it from start of program or e.g. from when OXYGENE is shown?

 

It is a bit faster on second run: about 68% of full speed on my computer.

X16&C64 Quiz: Try It Now! Huge Char Demo: Try It Now! DECPS: Try It Now! Aritm: Try It Now!
User avatar
desertfish
Posts: 1096
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by desertfish »


When not streaming audio, is the streaming of the polygon draw data realistic to run this fast on actual hardware?

Amazing results btw.

Jeffrey
Posts: 62
Joined: Fri Feb 19, 2021 9:46 am

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by Jeffrey »



1 hour ago, desertfish said:




When not streaming audio, is the streaming of the polygon draw data realistic to run this fast on actual hardware?



Amazing results btw.



Yes. When audio is turned off, there is no "streaming" going on during the drawing of the polygons. A speed of 1:36:90 is achievable on real hardware. In fact (if I can find the time to implement the newer version) I believe a time of 1:30 (and probably 1:25) is possible. That's 20 fps!

In essence (when audio is off): the scene files are loaded at the beginning (into banked ram). The loading right now still uses the kernal LOAD-function and loads from the host (not from the simulated SD). So that part is faster than on real hardware. This loader can however be replaced by an SD-loader when you put it in a real X16. I didn't bother to do this (yet). But thats just initial loading time.

The playback is started when all polygon-data is loaded (640kb) into ram. After that there are is no loading going on. Also, I do not "touch" or "prep" the data before starting the playback (thats in the spirit of the competition).

More info about the original scene files and competition can be found here: http://arsantica-online.com/st-niccc-competition/

It's actually pretty crazy how much work the 6502 can do when you really keep improving your design: my first version was around 4 minutes ;). Now it does it so much faster. I can probably make an (instructive) video about what process I went through. ?

Specifics:

The auto-incrementer from VERA helps quite a lot to speed up the process: it takes only 2 cycles per pixel (= 1 "STA VERA_data0" per 2 pixels) to blit a horizontal line to the buffer. This is where the X16 is faster than other platforms.

Of course: packing 2 pixels into 1 byte (and unsetting/setting the incrementer in the mean time) is slower than other platforms (the setup cost for VERA takes quite some time). This is why (on the X16) there isn't that much time difference between 8-bit pixels vs 4-bit pixels.

As a sidenote: I "shrink/crop" the screen to 256/200 pixels. All polygon data (x and y coordinates) are between 0 and 255 and fit nicely in a byte. This suits an 8-bit cpu very well. But, VERA still uses a 320 pixel-wide screen buffer (even if you only see 256 pixels horizontally) so to determine the vram-address given an x and y is not very "elegant". Lots of work is done to mitigate the problems that arose from that. In this version I have several lookup tables. 

Below is my new design of the core loop btw. Its really nuts. It requires many variants of (slightly) different code. Has very intricate jump-tables (with 64k entries!). Switches banks constantly. Uses two ports of VERA etc (in two different ways). But it should be quite a lot faster! Using everything the X16 has got where it helps.

920056729_stnicccx16-Innerloopv2(advanced).thumb.png.28baa22292205b4d4d36f1c36d228d45.png

Edit: the forum degrades the diagram picture for me. I don't understand why it does that.

DrTypo
Posts: 18
Joined: Sat Apr 10, 2021 9:13 pm

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by DrTypo »


On my computer (core i7 8700k), on the second execution it does take 1:39 from OXYGENE to the end.

On the first execution there are slow downs.

Yes it would have been nice to have an actual 256 pixels wide screen mode on VERA.

User avatar
desertfish
Posts: 1096
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

New demo uploaded: STNICCC Commander X16 Demo Remake

Post by desertfish »


This is impressive to say the least ?

Post Reply