Ideas about a video player for X16
Posted: Fri Jan 15, 2021 2:44 pm
I'm thinking about the XDC player (or '8088 domination') when writing this topic, if it's not clear. I'd like to discuss about how good the X16/VERA can be at video playback. First, let's compare the X16 with an IBM PC: the 8088 can needs more cycles to execute a command, but the 6502 does much less in one instruction, and it's 8-bit. Therefore I don't quite agree with the idea that the X16 would totally outperform a PC in general tasks. However, when it comes down to video playback, it's all about clock speed and cycle count. An LDA-STA-DEX-BNE cycle (which acts like a REP MOVSB command widely used in XDC) would cost 11 to 13 clock cycles (depending which kind of LDA you're using, as explained below), at 8MHz frequency, this would give us around 615KB/s to 727KB/s transfer rate. Well this is an ideal case and in reality we'll also be dealing with audio and resetting counter and stuff. However, it is arguable that we will able to get over 480KB/s at most times, which is double the limit of XDC. Since the author of XDC said that the graphics I/O is the ultimate bottleneck of the player (the MFM disk could only do 90KB/s but this can easily be solved by using a XT-IDE card instead), we can expect the X16 to do much better, if not double the fluency.
Second, the X16 has 256 colors from a 4096-color palette. which means don't need dithering to achieve the same effect of XDC. No dithering will result in more chunky data which is good for run-length compression. Plus, we can use only the STA-DEX-BNE cycle similar to a REP STOSB command. It needs only 9 clock cycles which will give us more than 1MB/s raw transfer speed!
What makes things better? The VERA chip! The auto increment feature allows us to implement dithering AND have chunky data (along with the benefits of run-length compression and higher transfer rate). This is because ordered dithering changes a single color into a pattern, which is usually 8 pixels long. Then we can set the auto increment value to 8 and fill in pixel 0,8,16... in the first run and pixel 1,9,17... in the second run etc. By doing so we can achieve good color, high compression rate and fluency simultaneously.
There are two ways to play the video: first, the player reads the video file and copy it to VRAM (that means we will use LDA [addr],X which is 4 cycles at least). The second is the player just JSRs to the video and 'execute' it (that means we can use LDA #value instead, a 2 cycle command). Despite being a bit slower, I believe the first is a better solution as users might be sharing videos online (probably on this forum) and the second method has essentially no way to protect the device from 'Trojan video'. You would definitely not want your movie erase all the files on your SD card, do you?