7 hours ago, Jeffrey said:
I start measuring time when the 3D part starts (the word "Oxygene" in yellow).
I was looking at the audio.asm and noticed that your IRQ handler uses PHA, TXA, PHA, TYA, PHA, and not just direct PHA PHX PHY - not that 4 cycles per frame is going to make a huge difference, but I think the initial portion of the ROM handler has already done the PHA PHX PHY before doing the JSR ($0314) so you can get away without even doing that, and the Kernal pops them back as well, so you don't need to spend the cycles saving the CPU registers at all - unless I'm missing something. Also, if not all of the Kernal's per-VBLANK routines are needed, then you could just JSR the ones you need in your own handler and not JMP to the Kernal's handler at all (for instance, you could skip the KBD/Joystick polling for a decent number of clock cycles back.)
Not sure how these savings would stack up in the grand scheme, but it might end up being several frames of time over the course of the entire demo run.
Edit: Although, I don't know if giving the main program a few hundred extra cycles would make any difference. I just thought I'd mention some potential "free" savings by disabling kernal routines that you don't need in case it helps. Awesome job, man!