Does this seem eggregious to anyone besides me?

ZeroByte · Post by **ZeroByte** » Sat Nov 20, 2021 8:13 am

Okay - so in relation to a recent question regarding my zsound format and time resolutions supported, I stated that I was working out how to do divide 16bit number by 60 into a 16.8 fixed point value, so that I could make the files specify a tick rate in Hz. The typical default should be 60hz, but if someone wants to make a high-time-resolution file for some reason, the format supports it..... But here's the deal: my divide by 60 routine is huge - it added almost 200 bytes to the program and I presume that it takes a bit to execute. Fortunately, it only executes once when you tell the library to play a song, so it's not like it slows the player down.... but somehow it makes me cringe to have all of this code in there just for x/60.

Welcome to the world of 6502?

I'm sure Woz could do it in 18 lines of code that execute in -14 clock cycles, but I'm not Woz....

Here's the routine - any thoughts? Is this ridiculous and I should re-think my life, or is this just normal?

p.s. - if anyone can tell me how to tell the forum software to make this use highlighting and fixed-width font, I'd appreciate it (and fix this post)

.proc calculate_tick_rate: near

           ; X/Y = tick rate (Hz) - divide by 60 and store to zsm_steps

           ; use the ZP variable as tmp space



           value := r0           ; use kernal ZP r0 tmp space

           frac := r1           ; but with meaningful names here

           stx value

           sty value+1

           stz frac

            ; value >> 6 (value = Hz/64)

           ldx #6

           jsr rshift3

           ; set step = value.

           lda value

           sta step

           lda value+1

           sta step+1

           lda frac

           sta fracstep

           ; currently, step = value = Hz/64.

           ; To make step = Hz/60, It should now get value/15 added to it.

           ; We do value/15 by successively dividing value by 16 and adding

           ; the result to step, which approaches the correct value with

           ; each pass. After 4 passes, we will have exceeded the precision

           ; of 16.4 fixed point, so that's when to stop.

           ldx #4 ; 4 means >> 4

           jsr rshift3 ; still need to rotate all 3 bytes

           jsr add_step

           ; 1 pass complete.

           jsr rshift2 ; falls through to add_step

           jsr rshift2

           ; 3 passes complete. The last pass is shortest (only frac remains)

           lsr frac

           adc fracstep

           ; when doing the carry bit adds, save the results in zsm_steps.zsm_fracsteps

           ; because we're finally done.

           sta zsm_fracsteps

           lda step

           adc #0

           sta zsm_steps

           lda step+1

           adc #0

           sta zsm_steps+1

           rts



rshift3:   ; rshift 3 byte value (X = number of shifts)

           lsr value+1

           ror value

           ror frac

           dex

           bne rshift3

           rts



rshift2:

           ldx #4   ; we're always >>4 in this phase

:           lsr value

           ror frac

           dex

           bne :-

           ; fall through to add_step to save a little CPU time

           ; i.e. no RTS followed by JSR add_step.



add_step:

           ; note that the carry flag is set by the rshift routine and is

           ; important, so no CLC statement as in the normal ADC usage...

           lda frac

           adc fracstep

           sta fracstep

           lda value

           adc step

           sta step

           lda step+1

           adc #0   ; high byte of value is now guaranteed to be 0.

           sta step+1

           rts

p.s. : amusingly, this is post #487 for me, which would have been the model number of the math coprocessor of the 486 had it not been integrated into the CPU by default. How funny that this post would be about a math routine in assembly!

ZeroByte · Post by **ZeroByte** » Sat Nov 20, 2021 8:19 am

The obvious answer is: change the format such that the result of this computation is what's stored in the header so the player library doesn't have to go through this rigamarole - just copy from header and start playing. I guess that makes sense too - but just in case folks think it makes MORE sense to store the value as Hz, this is what kind of crap it takes (for me anyway) to translate the value into something useful on-system for playback. Looking forward to people's thoughts....

The reason I think raw HZ might make more sense is that it's less complicated to go from that to some other timing source - e.g. the VIA or something. Starting with "ticks-per-frame" means having to multiply by some strange ratio - but then again, if you're wanting to drive the sound at 300 updates per second, (still less than native VGM) then chances are good that such computations don't concern you much anyway....

kliepatsch · Post by **kliepatsch** » Sat Nov 20, 2021 9:17 am

My 2 cents, maybe looking at this from a different perspective: I remember seeing a video on YT where someone made a music player for the Commander that had serious timing issues. Now of course I think your player will be much more accurate. But I still think that forcing an arbitrary timing format onto a 60 Hz grid will introduce noticeable jitter. The error can be up to ± 8 milliseconds, so the jitter will range up to 16 milliseconds. Even when working with Concerto, I found the rounding errors of its time step (around 7 milliseconds, less than half the duration of VSync) annoying, but bearable. But I still preferred having the music be in sync with the grid rather than relying on rounding to the grid.

So if you allow for arbitrary step durations but intend to play back at 60 Hz, prepare for some noticeable up to annoying artifacts with the timing. If you still want to allow for that, then you would need to support it. However, if you don't want it, then this time step conversion is not necessary.

On the other hand, I like that you want to make it possible to play back at a different step, if I understood your post correctly. In that case, would you not need a more general conversion routine than dividing by 60?

desertfish · Post by **desertfish** » Sat Nov 20, 2021 10:43 am

Here's a 24 bit division routine from codebase64 that is an extended version of their 16 bit division routine (which I use in prog8), so I expect you can extend it once again to 32 bits.

It looks smaller than your code, but I didn't measure.

https://codebase64.org/doku.php?id=base:24bit_division_24-bit_result

Also here's another person with a different looking 32 bits division routine but arguing that it is useful for small code sizes https://atariage.com/forums/topic/237463-looking-for-32-bit-division-routines/?tab=comments#comment-3240032

I haven't tried both of them but perhaps they're of some use to you

but if it's always division by 60, perhaps you can cheat a bit? Start with division by 64 (which is a simple shift) and maybe this is precise enough already? otherwise perhaps there's a way to adjust the result somewhat to make it more precise, I don't know

BruceMcF · Post by **BruceMcF** » Sat Nov 20, 2021 4:50 pm

On 11/20/2021 at 5:43 AM, desertfish said:

Here's a 24 bit division routine from codebase64 that is an extended version of their 16 bit division routine (which I use in prog8), so I expect you can extend it once again to 32 bits.

It looks smaller than your code, but I didn't measure.

https://codebase64.org/doku.php?id=base:24bit_division_24-bit_result

Also here's another person with a different looking 32 bits division routine but arguing that it is useful for small code sizes https://atariage.com/forums/topic/237463-looking-for-32-bit-division-routines/?tab=comments#comment-3240032

I haven't tried both of them but perhaps they're of some use to you

but if it's always division by 60, perhaps you can cheat a bit? Start with division by 64 (which is a simple shift) and maybe this is precise enough already? otherwise perhaps there's a way to adjust the result somewhat to make it more precise, I don't know

Surely there is ... since 60 is just UNDER 64, the true result will be equal to or greater than the result of divide by 64.

So multiply the result of the divide by 64 times 60, subtract that product from the original value. Multiplying times 60 is (X*16-X)*4, so 6 shifts left and a 16bit subtract.

For a 16bit unsigned integer, the residual can't be as big as 4,200, so you can do division by repeated subtraction of 960 (64*16, maximum 4 iterations), 240 (64*4, maximum 3 iterations) and 64 (maximum 3 iterations):

START: TUNE if the residual is less than 960

Subtract 960 from the product and increment the trial result by 16

Back to START

TUNE: FINETUNE if the residual is less than 240

Subtract 240 from the product and increment the trial result by 4

Back to TUNE

FINETUNE: FINISHED if the residual is less than 60

Subtract 60 from the product and increment the trial result

Back to FINETUNE

FINISHED

That has at most 10 iterations. I don't know whether it would be smaller than the above, but it seems like it would test out as faster.

ZeroByte · Post by **ZeroByte** » Sat Nov 20, 2021 6:02 pm

I'll have to look at that. My routine does almost exactly that, except repeated addition.

My x/60 routine is essentially: X = X>>6 + X>>10 + X>>14 + X>>18 + X>>22

@kliepatschThe Hz/60 computation is strictly my own implementation of a method to do arbitrary rate playback in once-per-frame chunks.I was close to having it working last night... Just ran out of time (4am). The single-step routine is exposed as a direct call, so you could set up a VIA IRQ and call it at exactly the right rate if that's what you need to do.

My import script already down samples time from 44100hz to 60hz, and I've never detected any jitter in the results. My Sonic demo used it. Other audio demos I've shared to YouTube do it. It just doesn't seem to make any perceptible difference. I DO see an issue in some cases tho - for instance in the audio I'm helping with on City Connection, the tunes have PSG volumes being done by ADSR routines which updated the volume faster than 1/60, so what happens is that the peak volumes are being overwritten in-frame by the subsequent lower volumes. The net result is that the volume of the PSG is too low because the "attacks" get overwritten by the first couple of decay adjustments. This is a problem in the "encoder" though - I could add cmdline switches to select different methods (such as "use max volume over the frame" or "use average volume over the frame" etc) to smooth out these kinds of issues.

I'll soon see what it sounds like to play these 60hz ZSM files at varying speeds. That's what all this time stuff opens up as a feature: playing back the same tune faster or slower. So you can speed up the music like in Sonic when you get the speed sneakers powerup.

I've done some calcs to see what kind of error creeps in when the result isn't an integer (i.e. the file's rate isn't an even multiple of 60).

One of my test values is 31,217hz. (just picked an arbitrary weird number). This would be 520.2833333 steps per frame. My /60 routine ends up with 520.28125 - 0.1% error. This means the song will play back ever so slightly too slow. But since I'm using the timing equivalent of "subpixel scrolling" - I expect the playback to remain smooth in such cases. Essentially, just a little more than 1 in 4 frames will play 521 steps instead of 520.

We shall see.

Again, I believe this entire thing to be a bit of an edge case - 60Hz resolution is strongly recommended. My "on-the-fly resampling" method is basically a way to have built-in support for tunes with non-60Hz rates which my tools can't even generate at the moment, but as this is a "reference implementation" it would be dumb not to support your own features, right? ?

BruceMcF · Post by **BruceMcF** » Sat Nov 20, 2021 6:54 pm

It seems like, for size rather than speed ... couldn't you just not optimize it, just do 24bit divide by 8bit of the 16bit dividend with the 0 fractional byte.

I think it might be something like the following. If "N" is in the zero page, that makes it under 40 bytes. But I stress, this is looking at a 64bit/32bit routine and transposing, it is absolutely untested, so it would be totally unsurprising if there is some real obvious mistake below.

; Value passed in AX, result returned in N...N+2. Remainder is in N+3, but as the result is in 16.8 format, the remainder can be ignored unless it is desired to round up for remainders above 127.

STZ N

STA N+1

STX N+2

STZ N+3

; the first five trial subtracts will always fail ... for space optimization, just let them

LDX #$19 ; 24 = $18, plus 1 to shift the final result bit in.

CLC

; Shift dividend one bit left

LOOP:

: ROL N

: ROL N+1

: ROL N+2

: DEX

: BEQ END

: ROL N+3

: LDA N+3

: SEC

: SBC #60

: BCC LOOP ; trial failed

: STA N+3

: BRA LOOP

END:

; optionally to round 16.8 to closest based on residual remainder

; LDA N+3

; BPL +

;   INC N

;   BNE +

;   INC N+1

;   BNE +

;   INC N+2

: + RTS

kliepatsch · Post by **kliepatsch** » Sat Nov 20, 2021 7:20 pm

On 11/20/2021 at 7:02 PM, ZeroByte said:

My import routine already down samples time from 44100hz to 60hz, and I've never detected any jitter in the results. My Sonic demo used it. Other audio demos I've shared to YouTube do it. It just doesn't seem to make any perceptible difference.

I just tried a couple of things with Concerto and indeed, the jitter is not as bad as I remembered. When making music with Concerto a couple of weeks ago, the jitter was bothering me a lot. It was barely noticeable, but because I KNEW the jitter was real, it kept distracting me all the time, so I had to adapt the song tempo to some integer tick count. And thinking that at 60 Hz, jitter could be twice as bad, it simply wouldn't be fun to make non-60 Hz music. Just playing it back would probably not be as bad.

ZeroByte · Post by **ZeroByte** » Sat Nov 20, 2021 7:48 pm

I've sat there in Deflemask waffling over a sub-step delay value on a note. You can definitely tell the difference when you're the source and hyper-focused on it. Just listening in full context though, meh. It's like agonizing over a particular pixel in Photoshop if you're clipping along an edge.

ZeroByte · Post by **ZeroByte** » Sat Nov 20, 2021 8:03 pm

@BruceMcFthe subtraction test says lda N3 .. sta N3. Is that supposed to be N+3 ?

So n1-3 holds a 16.8? The .8 part isn't a modulo remainder is it?