honestly, the correct solution is to use multiple sprites for one object. This is how it was done on classic systems.
Take this sprite of Samus from Super Metroid:
It requires 36px to cover the full width of the character on screen. However, most of that extra 4px of width is blank, with only 10px height for back foot. This can be contained in an 8x16 sprite placed correctly in relation to the main 32x32 sprite. If I were to make a demo/game using artwork of this sort, I would simply make the actor/object control these sprites and define animation frames for each one. In cases where the "extra bits" (like Samus's foot) move around relative to the main sprite, the engine would also need an x/y offsets table. In the Samus animation, the back foot is near the bottom of the sprite for one frame, and closer to the top on the next frame. This "foot sprite" would just be drawn at different relative locations to the main sprite during each frame of the animation cycle.
![1268944635_SamusPieces.png.1570e1d91227178392831dbbcdd0efcd.png](<___base_url___>/applications/core/interface/js/spacer.png)
My Sonic demo faced a similar issue on the Sonic sprite, which is 40px tall. I used one 32x32 sprite for the main part of Sonic:
![sonicmain.png.49dda40f9f35b23ba2d684f18f6bd2b3.png](<___base_url___>/applications/core/interface/js/spacer.png)
and I use a separate sprite for the ears:
![sonicears.png.3b0017f32e522d7cdfa0a0980425ae95.png](<___base_url___>/applications/core/interface/js/spacer.png)
Interestingly, the ears only require 2 frames of animation (frames 3 and 4 are duplicates of 1 and 2). I didn't bother deleting frames 3 and 4 from the resource, as I've got plenty of VRAM for this simple demo, but the engine never uses those frames. I set up an animation frames table with the VRAM locations of the frames, and the engine cycles through these tables. So the main body's data is the VRAM address equivalents of: 0, 1, 2, 3 and the ear's data is 0, 1, 0, 1. This is also useful for "pingpong" animations where you only need pixel data for 3 frames of a walk/run cycle and you just display them as 0,1,2,1,0,1,2,1... no need to have frame 1 in VRAM twice.
This is part of the challenge of programming retro. It's nice when HW just does the work for you, but when it doesn't, that doesn't mean you're stuck. You just have to code a solution. This stuff I've explained here actually saves more memory too, as you only need exactly the pixel data for exactly the parts that go out-of-bounds. The lazy alternative would've been to use a 64x64 sprite, but yes, that would gobble up VRAM needlessly.
Hope this makes sense.