Well, it's easy enough to math out the best-case transfer rate:
- The fastest lda that's practical for large data transfers is 4 cycles, ditto the only real sta selection. So that's 8 cycles/byte at the fastest, not counting overhead from loops. You can sort of blit while the screen is active, except that in practice this is going to produce weird visual effects on the VERA because writing to memory shares FPGA cycles with drawing. So while the VERA will let you blit any time, we really should limit blitting to the VBLANK interval.
- The VERA technically draws an 800x525 display (the excess beyond its 640x480 resolution is HBLANK and VBLANK, respectively, which are meaningful for signaling and timing but not for drawing). We technically could blit during HBLANK, but the timing is a bit tight. If we only count VBLANK, that's 45 lines out of 525.
- The CPU is running at 8MHz, divided by 60fps is 133,333.33 cycles/frame, times 45/525 is 11,428.57 cycles/VBLANK, divided by 8 is 1,428.57 bytes transferred, minus whatever overhead you lose from copy setup and branching logic (which will be significant). And, of course, whatever else needs to be transferred during that interval outside of blitting.
It seems like recently there have been a few suggestions to port several games from well into the 90s to the X16, without considering that it was envisaged more as a late 70s-decade computer, and only coincidentally reaches beyond that because the VERA is basically a late-80s card, and the supercharged clock rate of the X16's CPU gives it the oomph to drive it in ways that can recreate some 16-bit graphics.
But like the TurboGrafx-16 (or PC Engine for our Asiatic members), the reality is that this is still more 8-bit than 16-bit. The lack of DMA as a standard component is crippling to the system's ability to blit graphics -- imagine if that 1,400 bytes figure were closer to matching the 11,400 cycles/VBLANK figure. Now that'd be some blitting power. But without DMA, I really think it's important to think of the VERA more as the place where you're going to preload your assets, rather than blitting them in real-time.
To use Streets of Rage as an example, I'm going to ballpark sprites at 48x64 in size, meaning you'll probably draw them as a column of two 32x32s plus a column of four 16x16s (plus or minus certain special cases where you can choose to use fewer sprites and make more efficient use of that VRAM). Assuming those were authored to 4bpp, that's 1,536 bytes/sprite, which means you could fit 84 sprites into VRAM, not counting memory for background tiles.
In practice that means you'll probably be stuck with 60-ish sprites in VRAM. I think that's "not horrible", as long as you're clever about your background tiling and assets. You'll definitely need to get creative about animation, because I'd probably shoot for 12-ish frames of animation per enemy, total, and maybe 24 frames total for the entire moveset of the player (walk, punch, jump, dive kick, grab/throw, ouch, fall over, flip, etc). You might even lose a few frames to make room for hit sparkles and other smaller VFX. But that would allow for 3 types of enemies on-screen at once and a robust moveset for the player. As for cycling enemies around, that'd be done during moments where the big flashing "THAT WAY" arrow is happening and there are no enemies on-screen -- assuming you're able to hit half the theoretical bandwidth (a bit of a big "if", but I'm spitballing), that's roughly half a sprite per frame, so replacing all three enemies would run 64 frames -- just over 1 second of mandatory downtime between encounters. If you plan for 5-6 seconds of walking/loading between encounters, you should be able to comfortably "stream in" assets even without DMA.
So it'll take a lot of planning, but I think you could get there at a reasonable fidelity. It's not impossible. People should be impressed if someone goes and does it. Probably even more impressed than they likely will be.