Page 2 of 3
6502 RISC instruction set running at 3.4ghz..
Posted: Mon Jul 19, 2021 8:15 am
by TomXP411
On 7/15/2021 at 3:36 AM, kelli217 said:
So, then, let's drive this discussion in another direction. Assuming that, clock-for-clock, the 65C02 wouldn't be 'competitive,' would a max-speed chip nonetheless still be... 'acceptable' for modern-day tasks? Could you watch a 144p YouTube video? Play a 128kbps MP3? Render a modern web page with scripts and CSS?
The question becomes — with what RAM? These things depend on huge datasets being able to be accessed and manipulated, often with SIMD instructions, and there's still only the 64K address space, even with the kind of things that the X16 does via paging.
With this kind of sort of primitive MMU style of paging, though, a 65C816 suddenly becomes a much more attractive option. Windows 95 was able to run in the 16MB memory space that an '816 can access, and if the paging system used 4MB pages (scaled proportionally to the size of the X16's ROM pages [or 2MB if you'd prefer it scaled to the RAM page size]), then you've got a sort of supercharged version of the old DOS EMS. You can do some fairly complicated web pages with those kinds of resources.
But without a 32-bit data bus and a hardware multiplier at the bare minimum, you're just not going to be able to move enough data fast enough and get enough arithmetic done to play those compressed media files. Uncompressed (or 'trivially' compressed, like RLE or TMV) is no problem. The X16 can do that if you have a fast enough storage interface that you can keep the buffer full. A 3.4GHz '816-based system could push data fast enough to play 4K HD video at 60p if that's all it has to be doing.
No, because the instruction set simply doesn't have the needed operations.
As was mentioned above, there's no integer divide or multiply, let alone doing floating point math or the SSE instructions that operate on 4 integers at a time.
Scripts are a maybe, but again, note the performance numbers Scott pulled out above. A 3GHz 6502 would be running at the equivalent speed of a 100MHz x86 - but without the math coprocessor, or even multiplication or division.
If you think back to the 100Mhz days - yes, you can absolutely run a web browser on a 100MHz computer, but it's going to be much slower than a modern PC, and anything involving fancy math (such as decompressing JPG and PNG graphics) is going to be slooooooooooooooooooow.
6502 RISC instruction set running at 3.4ghz..
Posted: Mon Jul 19, 2021 5:03 pm
by m00dawg
The long but interesting interview with Jim Keller on
Anandtech comes to mind, specifically this quote:
"
JK: [Arguing about instruction sets] is a very sad story. It's not even a couple of dozen [op-codes] - 80% of core execution is only six instructions - you know, load, store, add, subtract, compare and branch. With those you have pretty much covered it. If you're writing in Perl or something, maybe call and return are more important than compare and branch. But instruction sets only matter a little bit - you can lose 10%, or 20%, [of performance] because you're missing instructions."
6502 RISC instruction set running at 3.4ghz..
Posted: Mon Jul 19, 2021 9:27 pm
by kelli217
13 hours ago, TomXP411 said:
No, because the instruction set simply doesn't have the needed operations.
I'm going to proceed under the impression that your first sentence responds directly to my last sentence. You are probably still correct, but I may not have made it clear that I was talking about completely uncompressed video, or something trivially compressed like Run-Length Encoded or Text Mode Video, where the CPU does not have to do any math to decompress lossy data, but is just pushing pixels or character data. And doing it on this theoretical 3.4 GHz 65C816, with no OS overhead.
6502 RISC instruction set running at 3.4ghz..
Posted: Tue Jul 20, 2021 3:36 am
by BruceMcF
5 hours ago, kelli217 said:
I'm going to proceed under the impression that your first sentence responds directly to my last sentence. You are probably still correct, but I may not have made it clear that I was talking about completely uncompressed video, or something trivially compressed like Run-Length Encoded or Text Mode Video, where the CPU does not have to do any math to decompress lossy data, but is just pushing pixels or character data. And doing it on this theoretical 3.4 GHz 65C816, with no OS overhead.
Actually this gets to the only point of taking a processor with such a small transistor footprint and speeding it up like that, which is that the bulk of the mask is taken up by some other specialized circuitry.
One problem with that approach is less technical than economies of scale ... making the specialized circuitry so that it goes onto a bus to be driven by an external CPU means that it can be used by more than one CPU, just as the CPU is produced to use more than one specialized circuit.
If you were going to do it anyway, a problem with using the 6502 for it is economies of scope ... since the 6502 instruction set is not well suited for compiled C software, a lot of things have to be built from scratch that wouldn't have to be built from scratch if you used a low-power ARM architecture, and accept the larger mask footprint as the tradeoff for the much greater toolchain support base.
And if you were going to do it ANYWAY, and build your own toolchain, then a stack machine would give you an even smaller transistor footprint with more MIPS than a 6502 of the same speed.
6502 RISC instruction set running at 3.4ghz..
Posted: Tue Jul 20, 2021 9:22 pm
by TomXP411
23 hours ago, kelli217 said:
I'm going to proceed under the impression that your first sentence responds directly to my last sentence. You are probably still correct, but I may not have made it clear that I was talking about completely uncompressed video, or something trivially compressed like Run-Length Encoded or Text Mode Video, where the CPU does not have to do any math to decompress lossy data, but is just pushing pixels or character data. And doing it on this theoretical 3.4 GHz 65C816, with no OS overhead.
You already have all the information you need to do the math.
1080i60 is the current broadcast standard (although I suspect studios are production video internally at 1080p60 or 3840p60).
1920 x 1080 is 2073600 pixels, or 6,220,800 bytes per frame. That is 186,624,000 bytes per second. It was already stated above that a 3GHz 6502 would run roughly 1.4MIPS.
How are you going to transfer 186 megabytes per second when you can only process 1.4 million instructions?
At this point, the questions you’re asking are making less and less sense, since it’s already been explained that 6502 architecture is decades out of date, and no amount of clock cycles will make it a practical processor for modern desktop computing demands.
6502 RISC instruction set running at 3.4ghz..
Posted: Tue Jul 20, 2021 10:44 pm
by m00dawg
On 7/10/2021 at 1:28 AM, BruceMcF said:
Now, the version where there is 128K on cpu cache with the Low RAM all in on-cpu cache and the High RAM and ROM segments starting to be cached as soon as the banks are selected ... that version would let you crank the cpu speed a bit higher before it gets memory bound.
This has been in my mind for a few days now and is something I find super interesting. Essentially, placing the "main" memory onto the die as SRAM would let it run at the CPU clock speed. This would include the ZP (so I think solves the need to consider the ZP part of a register file and we can treat it the same as on, say, the X16) as well as the stack. We're well beyond the point of having external SRAM here in order to run at anything remotely close to 3.4GHz or whatever, but given Ryzen has 384kb of L1 cache, this seems in the realm of possibility.
A 6502 running that fast is still going to run into processing limits of 64kb space. In that, it'd churn through data so fast that having such a small amount of memory would be pretty limiting (it already is on the X16, hence the himem stuff). I'm less sure about how to access external data and surely some level of caching or extra hardware to manage, say, modern DDR RAM, would be needed. Fetching data from DDR RAM would take longer than internal memory so, as was alluded to in other comments, there's going to be waiting going on and a need for some additional caching I suspect. I/O as well would have to be solved since on the X16 everything runs at bus speed and it's all good but there's far more complexity on modern systems to manage, say, the PCI Express bus. Even interfacing with our lovely YM2151 or the VERA would have to be very very different here.
Not to mention the design requirements from the physical hardware standpoint (trace lengths for the mainboard, etc. etc.) all become very important at these speeds.
At this point, it kinda breaks down on what the next step would be here. But it is kind of a fun exercise to think about, I thought anyway. I mean there is surely a reason why the T-800 uses a 6502 after all
?
6502 RISC instruction set running at 3.4ghz..
Posted: Tue Jul 20, 2021 11:57 pm
by Scott Robison
1 hour ago, m00dawg said:
At this point, it kinda breaks down on what the next step would be here. But it is kind of a fun exercise to think about, I thought anyway. I mean there is surely a reason why the T-800 uses a 6502 after all ?
And Bender! Head canon: With all the issues found with advanced CPU techniques, bugs in CPUs, security issues due to inter-core spying, etc etc etc, they decided to go to the best CPU that would be known secure where any defects were well documented. Yeah, that's it.
6502 RISC instruction set running at 3.4ghz..
Posted: Wed Jul 21, 2021 6:30 am
by BruceMcF
7 hours ago, m00dawg said:
A 6502 running that fast is still going to run into processing limits of 64kb space. In that, it'd churn through data so fast that having such a small amount of memory would be pretty limiting (it already is on the X16, hence the himem stuff). I'm less sure about how to access external data and surely some level of caching or extra hardware to manage, say, modern DDR RAM, would be needed. ...
Access to the 2MB of L2 RAM cache and 1MB of L2 instruction cache is straightforward ... you put a byte into $0000 in the address space to select an 8MB segment of L2 cache, or 16MB segment of L2 instruction cache, and reading the associated bank window copies the L2 contents into the L1 cache, 64bits at a time. Writing is more involved (why only one window is R/W and it's the 8KB one) but each write sets a bit in a 1KB written value register, which sets a bit in a 128byte written 64bit word register. When not reading the L2 R/W cache, it scans through the written word register and does a write back of new contents, 64bits at a time, controlled by the written value register.
Each core has its own 64KB L1cache, but the L2 cache is common to all cores, so only core 1 has the memory controller registers in its memory mapped I/O space.
6502 RISC instruction set running at 3.4ghz..
Posted: Wed Jul 21, 2021 7:47 pm
by Brad
Well if my Apple //c had a 3.4GHz 65C02 maybe it could actually run Flight Simulator 2 at a decent speed. That said, retrospectively the 6502 does things...strangely compared to some of the more modern designs I studied in college (68000 comes to mind), but it was "good enough" to power the most popular computers of all-time. My goal for this year was to learn 6502 assembly, but it's slow going because I keep thinking in a modern context; the entire concept of zero page was alien to me before I realized the intent. And really that sort of stuff is what engineers like to call "getting it done." A theoretical processor that does everything in one clock cycle with a billion megs of registers blah blah blah is fun to think about, but the 6502 was made by engineers with a problem to solve and they did it in a way that was not only effective, but super cheap to mass produce. I digress, but thinking about this sort of stuff really makes you appreciate the old school design philosophy. Shoe string budget, looming deadlines, no such thing as soft patches or firmware updates...it's gotta work or you don't eat.
The paradigm of programming has shifted so much over 40 years, anyway. Assembly programmers are a rare breed anymore, which is honestly a shame since it teaches you to understand what you're actually doing on a fundamental level and also allows you to fully exploit a system instead of just crossing your fingers that the compiler takes your crappy code and turns it into something quick enough for the job. It's almost like code optimization is a bad word anymore. That said, I don't think a modern compiler would be able to take advantage of the 6502 in a way that would scale linearly with more speed. Obviously I could be wrong, but honestly running GEOS on my Ultimate64 at 20MHz or so seems faster than using my Macbook, so maybe like 100MHz would be a reasonable limit of usefulness?
6502 RISC instruction set running at 3.4ghz..
Posted: Wed Jul 21, 2021 8:13 pm
by TomXP411
26 minutes ago, Brad said:
the 6502 was made by engineers with a problem to solve and they did it in a way that was not only effective, but super cheap to mass produce.
This right here is why the 6502 was so popular. It was cheap, compared to the 8080, its best competition when it was first created. The 6502 cost $25 in 1975, compared to $360 for the 8080 at launch.
Even if the 8080 went down in price over the two years between its release and the 6502 launch, I still doubt it went down to $25. Not even close. So when Apple and Commodore both set out to release an inexpensive home computer, it's no surprise they went with the MOS 6502, rather than the Intel 8080.
Obviously, Intel's strategy won out, but that's mostly due to the success of PC clones, rather than the intrinsic merit of the processor. I actually do think the 8080 was a better CPU, but was it 7 times better? With the price of the two processors, I'd have made the same decision as Tramiel, Woz, and Jobs back in the 70s.