Rick's b.log - 2022/12/03 |
|
It is the 21st of November 2024 You are 3.15.7.212, pleased to meet you! |
|
mailto:
blog -at- heyrick -dot- eu
Still, it's better than minus five!
Anyway, woke at half one. Then at quarter past two. And again at three. At that point I thought "stuff it" and made a tea.
As we have an audit on Monday, I spent roughly the first half of the day scrubbing.... we have these trolley like things for the big baking trays. They aren't dirty, but a few hundred passes through the automated cleaner and there's a sort of reddish gunk on them. It's the same sort of stuff you'll have on your sink if you neglect cleaning it.
I returned after a short break (250ml cola, Mars bar, Twix, and 2 White Oreo biscuits - sugar high for the win) and alternated between cleaning stuff as before, and helping the guy in the smaller plonge. And at twenty past noon he went to deal with the bin since the production team dropped off their bags of crap at the last possible minute (as usual) while I drained, stripped down, and cleaned the dishwasher. End of day, end of week, end of these Saturdays.
I feel alright right now. Well, I've had two teas since coming home so between that and the sugar....
At least I can adjust my time to wake at seven and no longer have to shift down to half three.
Okay, let's deep dive this and pull it apart ourselves. Just because. It'll help understand the sort of work the emulator is doing asides from the actual multiplication operation, which will melt your brain.
As you can imagine, this emulation may involve a couple of hundred ARM instructions. So it's not quick. It's better than nothing, and that's how it was for pretty much every machine Acorn made. The RiscPC doesn't even have a place for an FP chip.
I said on the ROOL forum that even ABC in VFP mode would hand Norcroft it's arse. Speaking of arses, this was pulled out of mine. I just knew that no matter how good Norcroft is at making optimal code, it would be screwed by the use of FPE.
So... here's the core of the C code:
Yes, it's a little bizarre. This is because Norcroft is very good at optimising, so if I gave it a fixed calculation and told it to loop eight million times, there's a chance that it might simply work out the result and return that and not bother to loop anything at all. ☺
Here is the equivalent BASIC code:
Okay, so let's put it to the test. Everything will be listed in decreasing order of time spent. The values used are centiseconds. Running singletasking on a Pi 3B+ (1400MHz).
The takeaways:
I would imagine the same code compiled with GCC (that does VFP) would be faster still, but I don't have GCC here (no inclination to learn a new toolchain).
The point has been made. C is pretty fast. C with VFP would be even better.
And here is today's chocolatey goodness.
The final Saturday
Well, it's pretty damn chilly but it wasn't -5°C. AccuWeather said that for most of the first two thirds of the week. Météo France said it was going to be a low of 1°C and a high of 4°C. It was on track to be that, but then cloud cover rolled in. So it has been between 2½°C and 3°C.
Which is like the lower shelf of my fridge...
What my weather is as I'm writing this.
Half an hour later, the alarm went off.
Which meant it was time for more tea.
Anyway, got myself a scrubbing pad and set about making them nice and shiny all over again.
Once I had done all of them they were not otherwise in use, I went and did the same thing to the baking tins and mixing bowls. Makes me wonder if the same sort of gunk turns up on stuff that passes though a dishwasher? After all, these things went through a dishwasher/steriliser, just industrial scale.
I was getting into a rhythm, and then realised it was half ten. Better go on break!
....but it's tomorrow when it hits me.
Hardware FP?
Something I whinge about is the lousy floating point performance on RISC OS using the official DDE compiler. This is because the compiler and libraries are all set to use the FPE. This stands for "Floating point emulator" and is a software emulation of an old (early '80s) FP chip that was intended to be available as a co-processor in the Archimedes machines, the infamous FPA10. These chips were hideously expensive, in September 1989 it would cost you £599.99+VAT. So Acorn's approach was to write an emulator. The machine will attempt to execute CDP CP1, 1, C0, C1, C2, 4
This will abort, because there's no co-processor attached. The abort handler will pick up this abort, and FPEmulator will be offered the instruction &EE110182
.
As co-processor 1 is known to FPEmulator, it will pick apart the instruction encoding to determine that it was the FP operation MUFD F0, F1, F2
or a double precision multiplication (F0 = F1 * F2).
EE110182 = 1110 1110 00010 0 001 0 000 0001 100 0 0 010
cond CDP opcode e Fn j Fd 0001 fgh 0 i Fm
opcode = 00010 = MUF (multiply)
ef = 01 = double
gh = 00 = round to nearest (default)
i = 0 = Fm not a fixed constant
j = 0 = dyadic
Fn = 00 = F1
Fd = 000 = F0
Fm = 010 = F2
= multiply double precision, default rounding, calculate F1 × F2 storing result in F0
Therefore : CDP CP1, 1, C0, C1, C2, 4
is identical to : MUFD F0, F1, F2
But when there's hardware FP built into the current generation of devices, sticking with an emulation is asinine.
[start is the value of TIME]
loop = start;
one = (double)start;
two = (double)start + (double)start;
three = one + (double)loop + (double)start;
// Churn for a while
for ( loop = 0; loop < 8000000; loop++ )
{
one = (one * one) / two;
two = three + one + loop;
three = three / 2;
if ( one > 123456 )
one = one / 1024;
if ( two > 123456 )
two = two / 1024;
}
start% = TIME
loop% = start%
one = start%
two = start% + start%
three = one + loop% + start%
FOR loop% = 1 TO 8000000
one = (one * one) / two
two = three + one + loop%
three = three / 2
IF ( one > 123456 ) THEN one = one / 1024
IF ( two > 123456 ) THEN two = two / 1024
NEXT
BASIC64 using FPE . . . 6,475 (64¾ seconds!)
ABC using FPE . . . . . 2,130
Standard BASIC V . . . 1,443 (~14½ seconds)
BASIC64 using VFP . . . 1,337
Norcroft C (FPE) . . . 838 (< 1 second)
ABC using VFP . . . . . 146 (< ¼ second!)
The calendars
Here's yesterday. I was too tired by the time it uploaded to make an entry for it. You'll get a daily video, but it might not appear here on time. Depends how I feel. ☺
Click that subscribe button, then you'll get notified.
Anon, 4th December 2022, 10:40 I managed to get a hardware FPA10 chip for a lot less than £600 back when I used an A5000. If memory serves I think it was closer to £20 when one of the big name suppliers in Acorn User (Beebug? Watfraud?) were selling them off.
I do recall that after fitting the chip, a lot of stuff ran quicker, although memory is hazy as to what.
Incidentally the lack of hardware FP is (again if memory serves) what made even a StrongARM RiscPC absolutely dire at encoding MP3 files. Something like 0.2x speed (a similarly-clocked Pentium could do nearly real-time). Tried it on a Pi a bit back (again I forget whether it was a Pi1 or a Pi Zero, I believe they use the same SoC) and got around 8x speed encoding. (Again, from memory, I can confirm that it was faster than real-time.)
One of the biggest mistakes ARM / Acorn made was not incorporating hardware FP into the ARM6 and above. (That's ARM6 as used in the RPC600, not ARMv6.) Possibly even into the ARM3 chip, meaning that the A5000 or better would have had proper hardware FP.Rick, 4th December 2022, 13:10 I fully agree. Hardware FP was shockingly expensive when it was first a thing (the 80387, like the FPA10, could cost almost as much as the entire rest of the machine), but in the latter days of the 80486, it became pretty much a non-event due to hardware FP being included within the processor (the old SX vs DX difference).
That the RiscPC, with no FP socket, didn't include this in the processor is simply a mistake that made their ARM based machines more expensive and less competitive (in terms of power) compared to everybody else. Sure, there was a time when the Archimedes was the world's fastest home computer. The rest moved on, Acorn didn't.
Before the StrongARM, Acorn had a 40MHz ARM710 card. Elsewhere, 133MHz Pentiums with much faster (66MHz?) memory access... and, of course, hardware FP (even if it did screw up slightly).David Pilling, 4th December 2022, 17:44 Makes me feel guilty for RMEnsuring the FPE so many times. If you used my software with speeded up FP it would go little faster, everything that was time critical was done using 32 bit integers.
I am surprised that the FPU hardware was available as early as you say. The tale I would tell is that Universities would have been happy to buy a UNIX (RiscIx) workstation around 1989 and compile their own software, but they would not stand for anything without hardware floating point operations. At least when I was sat in a sales demo from Acorn of the R140 at Uni, there was no FP available.
1987 was when PCs flooded into the University. It seems like 8087s were not long after. I am surprised to find (35 years later) that my Amstrad 1512 from early 1987, had an 8087 FP coprocessor socket.
Memory is promises promises about hardware FP, and it was delayed because it was not convenient for ARM to make it.
The takeaway then is that there needs to be some way of using VFP with Norcroft C - enhanced FPE (?)
Whilst writing this I found that there is some VFP support in RISC OS now:
https://www.riscosopen.org/news/articles/2021/07/10/going-round-in-circles-quickly
In brief Acorn stuffed FP up, everything was about making ARM a success.Rick, 4th December 2022, 18:18 "I am surprised that the FPU hardware was available as early as you say"
It is listed in the September 1989 Retail Price List. Whether or not the units actually existed and could be shipped... different question. ;)
There is growing VFP support in RISC OS. The latest versions of the ABC compiler can choose whether to build for FPA or VFP. But, still, no news on whether or not the DDE will be changed to support it natively. It would, I imagine, require a completely new (and dual-compatible) CLib as, annoyingly, the word order for the FP values is back to front between VFP and FPA. Unless VFP or NEON or something can be persuaded to load with a backwards word order?
David Pilling, 4th December 2022, 18:46 Wikipedia:
"The FPU expansion card was delivered for the R140 workstation and 400 series in 1989, priced at £599 plus VAT, and was based on the WE32206,[300] with a "protocol converter chip" being used to translate between the ARM and the WE32206.[301] The WE32206 card was also offered for Acorn's Springboard expansion card for IBM PC compatibles.[302] Although Acorn had expected that interfacing the ARM to an existing FPU chip would be "a much quicker route" to delivering a hardware-based floating-point solution than developing a new co-processor, the complexity involved in developing the custom gate array responsible for interfacing to the WE32206 was apparently greater than anticipated, taking two years to deliver."
The Archimedes models based on the ARM3 processor supported a completely new "arithmetic co-processor" or "floating-point accelerator" known as the FPA. Released in 1993 for the R260 workstation and the A540 and A5000 machines, priced at £99 plus VAT, the FPA device—known specifically as the FPA10—was fitted in a dedicated socket on the processor card for the R260 and A540...
David Pilling, 4th December 2022, 18:49 Still at it after all these years
"SHOULD YOU PREORDER THE NEW ACORN GPU MINING ACCELERATOR NOW?"
"I purchased Acorn M.2 FPGA based mining accelerators when the first batches were made. Not only did Squirrels miss the delivery date by over 6 months"
David Pilling, 4th December 2022, 18:53 There's a RISC User review of the FPU from December 1989 - David Spencer is not impressed. So yes it did arrive in 1989, but only just.
https://archive.org/details/risc-user-vol-3-iss-02-dec-89/p age/49/mode/1upRick, 4th December 2022, 19:44 Seems I misremembered. There's no FPA socket on the A310, and the FPA podule won't work as the A3xx doesn't wire the middle pins of the backplane socket like the A4xx machines do.
As far as I can tell, it's only the A5000 and the processor card of the A540 that ever had FPA sockets.
And, yeah, I can't help but agree with the RISC User article. A *lot* of money for not so much actual gain.Anon, 5th December 2022, 19:27 Still have to wonder why Acorn didn't just bang the FPA10 chip into the A5000 as standard.
I stand by what I said in the first comment above. Acorn / ARM should have put hardware FP into the ARM610 chip when the RiscPC was launched. PeeCees (remember that?) by this point had hardware FP on the 486DX and Pentiums...
Yes, Acorn were miles ahead back in the late 80s-early 90s. But then they stagnated.
© 2022 Rick Murray |
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted. RIPA notice: No consent is given for interception of page transmission. |