It is the 1706th of March 2020 (aka the 31st of October 2024)
You are 3.17.162.32,
pleased to meet you!
mailto:blog-at-heyrick-dot-eu
Poking a carcass with a stick for the lulz, and musings on task switching
Last Samhain (Halloween), I took apart a badly written demonstration program made by a trollmy nemesis somebody who is very pro-assembler and writes "clever" code.
That is not a compliment.
Well, on his site (now that he's no longer cluttering up the ROOL forum with this sort of thing), he talks about priority-based task switching and an implementation in C used to demonstrate how an implementation in assembler is so much better.
Don't use assembly
Let's get two things straight right from the outset. Firstly, I am the guy that wrote the well known ARM assembler tutorials, back in the nineties.
And secondly, if you advocate writing anything more than necessary glue code in assembler, or VFP/SIMD for speed-critical things, you're nuts. Plain and simple batshit.
Lest we compare RISC OS...
RISC OS is only available for eight flavours of ARM board: XScale (Iyonix), OMAP3 (Beagle etc), OMAP4 (Panda), the Pi family, iMX6, OMAP5, TI AM5728 (Titanium), and legacy Acorn ARM6/7/SA (RiscPC/A7000).
...with Linux...
Linux is ubiquitous. It is available on pretty much anything with a powerful enough processor and some form of memory management. All of these ARM boards can run Linux. Many people's home broadband routers will be running Linux (often on a Realtek MIPS chip, or nowadays some sort of ARM). Smart IP cameras and PVRs will be running Linux (again, often MIPS or ARM). And we all know Ubuntu and Debian et al on x86 hardware.
There is a worry, that I have discussed in the past, that RISC OS will eventually die out because there are fewer 32 bit ARM-A family devices, as there's really nothing that a 32 bit world can do that a 64 bit world can't do better.
The 64 bit ARM code looks spectacularly different to 32 bit ARM. Gone is the lovely multiple load/save, gone is most vestiges of the conditional execution that really set ARM aside from other processors. And gone is Rx for the registers. There are now twice as many, with a necessarily different ABI, and they're accessed as a 64 bit register (Xx) or a 32 bit register (Wx).
What this means in reality is that moving RISC OS to a 64 bit version of the same basic processor family will be about as complicated as porting it to an entirely different processor. Yup, it would be essentially a ground-up rewrite of hundreds of thousands of lines of assembler.
Linux, on the other hand, is already happily running on 64 bit devices. Chances are, if your derivation of Linux is called Android, it's probably already running in 64-bit ARMv8-A mode.
What it involved was to write 64 bit appropriate glue code for the startup and low level interrupt things, and then to recompile everything with a compiler capable of generating the 64 bit code.
That's why we have 64 bit Linux (or Android) today, and will likely never have a 64 bit native RISC OS.
Now, don't get me wrong here - I have no issue with people writing stuff in assembler if they can and they enjoy it. My issue is purely in telling everybody, in 2022, that assembler is better. It very absolutely definitely is NOT.
But compiled code is rubbish!
Before anybody says that compilers generate worse code than humans... well, that was largely true in the '90s when some compilers were really rather poor; in large part because of restrictions in the hardware itself (there's only so much you can fit into relatively small amounts of memory before compromises must be made). These days compiler technology is vastly better and while a person highly skilled in assembler might be able to write better code than any compiler, any average coder will be able to get a lot more done in the same time frame simply writing in a higher level language, and they will have fewer weird bugs because the compiler takes care of all the tedious low-level crap like stack handling, when to use writeback, and register assignments.
Because what really really matters 99.999% of the time is not the internal beauty of the code, but that you actually have a product. Something to ship out the door. Something to justify your salary.
Seriously, don't use assembler
Plus, take a look at a little something I threw together two winters ago - Rarthur. It is an unfinished attempt at rolling my own UI from the ground up, partly to prove a point and partly for the sake of doing it (and to keep me away from Netflix&Chocolate which is always a bad combination).
Note that it is available as two executables. One runs on RISC OS, the other on any basic DOS machine (or DOSbox). Apart from some minor colour palette differences and not quite using the same font (though it's a pretty close match), they look and feel and behave the same way on both platforms.
I would not have bothered starting this project if I was writing it in assembler, and it certainly wouldn't have been possible to target two essentially alien architectures at the same time.
I think I've pretty much made the point that being a pro-assembler fanboy in 2022 is not far off being an anti-vaxxer... or a Republican/Tory... a delusional soul who failed to see the ship sailed decades ago, and thus is a danger to others due to spreading easily disproven ideas from another epoch.
A simple task switcher
So to the task switcher in case here. The full code is available here. It's only the one source file, but it's more detailled than the short version quoted in the article.
What is going on here is implementing a very simple four level process priority system. It works by simply counting up, and making use of the fact that only one bit will be changing from zero to one with each count. Which means, as the comments put it:
* - Priority 0 : Bit 0 flips from 0 to 1, every other call. *
* - Priority 1 : Bit 1 flips from 0 to 1, every 4th call. *
* - Priority 2 : Bit 2 flips from 0 to 1, every 8th call. *
* - Priority 3 : Bit 3 flips from 0 to 1, every 16th call. *
Which if we write some simple BASIC to perform the same logic, it more or less results in exactly that happening:
Tick
Task
1
1
page in next priority zero task
2
2
page in next priority one task
3
1
page in next priority zero task
4
4
page in next priority two task
5
1
page in next priority zero task
6
2
page in next priority one task
7
1
page in next priority zero task
8
8
page in next priority three task
9
1
page in next priority zero task
10
2
page in next priority one task
11
1
page in next priority zero task
12
4
page in next priority two task
13
1
page in next priority zero task
14
2
page in next priority one task
15
1
page in next priority zero task
16
0
undefined
17
1
page in next priority zero task
18
2
page in next priority one task
19
1
page in next priority zero task
20
4
page in next priority two task
21
1
page in next priority zero task
22
2
page in next priority one task
23
1
page in next priority zero task
24
8
page in next priority three task
There is a slight anomaly where the 16th call is zero instead of four due to the reset back to zero.
Task states
While it's not a bad concept for a really simple task switcher, the problem arises when one considers that a task actually has four possible states, not two.
The two that everybody knows is that a process is either running, or it isn't.
Generally speaking, a processor is only capable of doing one thing at a time, and as this example task switcher demonstrates, the illusion of multitasking is maintained by giving each task a little bit of time. If you have five tasks and they each get a fiftieth of a second, they will average to a twentieth of a second each, every second, and the switching increment will be small enough that you mightn't even notice a tiny delay as the task is switched out and something else happens.
This is not strictly true these days with multi-core processors that actually can run multiple things at the same time, but it is still a valid generalisation as there are far more tasks on an average system than there are processor cores.
Task switcher code
Now, we know we should take a huge pinch of salt when we see David writing this:
* The code provided does work, if correctly called from the timer inturrupt.*
Here's the core of the code. There are two things that you should spot right away.
void SysTaskTick(void)
{
int prevcnt, t;
prevcnt=PriCount;
PriCount=(++PriCount & 0x0F);
switch (PriCount & ~prevCnt)
{
case 1: if (t = SysGetTsk(0)) if (t != CurTsk)
{ SysSwitchTo(t); CurPri = 0; CurTsk = t;} break;
case 2: if (t = SysGetTsk(1)) if (t != CurTsk)
{ SysSwitchTo(t); CurPri = 1; CurTsk = t;} break;
case 4: if (t = SysGetTsk(2)) if (t != CurTsk)
{ SysSwitchTo(t); CurPri = 2; CurTsk = t;} break;
case 8: if (t = SysGetTsk(3)) if (t != CurTsk)
{ SysSwitchTo(t); CurPri = 3; CurTsk = t;} break;
}
return; /*This returns to the current task, even if different from one
* inturrupted. */
}
Did you spot them?
The first problem means that I know this hasn't been passed through a compiler, as no compiler is going to accept mixing up prevcnt and prevCnt.
The second one is more interesting. The compiler will likely tell you that we are using "=" in a condition context. That is to say, the part if (t = SysGetTsk(0)) is expected to be if (t == SysGetTsk(0))... but if we do that, the code completely fails to switch tasks.
We actually want the assignment. As for why it is stuck in an if, well this is using a side effect of the fact that we are evaluating the clause. If the task handle returned is zero, then the rest of that code won't be executed. So the code is doing what is wanted, it's just a rather horrible way of writing it.
It would be much better to write each part of the switch structure like this:
case 1: t = SysGetTsk(0);
if (t && t != CurTsk)
{
SysSwitchTo(t);
CurPri = 0;
CurTsk = t;
}
break;
Or, if you prefer the compact styling, like this:
case 1: t = SysGetTsk(0);
if (t && t != CurTsk) { SysSwitchTo(t); CurPri = 0; CurTsk = t; } break;
A good coder does not rely upon side effects but sticks with what is obvious. After all, it may not be you that next looks at this code.
Which is why there is a third problem that the compiler may choke on. It's this:
PriCount=(++PriCount & 0x0F);
The Norcroft compiler will report a warning that this is undefined behaviour, that PriCount; is written twice without an intervening sequence point.
That's essentially nerd-speak for saying that "PriCount equals pre-increment PriCount anded with fifteen" is invalid. This is because at certain points in the code (the so-called sequence points), all side effects of previous calculations should be resolved.
The actual problem here isn't the AND, it's the fact of writing PriCount the value of the pre-incremented value of PriCount. For the record, post-increment (PriCount++) will trigger the same issue.
We can't write ++PriCount & 15; as that's not valid syntax. So the only reasonable option is:
PriCount=((PriCount + 1) & 0x0F);
This works because we have changed the PriCount into an explicit addition, so the compiler knows that we are taking this, incrementing it, and then ANDing it, before writing it back.
As it happens, the Norcroft compiler is smart and will generate the same code for the explicit addition as well as the pre-increment version, namely:
LDRB v4,[v1]
ADD a3,v4,#1
AND a2,a3,#&f
STRB a2,[v1]
However, again, one should not rely upon side effects. Not all compilers are equal, and "undefined behaviour" means the compiler is free to do what it thinks you meant, which may not be what you actually meant.
Task states revisited
Something that might be handled by David's system (he hasn't posted any source for SysGetTsk()) is that I mentioned above that there are actually four states a task can be in. Running, pending, sleeping, and blocked.
A task that is sleeping is one that knows that it has nothing to do. It might say "wake me up once a second".
A task that is blocked is one that is waiting for a resource. This is pretty much unknown to RISC OS as everything is very deterministic. If you want to read a hundred kilobytes of data from a file, everything will halt while that data is loaded in. It's not a big deal these days, but it took a good few seconds in the days of floppy discs. On other systems, that load might get pushed to the background and the task marked as "blocked" while the data loads, but other tasks can continue running.
One also needs a system idle process. This should be invoked in two cases - in the case that a task is entered which then calls yield, and in the case that the above task switcher does not have a task to switch to.
RISC OS is a co-operative scheduling system. The next task is called in, and it runs until it chooses to yield back to the system. There are a few tweaks regarding high priority poll words and idling tasks, and one might jump the queue if the user clicks on one of its windows, but essentially when a task has control it keeps it until it determines otherwise.
David's scheduler is pre-emptive. That is to say, the above code is called upon a clock tick to determine if there's a new task to invoke, and if so, to forcibly evict the current task. This behaviour would not work on RISC OS without a lot of additional baggage (imagine being swapped out in the middle of writing a VDU sequence, or data to an open file handle, or whilst VDU output is redirected to a sprite).
Overheads
An interesting aspect is to consider how much administrative time would be spent on task switching. Whilst the assembler "equivalent" might be smaller and tighter code, we haven't looked at the overheads of entering a privileged mode, of suspending the current task, of messing with memory management (or however swapping is actually handled) and of restarting a different task and then entering it. These context switches will take time. If your timer interval is short enough, an appreciable amount of the "quanta" (the time between two clock ticks) may be involved in actually switching tasks.
Furthermore, one could ask why a high priority task gets the same timeslice as a low priority one. Okay, granted, it gets to run a lot more frequently (every other tick as opposed to every sixteenth), but given the switching overheads, maybe it should have a higher quanta, that is to say, that a low priority task should run for one clock tick, while a high priority task should run for four (easily calculated as ticks = 4 - priority;).
Suddenly, writing this sort of thing in C will seem a lot more attractive simply because you can read it so much more easily. A couple of nanoseconds lost in code that isn't 100% optimal would be more than made up by not losing microseconds performing a lot of additional task switching due to making an algorithm that is smarter.
The 'better' assembler version
I'm not going to bother breaking this down. The comments do the job.
;The register saves are taken care of by the timer inturupt code.
.SysTaskTick
LDR R0, PriCount ;Get PriCount and,
MVN R1, R0 ;Negate into R1
ADD R0, R0, #1 ;Increment PriCount
AND R0, R0, #&0F ;Stay in range.
AND R1, R1, R0 ;Leave only bit that went 0 to one.
STR R0, PriCount ;Update mem copy.
CLZ R0, R1
RSBS R0, R0, #32 ;Calc bit num of 0 to 1 bit.
STRNE R0, CurPri
MOVEQ R15,R14 ;Nothing to do if answer 0.
SUB R0,R0,#1 ;If still here, dec bit num.
BL SysGetTsk ;Get the next tsk at priority.
TST R0, #&FF ;Valid tsk hndl range 1 to 255.
STRNE R0, CurTsk
BLNE SysSwitchTo ;If valid task, switch to it.
MOV R15,R14
I've pasted in some code to get the task switcher running, and added some reporting on what is going on. The SysGetTask() function simply returns the value of the input priority level, that is to say priority zero returns task 0, priority 1 returns task 1, and so on.
Looks to me like both methods are equivalent. But, you know, as I look at this I spot a potential flaw.
Since I'm simply returning the value of priority as the task handle, it appears that there are no high priority (0) tasks because SysGetTsk returns zero.
I've quoted a lot more, as this is interesting. In tick 23, we run the tick routine and look for a priority three task (lowest priority). We find one, and we switch to it.
Then, the next time through as tick 24, we cycle back to looking for a priority zero (highest) priority task. Since we don't have one, we drop out.
Coming in again at tick 25, we're now looking for a priority one task, but the lowest priority task is still running, having effectively been given two timeslices.
What would be more logical is that if a priority level should ever not have any tasks, drop down to the next priority level, so that there's always a task switch happening.
Alternatively, detect that there was no valid task, and hand over to an idle process for power saving (though this option protentially suboptimal, such as in the case of no highest priority tasks).
Lessons here?
Pretty easy. The first is that task switching is hard. One can make a very simplistic priority-based task switcher with ease, but when one starts to consider race conditions, resource blocking, sleeping, and simply not having a task to switch to... it suddenly gets a lot more complicated. Entire chapters of books have been written in this subject.
This task switcher will do the job, and it's fairly simple in implementation. However it's rather eccentric to claim that assembler saves a few instructions over C, when the task switcher itself has some obvious inefficiencies. The purpose of a task switcher, from a linear batch based system to an RTOS, is to find something for the processor to do. Now while falling through to continue running a task in the absence of anything to switch to is a valid "something to do", it does rather make a mess of the concept of priority levels, so it's arguably not the correct "something to do".
The second lesson is... don't depend upon clever code or side effects. It is far preferable to have code that can be understood than code that relies upon tricks (like the "if" nonsense).
And, finally, don't use assembler if you want any hope of your code being available to other machines. I mean, where's the DOS version? Oh, I'm sorry, you wrote it in assembler, so there isn't one.
Meanwhile, the C version...
Ramming home the Don't Use Assembler point by dropping an anvil from the Exosphere.
Your comments:
Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.
J.G.Harston, 16th January 2022, 23:25
I started writing the basics of an ARM64 assembler in a collaberation to get the basic outline, er, outlined. It's such a different processor under the hood that I can't really think of it as an ARM. It's like going from the 6809 to the 68000 and thinking it's "just a bigger 6800". It's not.
J.G.Harston, 16th January 2022, 23:37
RE: RArthur: I taught myself the Visual Basic IDE by writing a simple RISC OS-y desktop for Windows 3 back in 1993. Unfortunately, there's a head crash in the middle of the disk it's on.
druck, 17th January 2022, 10:10
I got in to assembler very early as I only had an Acorn Electron which was less than half the speed of the BBC Micro in high res modes, so I had to use 6502 to run at the same speed as BASIC on the Beeb. It code was small and fun back then, and I naturally took up 26 bit ARM assembler on RISC OS to make 8MHz processors fly.
But as time as gone on projects have got bigger and bigger and I've become more and more high level. These days I wouldn't even advocate writing in plain C, I don't know how I ever managed to get on with handling strings and fixed length buffers, it's horrible. I need at least C++ with a proper string class to function.
I'm almost exclusively Python these days (with a little C# and C++ if performance requires), because the more the language does, the less I have to worry about, and the bigger the project can grow.
If you can write massively distributed video conferencing systems in Python (ok, with a bit of C GStreamer for the actual audio and video processing), there isn't a lot it can't do.
Rick, 17th January 2022, 20:42
I was keeping this largely RISC OS centric. Because, come on, we can't sensibly play video, never mind such fantasies as video conferencing!
Strings in C aren't bad since proper internationalisation is such a mess that one can still think of "twenty characters will need twenty one bytes". Maybe someday it'll be sensible to switch to UTF-8. I won't hold my breath.
David Pilling, 20th January 2022, 02:15
Assembler made sense in 1987 - fitting RISC OS into ROMs, the need for efficiency, and the ARM was designed to be nice for assembler programmers. It was also fair to trumpet the virtues of software written in assembler, like Impression. But assembler is not the way now, and probably not for the last 25 years. The people writing UNIX knew all this pre 1987. If Acorn had been serious and if they'd remained in business when would they have re-coded everything in C. I'll be interested to hear if 64 bit RISC OS is going to happen or not - by the sound of it, it would be able to run on any processor. A bit like the RISC OS emulators. Being an optimist as programmers are, I can't see why converting RISC OS to C is such a big deal.
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.