Tutorial 03 - Branches

From ARMwiki
Jump to: navigation, search



In this tutorial, we shall look at the concept of branching and how to branch in your program.

A branch is a fundamental part of decision making. For us, it could be "if it is cold, make a cup of tea (else just watch TV)". For the computer, if this then do that or more complicated if so then do this otherwise do that, and so on.

GOTO considered harmful?

Use and abuse of GOTO was a terrible thing; and some exceedingly poor implementations of BASIC on home computers back in the Eighties offered little more, leading to code that was a horrendous mess that would make a plate of tagliatelle seem to resemble the program's flowchart. This has put a stain on the GOTO instruction, where it is now viewed with fear and distrust, like it will eat random variables when you aren't looking. Some (people who probably shouldn't be) teachers would disqualify your homework unconditionally if you dared to use a GOTO (true story!).

You should not use GOTO in modern programming (BBC BASIC, C, VisualBasic (except some error traps), etc) because you have all manner of functions and control logic at your disposal. WHILE loops as different from FOR loops as different to REPEAT loops, not to mention nested multiline IFs and SELECT/CASE/SWITCH decision-making structures. In a high level language, the only reason to use a GOTO is because of limitations in the language itself; and those tend to imply that it is time to find a better language.

As you might have noticed, assembler is very much not a high level language. The processor, whatever flavour it is, does not shirk away from banging new addresses into the program counter in order to jump from location to location. In effect, lame teachers and frightened students run in horror from GOTO; the processor sees nothing else.

This is not to say that jumping around the place isn't without its difficulties. Modern processors have long pipelines. The processor does not fetch an instruction, execute it, and do something with the result. What usually happens is that while the "current" instruction is being executed, the next one is being decoded and the one after that is being fetched. Some modern processors have freakishly long and complicated pipelines, but they all suffer from the same inherent problem. When a branch is encountered, the pipeline is useless. The data needs to be thrown away, and the pipeline restarted at the new address.

There are two ways of dealing with this situation. The most typical way is something called branch prediction where the processor will attempt to identify what is going to happen, including following branches. You might find this an interesting read, although it is x86-centric: http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/ The second way to approach the problem, and The Way Of The ARM is to use conditional execution. It costs little to load a few instructions to then treat them as NOPs. It is certainly quicker and less hassle than branches.

The moral here, guys, is that the ARM's conditional execution is a lovely thing; so make lots of use of it!

A world without conditional execution

The ARM's conditional execution is something of a marvel. You see, there are many conditions which may result in a branch being taken. The most basic form of handling this is to check the condition, and branch if the condition matches. Two instructions to take a branch? Yuck! So processor manufacturers built the checking into the branch instruction...

...which leads to an unholy assortment of branch operations. The 6502 offers us: BCC, BCS (carry clear/set); BEQ, BNE (zero set/clear); BMI, BPL (negative set/clear); BVC, BVS (overflow clear/set). And that's just a straight branch.

But it gets better. x86 (IA-32) offers: JO, JNO (overflow/not); JS, JNS (sign/not); JE or JZ (equal); JNE or JNZ (not equal); JB or JNAE or JC (below, not-above-or-equal, carry); JBE or JNA (below-or-equal, not-above); JA or JNBE (above, not-below-or-equal); JL or JNGE (less, not-greater-or-equal); JGE or JNL (greater-or-equal, not-less); JLE or JNG (less-or-equal, not-greater); JG or JNLE (greater, not-less-or-equal); JP or JPE (parity, parity-even); JNP or JPO (not-parity, parity-odd); JCXZ or JECXZ (CX or ECX is zero) ... and this is just from the 80386 instruction set! And yes, it would seem that the same branch may go under two or three different names, as if there wasn't already enough to remember!

The ARM? The ARM has B. Nothing more is necessary as the conditional execution, when applied to the instruction, can do everything necessary.


A useful trick is the subroutine. Consider, you are sitting at your computer reading this when the doorbell rings. You glance over and see it's a cute Brownie selling cookies. You get up, answer the door, buy some cookies, come back, carry on reading.

Congratulations. You have just used subroutines.

Your main activity was reading. Upon sensing a noise, you entered the hearing subroutine to recognise the noise as the doorbell. This triggered the identify subroutine which dragged your eyes from these incredibly exciting words to look at the door, plus a bunch of other unimportant subroutines involved in identifying who it was. Proto-girl-guide, cookies. You called the tummy subroutine to see if you have a cookie shaped hole that needs filled. Finally, after making the decision to purchase cookies, you get up and go to the door which involves dozens and dozens of subroutines for the act of walking and such. I won't bore you with details.

Never before has buying cookies seemed like such a complicated task. Fortunately computers are very good at doing mindless work repetitively. Look at all of the dots on your screen, try to imagine how many there are. Each dot has a story. Each single pixel in each letter of this line is there thanks to the result of tens of thousands of millions of decisions, and since you got this from my Wiki, a fair few of those millions of decisions took place in machines scattered around the planet. It's pretty epic when you think about it like that.

But I've still not explained the difference between a jump and a subroutine - so I'll get right do it. A jump (or a regular branch) is a branch in the normal flow of events. A subroutine (or a branch-with-link in ARM parlance) is a branch to do something else to which you eventually expect to return from.

Here is a pseudocode example:

 is buffer empty?
 if yes, call subroutine fill_buffer
 read a character from the buffer
 is it zero?
 if yes, branch to start
 print character to display
 branch to start
 read some data from hardware device
 shove it into a conveniently placed buffer

The flow if the buffer is empty is as follows:

 1. is buffer empty?
 2. if yes, call subroutine fill_buffer
 3. read some data from hardware device
 4. shove it into a conveniently placed buffer
 5. return
 6. read a character from the buffer
 7. is it zero?
 8. print character to display
 9. branch to start
10. is buffer empty?
11. and so on

If the buffer is not empty, then the flow would look like this:

 1. is buffer empty?
 2. read a character from the buffer
 3. is it zero?
 4. print character to display
 5. branch to start
 6. is buffer empty?
 7. and so on


This is a regular straight branch. When the ARM encounters a B, the processor will jump immediately to the specified address and execution will resume from there.

B(ranch with)L(ink)

This is a subroutine. When the ARM encounters a BL, the processor will place the return address into R14 and then jump to the specified address and resume execution.

Unlike the 6502 (JSR then RTS), there is no specific instruction to return from a subroutine. Since R14 was set to the return address (and you'll be familiar with this already, as the OS has a return address in R14 by default; as does BASIC if you are using the BASIC inline assembler; as does anything that might call your code), all that is required of you is to just push R14 back into PC. Normally this will do it:

 MOV     PC, R14

There _is a caveat, however. One R14 will overwrite another. So either stash your return addresses in registers (short program) or on the stack (the normal way!). In this case, something like this would return:

 LDMFD   R13!, {PC}  ; assuming only R14 is stacked

Putting this into practice

Here is a simple little program that shows the two different sorts of branching in practice. When you press a key, it is echoed back to you, with the exception of control codes and high-ASCII which appear as a '.'. When you're bored of that, press ESC to quit.

 DIM code% 1024
 FOR loop% = 0 TO 2 STEP 2
   P% = code%
   [ OPT    loop%
     MOV    R12, R14    ; preserve return address in R12
     SWI    "XOS_ReadC" ; read a keypress
     MOVCS  PC, R12     ; abort if ESC pressed
     BL     print_char  ; print character
     B      start       ; loop
     CMP    R0, #32     ; strip out control codes
     MOVLT  R0, #46     ; replace with '.'
     CMP    R0, #127    ; and high-ASCII
     MOVGE  R0, #46     ; ditto with the dottage
     SWI    "OS_WriteC"
     MOV    PC, R14
 PRINT "Characters typed will be echoed, press Esc to exit:"
 CALL code%

We saw here a taste of conditional execution, so that'll be our topic for the next tutorial!

Personal tools