Tips and Tricks

Count down, rather than up

When you count from zero to n, you will need code like:

  MOV    R1, #0
.loop
  ...do something...
  ADD    R1, R1, #1
  CMP    R1, #255
  BNE    loop

That is brain-dead code to do something 255 times. You could replace that code with:

  MOV    R1, #255
.loop
  ...do something...
  SUBS   R1, R1, #1
  BNE    loop

What, no comparison - but a conditional?
Yup, that's right. The SUB operation has the S suffix, so it affects the flags. An EQ condition is when the Z bit is set, and a NE condition is when the Z bit is unset. And the Z bit being set means....zero!.
So this loop looks after itself, really. When the countdown ends, the result is zero, the Z bit is set, the NE condition is no longer true, so the branching ceases.

Don't abuse the stack

If you are programming APCS code, you will know that the APCS rules state that only registers a0 to a4 (R0-R3) are corruptible. That doesn't mean you should start your code with a

  STMFD  R13!, {v1-v6, sl, fp, ip, sp, lr}

unless you actually need to store those registers. If you can work your code in the first four registers with no branching, you don't need to store anything.
I was unable to find a reference to which registers should be saved in BASIC, so my code has erred on the generous side. I would suggest that R0-R7 are corruptible, the remainder should be preserved. But again, if you only use R0 to R3 (or so), you don't need to preserve anything.

Say you use R0, R1 and R2, and you do a few BLs. Should you be pushing the link register to the stack?

  LDMFD  R13!, {R14}
  ...doing stuff...
  STMFD  R13!, {PC}

The code, above, is terribly wasteful. Not only do you use a multiple load/store instruction to preserve one register, but you waste stack space.
Even under APCS, this might be better:

  MOV    R3, R14
  ...doing stuff...
  MOV    PC, R3

The S suffix causes the PSR to be restored.

An optimisation, when it is required to store R14 (and only R14) to the stack is to do something like:

  STR    R14, [R13, #-4]!
  ...do stuff...
  LDR    PC, [R13], #4

If you also need to restore the state of the PSR on a 26bit system, you would instead do:

  STR    R14, [R13, #-4]!
  ...do stuff...
  LDR    R14, [R13], #4
  MOVS   PC, R14

Large numbers

There are two ways to load a register with a large number (say, &FFFF00FF).

1. Synthesise it

  MOV    R0, #&FF000000
  ADD    R0, R0, #&00FF0000
  ADD    R0, R0, #&000000FF

2. Load it

  ADR    R1, big_word
  LDR    R0, [R1]
  ...

.big_word
  EQUD   &FFFF00FF

I prefer to use the Load method, as it tends to make the code clearer, especially when generating large numbers involves trickery with BIC, EOR, and TEQ.

Loading may save program space - it depends upon how you generate your large number - but causes the processor to jump around to load the word.
Execution speed depends upon the processor. For example, it has been reported that a load is faster on a StrongARM, where an ARM 6 gets more speed out of generating the large value from three instructions.

Multiple IFs

IF x% = 4 OR x% = 1 THEN ...

can be implemented as something like:

  [we assume x% has been loaded into R0]
  CMP    R0, #4
  CMPNE  R0, #1
  BEQ    ...the code to call when x% = 4 OR 1
  ...the ELSE code

Branch tables

You can implement code such as:

  CASE something% OF
    WHEN  0 : PROCzero
    WHEN  1 : PROCone
    WHEN  2 : PROCtwo
  OTHERWISE : PROCinvalid
  ENDCASE

in assembler by code such as (assuming something% is in R0):

  CMP    R0, #2              ; The immediate value is the range
  LDRLS  PC, [PC, R0, LSL#8]  ; Program Counter set to the first EQUD
  B      invalid              ; We come here if R0 > 2
  EQUD   zero
  EQUD   one
  EQUD   two

Or an alternative method:

.entry
  CMP    R0, #((endoftable - table) / 4)
  ADDCC  PC, PC, R0, LSL#2
  B      invalid

.table
  B      zero
  B      one
  B      two
.endoftable

My personal favourite method is the latter, but both will have the desired effect.

Return to assembler index