Writing assembler in BASIC

Introduction

BASIC is demeaned by many as being a rather useless language. Okay, it has the obvious limitation in that it is an interpreted language (so each line must be 'compiled' on the fly), and it has no structure typing. Also, it is a boon for sit-and-type programmers; the sort that Pascal fanatics have nightmares about (then again, Real Programmers have nightmares about Pascal fanatics).

Going a stage further, I read a document - if I remember correctly it was a 'White Paper' within a "Mastering Visual Basic" course that slagged off BASICs for having PEEK and POKE and how system-unsafe those things are. The idiot that wrote the White Paper obviously has no clue about the heritage of BASIC or why it has such operations (I won't talk about this here, I could go on for many screenfuls!).
Anyway, people often malign BASIC as being a 'silly' language, a mere 'toy'...

But BASIC, for all its seemingly apparent limitations, is very powerful. It is an absolute doddle to drop into assembler, and back to BASIC. You can, with cunning, intermix BASIC and assembler and even call BASIC from assembler called from BASIC. Uh, confused yet?
Actually, it is nothing that you can't do from C, with the right tools. But, unlike C, you don't need a "development environment" costing hundreds of pounds, nor one that is huge to download. In fact, all you need is provided right there in your computer. BASIC, in ROM, and !Edit, also in ROM (or maybe on an Apps disc).

BASIC offers you quite a powerful assembler. It is notably missing floating point and ADRL, however Darren Salt has created a useful module that provides these facilities.
Follow this link to see Darren Salt's programs, including a vastly extended debugger module.

Maybe you've asked BASIC for a little bit of help on it's assembler.
BASIC provides detailed help on it's commands, COLOUR for example...

>HELP COLOUR
COLOUR a [TINT t]: set text foreground colour [and tint] (background 128+a).
COLOUR a,p: set palette entry for logical colour a to physical colour p.
COLOUR r,g,b: set colour to r, g, b.
COLOUR a,r,g,b: set palette entry for a to r, g, b physical colour.

So it is a bit of a disappointment that the assembler is described correctly, but in a very terse (and barely readable) form...

>HELP [
Assembly language is contained in [] and assembled at P%. Labels follow '.'.
Syntax:
SWI[<cond>] <expr>
ADC|ADD|AND|BIC|EOR|ORR|RSB|RSC|SBC|SUB[<cond>][S] <reg>,<reg>,<shift>
MOV|MVN[<cond>][S] <reg>,<shift>
CMN|CMP|TEQ|TST[<cond>][S|P] <reg>,<shift>
MLA[<cond>][S] <reg>,<reg>,<reg>,<reg>
MUL[<cond>][S] <reg>,<reg>,<reg>
LDR|STR[<cond>][B] <reg>, '[ <reg>[,<shift>] '] [,<shift>][!]
LDM|STM[<cond>]DA|DB|EA|ED|FA|FD|IA|IB <reg>[!],{<reg list>}[^]
B[L][<cond>] <label>
OPT|=|DCB|EQUB|DCW|EQUW|DCD|EQUD|EQUS <expr>
ADR[<cond>] <reg>,<label>
ALIGN
where <shift>=<reg>|#<expr>|<reg>,ASL|LSL|LSR|ASR|ROR <reg>|#<expr>|RRX
and <cond>=AL|CC|CS|EQ|GE|GT|HI|HS|LE|LS|LT|LO|MI|NE|NV|PL|VC|VS
and <reg>=R0 to 15 or PC or <expr>

But don't worry. This assembler area describes the ARM assembler, and this document tells you how to get going.
Here, we discuss the basics. Other documents expand upon these themes.

The first step

The first step is to reserve yourself a chunk of memory. All code must be assembled somewhere. You can choose to poke around in memory, or claim a chunk of RMA and assemble into that. Such things are regarded as hacky. The good way to program is to DIM a block of memory and assemble your (position independent?) code into it.

So to reserve ourselves some memory, we use DIM.

  DIM code% 4096

You should reserve enough memory to hold your code. If you overshoot, the assembler will not stop unless you are using range checking. Without range checking, bizarre things may happen.
If you start getting errors that don't make sense, try doubling your memory allocation.

A useful trick, at the end of your assembly, is to use:

  PRINT P% - code%

This prints the result of the current value of the next instruction to be assembled pointer with the beginning of code memory subtracted from it. The result is the amount of space used by your code.
Such a command helps you to optimise your use of memory. You don't need to claim 4096 bytes if your code only uses 124 bytes!

This leads us to the FOR...NEXT loop.
Typically, assembler must be assembled twice. The first time, you want to whizz blindly through and ignore all the errors. This may sound weird, but it will make a note of the labels in your program, and where they are. So the second time around you can assemble and all the labels referred to will be known. Et voila!

The document opt.html describes all the available options, the values that should be used in your loop.
Typically, you will see...

FOR loop% = 0 TO 2 STEP 2
The de facto way to assemble code, using two-pass assembly.
FOR loop% = 0 TO 3 STEP 3
The standard two-pass assembly, but outputs a listing. Useful for debugging, but otherwise it just slows down the assembly.
FOR loop% = 4 TO 6 STEP 2 or FOR loop% = 4 TO 7 STEP 3
Offset two-pass assembly (without and with listing). The code is assembled in your memory block, to be executed at some other location. This is usually used for things like modules. Refer to example four to see how this is used.

We are going to use 8 and 10.

  FOR loop% = 8 TO 10 STEP 2

This is just like the usual two-pass assembly, but it also uses range checking. It only needs one more bit of code to use, and the benefits are tremendous. You are assured that your code fits into the allocated space. If you should overshoot, you will receive a message such as:

  Assembler limit reached at line 100

Following your FOR loop, you need to tell the assembler where to put your code. This is done by setting P%.

    P% = code%

Do NOT forget this, the results of forgetting it can range from amusing to tragic, depending on what P% pointed to. It is worth noting that judicious use of P% can allow you to patch other applications while they are running. But I'm guessing only a geek like me would bother to try to do such a thing!

The next step is to set up your range check. To do this, you set L% to the end of your allocated memory.

    L% = code% + 4096

It is useful if you set your allocation amount (the value 4096) to a constant variable, then you only need to change the one thing.

With your locations set up, you can enter the assembler. This is done using the left square bracket.

    [ OPT     loop%

Following your square bracket, you need to set the OPTion, as shown.
OPT is not an opcode understood by the processor. It is simply a pseudo-opcode designed to tell the assembler how to behave. It is not provided in APCS assemblers.

Okay.
You are in the assembler. Set up, ready to roll.
Let's stick in some code.

      ; example code
           ADR     R0, message
           SWI     "OS_PrettyPrint"
           MOV     PC, R14

         .message
           EQUS    "Wheeee! Isn't this fun?"
           EQUB    13
           EQUB    0
           ALIGN

Finally, we need to close our assembler code and end the loop.

    ]
  NEXT

What comes next depends upon what you plan to do with the code. You can CALL it, USR it, save it to disc, or simply leave it to CALL/USR it at a later time in your program.
Here, we shall CALL it.

Your completed program should look like this.

  DIM code% 40
  FOR loop% = 8 TO 10 STEP 2
    P% = code%
    L% = code% + 40
    [ OPT     loop%

      ; example code
      ADR     R0, message
      SWI     "OS_PrettyPrint"
      MOV     PC, R14

    .message
      EQUS    "Wheeee! Isn't this fun?"
      EQUB    13
      EQUB    0
      ALIGN
    ]
  NEXT

  IF INKEY(-1) THEN PRINT P%-code%

  CALL code%

Immediately you should see two changes. The most obvious is the addition of the INKEY line. This will report the size of the code if you hold down Shift when you run the program. Because of this I was able to reduce the memory required to 40 bytes instead of four thousand.

Let's look at some other things.
Did you notice the comment? You can use ; or \ to start comments, though it is convention to use the semi-colon. Anything following the semi-colon is a comment, until a colon is reached.
Note - this is different to BASIC

The following will not work:

  ; this is a comment :-)

Because the following will work:

  MOV R0, R1, ASL #4 ; load R1<<4 into R0, now store it : STR R0, [R2]

That example isn't good code, you shouldn't have code following comments unless it is VERY clear what is going on. But it will work. The assembler treats the colon like BASIC treats a newline, and it'll keep on assembling.

Also notice the ALIGN. The string is 25 bytes ( Wheeee! Isn't this fun? plus newline plus terminator). You cannot follow this with anything in a word-aligned system until it is word-aligned. The ALIGN command will skip forward until P% is word-aligned. If you do not take care to word-align your strings, you will need to use ALIGN after them. It is recommended to use ALIGN anyway instead of lots of EQUB 0's because hardwiring the string lengths removes flexibility. But, it's your preference. I used to hardwire as the ALIGN built into BASIC does not null the padding bytes, making your executable untidy. Such things bother me, but hey, I've already said I'm a geek! [Darren Salt's ExtBASasm uses NULL padding bytes]

A quick note - don't be tempted to save space by using:

   ...
 .label_one   = "label one" + CHR$(0)
 .label_two   = "label two" + CHR$(0)
 .label_three = "label three" + CHR$(0)
   ALIGN
   ...

This will work, however you should be aware that in order to access the non-word-aligned data, an ADRL is likely to be required to push the offset. Don't worry if this doesn't make much sense - just stick to ALIGNing after strings...

Well. It might not be much, but it is your first simple assembler program. There is a hell of a lot more that you can do - ChangeFSI is testament to BASIC and assembler working together. This is only a start.

Variations on a theme

If you have no forward references, you can dispose of the FOR loop.

DIM code% 16
P% = code%
[ OPT   2
  MOV   R0, #0
  CMP   PC, PC
  MOVEQ R0, #&FF
  MOV   PC, R14
]
PRINT USR(code%)

This snippet of code is a processor-independent way of determining if the system is in 26bit mode or 32bit mode. Read this if you are interested in the difference.
In 26bit mode, when R15 is the first operand, only the PC (Program Counter) part is available. The PSR (Processor Status Register) is stripped.
As the second operand, all 32bits are available. Thus, on a 26bit system the comparison fails.
In 32bit mode, R15 is the PC and only the PC. There is no PSR. This, the comparison passes and R0 is updated to be 255.

The 32bit test above has 'issues'. A better version would be:

  ...
  TEQ   PC, #0
  TEQ   PC, PC
  ; is EQ if 32bit, NE if 26bit
  ...

Your homework is to figure out why this one works better.

Return to assembler index