Special facilities
available from BASIC

 

We all know that BASIC uses...

But there is more. Much more, some of which I didn't know existed for nearly a decade (such are the disadvantages of self-taught programming!).

When you CALL or USR an assembly language segment, BASIC sets the registers as follows:
R0 A%
R1 B%
R2 C%
R3 D%
R4 E%
R5 F%
R6 G%
R7 H%
R8 Pointer to BASIC's workspace (ARGP)
R9 Pointer to list of l-values for the parameters
R10 Number of parameters
R11 Pointer to BASIC's string accumulator (STRACC)
R12 BASIC's LINE pointer (points to current statement)
R13 Pointer to BASIC's stack (full, descending - as RISC OS uses)
R14 Return address, and environment information pointer
Of those, two hide a lot of detail...

 

 

R9 (and R10)

R9 points to a list that provides information of each variable passed as a parameter to CALL.
This is, for example:
  CALL mycode, pointer%, string$, real
and is different to just setting A%-H%.
Parameters to CALL are not often used, as they are not as easy to implement as setting registers.

For each variable given, two words (word aligned) are used. The first is known as the l-value. In English, this is the address where the value of this variable is stored. The second word is a descriptor.
The list is in reverse order, the last variable passed to CALL is the first given in the list. In this way, R9 always points to the last entry in the list. The pointer is always valid, even when the variables counter (R10) is zero.

The possible variable types are:
Type BASIC l-value points to in English...
&00 ?factor byte-aligned byte Pointer to actual byte
&04 !factor
integer%
integer%(n)
byte-aligned word
word-aligned word
word-aligned word
Pointer to four-byte integer
(may not be word-aligned)
&05 |factor
real
real(n)
byte-aligned FP value (5 bytes) Pointer to five-byte floating point value
&08 word-aligned FP value (8 bytes) Pointer to 8-byte floating point value
(BASIC VI)
&80 string$
string$(n)
byte-aligned SIB (5 bytes) Pointer to string information block
&81 $factor byte-aligned byte-string (CR terminated) Pointer to string
&100 + &04 integer%() word-aligned array pointer Pointer to word-aligned word. If array unallocated, or LOCAL but not yet DIMed, this word is less than 16. Otherwise, this word points to the array structure.
&100 + &05
&100 + &08
real()
&100 + &80 string$()

The String Information Block is comprised of four bytes comprising the address of the string, followed by a byte giving the length of the string.

The word array structure (for types &100 + <something>) is a word aligned list of integer subscript sizes (the values in the DIM, plus one) terminated by a zero word, followed by a word which contains the total number of elements in the array, followed by the entries in the array.

Here is an example of passing a string to CALL, and printing it in assembler. Notice how we need to use LDRB to load the string pointer, as the address is byte aligned.

REM >stringprnt
REM
REM String passing (via CALL) demonstration
REM
REM By Richard Murray
REM Downloaded from http://www.heyrick.co.uk/assembler/
REM

codesize% = 180 : REM The code is 180 bytes
DIM code% codesize%
PROCbuild_code

INPUT "Please enter some text: "my_string$

PRINT '"Entering assembler..."
CALL print_string, my_string$
PRINT "...returned from assembler"'

END

:

DEFPROCbuild_code
  FOR loop% = 8 TO 10 STEP 2
    P% = code%
    L% = code% + codesize%
    [ OPT    loop%

      \ Note... This is coded for CLARITY, not speed!

    .print_string
      CMP    R10, #1              ; Check one parameter was given
      BNE    wrong_parameters

      LDR    R0, [R9]             ; Load pointer
      LDR    R1, [R9, #4]         ; Load type

      CMP    R1, #&80             ; Is it a string?
      BNE    wrong_var_type

      \ Word giving string pointer may not be word aligned, so we
      \ cannot use an LDR as results for non-aligned addresses
      \ are unpredictable...
      LDRB   R1, [R0], #1
      LDRB   R2, [R0], #1
      ADD    R1, R1, R2, LSL#8
      LDRB   R2, [R0], #1
      ADD    R1, R1, R2, LSL#16
      LDRB   R2, [R0], #1
      ADD    R1, R1, R2, LSL#24

      LDRB   R2, [R0], #4         ; Length

      CMP    R2, #0               ; Check length
      BEQ    exit

      \ Now, R1 is string pointer and R2 is string length.
    .loop
      LDRB   R0, [R1], #1
      SWI    "OS_WriteC"
      SUBS   R2, R2, #1           ; SUBS sets Z bit when = 0
      BNE    loop

    .exit
      SWI    "OS_NewLine"         ; So a blank string prints just that...
      MOV    PC, R14

    .wrong_parameters
      SWI    "OS_WriteS"
      EQUS   "Incorrect number of parameters"+CHR$0
      ALIGN
      B      passed_to_call

    .wrong_var_type
      SWI    "OS_WriteS"
      EQUS   "Wrong variable type"+CHR$0
      ALIGN
      ; B      passed_to_call      ; Not required, will fall through

    .passed_to_call
      \ rudimentary optimisation! <g>
      SWI    "OS_WriteS"
      EQUS   " passed to call"+CHR$13+CHR$10+CHR$0
      ALIGN
      MOV    PC, R14
    ]
  NEXT
ENDPROC
Download this example

 

 

R14

Traditionally, R14 is the Link Register to return from your code to BASIC.

Following that are a list of words which are offsets from the ARGP (in R8).
For example, you might want to know the current value of PAGE without tying up a register to pass that value your code. PAGE is available at offset &08, so your code would be something like:
  LDR R0, [R14, #8]
  LDR R0, [R8, R0]

Offset Name Meaning
&00 RETURN Return address to BASIC
&04 STRACC String accumulator (256 bytes long)
&08 PAGE The current value of PAGE
&0C TOP The current value of TOP
&10 LOMEM The current start of variable storage
&14 HIMEM The current stack end
&18 MEMLIMIT Limit of available memory
&1C FSA Free space start (end of variables/stack limit)
&20 TALLY Value of COUNT
&24 TRACEF TRACE FILE handle, or 0 if no file being TRACEd to
Four words follow the trace file handle, that may be of use to you:
+ 4  LOCALARLIST - a pointer to the list of local arrays
+ 8  INSTALLLIST - a pointer to the list of installed libraries
+12  LIBRARYLIST - a pointer to the list of transient libraries
+16  OVERPTR     - a pointer to the overlay structure
 
Libraries are stored as a word which is the pointer to the next library, or 0 to end. This word is followed by the BASIC program which is the library.
 
Before OVERLAY has been executed, OVERPTR is zero. Afterwards, OVERPTR contains a pointer to the following structure:
OVERPTR+&00 Pointer to base of OVERLAY array (such as 'lib$(0)')
OVERPTR+&04 Index of current OVERLAY file (or -1 if none loaded)
OVERPTR+&08 Total allowed size of OVERLAY area
OVERPTR+&0C Start of current OVERLAY file in memory
&28 ESCWORD Exception flag word (contains escflg, trcflg)
&2C WIDTHLOC The value of WIDTH - 1
 
Then come branches to internal BASIC routines...
 
&30 VARIND Get value of l-value
 
On entry:
 R0  = Address to load variable from
 R9  = Type of variable (as in CALL parameter block)
 R12 = LINE
 
Returns with R0 - R3 as the value (or F0 in BASIC VI), R9 the type.
R9 =         0 - String; STRACC is start, R2 is end, [R2]-STRACC is the length
R9 = &40000000 - Interger; in R0
R9 = &80000000 - Float; in R0...R3
Registers preserved.
&34 STOREA Store a variable, optionally converting between formats
 
On entry:
 R0...R3 = Value (or F0 if float in BASIC VI)
 R4  = Address to store at
 R5  = Type of variable (as in CALL parameter block)
 R8  = ARGP
 R9  = Type of value
 R12 = LINE
 R13 = Stack pointer
 
Returns with R0 - R7 corrupted.
&38 STSTORE Store a string into a string variable
 
On entry:
 R2  = Length (address of byte beyond the last one)
 R3  = Address of start of string
 R4  = Address of l-value (ie, where to store it)
 R8  = ARGP
 R9  = Type of value
 R12 = LINE
 R13 = Stack pointer
 
Corrupts R0, R1, R5, R6 and R7.
String must start on a word boundary, and length must be 255 or less.
&3C LVBLNK Looks up a variable by name
 
On entry:
 R8  = ARGP
 R11 = Pointer to start of name
 R12 = LINE
 R13 = Stack pointer
 
May use the stack. Uses all registers.
 
If variable (more precisely, l-value) was found, returns with:
Z flag = 0, R0 = address of l-value, and R9 = Type of l-value
 
If not found, returns with:
Z flag = 1
C flag = 1 if no way string could be a variable (such as "%value")
C flag = 0 if could be a variable, but no such variable exists at present
 
If not found, and could be a variable, the registers are set up ready for a call to CREATE.
 
BASIC's documentation does not explicitly state, but by prefixing a name with the token for PROC (&F2) or FN (&A4), it may be possible to look up function/procedure names.
&40 CREATE Create a new variable. Input is failure of LVBLNK to find something.
 
It is recommended that you only call CREATE after a failed LVBLNK, with code such as:
  STMFD   R13!, {R14}
  BL      LVBLNK      ; look up name
  LDMNEFD R13!, {PC}  ; return if found
  LDMCSFD R13!, {PC}  ; return if invalid name
  BL      CREATE      ; create new variable
  LDMFD   R13!, {PC}

 
Returns same result as LVBLNK when l-value found.
Uses all registers.
&44 EXPR Evaluates an expression pointed to by R11.
 
On entry:
 R8  = ARGP
 R11 = Pointer to start of string
 R12 = LINE
 R13 = Stack pointer
 
EXPR stops after reading one expression (like those in a PRINT statement).
 
Returns with R0 - R3 as the value (or F0 in BASIC VI), R9 the type.
R9  =         0 - String; STRACC is start, R2 is end, [R2]-STRACC is the length
R9  = &40000000 - Interger; in R0
R9  = &80000000 - Float; in R0...R3
R10 = First character of the expression
R11 = Pointer to next character after R10.  
Additionally:
Z set means expression was a string, else expression was a number
If Z clear, then N set means expression was a floating point number, else expression was an integer.
 
A useful thing about EXPR is that it can call BASIC functions. You do this as you would in an EVAL statement, by evaluating a string containing the name of a user-defined function. For example "FNget_next_directory_entry". This allows you to call routines which perform a task that would be tedious in assembler - such as input a floating point number from the user.
 
Unfortunately, there is a complication. The string to be evaluated should be tokenised. So you can either call MATCH, or (possibly preferably) store the string pre-tokenised. The token for FN is &A4.
&48 MATCH Takes a text string and tokenises it to another string.
 
On entry:
 R1  = Points to source string (ASCII 10 or 13 terminated)
 R2  = Points to destination string
 R3  = MODE
 R4  = CONSTA
 R13 = Stack pointer
 
MODE is 0 for left mode, which is for a statement at the start of a line, or before an equals; and 1 for right-mode, in an expression.
This is important, consider the following:
  var = TIME
  TIME = var

It's the same word - TIME - but there are two different tokens for TIME, one for reading time and one for writing it.
 
CONSTA is 0 if you do not want BASIC to convert numbers which could be line numbers (0 to 65279) to internal format; and 1 if you do.
Internal format is the token &8D followed by three bytes containing the encoded line number. The advantage of the encoded numbering is the bytes lie in the range 64-127, so do not contain any tokens or control codes. These tokens are used after GOTO, GOSUB, RESTORE, THEN and ELSE. They are fixed length, so the program can be RENUMBERed without shuffling lines around.
 
Both MODE and CONSTA may be updated during the use of this function. For example, PRINT will change MODE to 1 to read an expression.
 
Corrupts R0-R5.
On exit, R1 and R2 are left pointing one byte beyond the terminating control CR codes of the strings.
Additionally, R5 contains status information. Typically, values larger than &1000 imply mismatched brackets; and ( (R5 AND 255)=1 ) means mismatched quotes.
&4C TOKENADDR This converts a token value to a pointer to the text string that represents it.
 
On entry:
 R0  = Token value (ie, &A4 for FN)
 R12 = Pointer to next byte of token
 
Returns in R1 a pointer to the first character of the string, terminated by a value &7F or greater. R0 is updated to point to the base of the token table.
 
The value of R12 is only used when matching a two-byte token.
No other registers are used or required.
 
If you are using BASIC V, additional floating point operations are available. R0...R3 contain an expanded floating point value, and R9 points to a packed floating point value (as accessed with the | operator).
 
&54 9 This is a word giving the number of additional routines that are available.
&58 FSTA Store a four-word FP value into a five-byte variable.
 
On entry:
 R0...R3 = Source FP value
 R9      = Pointer destination value
 
On exit, R2 may be altered, but this doesn't affect the FP value.
&5C FLDA Load a five-byte variable into a four-word FP value.
 
On entry:
 R9      = Pointer source value
 
On exit, R0...R3 contain the loaded value.
&60 FADD Add the four-word FP value in R0...R3 by the variable pointed to by R9.
Notically: (R0...R3) + [R9]  
On entry:
 R0...R3 = Source FP value
 R9      = Pointer five-byte value
 
On exit, R0...R3 is the result, and R4...R7 are corrupted.
Overflow errors are possible.
&64 FSUB Subtract R0...R3 from value pointed to by R9.
Notically: [R9] - (R0...R3)  
On entry:
 R0...R3 = FP value
 R9      = Pointer five-byte value
 
On exit, R0...R3 is the result, and R4...R7 are corrupted.
Overflow errors possible.
&68 FMULL Multiply the four-word FP value in R0...R3 by the variable pointed to by R9.
Notically: (R0...R3) * [R9]  
On entry:
 R0...R3 = Source FP value
 R9      = Pointer five-byte value
 
On exit, R0...R3 is the result, and R4...R7 are corrupted.
Overflow errors possible.
&6C FDIV Divide the the variable pointed to by R9 by the four-word FP value in R0...R3.
Notically: [R9] / (R0...R3)  
On entry:
 R0...R3 = Source FP value
 R9      = Pointer five-byte value
 
On exit, R0...R3 is the result, and R4...R7 are corrupted.
Overflow errors and divide by zero are possible.
&70 FLOAT Convert an integer to a four-word floating point value.  
On entry:
 R9      = Integer
 
On exit, R0...R3 is the floated version, and R9 is &80000000 (float type code).
&74 FIX Convert a four-word FP value into an integer.
 
On entry:
 R0...R3 = Floating point value
 
On exit, R0 is the fixed version (rounded towards zero), and R9 is &40000000 (integer type code).
&78 FSQRT Take the square root of the floating point number in R0...R3.  
On entry:
 R0...R3 = Floating point value
 
On exit, R0...R3 is the result, and R4...R7 are corrupted.
Negative root error possible.

The floating point values in R0...R3 are given as follows:

R0 = 32 bit mantissa, normalised (so bit 31 = 1)
R1 = Exponent in excess-128 form
R2 = Undefined
R3 = Sign, 0 is positive and &80000000 is negative
This is informational only, and the developers reserve the right to change the format. You are asked to treat R0...R3 as a single item, without worrying about the constituent parts.

 

Here is an example program which will list all of the tokens recognised by BASIC. It is completely written in assembler, so could be saved as a utility. Note, however, that it must be loaded and executed from within BASIC as the extended environment is only available from BASIC.

Note also, that passing invalid token values replies with junk. You can see this for yourself if you alter the secnd token set to end at a number higher than 183.

REM >listtokens
REM
REM Lists the tokens recognised by BASIC

DIM code% 396

FOR pass = 8 TO 10 STEP 2
  P% = code%
  L% = code% + 396
  [ OPT    pass

  .begin
    STMFD  R13!, {R14}
    MOV    R5, R14           ; for token print routine

    ADR    R0, starttitle
    SWI    "OS_Write0"

    BL     firstset
    SWI    "OS_NewLine"
    BL     secondset

    ADR    R0, endtitle
    SWI    "OS_Write0"

    LDMFD  R13!, {PC}

  .token
    \ This prints a token using BASIC's internal routine.
    \ Call with R0 set to the token number.

    STMFD  R13!, {R14}
    CMP    R0, #255
    ADRHI  R12, tokenbuffer
    SUBHI  R0, R0, #256
    STRHI  R0, tokenbuffer
    MOVHI  R0, #200
    ADR    R14, back
    ADD    PC, R5, #&4C
  .back
  .tokenloop
    LDRB   R0, [R1], #1
    CMP    R0, #&7F
    SWICC  "OS_WriteC"
    BCC    tokenloop
    LDMFD  R13!, {PC}

  .tokenbuffer
    EQUD   0

  .firstset
    \ These are the first tokens, 127 to 255 (but not 200)

    STMFD  R13!, {R14}
    MOV    R10, #127
  .firstloop
    SWI    "OS_WriteS"
    EQUS   "Token     "+CHR$0
    ALIGN
    MOV    R0, R10
    BL     print_number
    SWI    "OS_WriteS"
    EQUS   " is "+CHR$0

    MOV    R0, R10
    CMP    R0, #200
    BLEQ   special
    BLNE   token
    SWI    "OS_NewLine"

    ADD    R10, R10, #1
    CMP    R10, #256
    BLT    firstloop
    LDMFD  R13!, {PC}

  .special
    SWI    "OS_WriteS"
    EQUS   "extension token"+CHR$0
    MOV    PC, R14

  .secondset
    \ These are the first tokens, 127 to 183

    STMFD  R13!, {R14}
    MOV    R10, #127
  .secondloop
    SWI    "OS_WriteS"
    EQUS   "Token 200+"+CHR$0
    ALIGN
    MOV    R0, R10
    BL     print_number
    SWI    "OS_WriteS"
    EQUS   " is "+CHR$0

    MOV    R0, R10
    ADD    R0, R0, #256
    BL     token
    SWI    "OS_NewLine"

    ADD    R10, R10, #1
    CMP    R10, #184
    BLT    secondloop
    LDMFD  R13!, {PC}

  .print_number
    ADR    R1, number_buffer
    MOV    R2, #8
    SWI    "OS_BinaryToDecimal"
    ADR    R0, number_buffer
    SWI    "OS_Write0"
    MOV    PC, R14

  .number_buffer
    EQUD   0
    EQUD   0

  .starttitle
    EQUS   "BASIC: tokens and their keywords"+CHR$13+CHR$10
    EQUS   "--------------------------------"+CHR$13+CHR$10+CHR$13+CHR$10+CHR$0
    ALIGN

  .endtitle
    EQUS   CHR$13+CHR$10+"Finished."+CHR$13+CHR$10+CHR$13+CHR$10+CHR$0
    ALIGN
  ]
NEXT

CALL begin

END
Download this example

 


Return to assembler index
Copyright © 2004 Richard Murray