Example 9
Rewriting lowercase()

 

 

It is imperative that you read this document alongside this example... Thank you.

Introduction

Converting a string to lowercase is a useful function, that will be required in many programs - things such as script parsing or interpreting...

In C, the traditional way is to create a routine along the lines of:

void lowercase(char string[])
{
   int  i = 0;

   while ( string[i] )
   {
      string[i] = tolower(string[i]);
      i++;
   }

   return;
}
That isn't bad, but if you do a lot of work with strings that need converting to lower case, you can obtain benefits by converting a repetitive task, such as the lower case conversion, to assembler.

As far as I'm aware, only three dialects of RISC OS exist. English, German, and Welsh. That aside, there is no excuse these days for not making your lowercase routine deal with foreign languages. It is simple. Very simple.
Like this:

; R0 = Pointer to string (set on entry)
; R1 = Byte read from string
; R2 = Pointer to lowercase table
lowercase
        STMFD    sp!, {v6, lr}

        MOV      R1, R0            ; Preserve string pointer
        SWI      &43040            ; "Territory_Number"
        SWI      &43057            ; "Territory_LowerCaseTable"
        MOV      R2, R0            ; Set lowercase table pointer
        MOV      R0, R1            ; Restore string pointer

lowercase_loop
        LDRB     R1, [R0]          ; Load character from R0
        CMP      R1, #0            ; Is it a null byte?
        [ {CONFIG} = 26}
        LDMEQEA  sp!, {v6, pc}^    ; Return if null (end of string)
        |
        LDMEQEA  sp!, {v6, pc}     ; (32bit return)
        ]
        LDRB     R1, [R2, R1]      ; Convert to indexed lowercase character
        STRB     R1, [R0], #1      ; Store character, increment offset pointer
        B        lowercase_loop    ; Mulberry bushes

 

 

Putting it together

The C test program (c.runit):


#include <stdio.h>
#include <ctype.h>
#include "kernel.h"
#include "swis.h"


extern void lowercase(char *);
void o_lowercase(char []);

int main(void)
{
   char s[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+¤ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+¤ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+¤ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+={[}]|\\:;\"\'<,>.?/ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890~`!@#$%^&*()_-+¤";

   int  loop = 0;
   int  start = 0;
   int  end = 0;
   int  diff = 0;
   _kernel_swi_regs r;

   printf("Testing old (C) method...\n");
   _kernel_swi(OS_ReadMonotonicTime, &r, &r);
   start = r.r[0];
   for (loop = 0; loop < 1000; loop++)
      o_lowercase(s);
   _kernel_swi(OS_ReadMonotonicTime, &r, &r);
   end = r.r[0];
   diff = end - start;
   printf("Time taken for 1000 iterations was %d centiseconds.\n\n", diff);

   printf("Testing new (assembler) method...\n");
   _kernel_swi(OS_ReadMonotonicTime, &r, &r);
   start = r.r[0];
   for (loop = 0; loop < 1000; loop++)
      lowercase(s);
   _kernel_swi(OS_ReadMonotonicTime, &r, &r);
   end = r.r[0];
   diff = end - start;
   printf("Time taken for 1000 iterations was %d centiseconds.\n\n", diff);

   return 0;
}




void o_lowercase(char string[])
{
   int  i = 0;
   while ( string[i] )
   {
      string[i] = tolower(string[i]);
      i++;
   }
   return;
}

 

The assembler code (s.lower):


; lowercase() in assembler...
;
;
; This code is written for objasm 2.00 or later.
;
; Other assemblers may need code modifications, or just remove
; the macro definition and references to it.


        AREA |C$$code|, CODE, READONLY



        ; This macro sets up the APCS backtrace 'name'
        MACRO
        HEAD     $name
        =        $name, 0
        ALIGN
        &        &FF000000 :OR: (:LEN: $name + 4 :AND: -4)
        MEND



; APCS registers
a1      RN      0
R0      RN      0
a2      RN      1
R1      RN      1
a3      RN      2
R2      RN      2
a4      RN      3
R3      RN      3
v1      RN      4
R4      RN      4
v2      RN      5
v3      RN      6
v4      RN      7
v5      RN      8
v6      RN      9
sl      RN      10
fp      RN      11
ip      RN      12
sp      RN      13
lr      RN      14
pc      RN      15




        EXPORT  |lowercase|

; R0 = Pointer to string (set on entry)
; R1 = Byte read from string
; R2 = Pointer to lowercase table
        HEAD     ("lowercase")
|lowercase|
        MOV      ip, sp
        STMFD    sp!, {a1, fp, ip, lr, pc}
        SUB      fp, ip, #4

        STMFD    sp!, {v6}

        MOV      R1, R0            ; Preserve string pointer
        SWI      &43040            ; "Territory_Number"
        SWI      &43057            ; "Territory_LowerCaseTable"
        MOV      R2, R0            ; Set lowercase table pointer
        MOV      R0, R1            ; Restore string pointer

lowercase_loop
        LDRB     R1, [R0]          ; Load character from R0
        CMP      R1, #0            ; Is it a null byte?
        [ {CONFIG} = 26}
        LDMEQEA  fp, {fp, sp, pc}^ ; Return if null (end of string)
        |
        LDMEQEA  fp, {fp, sp, pc}
        ]
        LDRB     R1, [R2, R1]      ; Convert to indexed lowercase character
        STRB     R1, [R0], #1      ; Store character, increment offset pointer
        B        lowercase_loop    ; Mulberry bushes

        END

 

The MakeFile:


# Project: lowertest 

.SUFFIXES:   .c .s .o
CCflags      = -c -depend !depend -I,C: -throwback -fa
Linkflags    = -aif -o $@
ObjAsmflags  = -depend !depend -Stamp -quit -CloseExec

c_files      = o.runit
asm_files    = o.lower
libraries    = C:o.stubs

@.lowertest: $(c_files) $(asm_files)
             link $(linkflags) $(asm_files) $(c_files) $(libraries)

.c.o:;       cc $(ccflags) $< -o $@
.s.o:;       objasm $(objasmflags) -from $< -to $@

 

 

In comparison

Worth it.
Okay, the 1K string 1000 times probably isn't particularly real life, but the test does show the benefits of just a little bit of assembler.
I'm sure it could be optimised, so that the lowercase table is read and remembered, then simply referenced each time around.

Anyway, now for the results of the international jury...

*lowertest
Testing old (C) method...
Time taken for 1000 iterations was 949 centiseconds.

Testing new (assembler) method...
Time taken for 1000 iterations was 284 centiseconds.
That was measured on my A3000, so it will run quicker on later machines. However, it is clear to see, the C method takes three times as long.

 

IMPORTANT!

This isn't the entire story. Please read don't be over zealous for more...

 

 

IMPORTANT! (December 2003)

This story continues...

 

 


Return to assembler index
Copyright © 2004 Richard Murray