32 bit operation


A lot of this information is taken from the ARM assembler manual. I didn't have a 32 bit processor at the time, so trusted the documentation...
As it happens, the documentation erroneously stated that UMUL and UMLA could only be performed in 32bit mode. Well, that is incorrect, if your processor can do it (ie: StrongARM), it will work in 32bit OR 26bit...


The ARM2 and ARM3 have a 32 bit data bus and a 26 bit address bus. On later versions of the ARM, both the data bus and the address bus are a full 32 bits wide.
This explains how a "32 bit processor" can be referred to as 26 bit. The data width and instruction/word size is 32 bit, and always has been, but the address bus is only 24 bit.
Oh, whoops, I said 26 bit, didn't I?
:-) Well, as PC is always word aligned, the lower two bits will always be zero in an address, so on the ARM2/ARM3 processor these bits hold the processor mode setting. The width of PC is, effectively, 26 bit even though only 24 bits are actually used.

This is no a problem on the older machines. 4Mb memory was the norm. Some people upgraded to 8Mb, and 16Mb was the theoretical limit.
However a RiscPC with a 26 bit program counter would not have been possible, as 26 bits only allows you to address %11111111111111111111111100 (or 67108860 bytes, or 64Mb). The RiscPC allows for 258Mb of memory to be installed.
This, incidentally, explains the 28Mb size limit for application tasks; the system is expected to be compatible with the older RISC OS API.

The majority of the assembler site has been written regarding 26 bit mode of operation, which is compatible with the versions of RISC OS currently available (ie, RISC OS 2 to RISC OS 4); though some parts cover 32 bit modes (one example briefly runs in SVC32!), and I have noted parts of the examples that are 32 bit unfriendly.

Those with a RiscPC, Mico, RiscStation, A7000 etc have the ability to run a fully 32 bit operating system; indeed ARMLinux is such an operating system. RISC OS is not, because RISC OS needs, for the moment, to remain compatible with existing versions. It is the old dichotomy. It is wonderful to have a nice shiny new fully 32 bit version of RISC OS, but not so good when you realise a lot of your must-have software won't so much as load!
RISC OS isn't totally 26 bit. Some of the handlers need to work in 32 bit mode; however it is limited by money (ie, who's going to pay for RISC OS to be fully converted; and who's going to pay for new development tools to rebuild their code (PD software is strong on RISC OS)) and also by necessity (ie, lots of people use Impression but CC is no longer with us; it is quite likely Impression won't work on an updated RISC OS, so people will not see a necessity to upgrade if their desired software won't work).


Why is this even an issue?
Newer ARM processors will not support 26 bit operation. Several hybrids were made (ARM6, ARM7, StrongARM), but time has come to draw the line. You can either add the complexity of a 26/32 bit system, or you can go 32 bit only and have a simpler, smaller processor.
Either we go with the flow, or get left behind... So really, this is an issue, and we don't have a choice.


32 bit architecture

The ARM architecture changed significantly with the introduction of the ARM6 series. Below, we shall describe the differences in behaviour between 26 bit and 32 bit operation.

In the ARM 6, the program counter was extended to a full 32 bits. As a result:

A further change was the addition of extra privileged processor modes, allowed by the PSR now having a full 32 bits to use. These modes are used to handle Undefined instruction and Abort exceptions. Consequently:


When configured for a 32 bit program and data space, the ARM6 and ARM7 series support ten overlapping processor modes of operation:

When in a 26 bit processor mode, the programmer's model reverts to that of earlier 26 bit ARM processors. The behaviour is the same as that of the ARM2aS macrocell with the following alterations: In all other respects, when operating in a 26 bit mode the ARM behaves as like a 26 bit ARM. The relevant bits of the CPSR appear to be incorporated back into R15 to form the PC/PSR with the I and F bits in bits 27 and 26. The instruction set behaves like that of the ARM2aS macrocell, with the addition of the MRS and MSR instructions.


The registers available on the ARM 6 (and later) in 32 bit mode are:

User26   SVC26    IRQ26    FIQ26      User     SVC      IRQ      ABT      UND      FIQ

R0 ----- R0 ----- R0 ----- R0 --   -- R0 ----- R0 ----- R0 ----- R0 ----- R0 ----- R1
R1 ----- R1 ----- R1 ----- R1 --   -- R1 ----- R1 ----- R1 ----- R1 ----- R1 ----- R2
R2 ----- R2 ----- R2 ----- R2 --   -- R2 ----- R2 ----- R2 ----- R2 ----- R2 ----- R2
R3 ----- R3 ----- R3 ----- R3 --   -- R3 ----- R3 ----- R3 ----- R3 ----- R3 ----- R3
R4 ----- R4 ----- R4 ----- R4 --   -- R4 ----- R4 ----- R4 ----- R4 ----- R4 ----- R4
R5 ----- R5 ----- R5 ----- R5 --   -- R5 ----- R5 ----- R5 ----- R5 ----- R5 ----- R5
R6 ----- R6 ----- R6 ----- R6 --   -- R6 ----- R6 ----- R6 ----- R6 ----- R6 ----- R6
R7 ----- R7 ----- R7 ----- R7 --   -- R7 ----- R7 ----- R7 ----- R7 ----- R7 ----- R7
R8 ----- R8 ----- R8       R8_fiq     R8 ----- R8 ----- R8 ----- R8 ----- R8       R8_fiq
R9 ----- R9 ----- R9       R9_fiq     R9 ----- R9 ----- R9 ----- R9 ----- R9       R9_fiq
R10 ---- R10 ---- R10      R10_fiq    R10 ---- R10 ---- R10 ---- R10 ---- R10      R10_fiq
R11 ---- R11 ---- R11      R11_fiq    R11 ---- R11 ---- R11 ---- R11 ---- R11      R11_fiq
R12 ---- R12 ---- R12      R12_fiq    R12 ---- R12 ---- R12 ---- R12 ---- R12      R12_fiq
R13      R13_svc  R13_irq  R13_fiq    R13      R13_svc  R13_irq  R13_abt  R13_und  R13_fiq
R14      R14_svc  R14_irq  R14_fiq    R14      R14_svc  R14_irq  R14_abt  R14_und  R14_fiq
--------- R15 (PC / PSR) ---------    --------------------- R15 (PC) ---------------------
                                      ----------------------- CPSR -----------------------
                                               SPSR_svc SPSR_irq SPSR_abt SPSR_und SPSR_fiq
In short, the 32 bit differences are:


The CPSR and SPSR registers

The allocation of the bits within the CPSR (and the SPSR registers to which it is saved) is:
  31 30 29 28  ---   7   6   -   4   3   2   1   0
  N  Z  C  V         I   F       M4  M3  M2  M1  M0

                                 0   0   0   0   0     User26 mode
                                 0   0   0   0   1     FIQ26 mode
                                 0   0   0   1   0     IRQ26 mode
                                 0   0   0   1   1     SVC26 mode
                                 1   0   0   0   0     User mode
                                 1   0   0   0   1     FIQ mode
                                 1   0   0   1   0     IRQ mode
                                 1   0   0   1   1     SVC mode
                                 1   0   1   1   1     ABT mode
                                 1   1   0   1   1     UND mode
Please refer to the (26 bit) PSR for information on the N, Z, C, V flags and the I and F interrupt flags.


So what does it mean in practice?

Most ARM code will work correctly. The only things that will not work are any operations which fiddle with R15 to set the processor status. Unfortunately, this isn't as easy to fix as it seems.
I examined a 9K program (a MODE 7 teletext frame viewer, written in C) for potential problems, basically looking for: About 64 instructions fell into one of these categories.

There is likely to be few ways to make the conversion process automatic. Basically...


It is NOT easy. Such a small change, but with such far-reaching consequences.


In comp.sys.acorn.programmer, Stewart Brodie answered my query with a hint that may be useful to people intending to work with 32 bit code:

> How is it possible, if 32 bit code uses MSR/MRS to transfer status and
> register, and older ARMs don't have those instructions?
> Are we into "black magic" code for this?

You take advantage of the fact that the encodings for MSR and MRS act as NOPs
on ARM2 and ARM3 ;-)  With some careful arrangement, you can write fairly
tight code.

To refer back to earlier postings, an example of when MOVS pc, lr in a
32-bit mode is useful (entered in SVC or IRQ mode, IRQs disabled):

        ADR     r14, CallBackRegs
        TEQ     PC,PC
        LDREQ   r0, [r14, #16*4]    ; The CPSR
        MSREQ   SPSR_cxsf, r0       ; put into SPSR_svc/SPSR_irq ready for MOVS
        LDMIA   r14, {r0-r14}^      ; Restore user registers
        LDR     r14, [r14, #15*4]   ; The pc
        MOVS    pc, r14             ; Back we go (32-bit safe - SPSR set up)

(CallBackRegs contains user mode registers: R0-R15, plus the CPSR if in a
32-bit mode)



Download a 32 bit code scanner (12K)



Where is the example?

In the logical place, in the document describing the processor status register...


What about old stuff for which we don't have sources?

There are two options...

The first option is a one-time conversion. We can use an intelligent disassembler (such as D.Ruck's !ARMalyser to provide us with a source of the software, with the 32bit unsafe parts identified. I used this method to cobble together a 32bit version of one of my modules.
For fairly short things, this will be okay. For large projects... I shudder to think! One thing to be especially aware of is that some older software uses tricks like popping flags into 'unused' bits of addresses. A good example here is software that uses bits 0-27 as an address and bits 28-31 as flags...

1 << 28 = 268435456
What this means, in essence, is that the software will work fine on all older machines - including the majority of RiscPCs for which 256Mb was the limit of installable memory.
If, though, we run this on a 512Mb Iyonix (which is no longer out of the realms of possibility), as soon as it is loaded to an address over 256Mb ... bit 28 will be set!
The code will need to be examined to ensure such things don't occur, and if they do, it'll need to be worked around.
As far as I'm aware, which APCS-R requires flags to be saved, I've yet to see my C compiler generate code that depends upon the saving of flags across function calls. The typical example is:
Note that the N, Z, C and V flags from lr at the instant of entry must be reinstated; it is not sufficient merely to preserve the PSR across the call. Consider, a function ProcA which tail continues to ProcB as follows:
        CMPS   a1, #0
        MOVLT  a2, #255
        MOVGE  a2, #0
        B      ProcB
If ProcB merely preserves the flags it sees on entry, rather than restoring those from lr, the wrong flags may be set when ProcB returns direct to ProcA's caller.
While it has not been my experience that the C compiler generates such code, humans can. And much worse. This, too, must be taken into account. And all those ORRing values into R14 to directly twiddle the processor flags (on return)...

The other method is to make a new computer. All we need to to load up a few old modules, poke our application at 'troublesome' points, force everything to be in an area of memory that we may consider is 'safe'. Then we let our program loose with the same sort of critical care that you'd attend to a hungry cat in a room full of budgies... This, more or less, is what Aemulor does.
But, at a cost.

From the "Inside Aemulor" article on the Foundation RISC User (issue 11; January 2003) CD-ROM, we encounter a very important point:

From RISC OS's perspective, the Aemulor RMA is a normal dynamic area, but Aemulor remaps the memory at an address below 64Mb so that it becomes addressable within the 26-bit environment. Because this emulated RMA is visible to all applications, native 32-bit applications are also restricted to a maximum size of 28Mb each (as per RISC OS 4) whilst Aemulor is running. It is hoped that this limitation can be removed with a later version.

Or, as they say: There's no such thing as a free lunch.

Having said that, the use of Aemulor is essential for all those must-have programs that either cannot sensibly be modernised, or are unlikely to be modernised.
I have heard that somebody is 32bitting Impression Publisher. Well, you know, I heard once that somebody was porting Mozilla to RISC OS. Who knows, maybe I'm wrong... :-)



What API changes have there been?

The "Technical information on 26/320bit RISC OS binary interfaces" (v0.2) states:
Many existing APIs do not actually require flag preservation, such as service call entries. In this case, simply changing MOVS PC... to MOV PC... and LDM {}^ to LDM {} is sufficient to achieve 32-bit compatibility.
This is possibly worse than useless as it doesn't specify exactly which APIs need it and which don't. Is it safe to assume that everything not otherwise described is safe?

The best thing to do is get hold of that document and browse through it. Please do not simply 'assume' that things will work if you simply don't save flags.
Generally, this is the case, but unless you have a RISC OS 3.10 machine to test it on...



Return to assembler index
Copyright © 2004 Richard Murray