APCS introduction(ARM Procedure Call Standard) |
The APCS defines:
The APCS is not a single given standard, but is a collection of standards which are similar but
differ in certain situations. For example, APCS-R (used on 26 bit versions of RISC OS) says that
flags set on function entry should be reset on function exit. Under the 32 bit definition, it is
not always possible to know the entry flags (there is no USR_CPSR) so you do not need to restore
them. As you may expect, there is no compatibility between the versions. Code which expects the
flags to be restored is likely to misbehave if they are not restored...
The newest versions of SharedCLibrary (v5.43 etc) can recognise and work with APCS-R (26 bit,
restores flags), and APCS-3/32 (32 bit, doesn't restore flags). So long as the version of APCS
is the same within your entire application, it can differ between applications on older
machines.
For an example, David Pilling's OvationPro and my OvHTML are both compiled to be
26/32 neutral using the newer APCS specification. Meanwhile, Edit v1.54 sits on the
iconbar. It is compiled to APCS-R.
The situation is slightly different on the Iyonix. There, APCS-R programs simply won't
work (at all, no way José) because the processor does not support the required things. The
way we get older programs working on this system is to set up a 'fake' environment that looks a
lot like the older machines, patch the application so that certain things are done instead
of the flag restoring and mucking with R14 to set/clear flags (but, to the application, it looks
like it really happened), and then run the program in this safe environment. The software that
does this is called Aemulor.
If you are developing an ARM based system (from scratch), then there is no requirement to
implement APCS. It is recommended, as it is not difficult to implement, and it allows for a
variety of benefits.
But, the here and now. APCS must be used if you are writing assembler code to hook into
compiled C. The compiler will expect certain conditions, and these must be met in your
add-in code. A good example is APCS defines that a1 to a4 may be corrupted, but v1 to v6 must
be preserved.
By now, I'm sure you are scratching your head and saying 'a-what? v-what?'. So here is the APCS-R
register definition...
Register names | ||
Reg # |
APCS |
Meaning |
R0 |
a1 |
Working registers |
R1 |
a2 |
" |
R2 |
a3 |
" |
R3 |
a4 |
" |
R4 |
v1 |
Must be preserved |
R5 |
v2 |
" |
R6 |
v3 |
" |
R7 |
v4 |
" |
R8 |
v5 |
" |
R9 |
v6 |
" |
R10 |
sl |
Stack Limit |
R11 |
fp |
Frame Pointer |
R12 |
ip |
|
R13 |
sp |
Stack Pointer |
R14 |
lr |
Link Register |
R15 |
pc |
Program Counter |
These names are not defined by standard in Acorn's objasm (version 2.00), though later
versions of objasm, and other assemblers (such as Nick Roberts' ASM) define them for
you. Some assemblers may use a different command, refer to your documentation.
To define a register name, you typically use the RN
directive, at the very start of
your program:
a1 RN 0 a2 RN 1 a3 RN 2 ...etc... r13 RN 13 sp RN 13 r14 RN 14 lr RN r14 pc RN 15
sp
will always point to the lowest used address in the most recent frame. This fits in with
the tradition of a fully descending stack.sl
refers to a stack limit, below which you cannot
decrement sp
.There may be multiple stack chunks. These may be located at any address in memory, there is no convention here. This, typically, would be used to provide multiple stacks for the same code which is executing in a re-entrant manner; an anology here is FileCore which provides its services to the currently available FileCore filing systems (ADFS, RAMFS, IDEFS, SCSIFS, etc) by simply setting up 'state' information and calling the same pieces of code as is required.
fp
(frame pointer) should be zero, or it should point to the last in a
list of stack backtrace structures which will provide a means of 'unwinding' the program to
trace backwards through the functions called.
The structure is:
save code pointer [fp] fp points here return link value [fp, #-4] return sp value [fp, #-8] return fp value [fp, #-12] points to next structure [saved v7] [saved v6] [saved v5] [saved v4] [saved v3] [saved v2] [saved v1] [saved a4] [saved a3] [saved a2] [saved a1] [saved f7] three words [saved f6] three words [saved f5] three words [saved f4] three wordsThe structure contains between four and twenty-seven words, those in square brackets being optional values. The only thing that can be said is that if they do exist, then they exist in the given order (ie, saved f4 will be lower in memory than saved a3, but a2-f5 might not exist).
The fp register points to the stack backtrace structure for the currently executing function. The return fp value should be zero, or a pointer to a stack backtrace structure created by the function which called the current function. The return fp value in this structure is a pointer to the stack backtrace structure for the function that called the function that called the current function; and so on back until the first function.
The return link value, return sp value, and return fp value are reloaded into pc, sp, and fp when the function exits.
#include <stdio.h> void one(void); void two(void); void zero(void); int main(void) { one(); return 0; } void one(void) { zero(); two(); return; } void two(void) { printf("main...one...two\n"); return; } void zero(void) { return; }
At the point of printing a message on the screen, our example APCS backtrace structure would be: fp ----> two_structure return link return sp return fp ----> one_structure ... return link return sp return fp ----> main_structure ... return link return sp return fp ----> 0 ...Therefore, we can examine fp and see the structure for function 'two', which would point to the structure for function 'one', which would point to the structure for 'main', which points to zero to end. In this way, we can wind our way backward through the program and determine how we came to be at our current crash point.
It is also worth pointing out that an APCS structure as the above is unlikely ever to be
generated for the code given. The reason for this is that functions which do not call any other
functions do not require full APCS headers.
For your perusal, this is the code generated by Norcroft C v4.00 for the above code...
AREA |C$$code|, CODE, READONLY IMPORT |__main| |x$codeseg| B |__main| DCB &6d,&61,&69,&6e DCB &00,&00,&00,&00 DCD &ff000008 IMPORT |x$stack_overflow| EXPORT one EXPORT main main MOV ip, sp STMFD sp!, {fp,ip,lr,pc} SUB fp, ip, #4 CMPS sp, sl BLLT |x$stack_overflow| BL one MOV a1, #0 LDMEA fp, {fp,sp,pc}^ DCB &6f,&6e,&65,&00 DCD &ff000004 EXPORT zero EXPORT two one MOV ip, sp STMFD sp!, {fp,ip,lr,pc} SUB fp, ip, #4 CMPS sp, sl BLLT |x$stack_overflow| BL zero LDMEA fp, {fp,sp,lr} B two IMPORT |_printf| two ADD a1, pc, #L000060-.-8 B |_printf| L000060 DCB &6d,&61,&69,&6e DCB &2e,&2e,&2e,&6f DCB &6e,&65,&2e,&2e DCB &2e,&74,&77,&6f DCB &0a,&00,&00,&00 zero MOVS pc, lr AREA |C$$data| |x$dataseg| END
The save code pointer points to a location twelve bytes beyond the start of the code which set up that backtrace structure. You can see this in the example. Remember, you will need to strip off the PSR for 26-bit code.
So now we turn to our function, 'two'. As soon as execution enters 'two':
APCS-A
This is APCS-Arthur; and was defined in the dark days of Arthur. You may come across it
(unlikely, though), or references to it, so it is worth knowing it exists. It has been deprecated
and due to differing register definitions (that seem somehow alien to a seasoned RISC OS coder),
it should not be used.
It was for Arthur applications running in USR mode.
sl = R13, fp = R10, ip = R11, sp = R12, lr = R14, pc = R15.
The PRM (p4-411) says "Use of r12
as sp
, rather than the
architecturally more natural r13
, is historical and predates both Arthur and RISC
OS."
The stack is segmented and is extended on demand.
26-bit program counter.
No passing of floating point arguments in FP registers.
Non-reentrant.
Flags must be restored.
APCS-R
This is APCS-RISC OS. It is for (old) RISC OS applications operating in USR mode; or
modules/handlers in SVC mode.
sl = R10, fp = R11, ip = R12, sp = R13, lr = R14, pc = R15.
This is the single most common APCS version, as all (older) compiled C programs will have used
APCS-R.
Explicit stack limit checking
26-bit program counter.
No passing of floating point arguments in FP registers.
Non-reentrant.
Flags must be restored.
It is worth noting that I have seen 'cc' version 5 (26 bit) generate code to put an FP
value into an FP register - even though APCS-R says that FP values are not passed in FP
registers, so I'm not sure exactly which context caused this to occur.
APCS-U
This is APCS-Unix, used in Acorn's RISCiX. It is for RISCiX applications (USR mode) or the kernel
(SVC mode).
sl = R10, fp = R11, ip = R12, sp = R13, lr = R14, pc = R15.
Implicit stack limit checking (with sl)
26-bit program counter.
No passing of floating point arguments in FP registers.
Non-reentrant.
Flags must be restored.
APCS-32
This is an extension of APCS-2 (-R and -U) which allows for a 32bit program counter, and for
flags to not be restored on exit from a function executing in USR mode.
Other things as for APCS-R.
APCS variants | |||||
APCS | PC width | Stack-limit checking | FP arguments | Reentrancy | Notes |
APCS-U (RISCiX) | 26 bits | Implicit | Not in FP registers | Non-reentrant | |
APCS-R (older RISC OS software) | 26 bits | Explicit | Not in FP registers | Non-reentrant | Flags must be restored |
26 bits | Implicit | Via FP registers | Non-reentrant | ||
26 bits | Explicit | Via FP registers | Non-reentrant | ||
26 bits | Implicit | Not in FP registers | Reentrant | ||
26 bits | Explicit | Not in FP registers | Reentrant | ||
26 bits | Implicit | Via FP registers | Reentrant | ||
26 bits | Explicit | Via FP registers | Reentrant | ||
32 bits | Implicit | Not in FP registers | Non-reentrant | ||
APCS-32 (new RISC OS software) | 32 bits | Explicit | Not in FP registers | Non-reentrant | Flags cannot be restored |
32 bits | Implicit | Via FP registers | Non-reentrant | ||
32 bits | Explicit | Via FP registers | Non-reentrant | ||
32 bits | Implicit | Not in FP registers | Reentrant | ||
32 bits | Explicit | Not in FP registers | Reentrant | ||
32 bits | Implicit | Via FP registers | Reentrant | ||
32 bits | Explicit | Via FP registers | Reentrant |
function_name_label MOV ip, sp STMFD sp!, {fp,ip,lr,pc} SUB fp, ip, #4
STMFD
command.
Your next task is to check the stack space. If you don't need much space (less than 256 bytes) then you can use:
CMPS sp, sl BLLT |__rt_stkovf_split_small| SUB sp, sp, #<size of local variables>
|x$stack_overflow|
instead.__rt_stkovf_split_small
(or
x$stack_overflow
), sp may point to a different stack chunk, so you should
access stacked arguments with offsets from fp, not offsets from sp.
Then you do your stuff...
Exiting (when no FP registers need to be restored) is performed by:
LDMEA fp, {fp,sp,pc}
LDMDB
is the same as LDMEA
- you do not use
LDMFD
to exit an APCS function)
Again, if you stacked other registers, then reload them here.
The exit mechanism was chosen because it is easier and saner to simply LDM... to exit a function
than to branch to a special function exit handler.
For APCS-R (26 bit), suffix the LDM instruction with '^
'.
An extension to the protocol, used in backtracing, is to embed the function name into the
code.
Immediately before the function (and the MOV ip, sp
), you should have the following:
DCD &FF0000xx
So, your complete stack backtrace code (<256 bytes of stack required) would look like:
DCB "my_function_name", 0, 0, 0, 0 DCD &FF000010 my_function_name MOV ip, sp STMFD sp!, {fp, ip, lr, pc} SUB fp, ip, #4 CMPS sp, sl ; this may be omitted if you BLLT |__rt_stkovf_split_small| ; won't be using stack... SUB sp, sp, #<size of local variables> ...process... LDMEA fp, {fp, sp, pc} ; <-- append '^' for APCS-R
If you use no stack, and you don't need to save any registers, and you don't call anything, then
setting up an APCS block is unnecessary (but might be useful to track down problems during the
debug stage).
In this case, you could:
my_simple_function ...process... MOV pc, lrUse
MOVS pc, lr
in APCS-R.
One thing to consider is the case when we require more than 256 bytes. In this case, our code is:
; create the stack backtrace structure MOV ip, sp STMFD sp!, {fp, ip, lr, pc} SUB fp, ip, #4 SUB ip, sp, #<maximum frame size> CMPS ip, sl BLLT |__rt_skkovf_split_big| SUB sp, sp, #<initial frame size< ...process... LDMEA fp, {fp, sp, pc} ; <-- append '^' for APCS-R
To finish up, we'll look at an example function, and the code that is generated.
void c_lowercase(char string[]) { int i = 0; while ( string[i] ) { string[i] = tolower(string[i]); i++; } return; }
= "c_lowercase", 0 DCD &FF00000 cc_lowercase MOV ip,sp STMDB sp!,{a1,v1,v2,fp,ip,lr,pc} SUB fp,ip,#4 CMP sp,sl BLLT __rt_stkovf_split_small MOV v1,a1 MOV v2,#0 LDRB a1,[a1,#0] CMP a1,#0 LDMEQDB fp,{v1,v2,fp,sp,pc} |L0002cc.J4.c_lowercase| LDRB a1,[v1,v2] BL tolower STRB a1,[v1,v2] ADD v2,v2,#1 LDRB a1,[v1,v2] CMP a1,#0 BNE |L0002cc.J4.c_lowercase| LDMDB fp,{v1,v2,fp,sp,pc}
UMULL
and
MRS
, the same code should work across the entire range of RISC OS machines,
save those poor sods who are still using RISC OS 2!
Many existing APIs don't actually require flags to be preserved. So in our 32bit version we can
get away by changing MOVS PC,...
to MOV PC,...
, and LDM
{...}^
to LDM {...}
, and rebuilding.
The objasm assembler (v3.00 or later) have a {CONFIG}
variable which will be
either 26
or 32
. Using this, it is possible to build macros...
my_function_name MOV ip, sp STMFD sp!, {fp, ip, lr, pc} SUB fp, ip, #4 ...process... [ {CONFIG} = 26 LDMEA fp, {fp, sp, pc}^ | LDMEA fp, {fp, sp, pc} ]
Testing for 32bit?
If you require your code to be adaptive, there is a simple test to determine the processor PC
state. From this, you can determine:
TEQ PC, #0 TEQ PC, PC ; EQ for 32bit; NE for 26bit
First case optimisation
Let's say we have a function like:
int getbytefromcache(ptr) { /* ptr is pointer to cache value 0...xxxx */ int __ptr = ptr; if (__ptr > __cachebase) { __ptr -= __cachebase; if (__ptr < __cachelimit) return (int)__cache[__ptr]; } /* flush the cache, reload wanted block, return value */ ...
getbytefromcache LDR a4, __cachebase CMP a1, a4 BLT getbytefromcache_entry SUB a2, a1, a4 LDR a4, __cachesize CMP a2, a4 LDRLTB a1, [a2] MOVLT pc, lr ; fall through if not LT getbytefromcache_entry MOV ip, sp STMFD sp!, {fp, ip, lr, pc} SUB fp, ip, #4 ... stuff ... LDRB a1, [a#] [ {CONFIG} = 26 LDMEA fp, {fp, sp, pc}^ | LDMEA fp, {fp, sp, pc} ]