Floating point
|
Parts of this documentation has been taken from the ARM Assembler manual.
Please note that this describes the floating point implementation used with RISC OS and not the VFP provided with the more recent ARM processors (ARMv5 etc).
A standard ARM floating point instruction set has been defined, so that the code may be used across all RISC OS machines. If the actual hardware does not exist, then the instructions are trapped and executed by the floating point emulator module (FPEmulator). The program does not need to know whether or not the FP co-processor is present. The only real difference will be speed of execution.
If you are interested in the co-processor aspect, read the document on co-processor access.
Note: If a real FPU is attached, this will pick up on the unrecognised instructions and
do the work itself. However in the case of systems such as the ARM7500FE, the work of the
floating point unit is shared between hardware (for instructions like MUF
(multiply)) and software (for instructions like LGN
(log. to base e)).
There is also an FPSR (floating point status register) which, similar to the ARM's own PSR, holds the status information that an application might require. Each of the flags available has a 'trap' which allows the application to enable or disable traps associated with the given error.
The FPSR also allows you to tell between different implementations of the FP system.
There may also be an FPCR (floating point control register). This holds information that the
application should not access, such as flags to turn the FP unit on and off. Typically, hardware
will have an FPCR, software will not. Do not attempt to use the FPCR - some parts of it
are read-sensitive, so your reading it will affect the rest of the system (like FPE!).
FP units can be software implementations such as the FPEmulator modules, hardware implementations
such as the FP chip (and support code), or a combination of both.
The "most original" example of a 'both' that I can think of is the Warm Silence
Software patch that will utilise the 80x87 chip on suitably equipped PC co-processor cards as a
floating point processor for ARM FP operations. Talk about resource sharing...!
The results are calculated as though it were infinite precision, then they are rounded to the
length required. The rounding may be to nearest, to +infinity(P), to -infinity(M), or to zero.
The default is rounding to nearest. If a tie, it will round to nearest even.
The working precision is 80 bits, comprising of a 64 bit mantissa, a 15 bit exponent, and a sign
bit. Specific instructions that work with single precision may provide better performance in
some implementations - notably fully-software-based ones.
The FPSR contains the necessary status for the FP system. The IEEE flags are always present, but the result flags are only available after an FP compare operation.
Floating point instructions should not be used from SVC mode.
The FPSR is laid out as follows:
31 24 23 16 15 8 7 0 System ID | Trap enable | System control | Exception flags
The defined system IDs are:
&00 Old FPE - FPE module prior to v4.00 &80 FPPC - Interface between ARM and WE32206 (AT&T MAU) &01 FPE 400 - FPE module v4.00 or later &81 FPA - ARM FPU
The Trap enable byte is:
23 22 21 20 19 18 17 16 Reserved INX UFL OFL DVZ IVOIf an exception flag bit is set following an operation, and the corresponding trap enable bit is set, then the exception trap will be taken.
The System control byte is:
15 14 13 12 11 10 9 8 Reserved AC EP SO NE ND(these bits have no meaning on Old FPE and FPPC systems)
7 6 5 4 3 2 1 0 Reserved INX UFL OFL DVZ IVOWhenever an exception condition arises, the appropriate cumulative exception flag in bits 0 to 4 will be set to 1. If the relevant trap enable bit is set, then an exception is also delivered to the user's program in a manner specific to the operating system. (Note that in the case of underflow, the state of the trap enable bit determines under which conditions the underflow flag will be set.) These flags can only be cleared by a WFS instruction.
S -
single
D -
double
E -
double extended
P -
packed decimal
EP -
extended packed decimal (if enabled)
Rounding modes are:
-
nearest (no letter required)
P -
plus infinity
M -
minus infinity
Z -
zero
Because the floating point system is a little complex, we shall quickly look at how floating
point typically operates.
As we all know, a computer can use bit patterns to represent numbers. Older machines could easily
handle between 0 and 255, or 0 and 65535. The ARM processor can easily handle between zero and
4294967295. A 64 bit processor can easily handle between 0 and 18446744073709551615.
Obviously, by using other techniques, any system can store all sorts of numbers - BBC BASIC
running on a 6502 didn't crash if you told it to count to 257, for example.
As the data widths get larger, and the numbers that can be handled in one go get larger, this
still does not help with the simplest case of PI...
But... there's a solution. An eight-bit processor can handle numbers larger than 255 by using
the simple formula ( (high_byte x 256) + low_byte ). Okay, things
are more complex, but you get the idea.
So, if we want to store PI (we'll take PI as being 3.14159265), then why don't we simply store
the number 314159265, and alongside it we'll store also a value saying that the 'real' decimal
point should shift eight places to the left.
PI = 314159265.0 [<-8] = 31415926.5 [<-7] = ... = 3.14159265In the above example, I've shown it after the first shift of the decimal point, just to give you a visual idea of how it works.
IEEE Single Precision
31 30 23 22 0 Sign | Exponent | (msb) Fraction (lsb)
IEEE Double Precision
31 30 20 19 0 First word: Sign | Exponent | (msb) Fraction (lsb) Second word: (msb) Fraction (lsb)
To a non-mathematician (such as myself!), the system does not appear to make an awful lot of sense, for example:
DIM code% 64 FOR opt% = 0 TO 2 STEP 2 P% = code% [ OPT opt% ext 1 MVFS F0, #0.5 ; or whatever other value... STFS F0, store MOV PC, R14 .store DCD 0 ; only one word - we're using single precision ] NEXT CALL code%Examining the memory location gives:
0 &00000000 exponent = 0 fraction = 0 sign = 0 0.5 &40000000 exponent = 9 fraction = 0 sign = 0 1 &3F800000 exponent = 254 fraction = 0 sign = 0 2 &08B4000D exponent = 34 fraction = 13 sign = 0 5 &40A00000 exponent = 2 fraction = 2097152 sign = 1 10 &41200000 exponent = 4 fraction = 2097152 sign = 1
Continuing, working up some code with the 'printlib' example supplied with the C/assembler development software, we arrive at:
-0.123456 is -1.2346E-1 9.99996 is 1.0000E1 -0.0999998 is -1.0000E-1 0.999997 is 1.0000E0 -0.0 is 0 9.99999E99 is 1.0000E100This apprarently does make sense - only not to me - so I cannot explain it! :-)
LDF<condition><precision><fp register>, <address>
STF<condition><precision><fp register>, <address>
Store floating point value.
The address can be in the forms:
LFM
and SFM
These are similar in idea to LDM and STM, but they will not be described because some versions
of FPEmulator do not support them. The FP module in RISC OS 3.1x (2.87) does, as do (I think)
later versions. If you know they your software will only operate on a system that supports SFM,
then use it. Otherwise you'll need to 'fake' it with a sequence of STFs. Likewise for LFM/LDF.
FLT<condition><precision><rounding> <fp register>, <register>
FLT<condition><precision><rounding> <fp register>, #<value>
Convert integer to floating point, either an ARM register or an absolute value.
FIX<condition><rounding> <register>, <fp register>
Convert floating point to integer.
WFS<condition> <register>
Write floating point status register with the contents of the ARM register specified.
RFS<condition> <register>
Read floating point status register into the ARM register specified.
WFC<condition> <register>
Write floating point control register with the contents of the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
RFC<condition> <register>
Read floating point control register into the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
Floating point co-processor data operations:
The formats of these instructions are:
The binary operations are...
ADF -
Add
DVF -
Divide
FDV -
Fast Divide - only defined to work with single precision
FML -
Fast Multiply - only defined to work with single precision
FRD -
Fast Reverse Divide - only defined to work with single precision
MUF -
Multiply
POL -
Polar Angle
POW -
Power
RDF -
Reverse Divide
RMF -
Remainder
RPW -
Reverse Power
RSF -
Reverse Subtract
SUF -
Subtract
The unary operations are...
ABS -
Absolute Value
ACS -
Arc Cosine
ASN -
Arc Sine
ATN -
Arc Tangent
COS -
Cosine
EXP -
Exponent
LOG -
Logarithm to base 10
LGN -
Logarithm to base e
MVF -
Move
MNF -
Move Negated
NRM -
Normalise
RND -
Round to integral value
SIN -
Sine
SQT -
Square Root
TAN -
Tangent
URD -
Unnormalised Round
CMF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with FP register 1.
The varient CMFE compares with exception.
CNF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with the negative of FP register 1.
The varient CMFE compares with exception.
Compares are provided with and without the exception that could arise if the numbers are unordered (ie one or both of them is not-a-number). To comply with IEEE 754, the CMF instruction should be used to test for equality (ie when a BEQ or BNE is used afterwards) or to test for unorderedness (in the V flag). The CMFE instruction should be used for all other tests (BGT, BGE, BLT, BLE afterwards).
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
And when the AC bit is set, the flags refer to:
N =
Less than
Z =
Equal
C =
Greater than, or equal, or unordered
V =
Unordered
In APCS code with objasm, to store a floating point value, you would use the directive DCF. You append 'S' for single precision, and 'D' for double.
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply FLTS F0, R0 FLTS F1, R1 FMLS F2, F0, F1 FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) ENDThere is no option to download this program, as standard BASIC won't touch it. However, you can include FP statements if you can 'build' the instructions.
This version will work in BASIC:
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply EQUD &EE000110 ; FLTS F0, R2 EQUD &EE011110 ; FLTS F1, R1 EQUD &EE902101 ; FMLS F2, F0, F1 EQUD &EE100112 ; FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) END
Remember to use the appropriate precision for what you are doing.
REM >precision REM REM Short example to show how data can be 'lost' due REM to using incorrect precision. ON ERROR PRINT REPORT$ + " at " + STR$(ERL/10) : END DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% EXT 1 .single_precision FLTS F0, R0 FIX R0, F0 MOV PC, R14 .double_precision FLTD F0, R0 FIX R0, F0 MOV PC, R14 .doubleext_precision FLTE F0, R0 FIX R0, F0 MOV PC, R14 ] NEXT A% = &1ffffff PRINT "Original input is " + STR$~A% PRINT "Single precision " + STR$~(USR(single_precision)) PRINT "Double precision " + STR$~(USR(double_precision)) PRINT "Double extended " + STR$~(USR(doubleext_precision)) PRINT ENDThe result of this program is:
Original input is 1FFFFFF Single precision 2000000 Double precision 1FFFFFF Double extended 1FFFFFFYou don't need to use double precision everywhere, though, as it will be that much slower. Simply keep this in mind if you are dealing with large numbers.
In order to test the actual speed differences, I wrote a test program:
DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% MOV R0, #23 MOV R1, #1<<16 .timetest FLTD F0, R0 FLTD F1, R0 MUFD F2, F0, F1 SUBS R1, R1, #1 BNE timetest MOV PC, R14 ] NEXT t% = TIME CALL code% PRINT "That took "+STR$(TIME - t%)+" centiseconds." ENDI tried various precisions, and also the fast multiply. It showed something interesting. So I tried multiplication, and addition. All with the same data (input 23).
Here are my results for a million (roughly) convert-and-process operations. I've just timed my RiscPC and the times were MUCH slower - so I'm not entirely sure which system the timings below relate to - it did say "ARM710 processor, FPEmulator 4.14" but I doubt that...
Operation Fast single Single Double Double extended Multiplication 1731cs 1755cs 1965cs 1712cs Division 2169cs 2169cs 2618cs 2479cs Addition n/a 1684cs 1899cs 1646csThis seems to show that double extended precision is the fastest on my machine for a selection of operations. Thus, it is incorrect to simply assume more complexity takes longer time. My personal suspicion here is the internal format is double extended, thus working directly with it entails no loss due to converting the value to a different precision.
Why do I doubt the above experiments? Simple. Here are the results for an ARM710 RiscPC using FPEmulator 4.14 (1.07Mz):
Operation Fast single Single Double Double extended Multiplication 112cs 112cs 110cs 111cs Division 138cs 139cs 153cs 159cs Addition n/a 108cs 107cs 106csThese results seem more consistent, so... :-)
The moral here? Don't be afraid to experiment...