From ARMwiki
Jump to: navigation, search
Instruction LDR[B][T]
Function Load Register
Category Load and Store
ARM family All
Notes -


LDR[B][T] : Load Register

Take a deep breath - this is one of the most flexible, and complicated, of the ARM instructions.

LDR allows you a way to load a 32 bit word (LDR) or an unsigned byte (LDRB) into a register, in a variety of addressing modes (pre- and post-indexed), with optional address translation to force accessing User mode registers (LDRT/LDRBT). By using PC as the base register, position independent code can be created, as can jump tables (see example), easy access to memory arrays, etc.

There are nine possible addressing modes, for all purposes:

  • Immediate offset
  • Register offset
  • Scaled register offset
  • Immediate pre-indexed
  • Register pre-indexed
  • Scaled register pre-indexed
  • Immediate post-indexed
  • Register post-indexed
  • Scaled register post-indexed

These are described in more detail below.

LDR[B][T] is available in all versions of the ARM architecture. Later versions offer, additionally, LDRH/LDRSH to load unsigned or signed 16 bit halfwords, plus LDRSB to load signed bytes. These instructions are available in architecture v4 or later (ARM8/StrongARM generation).


Many. Notice that the condition code comes before the B or T specifier, and that T is only available with post-indexed addressing.
Optional parts are described in curly braces {} because square brackets [] are a part of the instruction syntax.

Immediate offset:

  LDR{cond}{B}     Rd, [Rn{, #{+|-}<12 bit offset>}]

The offset is optional, as LDR Rd, [Rn] is a valid instruction, which will be assembled with an offset of zero to mean, quite simply, "load Rd with the data at Rn".

Register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

Scaled register offset:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]

Immediate pre-indexed:

  LDR{cond}{B}     Rd, [Rn #{+|-}<12 bit offset>]!

Register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]!

Scaled register pre-indexed:

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, <shift> #<shift immediate>]!

Immediate post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

Register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

Scaled register post-indexed:

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm, <shift> #<shift immediate>

Briefly, an ! suffix means pre-indexed, and having square brackets around only Rn specifies post-indexed.


 Read words or bytes from memory, with optional write-back.

Addressing modes

Note that the first three forms are, strictly speaking, pre-indexed. However the ARM Architecture Reference Manual uses the term "pre-indexed" only to the versions with writeback, so this is the parlance used here. You can, if you wish, think of the first three as "pre-indexed without writeback".

With normal addressing, the address calculated is the address used.

Immediate offset

  LDR{cond}{B}     Rd, [Rn{, #{+|-}<12 bit offset>}]

The register Rd will contain the word or byte loaded from Rn plus/minus the specified offset.

This addressing mode is useful for accessing structures and data fields. For example, if R3 points to the structure base, you can load the third element with:

  LDR   R0, [R3, #8]   ; R0 = word at [R3 + 8]
It is worth taking a moment to understand a small source of potential confusion. Because LDR and LDRB share a lot in implementation, and because some ARMs are capable of unaligned loads (that is, loading words that are not on a word boundary), the offset specified, the #8, is in bytes; however LDR loads words.
Therefore, when you are looking to load elements from an array of words, the first word would be at offset #0. As a word is four bytes, the second would be found at offset #4. The third, the one the example above is loading, is found at #8.

A basic no-offset LDR is this with an offset of zero. It looks like:

  LDR   R0, [R3]       ; R0 = word at R3

Register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm]

The register Rd will contain the word or byte loaded from Rn plus/minus the offset specified in Rm.

This addressing mode is similar to immediate offset, except for the offset being held in a register, thus can easily cycle through array elements. Assuming R3 points to the structure base, and R4 holds the value 8, this will load the third element of a word array:

  LDR   R0, [R3, R4]   ; R0 = word at [R3+R4]

Scaled register offset

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]

This is useful for cases where the index register Rm is a counter rather than an offset, thus permitting a shift to be applied to change the counter into an offset. Assuming R0 is a function number, we can shift this into an address offset and branch to handler code as follows:

  LDR   PC, [PC, R0, LSL #2]   ; PC = PC + [R0 << 2]

Refer to the jump table example to see this in full.

With pre-indexed addressing, the address calculated is the address used; with the calculated address being written back to the base register.

Immediate pre-indexed

  LDR{cond}{B}     Rd, [Rn, #{+|-}<12 bit offset>]!

This functions as for immediate indexed, except that the calculated address is written back to the base register Rn. This permits pointer access to arrays with automatic update of the pointer. We can read each byte from a string, one by one, using something such as:

  LDRB  R0, [R1, #1]!   ; R0 = byte from [R1+1], then R1 updated

This addressing mode, when used with STR is useful for pushing single registers on to the RISC OS/ARMLinux (FD) stack:

  STR   R14, [R13, #-4]!

Register pre-indexed

  LDR{cond}{B}     Rd, [Rn, Rm]!

This functions as for immediate pre-indexed, except that the offset comes from a register. This is useful for walking an array, perhaps with variable sized elements where the offsets can be written to the offset register.

  LDR   R0, [R1, R2]!   ; R0 = byte from [R1+R2], then R1 updated

Scaled register pre-indexed

  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSL #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, LSR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ASR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, ROR #<shift immediate>]!
  LDR{cond}{B}     Rd, [Rn, {+|-}Rm, RRX]!

As for scaled register offset, with the difference that the calculated address is written back to Rn. The actual address is Rn plus the shifted value of Rm.

  LDR   R0, [R1, R2, LSL #2]!   ; R0 = [R1 + (R2 << 2)], then R1 updated

With post-indexed addressing, the address used is the address held in the base register (Rn). Once the data has been loaded, the address is then calculated using the offsets, and this is written back to the base register.
Post-indexed addressing always writes back. You can use the T instruction suffix to force accessing User mode registers from a privileged mode.

Immediate post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], #{+|-}<12 bit offset>

This is used for access to arrays with automatic update of the base register. The word to be read is read from Rn, and then Rn is updated to be Rn plus offset.
This addressing mode can be used to pop single words off of the RISC OS/ARMLinux (FD) stack:

  LDR   R14, [R13], #4

Register post-indexed

  LDR{cond}{B}{T}  Rd, [Rn], {+|-}Rm

As for immediate post-indexed, with the difference that the offset is taken from a register. The base register, Rn is updated following the read (from the address pointed to by Rn) to be Rn plus Rm.

Scaled register post-indexed

  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSL #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, LSR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ASR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, ROR #<shift immediate>
  LDR{cond}{B}     Rd, [Rn], {+|-}Rm, RRX

As for register post-indexed, with the addition that the contents of Rm are shifted.

Equivalence in C

This shows an equivalence between an ARM instruction, and behave-alike C code. Assume that r0 is a long unsigned integer, and r1 is an array of long unsigned integers.
The upper line in each example is ARM code, the lower line is equivalent C.

Immediate offset:

  LDR    R0, [R1, #4]
  r0 = r1[1];
Remember, again, the offset is in bytes, so #4 would point to the second word (long int) in our array, thus in C the array indice [1] would provide the correct data.

Register offset:

  LDR    R0, [R1, R2]
  r0 = r1[r2];

Scaled register offset:

  LDR    R0, [R1, R2, LSL #4]
  r0 = r1[(r2 << 4)];

Immediate pre-indexed:

  LDR    R0, [R1, #4]!
  r1 += 4; r0 = *r1;

Register pre-indexed:

  LDR    R0, [R1, R2]!
  r1 += r2; r0 = *r1;

Scaled register pre-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r1 += (r2 << #2); r0 = *r1;

Immediate post-indexed:

  LDR    R0, [R1], #4]
  r0 = *r1; r1 += 4;

Register post-indexed:

  LDR    R0, [R1], R2
  r0 = *r1; r1 += r2;

Scaled register post-indexed:

  LDR    R0, [R1, R2, LSL #2]!
  r0 = *r1; r1 += (r2 << #2);


Simple I/O to memory transfer. In this example, R8 points to the I/O device base address, R9 points to the memory buffer where we will be writing our data, R10 points to the final address (in memory), and R11 is used as workspace. These values were chosen as they are banked in FIQ mode. The final note is the constant #DataWord which is an offset to the actual I/O register that we will be reading. The hardware clears the interrupt when the data is read.

  LDR     R11, [R8, #DataWord]  ; read word (fixed address)
  STR     R11, [R9], #4         ; write to memory (address updates)
  CMP     R9, R10               ; done?
  BLT     read                  ; no, go back for more

Jump table. A jump table (also called a multi-way branch) is a useful concept for dispatching according to an input selector. By way of an example, imagine an emulation of the 6502 processor. The 6502 presents up to 256 possible instructions (not all are used) followed by zero, one, or two data bytes. We could sort out which instruction we are executing with code such as the following:

  ; 6502 opcode in R0
  CMP   R0, #0
  BEQ   opcode_brk
  CMP   R0, #1
  BEQ   opcode_ora
  CMP   R0, #2

However anybody that writes code like this is either a total newbie, or doesn't have a future in programming ... to execute the instruction INC <absolute>, X which is opcode &FE, you would need to work through some two hundred comparisons and skip over the corresponding number of branches (the exact number depends on how you handle invalid opcodes). It is possible that you might reach your INC handler branch after five hundred and six instructions! As you can imagine, such code would be tedious in every possible sense of the word.

A much better solution, that will reduce any and every 6502 opcode to a three-instruction dispatch is the jump table. Here it is:

  CMP    R0, #((dispatch_endoftable - dispatch_table) / 4)
  ADDCC  PC, PC, R0, LSL #2
  B      opcode_inv
  ; row 0
  B      opcode_brk
  B      opcode_ora
  B      opcode_err
  ...all of the rest of the instructions...

The complicated equation at the end is checking that R0 fits into the range of (end of table - start of table) divided by four. The result, for 1024 bytes representing 256 branch instructions, should be 256 - for 256 possible 6502 opcodes.
We then calculate a relative address (PC plus shifted R0 offset) and push it into PC, the address being our desired opcode branch. If the input value is out of range, we instead fall through to branch to the invalid opcode handler. This is possible as PC, when read, is in advance of the expected location of PC due to how the ARM works (it is actually PC+8); which provides us with the space to then insert our fall-through case branch. Nifty, huh?

Better jump table. You might be thinking where was the LDR? The above example demonstrates how a jump table functions. An alternative jump table can be created by, instead of jumping to a branch, by instead taking an address and stuffing it directly into PC. For example, if we assume that R0 holds the operation index, and MaxOp is a constant describing the maximum number of supported operations, we can perform our branch as follows:

  CMP    R0, #MaxOp
  LDRLT  PC, [PC, R0, LSL #2]
  B      BadIndexValue
  DCD    Op_0_Handler
  DCD    Op_1_Handler
  DCD    Op_2_Handler

This version, by using a direct load instead of a branch-to-a-branch is even better, able to handle dispatch in only two instructions (compare, then load). This is useful for function wrappers, SWI handlers, and the like, where an input value selects the operation desired. For instance, instead of a dozen functions, there may (at API level) instead be a single "Misc Filesystem Function" where operation #0 is Read Size, #1 is Read Date, #2 is Read Permissions, etc etc.


  • Immediate, Register, or Scaled register: Specifying PC as Rn uses the value of (the instruction + 8).
  • Pre-indexed (any) / Post-indexed (any): Specifying PC as either Rn or Rm is unpredictable.
  • Register or Scaled register (pre- or post- indexed): Using the same register as Rn and Rm is unpredictable.
  • Pre-indexed (any) / Post-indexed (any): Using the same register as Rd and Rn is unpredictable.
  • If Translation is used (post-indexed only), the registers used will be the User Mode registers, reardless of the currrent processor mode.
  • If a word read is not word aligned, the data read is rotated so that the addressed byte is the least significant byte of the register.
  • For byte loads, the byte is zero-extended so that it, and it alone, is what is held in the specified register.


The instruction bit patterns are as follows.

  • I - Register (set) or Immediate (unset)
  • P - Pre-indexed (set) or Post-indexed (unset)
  • U - Offset added to base (set) or subtracted from base (unset)
  • B - Unsigned byte (set) or word (unset) access
  • W - Depends on the P bit:
    • [P = 1] - the calculated address will be written back if W set.
    • [P = 0] - the access is treated as a User mode access if W set (has no effect if processor in User mode).
  • L - operation is a Load (set) or a Store (unset)

Immediate offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 0
condition 0 1 I P U B W 1 Rn (base) Rd 12 bit offset

Register offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 4 3 - 0
condition 0 1 I P U B W 1 Rn (base) Rd 0 0 0 0 0 0 0 0 Rm

Scaled Register offset/index:

31 - 28 27 - 26 25 24 23 22 21 20 19 - 16 15 - 12 11 - 7 6 - 5 4 3 - 0
condition 0 1 I P U B W 1 Rn (base) Rd shift immediate shift 0 Rm

To help clarify, the bits 27-20 are as follows for each of the available options:

Addressing mode 27 - 26 25 24 23 22 21 20
Immediate offset 0 1 0 1 U B 0 L
(Scaled) Register offset 0 1 1 1 U B 0 L
Immediate Pre-indexed 0 1 0 1 U B 1 L
(Scaled) Register Pre-indexed 0 1 1 1 U B 1 L
Immediate Post-indexed 0 1 0 0 U B 0 L
(Scaled) Register Post-indexed 0 1 1 0 U B 0 L

You can differentiate between Register and Scaled Register by looking at bits 11-4, which will be all zero for Register, or set accordingly if Scaled Register.

Personal tools