|
The Pipeline
|
|
A conventional processor executes instructions one at a time, just as you expect it to when you
write your code. Each execution can be broken down into three parts, which anybody who has
learned this stuff at college will have fetch, decode, execute burned into their memory.
In English...
- Fetch
Retrieve the instruction from memory.
Don't get all techie - whether the instruction comes from system memory or the processor
cache is irrelevant, the instruction is not loaded 'into' the processor until it is
specifically requested. The cache simply serves to speed things up. By loading chunks of
system memory into the cache, the processor can satisfy many more of its instruction
fetches by pulling instructions from the cache. This is necessary because processors are
very fast (StrongARMs, 200MHz+; Pentiums up to GHz!) and system memory is not (33, 66, or
133MHz). To see the effect the cache has on your processor, use *Cache Off
.
- Decode
Figure out what the instruction is, and what is supposed to be done.
- Execute
Perform the requested operation.
Each of these operations is performed along with the electronic 'heartbeat', the clock rate.
Example clock rates for several microprocessors included in Acorn products are given here as an
example:
BBC microcomputer | 6502 | 2MHz
|
Acorn A310-A3000 | ARM 2 | 8MHz
|
Acorn A5000 | ARM 3 | 25MHz
|
Acorn A5000/I | ARM 3 | 30MHz
|
RiscPC600 | ARM610 | 33MHz
|
RiscPC700 | ARM710 | 40MHz
|
Early PC co-processor | 486SXL-40 | 33MHz (not 40!)
|
RiscPC (StrongARM) | SA110 | 202MHz - 278MHz+
|
As shown in the PC world, processors are running into GHz speeds (1,000,000,000 ticks/sec) which
will necessitate much in the way of speed tweaks (huge amounts of cache, extremely optimised
pipeline) because there is no way the rest of the system can keep up. Indeed, the rest of the
system is likely to be operating at a quarter of the speed of the processor. The RiscPC is
designed to work, I believe, at 33MHz. That is why people thought the StrongARM wouldn't give
much of a speed boost. However the small size of ARM programs, coupled with a rather large cache,
made the StrongARM a viable proposition in the RiscPC, it bottlenecked horribly, but other
factors meant that this wasn't so visible to the end-user, so the result was a system which is
much faster than the ARM710. More recently, the Kinetic StrongARM processor card. This attempts
to alleviate bottlenecks by installing a big wodge of memory directly on the processor card and
using that. It even goes so far as to install the entirety of RISC OS into that memory so you
aren't kept waiting for the ROMs (which are slower even than RAM).
There is an obvious solution. Since these three stages (fetch, decode, execute) are fairly
independent, would it not be possible to:
fetch instruction #3
decode instruction #2
execute instruction #1
...then, on the next clock tick...
fetch instruction #4
decode instruction #3
execute instruction #2
...tick...
fetch instruction #5
decode instruction #4
execute instruction #3
In practice, the answer is yes. And this is exactly what a pipeline is. Simply by doing this,
you have just made your processor three times faster!
Now, it isn't a perfect solution.
- When it comes to a branch, the pipeline is dumped as instructions after a branch are not
required. This is why it is preferable to use conditional execution and not branching.
- Next, you have to keep in mind the program counter is ahead of the instruction that is
currently being executed. So if you see an error at 'x', then the real error is quite
possibly at 'x-8' (or 'x-12' for StrongARM).
Return to assembler index
Copyright © 2001 Richard Murray