The bad news
first...
Due to
the heavy dependencies on conio and the use
of delay(), this code will only run under
16-bit DOS. I looked to both lcc and
OpenWatcom and one has a conio that crashed when used
as a 32-bit character mode application, and the other didn't
have it.
So,
for now, unless somebody can point me to conio
that works comprehensively in a 32-bit .exe in text mode
(graphics not required), AmélieEm will remain
old-style-DOS.
This,
by the way, means that a RISC OS conversion will not be along
any time soon.
Introduction
As I do not,
currently, have an EPROM eraser I figured it might be better
to write and test Amélie's BIOS and application code
in the software domain. Besides, writing an emulator sounded like fun.
The picture above is what
AmélieEm says when it is initialising, in case you
ever wondered.. This stage will take a split second if you are
loading AmélieEm from a harddisc. The next thing you
will see is Tracey:
Get to know
Tracey, she is very versatile. No, she isn't named
after my girlfriend - it is from "tracing mode". I'm a geek,
remember?
The screen is split into three
sections:
-
The top displays a disassembly of the current
instructions on the left. Cursor down will go to the
next instruction, but cursor up will back up one
byte. Sorry, but the 6502 isn't a word-aligned
processor. In the middle is the complete status of the
processor registers, plus the addressing mode of the
currently selected instruction. Ignore "mem" and "tmp".
These instructions are used internally in the emulation
(refer to the source if you want to know what for). On
the right, the final 16 bytes of the software stack (this
is fixed) and the value of the stack
pointer.
-
The middle of the screen is reserved for I/O
emulation status. At this point, perhaps the only useful
part is the slightly inaccurate cycle counter (it does not
add extra cycles for page boundaries being
crossed).
-
The bottom of the screen serves as a 64 byte dump
of memory and the command line. Page Up and Page Down
can be used to scroll through the memory. The dump will
'wrap around', and any unused or unallocated addresses
will be seen as
'00'.
Tracey is versatile. You
can alter most aspects of the system here - including poking
around in memory (including EPROM!). Pressing RETURN will
allow you to single-step instructions. Leaving Tracey
will let the emulation run at full speed. You can set up
"breakpoints" which will cause Tracey to reappear
just before a specific instruction is executed.
Let's set up a
breakpoint now. You would press B
(for Breakpoint). The command line changes
to:
We want to set
a breakpoint, so press S, and then type in the desired
address F817
. The
command line will look like:
Press Return to set it. You can tell breakpoints by the
red highlight and the 'B' in the leftmost part of the
disassembly.
You can have up to 16 breakpoints active at any one
time.
Unlike several emulations I've seen, Tracey
bends over backwards to prompt you when necessary. You do not
have to remember arcane incantations just to change the Zero
flag...
Tracey prompts you all the
way...
Emulation
principles
The main emulation
loop is within wrapper.c. The loop is as
follows:
{
are we
stepping? if so, call Tracey (Tracey
doesn't return until complete)
read byte from
memory, this is the instruction opcode
look up
addressing mode and cycle count for this
instruction
dispatch the
instruction (this means, 'execute' it)
increment
cycle count
patch up after
breakpoint call, if breakpoints active
post-call
Tracey (this method is not used at this
time)
Poll the
hardware devices
If 10240
cycles have elapsed, check for a keypress (this isn't
accurate as cycles are not incremented one by one; and
anyway the kbhit() call is painfully
SLOW)
}
loop
That is it in a
nutshell.
Address
decoding
The address
decoding attempts to mimic the soft of logic that would be
used on Amélie. It would be simpler (and faster?) to
simply block it as "if between &A000 and &A0FF then it
is the VIA", but we want to be sure that out memory logic is
viable.
A8 = ( (addr >> 8) & 1
); A9 = ( (addr >> 9)
& 1 ); A13 = ( (addr >> 13) &
1 ); A14 = ( (addr >> 14) & 1
); A15 = ( (addr >> 15) & 1
);
/* RAM or ROM?
*/ wrk = A14
+ A15;
if (wrk ==
0) return RAMSEL; /* !14
& !15 = RAM at &0000 */
if (wrk ==
2) return ROMSEL; /*
14 & 15 = ROM at &F000
*/
/* TEST TWO - I/O STUFF [A15 and A13 are
SET, A8 and A9 determine device] */ if (
!A13 || !A15 ) return
0;
wrk = A8 + (A9 <<
1); switch (wrk)
{ case 0 : /* !8 & !9
= VIA at &A000
*/
return VIASEL;
case 1 : /* 8
& !9 = SER at &A100
*/
return SERSEL;
case 2 : /* !8
& 9 = <unused> at &A200
*/
break; /* invalid device, it is an error...
*/
case 3 : /* 8
& 9 = LAT at &A300
*/
return LATSEL; }
What you are
actually looking at here is an optimised software version of
the NAND and AND and 3-to-8 demux. Instead of asking "is (NOT
A14 AND NOT A15)" and then "is (A14 AND A15)", we can add
them, as both have value '1' if active. Therefore RAM (neither
A14 nor A15) will be zero and ROM (A14 and A15) will be
two.
Similar logic is
applied to the I/O selection, though note that this code only
implements a 2-to-4 decode.
Addressing
mode lookup
Basically two 256
byte tables. The instruction is an offset into the table. It
can be expressed beautifully in ARM code:
lookup_opcode
; ON ENTRY:
; R0 =
Opcode
; R1 = Pointer to two-word block for opcode
information
; R2 = Offset
pointer
; R3 = Value read
ADR R2,
datablock ; set up
pointer
LDRB R3, [R2,
R0] ; read addressing
mode (via datablock + opcode
)
STR R3, [R1,
#0]
ADD R2, R2,
#256 ; reposition to
second table
LDRB R3, [R2,
R0] ; read cycle
count
STR R3, [R1,
#4]
MOV PC, R14
The &xB
instructions are undefined on the NMOS 6502, so have been used
to implement various emulator-specific instructions. If you
wish to remove this functionality (perhaps to add 65C(E)02
instructions, please be aware that the breakpoint system uses
one of these instructions!).
Instruction
dispatch
We have the
instruction opcode. So which instruction is this?
The dispatch has
been implemented as a big "select" structure listing
all 256 possible opcodes, trusting that the compiler
can do a good job of making optimised code. The worst
non-optimal case would be:
if (opcode
== 0) { opcode_brk(); return; } if (opcode
== 1) { opcode_ora(); return; } [...] /*
else
*/
opcode_err(); return;
A better option
would be a jump table. Acorn C v5.51 and TurboC v2.01 and
TurboC++ v1.0 all do this as it is the sensible approach - you
don't need to perform 255 tests to reach the 256th
element.
Unfortunately,
there isn't much you can do about how crap the x86 processor
is, so here is an example of it. This code loads a
pre-computed address from an array, so it is a jump table in
the true sense of the word.
push
bp
mov bp,
sp
mov bx,word ptr
[bp+4]
cmp
bx,255
jbe
@@0
jmp
@1@3890 @@0:
shl
bx,1
jmp word ptr
cs:@1@C15044[bx]
The jump table
itself looks like:
@1@C14538 label
word
dw
@1@98
dw
@1@122
and each branch
point looks like:
@1@98:
call near ptr
_opcode_brk
jmp
@1@3914 @1@122:
call near ptr
_opcode_ora
jmp @1@3914
It is almost a
sexual event working with the ARM processor. The instruction
positionings are fixed at a "word" of four bytes. You can
randomly disassemble anything as a new word is a new
instruction. The side effect of this is we can dispense
with the actual jump table and use this knowledge to poke a
new value directly into the Program Counter, as
follows:
CMP
a1,#&ff
ADDLS pc,pc,a1,LSL
#2
B
|L000818.J164.dispop|
B
|L00081c.J163.dispop|
[...] |L000818.J164.dispop|
B
opcode_brk |L00081c.J163.dispop|
B
opcode_ora
This is
oh-so-close. It would have been really great if the compiler
had realised that B ..J164.dispop
-> B opcode_brk is actually the same thing as
calling opcode_brk directly.
As a side effect, note that no registers
are corrupted for this to work.
Here is my
hand-crafted dispatch code:
CMP R0, #((dispatch_endoftable -
dispatch_table) /
4)
ADDCC PC, PC, R0, LSL
#2
B opcode_inv
dispatch_table
; row 0
B
opcode_brk
B
opcode_ora
[...] displatch_endoftable
Processor
'internals'
To be
described...
Device
polling
To be
described...
Breakpoints
To be
described...
Known
emulation faults
-
6502 CPU
core
-
Minimal NMI support
(Amélie doesn't use NMIs)
-
No "BCD" maths
mode
-
No support for
'undocumented' side-effects in the NMOS version of the
6502
-
Basic cycle counting
- does not include "additional" cycles
-
May or may not fully
support all of the CPU bugs (these need to be enabled,
then the core recompiled)
-
6522 VIA
core
-
No
Timer2
-
Timer1 only works in
basic modes (single-shot and countdown, without
PB7)
-
No support for
serial shifting
-
No support for
automatic handshaking
-
Only generates IRQs
for Timer1, CAx and CBx events
-
unfinished
-
6551 ACIA
core
-
Latch
Just show me the code!
The code is
written in plain C, with C style comments.
While much of
AmélieEm is "portable", the user interface parts rely
heavily on conio.h and dos.h which means
that at this time only a 16-bit MS-DOS version
is available.
AmélieEm compiles on these
systems:
For various
reasons, AmélieEm does not
compile on these systems:
-
TurboC
v2.01 Tracey's source is larger than the
inbuilt (~64K) limit on source
file size.
-
lcc-win32
v3.8 We have (mostly?) conio.h but I don't see
any delay() function.
-
OpenWatcom
v1.2 The requires parts appear to be present
(more-or-less), but you can't use them in a 32-bit console
application, so compiling to a 16-bit application is
unlikely to offer anything over
TurboC++.
-
RISC OS, Unix,
Mac, etc etc... Find conio.h and make
a delay() routine, and you might be in with
a chance... :-)
If you need
any help with
AmélieEm's code,
feel free to
contact me.
Modules
addrdeco.c simply decodes the
address given to be a device ID.
breakpt.c handles the
breakpoints.
dispatch.c is the processor
instruction dispatcher. You may find benefits if you replace
this with some optimised code; I have written a fast ARM
version. Sorry, I don't speak x86.
lookup.c is the part that looks
up cycle count and addressing mode for each instruction. As
with dispatch.c, you can probably write more optimal
code than your compiler in this instance...
memory.c handles all reading and
writing from memory. This is a candidate for assemblerisation,
but it may be quite involved.
opcode.c is the core of the 6502
processor emulation. romram.c is a short
module that allocates memory for the RAM area and the
ROM area. tracey.c contains all of
Tracey's code, which is why it
is huge! via.cis the 6522
VIA emulation. wrapper.c is the entry point. It
organises initialisation and then runs the main execution
loop.
Release
notes
AmélieEm is not yet 'finished', nor
has it really been tested, so I have nothing
to add at this
time.
|