Rick's b.log - 2011/04/05 |
|
It is the 21st of November 2024 You are 18.217.237.169, pleased to meet you! |
|
mailto:
blog -at- heyrick -dot- eu
Okay. I will not cover anything to do with reverse engineering. The reason for this is another name for that is disassembly, and it is not so useful to try to disassemble until you are familiar with how to assemble.
For finding out the opcodes, the best resource is ARM Ltd (registration required).
As for Joe's other point, we are greatly aided by the fact that we built out tiny executable "from the ground up". This means we can account for every single byte in the file. So when the GCC tools wrap the data in crap, we can do something about it.
Let's look at a (partial) dump of the GCC assembled file:
The ARM is beautiful. It is all nicely word aligned, so working through the code is pretty simple.
We shall start with something nice and obvious. The "Hello World" string:
Okay. Now to find the beginning. We have to count the number of words defined in the headers. You will see there are 21 ".word" lines. 21 words. Multiply by four, this gives us 84 bytes. Just count back, in whatever manner best suits you.
Here is the file with the start of the file marked:
Working the other way now, we shall look at the executable code. It is a really simple program. Only six instructions. Six multiplied by four is 24. So following the string, we want to count up either six words or 24 bytes. Here it is marked:
It is useful to know little tricks to verify that you're on track. For example, we know the last instruction in our program is a SWI call to tell Linux it can discard our task, we're done. The SWI number is
That bit in cyan? From the second ELF to the SWI 900001? That is the part we want to keep. Everything else is junk and can be deleted.
Just for a laugh, I looked to see what would be the smallest possible RISC OS version that would be a real executable (and not a BASIC program, etc). With a valid APCS header, it is a little smaller than the Linux version, running in at exactly 100 bytes:
Here's the source:
But, wait. RISC OS is, historically, a rather simpler OS [and a lot nicer to program, nerr!]. This means that the program header, while it serves a purpose, is not strictly necessary. This may not be true of the Iyonix type RISC OS (which may reject unheadered code as being 26 bit PC+PCR), but for all of the original Acorn machines, there was really only one thing you could count on - when you were loaded, you were loaded at &8000. All "absolute" executables load at &8000, clever page switching makes it possible in a multitasking environment.
Here it is, all 36 bytes of it:
I have been asked why I mention that things are "backwards". Take a look at the words in the pictures above and you will see that they are the other way around.
Here, then, is the source to what may be the tiniest possible native ARM executable:
We're not done yet, mind you. We can lose a word. We can dispose of setting R0 to zero prior to OS_Exit. Sure, the system might get some weird return code but this doesn't matter much. It is supposed to be a possible pointer to an error block, but RISC OS sanitises this in case of dimwit coders blindly calling OS_Exit with any old rubbish in R0. So we could push this down to 32 bytes. But, wait, we can use a rather icky little system call called
Therefore, I present to you the absolute smallest valid ARM executable to display a Hello World message:
You can download the RISC OS sources (Zip, 5KiB).
Okay, it is nearly five AM and I know y'all think I have no life. That may be true, but hey... t'was fun. Until next time!
Geekery expanded
In the comments of the previous entry, Joe asks:
I've just started on this journey, I would appreciate some pointers, how to know, which bits you can delete from the binary file using hexeditor and where to find the opcodes and reverse engineering tips.
Many protection systems (and trust me, I get emails asking for help breaking protection on kit I've never heard of) work by testing for a condition. Perhaps a cypher or somesuch. The weak point is, frequently, a BLNE (branch if not equal) to the bomb-out routine. If you alter that to a NOP, then you may find the program works. That is all I shall say on the matter, for if you know enough ARM to work out what I'm talking about, you probably have enough knowledge to try this on your own.
Failing that, there is online documentation for the processor core used in the OSD, the ARM926EJ-S. Browse it here, or download a PDF version.
If all else fails, Google. ☺
00000000 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 |.ELF...a........|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
00000020 e0 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |........4.....(.|
00000030 07 00 04 00 7f 45 4c 46 01 01 01 61 00 00 00 00 |.....ELF...a....|
00000040 00 00 00 00 02 00 28 00 01 00 00 00 68 80 00 00 |......(.....h...|
00000050 34 00 00 00 00 00 00 00 02 00 00 00 34 00 20 00 |4...........4. .|
00000060 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000070 00 80 00 00 00 80 00 00 80 00 00 00 80 00 00 00 |................|
00000080 05 00 00 00 00 80 00 00 48 65 6c 6c 6f 20 57 6f |........Hello Wo|
00000090 72 6c 64 21 20 3a 2d 29 0a 00 00 00 01 00 a0 e3 |rld! :-)........|
000000a0 20 10 1f e5 0d 20 a0 e3 04 00 90 ef 00 00 a0 e3 | .... ..........|
000000b0 01 00 90 ef 00 2e 73 79 6d 74 61 62 00 2e 73 74 |......symtab..st|
000000c0 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e |rtab..shstrtab..|
000000d0 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73 73 00 |text..data..bss.|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
And now let's look at our code:
@ Write a basic ELF header [ALL words are written backwards!]
.word 0x464C457F @ ELF "magic" value
.word 0x61010101 @ Type = 32 bit, word order LSB, ver 1
.word 0 @ padding
.word 0 @ padding
.word 0x00280002 @ executable file, ARM CPU
.word 0x00000001 @ version = 1 (current)
.word 0x00008068 @ entry point, start of execution
.word 0x00000034 @ program header table offset
.word 0 @ section table offset (there is none)
.word 0x00000002 @ processor specific flags (2=???)
.word 0x00200034 @ ELF header size, size of ptab entry
.word 0x00000001 @ num of ptab ents, size of sectab ents
.word 0 @ num sectab ents, ptr to string table
@ Now for a basic program header table
.word 0x00000001 @ type = PT_LOAD (loadable)
.word 0 @ offset (0 = load from start)
.word 0x00008000 @ virtual address to load to
.word 0x00008000 @ physical address to load to
.word 0x00000080 @ number of bytes to load
.word 0x00000080 @ size of memory image
.word 0x00000005 @ flags = Executable (1) and Read (4)
.word 0x00008000 @ alignment
@ Now for some really simple code to print the message to the terminal.
message:
.ascii "Hello World! :-)\n"
.byte 0
.byte 0
.byte 0
entry:
mov r0, #1 @ 1 = stdout
adr r1, message @ pointer to message
mov r2, #17 @ message length
swi 0x900004 @ swi call for Sys_Write
mov r0, #0 @ set return code
swi 0x900001 @ swi call for Sys_Exit
@ That's it! Done.
Logic would say we can assume that the second ELF marker is ours, but I shall talk about a more definitive way to deal with this.
00000000 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 |.ELF...a........|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
00000020 e0 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |........4.....(.|
00000030 07 00 04 00 7f 45 4c 46 01 01 01 61 00 00 00 00 |.....ELF...a....|
00000040 00 00 00 00 02 00 28 00 01 00 00 00 68 80 00 00 |......(.....h...|
00000050 34 00 00 00 00 00 00 00 02 00 00 00 34 00 20 00 |4...........4. .|
00000060 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000070 00 80 00 00 00 80 00 00 80 00 00 00 80 00 00 00 |................|
00000080 05 00 00 00 00 80 00 00 48 65 6c 6c 6f 20 57 6f |........Hello Wo|
00000090 72 6c 64 21 20 3a 2d 29 0a 00 00 00 01 00 a0 e3 |rld! :-)........|
000000a0 20 10 1f e5 0d 20 a0 e3 04 00 90 ef 00 00 a0 e3 | .... ..........|
000000b0 01 00 90 ef 00 2e 73 79 6d 74 61 62 00 2e 73 74 |......symtab..st|
000000c0 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e |rtab..shstrtab..|
000000d0 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73 73 00 |text..data..bss.|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
This is marked out in green. This is our "home base". Our marker.
Note that I have included the newline code, and continued over the three null bytes padding to word-align the marker.
00000000 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 |.ELF...a........|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
00000020 e0 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |........4.....(.|
00000030 07 00 04 00 7f 45 4c 46 01 01 01 61 00 00 00 00 |.....ELF...a....|
00000040 00 00 00 00 02 00 28 00 01 00 00 00 68 80 00 00 |......(.....h...|
00000050 34 00 00 00 00 00 00 00 02 00 00 00 34 00 20 00 |4...........4. .|
00000060 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000070 00 80 00 00 00 80 00 00 80 00 00 00 80 00 00 00 |................|
00000080 05 00 00 00 00 80 00 00 48 65 6c 6c 6f 20 57 6f |........Hello Wo|
00000090 72 6c 64 21 20 3a 2d 29 0a 00 00 00 01 00 a0 e3 |rld! :-)........|
000000a0 20 10 1f e5 0d 20 a0 e3 04 00 90 ef 00 00 a0 e3 | .... ..........|
000000b0 01 00 90 ef 00 2e 73 79 6d 74 61 62 00 2e 73 74 |......symtab..st|
000000c0 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e |rtab..shstrtab..|
000000d0 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73 73 00 |text..data..bss.|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000000 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 |.ELF...a........|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
00000020 e0 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |........4.....(.|
00000030 07 00 04 00 7f 45 4c 46 01 01 01 61 00 00 00 00 |.....ELF...a....|
00000040 00 00 00 00 02 00 28 00 01 00 00 00 68 80 00 00 |......(.....h...|
00000050 34 00 00 00 00 00 00 00 02 00 00 00 34 00 20 00 |4...........4. .|
00000060 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000070 00 80 00 00 00 80 00 00 80 00 00 00 80 00 00 00 |................|
00000080 05 00 00 00 00 80 00 00 48 65 6c 6c 6f 20 57 6f |........Hello Wo|
00000090 72 6c 64 21 20 3a 2d 29 0a 00 00 00 01 00 a0 e3 |rld! :-)........|
000000a0 20 10 1f e5 0d 20 a0 e3 04 00 90 ef 00 00 a0 e3 | .... ..........|
000000b0 01 00 90 ef 00 2e 73 79 6d 74 61 62 00 2e 73 74 |......symtab..st|
000000c0 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e |rtab..shstrtab..|
000000d0 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73 73 00 |text..data..bss.|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x900001
. This is presented to the assembler as a literal, so it stands to reason that this number will be present in the final word of our file. If we observe that all the bytes are 'backwards', we will be looking for "01 00 90 ##
", where the '##' is the SWI opcode, but we don't need to know that in order to find that the last highlighted word is indeed correct.
; A little hello world for RISC OS
; with valid APCS header. ;-)
; Include two personal macros
GET ^.h.equsza
GET ^.h.equd
; Define the two system calls used in this code
OS_Write0 * &02
OS_Exit * &11
; Specify the code area, type, addrmode, and entry point
AREA |asm$code|, CODE, A32bit
ENTRY
; RISC OS APCS header
MOV R0, R0 ; Decompression code call
MOV R0, R0 ; Self-relocation code call
MOV R0, R0 ; Zero initialisation code call
BL start ; Program entry call
SWI OS_Exit ; Fall-out trap to force exit
EQUD &40 ; Read-only area size (header)
EQUD &20 ; Read-write area size (code)
EQUD 0 ; Debug area size
EQUD 0 ; Zero initialisation size
EQUD 0 ; Debug type
EQUD &8000 ; Current base of absolute
EQUD 0 ; Workspace required
EQUD 32 ; Flag software as 32 bit PCR okay
EQUD 0 ; Data base address when linked
EQUD 0 ; Reserved header (should be zero)
EQUD 0 ; Reserved header (should be zero
message
EQUSZA "Hello World! :-)\n"
start
ADR R0, message ; Pointer to message
SWI OS_Write0 ; OS call writes until null byte
MOV R0, #0 ; Define return code
SWI OS_Exit ; And exit.
END
This means we can dispense of a lot of the formalities and go for the purest, smallest, program that is a valid executable.
But are they? Look also at the ending newline code of the message, and its place in the hex dump vs the ASCII.
It may seem to be weird (and, yeah, it probably is!), with complicated explanations steeped in time and history and a dose of 6502 influence, suffice to say that that is how I'm used to seeing it.
DIM code% 36
FOR l% = 0 TO 2 STEP 2
P% = code%
[ OPT l%
ADR R0, message
SWI "OS_Write0"
MOV R0, #0
SWI "OS_Exit"
.message
EQUS "Hello World! :-)"
EQUB 10
EQUB 0
EQUB 0
EQUB 0
]
NEXT
OSCLI("%Save <Obey$Dir>.HelloTiny " + STR$~(code%) + " " + STR$~(P%))
OSCLI("%SetType <Obey$Dir>.HelloTiny &FF8")
Yes, BBC BASIC on RISC OS has a built-in ARM assembler, just like how the BBC BASIC on the BBC Micro had a built-in 6502 assembler. I wish I had that on the OSD's Linux setup!
OS_WriteS
which will dick around with R14 to find a null terminated string following the SWI, which would then be written.
DIM code% 28
FOR l% = 0 TO 2 STEP 2
P% = code%
[ OPT l%
SWI "OS_WriteS"
EQUS "Hello World! :-)"
EQUB 10
EQUB 0
EQUB 0
EQUB 0
SWI "OS_Exit"
]
NEXT
OSCLI("%Save <Obey$Dir>.HelloOMFG " + STR$~(code%) + " " + STR$~(P%))
OSCLI("%SetType <Obey$Dir>.HelloOMFG &FF8")
Twenty eight bytes. Count 'em and weep. [also in your face Linux! nerr! etc]
The HelloWorld is assembler to be built with amu, objasm, and link [so you'll need the RISC OS compiler suite].
HelloTiny is built using a simple BASIC program. Build it by running the BuildTiny Obey file - this is necessary to set up the correct path to write the output to.
BldOMGCode will build the itty-bitty one. Run the BASIC code directly after having run BuildTiny.
joe, 5th April 2011, 12:50 Thanks Rick,
this is a real treasure.
I have got win. CE 5.0 hacked GPS, with 600MHz CPU
and 128MB of RAM to test my future codes, if any.
Microsoft's Embeded C++ for win CE 5.0 "Hello World"
needs 7.5 KB to paint it on the screen.
The things are "backwards" because of endiannes,
I am working on fully understanding this problem.
I have read somewhere, that the best way to learn,is to write very simple programs and look at them in hexeditor.
Maybe you should write a book, which explains it all,
something like "Machine language for new generation",
it would be the real best seller.
I want at least 2 copies, just in case.
joe
© 2011 Rick Murray |
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted. RIPA notice: No consent is given for interception of page transmission. |