I've just started on this journey, I would appreciate some pointers, how to know, which bits you can delete from the binary file using hexeditor and where to find the opcodes and reverse engineering tips.
Okay. I will not cover anything to do with reverse engineering. The reason for this is another name for that is disassembly, and it is not so useful to try to disassemble until you are familiar with how to assemble.
Many protection systems (and trust me, I get emails asking for help breaking protection on kit I've never heard of) work by testing for a condition. Perhaps a cypher or somesuch. The weak point is, frequently, a BLNE (branch if not equal) to the bomb-out routine. If you alter that to a NOP, then you may find the program works. That is all I shall say on the matter, for if you know enough ARM to work out what I'm talking about, you probably have enough knowledge to try this on your own.
As for Joe's other point, we are greatly aided by the fact that we built out tiny executable "from the ground up". This means we can account for every single byte in the file. So when the GCC tools wrap the data in crap, we can do something about it.
Let's look at a (partial) dump of the GCC assembled file:
This is marked out in green. This is our "home base". Our marker.
Note that I have included the newline code, and continued over the three null bytes padding to word-align the marker.
Okay. Now to find the beginning. We have to count the number of words defined in the headers. You will see there are 21 ".word" lines. 21 words. Multiply by four, this gives us 84 bytes. Just count back, in whatever manner best suits you.
Here is the file with the start of the file marked:
Working the other way now, we shall look at the executable code. It is a really simple program. Only six instructions. Six multiplied by four is 24. So following the string, we want to count up either six words or 24 bytes. Here it is marked:
It is useful to know little tricks to verify that you're on track. For example, we know the last instruction in our program is a SWI call to tell Linux it can discard our task, we're done. The SWI number is 0x900001. This is presented to the assembler as a literal, so it stands to reason that this number will be present in the final word of our file. If we observe that all the bytes are 'backwards', we will be looking for "01 00 90 ##", where the '##' is the SWI opcode, but we don't need to know that in order to find that the last highlighted word is indeed correct.
That bit in cyan? From the second ELF to the SWI 900001? That is the part we want to keep. Everything else is junk and can be deleted.
Just for a laugh, I looked to see what would be the smallest possible RISC OS version that would be a real executable (and not a BASIC program, etc). With a valid APCS header, it is a little smaller than the Linux version, running in at exactly 100 bytes:
Here's the source:
; A little hello world for RISC OS
; with valid APCS header. ;-)
; Include two personal macros
GET ^.h.equsza
GET ^.h.equd
; Define the two system calls used in this code
OS_Write0 * &02
OS_Exit * &11
; Specify the code area, type, addrmode, and entry point
AREA |asm$code|, CODE, A32bit
ENTRY
; RISC OS APCS header
MOV R0, R0 ; Decompression code call
MOV R0, R0 ; Self-relocation code call
MOV R0, R0 ; Zero initialisation code call
BL start ; Program entry call
SWI OS_Exit ; Fall-out trap to force exit
EQUD &40 ; Read-only area size (header)
EQUD &20 ; Read-write area size (code)
EQUD 0 ; Debug area size
EQUD 0 ; Zero initialisation size
EQUD 0 ; Debug type
EQUD &8000 ; Current base of absolute
EQUD 0 ; Workspace required
EQUD 32 ; Flag software as 32 bit PCR okay
EQUD 0 ; Data base address when linked
EQUD 0 ; Reserved header (should be zero)
EQUD 0 ; Reserved header (should be zero
message
EQUSZA "Hello World! :-)\n"
start
ADR R0, message ; Pointer to message
SWI OS_Write0 ; OS call writes until null byte
MOV R0, #0 ; Define return code
SWI OS_Exit ; And exit.
END
But, wait. RISC OS is, historically, a rather simpler OS [and a lot nicer to program, nerr!]. This means that the program header, while it serves a purpose, is not strictly necessary. This may not be true of the Iyonix type RISC OS (which may reject unheadered code as being 26 bit PC+PCR), but for all of the original Acorn machines, there was really only one thing you could count on - when you were loaded, you were loaded at &8000. All "absolute" executables load at &8000, clever page switching makes it possible in a multitasking environment.
This means we can dispense of a lot of the formalities and go for the purest, smallest, program that is a valid executable.
Here it is, all 36 bytes of it:
I have been asked why I mention that things are "backwards". Take a look at the words in the pictures above and you will see that they are the other way around.
But are they? Look also at the ending newline code of the message, and its place in the hex dump vs the ASCII.
It may seem to be weird (and, yeah, it probably is!), with complicated explanations steeped in time and history and a dose of 6502 influence, suffice to say that that is how I'm used to seeing it.
Here, then, is the source to what may be the tiniest possible native ARM executable:
Yes, BBC BASIC on RISC OS has a built-in ARM assembler, just like how the BBC BASIC on the BBC Micro had a built-in 6502 assembler. I wish I had that on the OSD's Linux setup!
We're not done yet, mind you. We can lose a word. We can dispose of setting R0 to zero prior to OS_Exit. Sure, the system might get some weird return code but this doesn't matter much. It is supposed to be a possible pointer to an error block, but RISC OS sanitises this in case of dimwit coders blindly calling OS_Exit with any old rubbish in R0. So we could push this down to 32 bytes. But, wait, we can use a rather icky little system call called OS_WriteS which will dick around with R14 to find a null terminated string following the SWI, which would then be written.
Therefore, I present to you the absolute smallest valid ARM executable to display a Hello World message:
Twenty eight bytes. Count 'em and weep. [also in your face Linux! nerr! etc]
You can download the RISC OS sources (Zip, 5KiB).
The HelloWorld is assembler to be built with amu, objasm, and link [so you'll need the RISC OS compiler suite]. HelloTiny is built using a simple BASIC program. Build it by running the BuildTiny Obey file - this is necessary to set up the correct path to write the output to. BldOMGCode will build the itty-bitty one. Run the BASIC code directly after having run BuildTiny.
Okay, it is nearly five AM and I know y'all think I have no life. That may be true, but hey... t'was fun. Until next time!
Your comments:
Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.
joe, 5th April 2011, 12:50
Thanks Rick, this is a real treasure. I have got win. CE 5.0 hacked GPS, with 600MHz CPU and 128MB of RAM to test my future codes, if any. Microsoft's Embeded C++ for win CE 5.0 "Hello World" needs 7.5 KB to paint it on the screen. The things are "backwards" because of endiannes, I am working on fully understanding this problem. I have read somewhere, that the best way to learn,is to write very simple programs and look at them in hexeditor. Maybe you should write a book, which explains it all, something like "Machine language for new generation", it would be the real best seller. I want at least 2 copies, just in case. joe
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.