Rick's b.log - 2020/09/08 |
|
It is the 21st of November 2024 You are 18.222.182.249, pleased to meet you! |
|
mailto:
blog -at- heyrick -dot- eu
Manga would start, it would download the main manga index, as normal, and then the machine would stiff.
Infuriatingly, whatever the code was actually doing would not be replicated in the debugger builds, as programs built with debugging have all of the optimisation turned off, to do exactly what the code says, rather than taking shortcuts.
Also, infuriatingly, adding some of my own logging and debug code made the problem go away.
To try to understand what was happening, let's look at the code. I have removed all comments to keep things simpler.
Next we set up the environment (pop up a blank sliding-bar window, paste a message into it, set the slider to 0%), then we check and open the file. I don't need to bore you with the code for that.
Then we start to parse, as follows:
If the divider class "d43" is found, then this marks the beginning of a new section. Because there are over four thousand manga, we use the sections to organise the menu structure for choosing Manga. For example, Sakura Wars (I made that up, but it sounds like something a manga would be called) is listed under the menu entry 'S', and not after two or three thousand others.
If we find a new section, then the section name is easily read out as the 17th byte from the 'div' point. It's a single character, being either '#' or a number, or the letters from A to Z.
We then increment thissect in order that it points to a valid array entry. And then we sanity check it to check it isn't too big (SECTMAX is 40, the letters A-Z, numbers 0-9, and # equal 37; leaving space for three in case they add something else).
Assuming we continue, then we set up the status for the section. Each array consists of three words. The first is the character of the section. This is only a byte, but things on the ARM are word aligned, so might as well treat it as a word. The next word is a count of how many manga in the section (initialised to zero), and the final word is the number of the next manga to be added (as it is incremented after a manga has been read in).
Every time a new section is read, we update the sliding bar thingy and poll twice. This is a useful tradeoff between parsing speed (we don't want to update/poll too often) versus system responsiveness (we don't want to poll too infrequently). On my Pi1 and Pi2 (sort of average machines these days, I would imagine), you can clearly see it's doing something and it doesn't take too long.
The final snippet is to look for the beginning of an entry. It's a list containing a link. There follow two screenfuls of code to parse that and set up the manga array... but this isn't relevant to the discussion.
It was, as I mentioned, totally stiffing the machine.
By a process of trial and error, I located it down to the three lines that set up the section.
This failed:
This worked:
Now is where it starts to get messy. Fasten your seatbelts.
The function entry looks like this:
It's a pretty standard APCS entry. SP is copied to IP, a bunch of stuff is written to the stack (including the value of the stack pointer on entry, now in IP). Then we muck about to get the size of the stack that we require, and if we cannot, call a function to make it happen.
In looking at the code, it would appear that SP+0 is used for storing the value of the length of the manga title, ttllen, and it looks like SP+4 is used for the result of scan.
Now, the interesting thing is that I found the bug in an entirely incorrect roundabout way. As I said, removing the lines assigning
I was on the right track... for completely the wrong reason!
Looking at the File_ReadLine() routine, it had an error. It was supposed to read, at most, 1023 bytes. Instead, it read while the offset was <1024.
Now what actually happened here is that the address of that poll block structure was corrupted by overshooting the buffer, and we then use that messed up address with sprintf() to generate the "Reading section 'X', xxxx manga indexed" message. Where? Well...
I'm not entirely certain what bit of memory was being trampled on, as the machine froze, as I said. It was only in that specific build, too. Oh, sure, the problem would always have potentially happened, but until Cloudflare get involved, no lines came anywhere near the 1K string limit. Indeed, given that the fetcher breaks up the HTML (it's either that or completely rewrite the parser!), the only thing that remains "big" is that wodge of CSS at the start of each page. And why this version? For the sme reason that deleting the sections[] assignments or adding debug code made the problem seemingly go away... it simply shifted the position of event_lastevent in the executable, which gave it a different address, which meant that whatever was being trampled on this time wasn't something that was going to instantly stiff the machine. Who knows, given that the mouse pointer froze, I might have just crapped on the interrupt vector chain or something. Arguably one shouldn't be able to write that stuff from user mode code... but this is RISC OS we're talking about... 😔
Oh, one final thing. Remember that I said that code built in debugging mode was very different as optimisations were turned off?
Compare function entry start:
Now, finally, let's compare the sections[] assignment lines:
And finally... your dose of kitten...
Tracking down a buffer overflow
A discussion on the ROOL forum reminded me of a problem that I experienced with Manga recently.
Completely.
No mouse pointer movement, nothing.
static BOOL mangareader_parselist(void)
{
file_handle ifp;
char inputline[1024];
int thissect = -1;
int thismanga = 0;
char *scan = NULL;
char *poke = NULL;
int ttllen = 0;
int ttlcrc = 0;
[etc]
That is the start of the function. There's a word for the file handle, then 1024 characters to be used as an input buffer, then a pointer to which section (A, B, C, etc) and which manga. Because of how the code is written, thissect is initialised to -1 because it is incremented before use, so the section array counts from zero. scan and poke are for looking for things. scan is non-NULL if strstr() or strchr() finds something, and poke is used to further muck around with what scan is.
To demonstrate:
inputline is: "this is a test".
scan is the result of looking for "is", so points to "is a test".
poke is the result of looking for "this" in scan. It isn't there, so is NULL.
I hope that makes some sense... ☺
Next up comes the title length and CRC value, these are used to create the UIDs for each manga.
There's some more stuff, it isn't important.
while ( !File_EOF(ifp) )
{
File_ReadLine(ifp, inputline);
scan = strstr(inputline, "<div class=\"d43\">");
if (scan != NULL)
{
section = scan[17];
thissect++;
if (thissect == SECTMAX)
{
Error_ReportWithType( [error message here] );
File_Seek(ifp, File_ReadExtent(ifp));
goto skipout; // argh! argh! zombies! argh!
}
sections[thissect].sectext = section;
sections[thissect].objects = 0;
sections[thissect].arrayoffset = thismanga;
sprintf(event_lastevent.data.bytes, message_read("mread.lpsec"), section, thismanga);
percent = ( File_ReturnPos(ifp) * 100 ) / File_ReadExtent(ifp);
busy_setstatus(message_read("mread.lparse"), event_lastevent.data.bytes, TRUE, percent);
wimp_poll_now();
wimp_poll_now();
}
scan = strstr(inputline, "<li><a href=\"/");
if ( (scan != NULL) && (section != 0) )
{
Okay. What's going on here is we're entering into the loop that reads lines of text from the input. This will be lines of HTML, Javascript, and whatever other rubbish is in the HTML file that is mangareader.net/alphabetical.
If the section maximum has been reached, then we report a warning and force the end of the file to be reached, and goto (yes, a goto!) the end of the scanning routine. The logic here is to basically give up and roll with what we have.
The logic for this is that if we want to find the 20th manga in 'S', then we simply look for the arrayoffset of 'S' and add 19 to it (19, not 20, as we begin counting from zero).
sections[thissect].sectext = section;
sections[thissect].objects = 0;
sections[thissect].arrayoffset = thismanga;
printf("Work, you ****ing piece of ****!");
sections[thissect].sectext = section;
sections[thissect].objects = 0;
sections[thissect].arrayoffset = thismanga;
mangareader_parselist
MOV ip,sp
STMDB sp!,{v1-v6,fp,ip,lr,pc}
SUB fp,ip,#4
SUB ip,sp,#&410
CMP ip,sl
BLMI __rt_stkovf_split_big
SUB sp,sp,#&400
SUB sp,sp,#&c
MVN v4,#0
MOV v3,#0
MOV v6,#0
Then an interesting part. We add &40C to the stack pointer. In other words, 1024 bytes plus 12 bytes.
The next 1024 bytes are the input buffer to be used by the line reading routine. And finally, SP+408 looks to be a pointer to the data of the event_lastevent block. This is the poll block, and it is used as temporary workspace.
sections[thissect].whatever = something;
led me to suspect that it might have been thissect that was getting corrupted. The section array is set up elsewhere and it is a fixed size (not malloc), so "logically" if something was corrupting thissect, then it might end up writing to a nonsense array offset, like sections[-1389834]
or something.
This led me to look at what was defined prior to thisect. That being the big old inputbuffer string space.
This might look correct. Certainly, you're going until 1023 and then stopping, right? Wrong. Because you're counting from zero and stuffing a terminating NULL to the end.
In other words, what we want to do is read 1023 bytes and then add a terminator. What actually happened was 1024 bytes were read, and then the terminator was added... to make a total of 1025 bytes.
mangareader_parselist mangareader_parselist
MOV ip,sp MOV ip,sp
STMDB sp!,{v1-v6,fp,ip,lr,pc} STMDB sp!,{v1-v6,fp,ip,lr,pc}
SUB fp,ip,#4 SUB fp,ip,#4
SUB ip,sp,#&410 SUB ip,sp,#&410
CMP ip,sl CMP ip,sl
BLMI __rt_stkovf_split_big BLMI __rt_stkovf_split_big
SUB sp,sp,#&400 SUB sp,sp,#&400
SUB sp,sp,#&c SUB sp,sp,#&c
MVN v4,#0 MVN v2,#0
MOV v3,#0 MOV v3,#0
MOV v6,#0 MOV v4,#0
ADD a1,pc,#L00063c-.-8 MOV v5,#0
BL message_read MOV v6,#0
MOV a2,#0
STR a2,[sp,#8]
MOV a3,#0
STR a3,[sp,#4]
MOV a4,#0
STR a4,[sp]
ADD a1,pc,#L0006ec-.-8
BL message_read
A you can see, when I say "=NULL" or "=0", it actually does it.
Note also the register set to -1 using MVN
. It's allocating registers entirely differently, so good luck trying to compare debug code with non-debug code.
|L0003f4.J13.mangareader_parselist| |L000418.J28.mangareader_parselist|
LDR a3,[pc, #L000698-.-8] LDR a2,[pc, #L000744-.-8]
MOV a4,#0 STR v4,[a2,v2,LSL #4]
ADD a1,pc,#L00069c-.-8 MOV a4,#0
STR v6,[a3,v4,LSL #4] LDR a1,[pc, #L000744-.-8]
ADD a3,a3,v4,LSL #4 ADD a3,a1,v2,LSL #4
STR a4,[a3,#4]! STR a4,[a3,#4]
STR v3,[a3,#8]! LDR a1,[pc, #L000744-.-8]
BL message_read ADD a3,a1,v2,LSL #4
STR v3,[a3,#&c]
ADD a1,pc,#L000748-.-8
BL message_read
Dehumidifier unboxing
It arrived very quickly from Amazon. It was... a little smaller than I expected. Though, to be honest, I'm not sure what I was expecting!
Anna susses ladders
I had taken Anna up to the potager to play/hunt/roam as there's plenty of stuff to look at, poke around, and climb. I had gone to sit on the roof of the woodshed, basically so I wouldn't have flying kitten attacking soft parts of my anatomy every five seconds.
Yeah... about that...
David Pilling, 11th September 2020, 01:43 @25C and 100% humidity "one liter of air can hold 0.941mmol of water vapor, which is 0.017 mL of liquid water"
© 2020 Rick Murray |
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted. RIPA notice: No consent is given for interception of page transmission. |