mailto: blog -at- heyrick -dot- eu
Just before my holiday, I called the Aixam dealer up near Rennes, asking about having my car serviced. This wasn't possible, because I want my car collected (no way I'm driving up there by myself!) in order to have an oil change, new drive belt, etc and they are mostly on holiday (as would I be!).
Strictly speaking, I don't need this to be done right now, as I have another 3000 or so kilometres to go; however when I go back to work (as of next Monday), it's our busy period in the run up to Christmas, so I wish to know everything is good and that I have a good drive belt, recently changed oil, and so on.
I'd prefer to have the oil changed too soon rather than too late.
Somebody called this morning. They plan to pick up Felicity on Thursday afternoon, do the work on Friday morning, and bring her back on Friday afternoon. Talk about leaving it to the last minute!
A much improved search and replace
Yesterday, I complained about how bad Google Docs is, with emphasis on it's failure to handle a relatively simple search and replace, changing six asterisks into three.
I wrote a really rubbish example program that did it, shuffling memory around each time, to demonstrate how such a thing wasn't really such an onerous task. A version compiled with ABC took eighty seconds. I didn't try with raw BASIC as it would have had many overheads in interpreting the code itself (so I'd guess it might have been several minutes).
Unfortunately I had the idea of using two buffers, and to read from one and write to another, in order to minimise the amount of data moving required.
I say unfortunately as that's the sort of stupid rubbish that keeps me awake at night. Being alone? No problem. No close family any more? No problem. Might die of the plague? No problem. Stupid bloody text replace algorithm? Toss, turn, not sleeping.
So at about 3am I got up, fired up Zap, refused to have a cup of tea, and wrote this from scratch:
REM Search for something, replace it with another
REM Improved version, uses two buffers to avoid
REM endless memory shuffles.
REM To simplify memory handling, the replacement
REM string MUST be the same size or shorter.
REM What to look for
find$ = "******"
REM What to replace it with (this must be shorter)
repl$ = "***"
t% = TIME
REM Input the file
fp% = OPENIN("INPUT_FILE")
size% = EXT#fp%
DIM main% size% : REM Original buffer
DIM copy% size% : REM Copy buffer
REM Load the file
SYS "XOS_GBPB", 4, fp%, main%, size%
moff% = 0 : REM Offset into main buffer
coff% = 0 : REM Offset into copy buffer
byte% = 0 : REM Byte read
mtch% = 0 : REM First byte of match string
mtch% = ASC(LEFT$(find$, 1))
REM Read a byte from the main buffer
byte% = main%?moff%
REM Does it match the string we're looking for?
IF ( byte% = mtch% ) THEN
REM Look ahead to see if it matches the rest
good% = TRUE
FOR loop% = 1 TO LEN(find$)
this% = main%?(moff% + (loop% - 1)) : REM Byte from main
that% = ASC(MID$(find$, loop%, 1)) : REM Byte from string
IF ( this% <> that% ) THEN good% = FALSE
REM If good (a match), then insert the replacement
REM into the copy buffer.
IF ( good% = TRUE ) THEN
FOR loop% = 1 TO LEN(repl$)
copy%?coff% = ASC(MID$(repl$, loop%, 1))
coff% += 1
REM And skip over string in main buffer
moff% += LEN(find$)
REM String not a match, so treat this as a regular byte
copy%?coff% = byte%
moff% += 1
coff% += 1
REM Byte was not a match, so copy it over
copy%?coff% = byte%
REM And increment the pointers
moff% += 1
coff% += 1
UNTIL ( moff% = size% )
SYS "Hourglass_Percentage", 0
REM Now save the result so we can look at it
fp% = OPENOUT("OUTPUT_FILE")
SYS "XOS_GBPB", 2, fp%, copy%, coff%
PRINT "Search and replace took "+STR$(TIME-t%)+" centiseconds."
I ran it, got a pleasing result, and then went to bed. And slept until half nine. ☺
The pleasing result? The version built with ABC took nine centiseconds; in other words a fraction less than a tenth of a second.
And since it was so fast, I had a crack at running the pure BASIC version.
In other words, a second and a half.
There is, of course, the same main limitation as before - the replace string cannot be larger than the search string. This is purely because BASIC doesn't have the ability to resize existing memory claims.
Obviously a real search and replace needs to handle many more potential permutations - such as the ability to replace with larger strings (and/or empty strings), optional case insensitivity, optional ignoring of diacritics (that means to treat something like "âïmé" as "aime"), plus these days, potentially handling non-Latin text.
All of that being said, if a simple algorithm written in interpreted BASIC can manage it in a second and a half, remind me again why Google Docs cannot manage it in a few seconds at most on a device that is more powerful?
Okay, I've brought the operation time down to a second and a half in BASIC, or a tenth of a second when compiled. That basically takes the point, stamps it onto a cartoon anvil, and then drops it from a great height.
So... I can let this issue rest now. ☺
Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.
|Steve Drain, 11th August 2020, 16:34|
It is too much of a challenge. I have put a very simple replace program at http://kappa.me.uk/Miscellaneous/swReplace.zip, which does replace with longer strings and no memory problems. I would be interested to know if it survives your test and what speed it makes.
|VinceH, 11th August 2020, 18:26|
My solution to the larger string problem in the original version(s) of WebChange was to provide enough space to double the size of the original file. Nobody would hit that as a limit, I thought.
Someone hit that as a limit.
The way I do it now is to not hold the replacement file in memory at all. As the search and replace is carried out, the replacement is written to Scrap as it's built up. When the task is done, it loads it and compares to the original - if they differ, the new file replaces the old one.
Probably not the most efficient approach, but it avoids that problem and could in theory produce a file that wouldn't even fit in memory.
|VinceH, 11th August 2020, 18:29|
(Apart from it not being possible to carry out that final test if the file is that large, of course - but I think I may have implemented a size test first. In fact, if I haven't, I should. That'd be a bit faster anyway, and then I'd only need to compare the contents if the size is the same.)
|Rick, 13th August 2020, 10:43|
And so it's not to be.
When the mechanic said "jeudi prochaine" he meant Thursday in a week not the actual next Thursday (today).
So I've cancelled that, and will need to think about alternatives.
The gearbox oil was changed fairly recently. The engine oil ought to be good for another 2000km or so. Most of when I wanted was a new drive belt and general check up. I've looked at the belt and it seems "okay" (in that there's no obvious signs of cracked rubber or anything).
So the poking and prodding isn't urgent... Which will give me time to think about options.
To put this into context, working an estimated 22 days a month, 25km journey (there and back), it'll be just a little under 2,500km between now and winter holiday if I don't go anywhere else.
|Rick, 13th August 2020, 15:01|
I cannot test the file version, it reports "Outside file" (line 260) both on the test supplied, and on my test (which should make a shorter file).
The in-memory version, which uses the cute trick of swapping bytes with carriage returns, to then use BASIC's string handling keywords to look for matches, does it in an impressive 26 centiseconds. In BASIC!
So, I can't test the file-to-file version, but the process has been reduced to a mere quarter of a second. In BASIC.
Google? Your excuse?
List all b.log entries
Return to the site index
PS: Don't try to be clever.
It's a simple substring match.
Last read at 14:57 on 2020/09/29.
© 2020 Rick Murray
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.