It is the 1730th of March 2020 (aka the 24th of November 2024)
You are 18.117.78.87,
pleased to meet you!
mailto:blog-at-heyrick-dot-eu
File type detection
I was writing an article yesterday when I decided to browse the images that I had prepared to see what order to put them in.
File not found.
Huh?
I refreshed the browser and saw that the contents of my SD card had been scrambled, blended, and otherwise messed with.
Since nothing had been written to the card, I figured that the data ought to be recoverable. I went to DOS (well, NTVDM) and did chkdsk e: /f and let it get on with the job. Now chkdsk can often "recover" a damaged filesystem, but it does it with no intelligence whatsoever. Damaged things are not recovered, they are just hacked out.
So it created eighty billion .CHK files from the "lost fragments".
Most of these 'fragments' are valid files (and since nothing was written to the drive, should not be corrupt). The only issue is that they are rounded up to the natural block size of the filesystem, or multiples of 32KiB. This may or may not be a problem depending on the filetype.
So far I've recovered some videos from my phone, all of the originals of the photos of the tram-train trip to Nantes with Mick, and some other random PDFs and such that were not backed up to harddisc. I'm less concerned about the animé as I had copied the important stuff to harddisc.
I'm not feeling inclined to examine what may be thousands of files to see what they are, especially given that there is no guarantee that the fragments are even valid files - bits and pieces of leftover files will turn up, so don't panic if you see files containing gibberish or bits of files that you recognise.
Anyway, not wanting to wade through all the rubbish, I decided to throw together a program to do the grunt work.
Here it is. A wodge of VisualBasic to look at all the .CHK files and try to work out what sort of file each one is. The path is hardwired to E:\FOUND.000\ as that's what it was on my setup. Amend this as necessary. In hindsight, I probably should have used a "BasePath$" string or something. Oh well...
Option Explicit
Private Type FilesDef
FileName As String ' the filename
Exten As String ' what to rename it as
End Type
Private Sub Form_Load()
' Allocate space for 10000 elements. chkdsk won't deal with more on one run.
' (if there are more files, you can rename and move but DO NOT WRITE as it
' risks overwriting files that COULD be recovered)
Dim Files(10000) As FilesDef ' it's Windows, stupid memory claims are the norm! :-P
Dim Entity As String
Dim ThisCount As Integer
Dim FileCount As Integer
Dim YieldCount As Integer
Dim WordA As Long
Dim WordB As Long
Dim WordC As Long
Dim WordG As Long
Dim MyFP As Integer
Dim OldName As String
Dim NewName As String
' Force window open
Me.doing.Caption = "Initialising..."
Me.Show
DoEvents
Me.Refresh
DoEvents
' Scan loop
FileCount = 1
YieldCount = 1
Entity = Dir("E:\FOUND.000\*.CHK") ' **HARDWIRED** to the path of my SD card
' Tweak as appropriate.
Do While (Len(Entity) > 0)
' Strip OUT directories...
If (GetAttr("E:\FOUND.000\" & Entity) And vbDirectory) <> vbDirectory Then
' If there is an Entity, write it to array...
Files(FileCount).FileName = Entity
' Now open it up and read the first three words (3 x 4 bytes) and
' the word at +24.
MyFP = FreeFile
Open "E:\FOUND.000\" & Entity For Binary Access Read Lock Read As MyFP
Get #MyFP, 1, WordA
Get #MyFP, 5, WordB
Get #MyFP, 9, WordC
Get #MyFP, 25, WordG
Close #MyFP
Me.doing.Caption = "Examining file '" & Entity & "' (" & Str(FileCount) & ")..."
' Now attempt to guess its type - note the backwards byte order
' This list is NOT complete (no 7zip, no WMF, etc). These types are simply
' the ones that I expect to be present on my SD card...
' Image file types
' JPEG files begin "yoya" which is 0xFFD8FFE0 / 0xFFD8FFE1
If (WordA = &HE0FFD8FF) Or (WordA = &HE1FFD8FF) Then _
Files(FileCount).Exten = "jpeg"
' BMP images begin "BMxx x0xx" which is 0x424Dxxxx xx00xxxx
If (((WordA And &HFFFF&) = &H4D42&) And ((WordB And &HFF00&) = &H0&)) Then _
Files(FileCount).Exten = "bmp"
' GIF files begin "GIF8" which is 0x47494638
If (WordA = &H38464947) Then Files(FileCount).Exten = "gif"
' PNG images begin "%PNG" which is 0x89504E47
If (WordA = &H474E5089) Then Files(FileCount).Exten = "png"
' Video and audio file types
' MP4 files with "ftyp" in second word (0x66747970)
If (WordB = &H70797466) Then Files(FileCount).Exten = "mp4"
' AVI files have "AVI " in third word (0x41564920)
' (can't rely upon "RIFF" in first word as I think wav is same)
If (WordC = &H20495641) Then Files(FileCount).Exten = "avi"
' Matroska files have "matr" in seventh word (0x6D617472)
' (usually, I have seen some that differ)
If (WordG = &H7274616D) Then Files(FileCount).Exten = "mkv"
' FLV files begin "FLV[" which is 0x464C5601
If (WordA = &H1564C46) Then Files(FileCount).Exten = "flv"
' SRT subs begin "1<newline>0" which is 0x310D0A30
If (WordA = &H300A0D31) Then Files(FileCount).Exten = "srt"
' MP3 files with ID3 data begin "ID3[" which is 0x49443303
If (WordA = &H3334449) Then Files(FileCount).Exten = "mp3"
' MP3 files without ID3 tags *appear* to begin:
' FF F3 xx xx 00 00 xx xx
' FF FB xx xx 00 00 xx xx
' but since I don't know the meanings of these bytes, leave it.
' Archive file types
' Zip files begin "PK[]" which is 0x504B0304
If (WordA = &H4034B50) Then Files(FileCount).Exten = "zip"
' RAR archives begin "Rar!" which is 0x52617221
If (WordA = &H21726152) Then Files(FileCount).Exten = "rar"
' Other file types
' PDF documents begin "%PDF" which is 0x25504446
If (WordA = &H46445025) Then Files(FileCount).Exten = "pdf"
' HTML begins "<htm" or "<HTM" which is 0x3C68746D / 0x3C48544D
If (WordA = &H6D74683C) Or (WordA = &H4D54483C) Then _
Files(FileCount).Exten = "html"
' there's also the doctype version, too...
' Executables begin "MZ[] " which is 0x4D5A9000
If (WordA = &H905A4D) Then Files(FileCount).Exten = "exe"
' RISC OS DrawFiles begin "Draw" which is 0x44726177
If (WordA = &H77617244) Then Files(FileCount).Exten = "aff"
' In case anything else needs to be trapped
'If (Files(FileCount).Exten = "") Then
' Debug.Print Entity, Hex(WordA), Hex(WordB), Hex(WordC)
'End If
FileCount = FileCount + 1
If (FileCount > 10000) Then Error "Too many files."
End If ' is NOT a directory
' Yield?
YieldCount = YieldCount + 1
If (YieldCount > 150) Then
DoEvents
YieldCount = 0
End If
' Get next entity
Entity = Dir
Loop
' Now we have enumerated all of the files, time to start renaming them.
ThisCount = 1
Do While (ThisCount < FileCount)
If (Files(ThisCount).Exten <> "") Then
Me.doing.Caption = "Renaming file " & Str(ThisCount) & _
" of " & Str(FileCount) & "..."
OldName = "E:\FOUND.000\" & Files(ThisCount).FileName ' original name
NewName = Left(OldName, Len(OldName) - 3) ' hack off "CHK"
NewName = NewName & Files(ThisCount).Exten ' add new extension
Name OldName As NewName
End If
' Yield?
YieldCount = YieldCount + 1
If (YieldCount > 150) Then
DoEvents
YieldCount = 0
End If
ThisCount = ThisCount + 1
Loop
' We're done!
Beep
End
End Sub
Some principle applies if you want to do the same sort of thing in BBC BASIC.
Your comments:
Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.