Home

Finessing file types for scrape one

February 28, 2020

Summarized from whitefiles

An archive is a special file used to store one or more files or folders. The most common formats encountered on Mac OS are listed below.

This post will act as a reference when formulating a component of the string for the batch script that will download the file types required for each wayback machine scrape.

For example:

wayback_machine_downloader -s -d URL_IA -t2003 -c6 --only "/\.(hqx|sit|sitx|dd|pkg|abs|bin|sea|cpt|mcf|dmg)$/i" URL

File types

(Mac OS X) Disk Image

.dmg • devi | file extension** - creator**

This file stores the contents of a virtual disk’ as a disk image. The disk’ containing the original material is mounted on the desktop whenever it is opened. These archives can be created using Disk Utility in Mac OS X or Disk Copy on Mac OS 9. These files are exclusive to Mac OS X and above and can’t be opened in the Classic Mac OS.

(Mac OS 9) Disk Image

.img .image • rohd/dlmg/dimg | file extension - creator

Similar to .dmg files, these older file formats are created by Disk Copy, ShrinkWrap and other Classic Mac OS applications. As these files contain resource forks, they must be encoded prior to transfer over the internet.

(Mac OS 9) Disk Image - Self Mounting

.smi • APPL | file extension - creator

Similar to .dmg files, these older file formates are created by Disk Copy, ShrinkWrap and other Classic Mac OS applications. As these files contain resource forks, they must be encoded prior to transfer over the internet.

(Mac OS 9) StuffIt version 1.5.1

.sit • SIT! | file extension - creator

sv1.5.1 can be unstuffed using modern applications.

(Mac OS 9) StuffIt versions 2, 3 and 4

.sit • SITD! | file extension - creator

v2/3/4 can be unstuffed by modern applications, although they can’t be created by StuffIt 5.x applications.

(Mac OS 9) StuffIt version 5.x

.sit • SIT5 | file extension - creator

v5.x can only be created with StuffIt 5.x or later applications. StuffIt 5.x software is also required for unstuffing

(Mac OS 9/X) StuffIt version 7.x

.sitx • ???? | file extension - ResEdit is unable to locate a signature which may account for the unknown creator

This format yields a higher rate of compression, supports files larger than 4 gigabytes and is compatible with StuffIt Deluxe 7.x or higher.

(Mac OS 9 and below) Compact Pro

.cpt • PACT | file extension - creator

A Classic Mac OS archive containing one or more files, complete with Finder information and resource forks. These archives were originally created and expanded using Compact Pro or the older Compactor application. Freeware utilities such as cptExpand and Extractor and other Aladdin products can be used for decompression.

(Mac OS 9 and below) DiskDoubler

.dd • DD01 | file extension - creator

This type of file was originally created by Symantec’s DiskDoubler but can also be expanded with Aladdin products or with the freeware DDExpand.

(Mac OS 9 and below) PackIt

.pit • PIT | file extension - creator

These ancient files were originally created using PackIt, although they can also be expanded with Aladdin products or with the freeware Unpit.

(Mac OS 9 and below) Self-Extracting Archive

.sea • APPL | file extension - creator

A Classic Mac OS application program created with a StuffIt application. This kind of file self-extracts’ its contents when you double-click on it from within the Classic Mac OS environment.


Archives marked * will be downloaded in Phase 1

Type of Archive Extensions
AppleLink * .pkg
AppleSingle * .as
BinHex * .hqx
BZip .bz2, .bz, .tbz
Compact Pro * .cpt
GZip .gz, .tgz
LHa .lha, .lhz
MacBinary * .bin
MIME/Base64 .mime, .mim
Private File .pf
RAR .rar
Self-extracting archive (SEA) * .sea
StuffIt * .sit
StuffIt X * .sitx
TAR .tar
Unix Compress .Z, .taz
UU .uu, .uue
Zip .zip
Classic Mac OS self-extracting archive (SEA) * .sea
DiskDoubler * .dd
PackIt * .pit

Resulting command line

wayback_machine_downloader -s -d URL_IA -c6 --only "/\.(pkg|as|hqx|cpt|bin|sea|sit|sitx|dd|pit)$/i" URL

ResEdit, file extensions and creators**

Field Name Type | A file’s Type (‘AAPL for applications, for example. Field Name Creator | A file’s Creator. An application and its associated documents all have the same Creator. Every file has a Type field and a Creator field which the Finder uses to establish the relationship between a document and the application that created it. The Type and Creator are both four characters long (just like a resource type). The Creator field of a document file is the same as the Creator of the application that created it. Applications must have unique Creators so the Finder will know which document files belong to which application. The Type field of a document file distinguishes between different document types of the same application. The Type field of an application is always set to APPL so the Finder will know it’s an application.


When editing the BNDL resource in ResEdit, the Field Name Signature name is the same as the Creator and is for the application and its documents, it is used by the Finder to find a document’s Creator. ResEdit was a quintessential tool for Mac power users. The Genius of Mac: ResEdit and resources on The Eclectic Light Company’s site and ResEdit Reference: For ResEdit 2.1 on Apple’s archive document library both offer some further reading.

Scrape two

Preliminarily .pdf, .html, .htm, and .txt file types are potential candidates.

Scrape three

Mac OS X file types.