User Tools

Site Tools


torrentzip

This is an old revision of the document!


TorrentZip

This specification is intended to define the implementation of zip as used by the TorrentZip standard.

Archive format

General format of a torrentzipped .zip file with n files:

Archive Start
[local file header 1]
[file data 1]
[local file header 2]
[file data 2]
[local file header n]
[file data n]
Central Directory - Start (SOCD file offset)
[central directory file 1]
[central directory file 2]
[central directory file n]
Central Directory - End (EOCD file offset)
[end of central directory record]
Archive End

Local file header x: (Showing torrentzipped default values):

Type Attribute Value Description
UInt32Local file header signature(0x04034b50)
UInt16Version needed to extract20 or 4520 = File is compressed using Deflate compression / 45 if this record contains zip64 information
UInt16General purpose bit flag2Maximum compression option was used, bit 11 (0x800) is set for unicode filename
UInt16Compression method8The file is Deflated
UInt16Last mod file time4812811:32 PM
UInt16Last mod file date860012/24/1996
UInt32CRC32 File CRC32
UInt32Compressed size File Compressed Size
UInt32Uncompressed size File Uncompressed Size
UInt16Filename length Filename length
UInt16Extra field length Normally 0, Length of Extra field data if zip64 extra field information is included
Byte[]Filename (variable size) Byte array of filename

The default values show are required to have consistent torrentzipped files. Default time/date of 11:32pm 12/24/1996 is the date of the first ever MAME release.

File data x:

The data compression must be exactly as ZLib version 1.1.3 using maximum compression level 9.

Central Directory file x: (Showing torrentzipped default values):

Type Attribute Value Description
UInt32Central file header signature(0x02014b50)
UInt16Version made by0MS_DOS and OS/2 (FAT/FAT32 file systems)
UInt16Version needed to extract20 or 4520 = File is compressed using Deflate compression / 45 if this record contains zip64 information
UInt16General purpose bit flag2Maximum compression option was used, bit 11 (0x800) is set for unicode filename
UInt16Compression method8The file is Deflated
UInt16Last mod file time4812811:32 PM
UInt16Last mod file date860012/24/1996
UInt32CRC32 File CRC32
UInt32Compressed size File Compressed Size
UInt32Uncompressed size File Uncompressed Size
UInt16File name length Filename length
UInt16Extra field length Normally 0, Length of Extra field data if zip64 extra field information is included
UInt16File comment length0No file comment
UInt16Disk number start0Multi disk storage not used so set to disk 0
UInt16Internal file attributes0No internal attributes
UInt32External file attributes0No external attributes
UInt32Relative offset of local header File offset of this files Local Header
Byte[]File name (variable size) Byte array of filename

End of Central Directory:

Type Attribute Value Description
UInt32End of central dir signature(0x06054b50)
UInt16Number of this disk0Multi disk storage not used so set to disk 0
UInt16Number of the disk with the start of the central directory0Multi disk storage not used so set to disk 0
UInt16Total number of entries in the central directory on this disknTotal number of files
UInt16Total number of entries in the central directorynTotal number of files
UInt32Size of the central directoryEOCD-SOCDlength of the central directories
UInt32Offset of start of central directory with respect to the starting disk numberSOCDStart of central directory
UInt16.ZIP file comment length22torrentzipped comment
Byte[22].ZIP file commentTORRENTZIPPED-XXXXXXXX

See above 'General format of a torrentzipped .zip file with n files' for SOCD & EOCD

The TorrentZipped files comment

The .ZIP file comment in the End of Central Directory is used to check the validity of the torrentzipped file.

The comment must be formatted as the 22 bytes of TORRENTZIPPED-XXXXXXXX. The XXXXXXXX is the CRC32 of the central directory records stored as hexadecimal upper case text. (The CRC32 of the bytes in the file between SOCD & EOCD)

This comment ensures that if any change is made to the files within the zip this checksum will no longer match the byte data in the central directory, and in this way we can check the validity of a torrentzip file.

Filename Encoding

The filenames of the compressed files in a zip file are stored in the local header and the central directory as byte arrays. Zip was original build on early IBM PCs, and as such uses code page 437 to convert a string to a byte array to store the filenames. With the arrival of unicode multiple different methods where added to the official zip format to permit unicode filenames to be stored in a zip file. Trrntzip format uses the general purpose bit 11 method. So to store a filename in a trrntzip zip file you must first see if the filename can be stored using code page 437, if not then UTF8 encoding should be used in the byte arrays, this is then indicated by setting bit 11 of the General Purpose Bit Flags both in the local header and central directory.

File order with a TorrentZip

For the creation of consistent torrentzipped files, the file order is also very important. Files must be sorted by filename using a lower case sort.

Directory separator character

As zips only store files (not directories), files in directories are represented by storing a relative path to the filename. For example file test1.rom in directory set1 would be stored with a filename of set1/test1.rom. Some zipping programs will store this as set1\test1.rom.

This leads to a possible naming inconsistency. The zip file format states: “All slashes should be forward slashes ‘/’ as opposed to backwards slashes ‘\’”. So Torrentzip will change all \ characters to /. This must be done before sorting, to ensure that the sort is performed correctly.

Directory entries and empty directories

A directory entry is stored in a zip by adding a file entry ending in a directory separator character with a zero size and CRC. So directory set1 would be stored as a zero length, zero CRC file called set1/. Some zip programs when adding the previously mentioned file set1/test1.rom will also add the directory set1/, this creates an inconsistency problem. In this example the set1/ directory entry is unnecessary, as the filename set1/test1.rom implies the existents of the set1/ directory. To resolve this inconsistency un-needed directories should be removed from the zip, the only needed directory entries are empty directories that are not implied by any file entries.

Example:

Filename Size CRC
set1/000000000
set1/test1.rom102453AC4D00
set2/000000000

The set1/ entry should be removed, as it is implied by the set1/test1.rom file. The set2/ entry should be kept to create the empty directory, as removing it would completely remove the set2 directory.

Repeat files

Another test that could be performed is checking for repeat file entries inside the zip, most zip programs have a hard time handling this and will just ignore this repeat giving the user no way of knowing there is a repeat filename problem. So it would fix another possible inconsistency if torrentzip scanning at least warned about repeat filename being found inside a zip.

Reference

torrentzip.1668971265.txt.gz · Last modified: 2022/11/20 11:07 by johnsanc