User Tools

Site Tools


rvzstd

This is an old revision of the document!


RVZstd

This specification defines the implementation of ZIP archives as used by the RVZstd standard. This standard is RomVault's implementation of deterministic ZIP archives compressed with zstd.

Archive format

General format of an RVZstd .zip archive with n files:

Archive Start
[local file header 1]
[file data 1]
[local file header 2]
[file data 2]
[local file header n]
[file data n]
Central Directory - Start (SOCD file offset)
[central directory file 1]
[central directory file 2]
[central directory file n]
Central Directory - End (EOCD file offset)
[end of central directory record]
Archive End

Local file header x: (Showing RVZstd default values):

Type Attribute Value Description
UInt32Local file header signature04034B50Static signature value
UInt16Version needed to extract3F0063: Used for modern compression algorithms like LZMA and zstd
UInt16General purpose bit flag Bit 2: (0x200) Maximum compression option was used. Bit 11: (0x800) is set for unicode filename
UInt16Compression method5D0093: The file is compressed with zstd
UInt16Last mod file time0000zeroed time, 1/1/0001
UInt16Last mod file date0000zeroed date, 12:00:00 AM
UInt32CRC32 File CRC32
UInt32Compressed size File Compressed Size
UInt32Uncompressed size File Uncompressed Size
UInt16Filename length Filename length
UInt16Extra field length Normally 0, Length of Extra field data if zip64 extra field information is included
Byte[]Filename (variable size) Byte array of filename

The default values are required to ensure consistent RVZstd files. Unlike torrentzip, RVZstd uses zeroed values for date and time instead of the date/time of the first MAME release.

File data x:

The data compression must be exactly zstd version 1.5.5 using level 19 without long distance matching, training, or any-other parameters altered. All files and empty directories must be compressed with the zstd method, which is method 93.

Central Directory file x: (Showing RVZstd default values):

Type Attribute Value Description
UInt32Central file header signature02014B50Static signature value
UInt16Version made by0000MS_DOS and OS/2 (FAT/FAT32 file systems)
UInt16Version needed to extract3F0063: Used for modern compression algorithms like LZMA and zstd
UInt16General purpose bit flag Bit 2: (0x200) Maximum compression option was used. Bit 11: (0x800) is set for unicode filename
UInt16Compression method5D0093: The file is compressed with zstd
UInt16Last mod file time0000zeroed time, 1/1/0001
UInt16Last mod file date0000zeroed date, 12:00:00 AM
UInt32CRC32 File CRC32
UInt32Compressed size File Compressed Size
UInt32Uncompressed size File Uncompressed Size
UInt16File name length Filename length
UInt16Extra field length Normally 0, Length of Extra field data if zip64 extra field information is included
UInt16File comment length0000No file comment
UInt16Disk number start0000Multi disk storage not used so set to disk 0
UInt16Internal file attributes0000No internal attributes
UInt32External file attributes0000No external attributes
UInt32Relative offset of local header File offset of this files Local Header
Byte[]File name (variable size) Byte array of filename

End of Central Directory:

Type Attribute Value Description
UInt32End of central dir signature06054B50Static signature value
UInt16Number of this disk0000Multi disk storage not used so set to disk 0
UInt16Number of the disk with the start of the central directory0000Multi disk storage not used so set to disk 0
UInt16Total number of entries in the central directory on this disknTotal number of files
UInt16Total number of entries in the central directorynTotal number of files
UInt32Size of the central directoryEOCD-SOCDlength of the central directories
UInt32Offset of start of central directory with respect to the starting disk numberSOCDStart of central directory
UInt16.ZIP file comment length0F0015: RVSztd comment
Byte[22].ZIP file comment52565A5354442D…RVZSTD-XXXXXXXX

See above 'General format of an RVZstd .zip file with n files' for SOCD & EOCD

The RVZstd archive comment

The ZIP file comment in the End of Central Directory is used to check the validity of the RVZstd file.

The comment must be formatted as the 15 bytes of RVZSTD-XXXXXXXX. The XXXXXXXX is the CRC32 of the central directory records stored as hexadecimal upper case text. (The CRC32 of the bytes in the file between SOCD & EOCD)

This comment ensures that if any change is made to the files within the zip this checksum will no longer match the byte data in the central directory, and in this way we can check the validity of an RVZstd file.

Filename Encoding

The filenames of the compressed files in a zip archive are stored in the local header and the central directory as byte arrays. Zip was original built on early IBM PCs, and as such uses code page 437 to convert a string to a byte array to store the filenames. With the arrival of Unicode, multiple different methods were added to the official zip format to permit Unicode filenames to be stored in a zip file. RVZstd format uses the general purpose bit 11 method. So to store a filename in an RVZstd zip file you must first see if the filename can be stored using code page 437, if not then UTF-8 encoding should be used in the byte arrays, this is then indicated by setting bit 11 of the General Purpose Bit Flags both in the local header and central directory.

File order with an RVZstd archive

For the creation of consistent RVZstd files, the file order is also very important. Files must be sorted by filename using a lower case sort.

Directory separator character

As zips only store files (not directories), files in directories are represented by storing a relative path to the filename. For example file test1.rom in directory set1 would be stored with a filename of set1/test1.rom. Some zipping programs will store this as set1\test1.rom.

This leads to a possible naming inconsistency. The zip file format states: “All slashes should be forward slashes ‘/’ as opposed to backwards slashes ‘\’”. So RomVault will change all \ characters to /. This must be done before sorting, to ensure that the sort is performed correctly.

Directory entries and empty directories

A directory entry is stored in a zip by adding a file entry ending in a directory separator character with a zero size and CRC. So directory set1 would be stored as a zero length, zero CRC file called set1/. Some zip programs when adding the previously mentioned file set1/test1.rom will also add the directory set1/, this creates an inconsistency problem. In this example the set1/ directory entry is unnecessary, as the filename set1/test1.rom implies the existents of the set1/ directory. To resolve this inconsistency un-needed directories should be removed from the zip, the only needed directory entries are empty directories that are not implied by any file entries.

Example:

Filename Size CRC
set1/000000000
set1/test1.rom102453AC4D00
set2/000000000

The set1/ entry should be removed, as it is implied by the set1/test1.rom file. The set2/ entry should be kept to create the empty directory, as removing it would completely remove the set2 directory.

Repeat files

Another test that could be performed is checking for repeat file entries inside the zip, most zip programs have a hard time handling this and will just ignore this repeat giving the user no way of knowing there is a repeat filename problem. So it would fix another possible inconsistency if RVZstd scanning at least warned about repeat filename being found inside a zip.

Reference

rvzstd.1760239723.txt.gz · Last modified: 2025/10/12 03:28 by johnsanc