Parts
- Part 1 — Searching The Data → /posts/wows_depack_part1/
- Part 2 — Getting The Metadata → /posts/wows_depack_part2/
- Part 3 — Reading The Database → /posts/wows_depack_part3/
- Part 4 — Tidying-Up The Project → /posts/wows_depack_part4/
The Implementation
In the last part, we discovered and got a fairly good idea of the metadata/IDX format.
In this part, we will create a rough implementation to extract the content.
Data Structures
First, define C structures matching our reverse-engineered format:
// INDEX file header
typedef struct {
char magic[4];
uint32_t unknown_1;
uint32_t id;
uint32_t unknown_2;
uint32_t file_plus_dir_count;
uint32_t file_count;
uint64_t unknown_3;
uint64_t header_size;
uint64_t offset_idx_data_section;
uint64_t offset_idx_footer_section;
} WOWS_INDEX_HEADER;
Parser Implementation
Note: These examples omit error handling for clarity. Production code requires bounds checking and validation.
File Mapping
Memory map the index file:
// Open the index file
int fd = open(args.input, O_RDONLY);
// Recover the file size
struct stat s;
fstat(fd, &s);
size_t index_size = s.st_size;
// Map the whole content in memory
char *index_content = mmap(0, index_size, PROT_READ, MAP_PRIVATE, fd, 0);
The second step is to have an entry point to actually parse the thing:
WOWS_CONTEXT context;
context.debug = true;
// Start the parsing
return wows_parse_index(index_content, index_size, &context);
Here, I pass the memory-mapped content of the index, its size (will be used in the future to avoid overflows), and a context
, which will be used to pass parsing options and maintain “states” in the parsing if necessary.
Parsing the Header Section
int wows_parse_index(char *contents, size_t length, WOWS_CONTEXT *context) {
// header section
WOWS_INDEX_HEADER *header = (WOWS_INDEX_HEADER *)contents;
We can print it with a few printf
:
int print_header(WOWS_INDEX_HEADER *header) {
printf("Index Header Content:\n");
printf("* magic: %.4s\n", (char *)&header->magic);
printf("* unknown_1: 0x%x\n", header->unknown_1);
[...]
return 0;
}
Output:
Index Header Content:
* magic: ISFP
* unknown_1: 0x2000000
* id: 0xb4399d91
* unknown_2: 0x40
* file_plus_dir_count: 311
* file_count: 284
* unknown_3: 1
* header_size: 40
* offset_idx_data_section: 0x3bf6
* offset_idx_footer_section: 0x7136
Metadata entries
Then, we can do a bunch of pointer arithmetic operations to extract the other sections of the index file:
// Recover the start of the metadata array
WOWS_INDEX_METADATA_ENTRY *metadatas;
metadatas =
(WOWS_INDEX_METADATA_ENTRY *)(contents + sizeof(WOWS_INDEX_HEADER));
Then, we do something with these sections, like for example:
// Parse & print each entry in the metadata section
for (i = 0; i < header->file_plus_dir_count; i++) {
if (context->debug) {
print_metadata_entry(&metadatas[i], i);
}
}
With print_metadata_entry
looking like this:
int print_metadata_entry(WOWS_INDEX_METADATA_ENTRY *entry, int index) {
printf("Metadata entry %d:\n", index);
printf("* file_type: %lu\n", entry->file_type_1);
printf("* offset_idx_file_name: 0x%lx\n", entry->offset_idx_file_name);
printf("* unknown_4: 0x%lx\n", entry->unknown_4);
printf("* file_type_2: 0x%lx\n", entry->file_type_2);
return 0;
}
Re-Evaluating some of the fields meaning:
Once done, it gives us a more comfortable read:
[...]
Metadata entry 0:
* file_type: 14
* offset_idx_file_name: 0x26e0
* unknown_4: 0x93b6404fba9a0c8f
* file_type_2: 0xdbb1a1d1b108b927
Metadata entry 1:
* file_type: 19
* offset_idx_file_name: 0x26ce
* unknown_4: 0xc7f7d0284a87ec8f
* file_type_2: 0x74d821503e1beba4
Metadata entry 2:
* file_type: 18
* offset_idx_file_name: 0x26c1
* unknown_4: 0x6b4f2cace7a270ad
* file_type_2: 0xdbb1a1d1b108b927
Metadata entry 3:
[...]
Metadata entry 310:
* file_type: 19
* offset_idx_file_name: 0x14fb
* unknown_4: 0xce7afff48d1bd174
* file_type_2: 0x74d821503e1beba4
This permits us to review our previous reverse and right away there are two interesting things to note:
- There was a bit of an unknown regarding the number of metadata chunks: was it
file_count
orfile_plus_dir_count
? Now we are more certain it’sfile_plus_dir_count
as it’s the larger value. If it was not, we would try to parse a section past the metadatas as metadata with funky results (garbage or crash). This is not the case. file_type
in metadata is not a file type/enum. The values are small but quite varied; it’s more likely the length of the file name.
Lets check with the last entry:
Metadata entry 310:
* file_type: 19
[...]
kakwa@linux 6775398/idx » hexdump -C system_data.idx | less
[...]
00003be0 61 72 69 61 74 69 6f 6e 5f 64 75 6d 6d 79 2e 64 |ariation_dummy.d|
00003bf0 64 73 00 77 61 76 65 73 5f 68 65 69 67 68 74 73 |ds.waves_heights|
00003c00 30 2e 64 64 73 00 8f 0c 9a ba 4f 40 b6 93 70 11 |0.dds.....O@..p.|
00003c10 03 07 0d 33 ed 77 00 00 00 00 00 00 00 00 05 00 |...3.w..........|
The last file name is waves_heights0.dds
, 18 characters long, with the \0
, we have our 19 value.
So let’s rename this field.
Now that we have fixed that, we can recover the file names of each entry:
char *filename = (char *)entry;
filename += entry->offset_idx_file_name;
printf("* filename: %.*s\n", (int)entry->file_name_size, filename);
Nice:
Metadata entry 0:
* file_name_size: 14
* offset_idx_file_name: 0x26e0
* unknown_4: 0x93b6404fba9a0c8f
* file_type_2: 0xdbb1a1d1b108b927
* filename: KDStorage.bin
Metadata entry 1:
* file_name_size: 19
* offset_idx_file_name: 0x26ce
* unknown_4: 0xc7f7d0284a87ec8f
* file_type_2: 0x74d821503e1beba4
* filename: waves_heights1.dds
We have file names and directory names, for example:
[...]
Metadata entry 10:
* file_name_size: 11
* offset_idx_file_name: 0x2654
* unknown_4: 0x46c008ccf65395e0
* file_type_2: 0x46e29969bd85cf06
* filename: space_defs
Metadata entry 11:
* file_name_size: 13
* offset_idx_file_name: 0x263f
* unknown_4: 0x7213702d5e6899e0
* file_type_2: 0x13d93873302ed14c
* filename: aid_null.dds
Metadata entry 12:
* file_name_size: 16
* offset_idx_file_name: 0x262c
* unknown_4: 0xa1a829d8713f89e0
* file_type_2: 0xdbb1a1d1b108b927
* filename: camouflages.xml
[...]
There is something which should enable us to differentiate between the two, maybe one of the unknown fields.
Also, we still need to figure out how the directory system works:
- How directories & subdirectories are composed (to get
<dir>/<sub dir>/<sub sub dir>/
paths) - How the path goes back to the root (
/
)
We will not look at it here, but that’s something to keep in mind.
Footer Parsing
Extracting footer information:
WOWS_INDEX_FOOTER *footer = (WOWS_INDEX_FOOTER *)(contents + header->offset_idx_footer_section);
print_footer(footer);
The results were incorrect due to an offset miscalculation.
Index Footer Content:
* size_pkg_file_name: 50b0bd0300002d0b
* unknown_7: 0xe967
* unknown_6: 0x15
A file name size of 50b0bd0300002d0b
? I don’t think so.
So let’s look at it more closely.
In the header, we have:
Index Header Content:
[...]
* offset_idx_footer_section: 0x7136
[...]
The hexdump gives:
kakwa@linux 6775398/idx » hexdump -s 6 -C system_data.idx| less
[...]
00007116 21 67 ac 70 22 ec ca b8 70 11 03 07 0d 33 ed 77 |!g.p"...p....3.w|
00007126 28 f9 15 0a 00 00 00 00 05 00 00 00 01 00 00 00 |(...............|
00007136 0b 2d 00 00 03 bd b0 50 67 e9 00 00 00 00 00 00 |.-.....Pg.......|
00007146 15 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00007156 70 11 03 07 0d 33 ed 77 73 79 73 74 65 6d 5f 64 |p....3.wsystem_d|
00007166 61 74 61 5f 30 30 30 31 2e 70 6b 67 00 |ata_0001.pkg.|
If our previous interpretation was correct, a simple offset from the start of the index file should be 0x7146
, not 0x7136
.
Maybe we are missing some fields in the footer, but given the previous 128 bits at offset 0x7136
really look like the end of a pkg metadata entry, I doubt it.
A more plausible explanation is that the offset is relative to the header id
field at 0x10
.
Maybe the magic
+ unknown_1
bits, i.e., the first 128 bits, are considered to be a separate section.
Anyway, let’s just offset by 128 bits.
#define MAGIC_SECTION_OFFSET sizeof(uint32_t) * 4
// Get the footer section
WOWS_INDEX_FOOTER *footer = (WOWS_INDEX_FOOTER *)(contents + header->offset_idx_footer_section + MAGIC_SECTION_OFFSET);
That’s better:
Index Footer Content:
* size_pkg_file_name: 23
* unknown_7: 0x18
* unknown_6: 0xb5a4fa9349d9fd0d
We can also recover the pkg file name as follows:
char *pkg_filename = (char *)footer;
pkg_filename += sizeof(WOWS_INDEX_FOOTER);
printf("* pkg filename: %.*s\n",
(int)footer->size_pkg_file_name, pkg_filename);
Safety Considerations
The current implementation lacks bounds checking and trusts all offsets—this needs fixing in production code.
PKG Data Entries
Similar process for the data section:
And lets add the 128 bits from the start (hexdump gives us 0x3c06
, which is again a 0x10
difference with 0x3bf6
).
// Get pkg data pointer section
WOWS_INDEX_DATA_FILE_ENTRY *data_file_entry =
(WOWS_INDEX_DATA_FILE_ENTRY *)(contents +
header->offset_idx_data_section +
MAGIC_SECTION_OFFSET);
From there, we are not sure if we have header->file_plus_dir_count
or header->file_count
entries. The latter seems more likely as this section points to the pkg files, but that’s not a given.
Also, we are unsure how one entry there is paired with a metadata entry. Maybe the order is simply the same in this array, maybe the matching is done through one of the unknown field.
But first, lets dump the content with some printf
:
Data file entry [0]:
* unknown_5: 0x93b6404fba9a0c8f
* unknown_6: 0x77ed330d07031170
* offset_pkg_data_chunk: 0x0
* type_1: 0x5
* type_2: 0x1
* size_pkg_data_chunk: 0x21f5
* id_pkg_data_chunk: 0x366c
* padding: 0x4a87ec8f
Data file entry [1]:
* unknown_5: 0x77ed330d07031170
* unknown_6: 0x5ef9b1e
* offset_pkg_data_chunk: 0x100000005
* type_1: 0x11515
* type_2: 0x97637703
* size_pkg_data_chunk: 0x2ab3e
* id_pkg_data_chunk: 0x6b4f2cace7a270ad
* padding: 0x7031170
[...]
Humm, that doesn’t look righ… Why is the padding not the expected 0x0
? Also why the first entry looks mostly ok, except the padding, and the rest is garbage.
First, I double-check the WOWS_INDEX_DATA_FILE_ENTRY
field sizes, and it was ok.
Then, I remembered that compilers can add padding to have all the fields properly aligned in memory, this helps with performances.
To avoid that, we need to add:
#pragma pack(1)
Now the output looks like that:
* unknown_5: 0x93b6404fba9a0c8f
* unknown_6: 0x77ed330d07031170
* offset_pkg_data_chunk: 0x0
* type_1: 0x5
* type_2: 0x1
* size_pkg_data_chunk: 0x21f5
* id_pkg_data_chunk: 0x366c5c4500bf
* padding: 0x0
Data file entry [1]:
* unknown_5: 0xc7f7d0284a87ec8f
* unknown_6: 0x77ed330d07031170
* offset_pkg_data_chunk: 0x5ef9b1e
* type_1: 0x5
* type_2: 0x1
* size_pkg_data_chunk: 0x11515
* id_pkg_data_chunk: 0x2ab3e97637703
* padding: 0x0
Data file entry [2]:
* unknown_5: 0x6b4f2cace7a270ad
* unknown_6: 0x77ed330d07031170
* offset_pkg_data_chunk: 0x2205
* type_1: 0x5
* type_2: 0x1
* size_pkg_data_chunk: 0x1cb
* id_pkg_data_chunk: 0xcadc1deb96d
* padding: 0x0
That’s much better.
But this small issue raises a number of issues with my method of parsing. Casting to structs comes with numerous issues, from overflows to endianness.
This is not that critical here since we are just trying to have a rough prototype, but on more critical software, that’s not a good idea.
After this prototype, it might be a good idea to start learning Rust ^^.
Also, if we try to parse header->file_plus_dir_count
entries, we get the following:
Data file entry [283]:
* unknown_5: 0xb8caec2270ac6721
* unknown_6: 0x77ed330d07031170
* offset_pkg_data_chunk: 0xa15f928
* type_1: 0x5
* type_2: 0x1
* size_pkg_data_chunk: 0x2d0b
* id_pkg_data_chunk: 0xe96750b0bd03
* padding: 0x0
Data file entry [284]:
* unknown_5: 0x15
* unknown_6: 0x18
* offset_pkg_data_chunk: 0x77ed330d07031170
* type_1: 0x74737973
* type_2: 0x645f6d65
* size_pkg_data_chunk: 0x5f617461
* id_pkg_data_chunk: 0x676b702e31303030
* padding: 0x0
Data file entry [285]:
* unknown_5: 0x0
* unknown_6: 0x0
* offset_pkg_data_chunk: 0x0
* type_1: 0x0
* type_2: 0x0
* size_pkg_data_chunk: 0x0
* id_pkg_data_chunk: 0x0
* padding: 0x0
The entry 283
is OK. This is the 284th entry since we start at 0
, which is exactly header->file_count
. The next one has weird values and the rest just 0
.
So header->file_count
is indeed the number of entries in this section.
Entry Matching
The fact that on one side we have header->file_count
and on the other header->file_plus_dir_count
means it’s not a simple index matching.
Lets investigate the unknown fields:
[...]
Data file entry [279]:
* unknown_5: 0xce7afff48d1bd174
* unknown_6: 0x77ed330d07031170
[...]
Data file entry [280]:
* unknown_5: 0x199e99feb0c986f8
* unknown_6: 0x77ed330d07031170
[...]
unknown_6
is always the same, not really interesting.
unknown_5
on the contrary is specific to each entry:
kakwa@linux GitHub/wows-depack (main) » ./wows-depack-cli \
-i ~/Games/World\ of\ Warships/bin/6775398/idx/system_data.idx | \
grep 'unknown_5' | sort | uniq -c
[...]
1 * unknown_5: 0x14b002d7c2835863
1 * unknown_5: 0x15a7b41a61f65f9c
1 * unknown_5: 0x15fcab5401f27f56
1 * unknown_5: 0x18a0d0dc4b05f8fa
1 * unknown_5: 0x192a05120f00553e
[...]
The values, however, are present two times—one in the metadata entry, the other in the data file entry:
Metadata entry [72]:
[...]
* unknown_4: 0x1011b17d9304bb39
[...]
Data file entry [65]:
[...]
* unknown_5: 0x1011b17d9304bb39
[...]
So the link is established through these fields. These are simply random, unique IDs for each entry.
In fact unknown_4
and unknown_5
are not the only fields leveraging this.
Looking at file_type_2
values, we get something like that:
kakwa@linux GitHub/wows-depack (main *) » ./wows-depack-cli \
-i ~/Games/World\ of\ Warships/bin/6775398/idx/system_data.idx | \
grep '0x937f155e4baaf562\|filename:' | grep -A 1 '0x937f155e4baaf562'
[...]
--
* file_type_2: 0x937f155e4baaf562
* filename: LowerAftTrans.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: MidBarbette.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: Bulkhead.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: MidBelt.dds
--
[...]
--
* unknown_4: 0x937f155e4baaf562
* filename: armour
--
[...]
--
* file_type_2: 0x937f155e4baaf562
* filename: Bottom.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: ConstrBig.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: ConstrMid.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: ConstrSm.dds
--
* file_type_2: 0x937f155e4baaf562
* filename: DoubleBottom.dds
--
[...]
Ok, file_type_2
is not a file type at all; it’s the id
(unknown_4
right now) of just one node that really looks like a directory.
file_type_2
should probably be renamed parent_id
or something.
Also, unknown_6
follows the same logic: it’s the id of the footer entry (side note: maybe the format supports having one index for several files).
Small Tangent
By this point, I was a bit intrigued by the type_1
and type_2
fields in the pkg
pointer sections.
kakwa@linux GitHub/wows-depack (main) » for i in ~/Games/World\ of\ Warships/bin/6775398/idx/*;\
do
./wows-depack-cli -i "$i" | grep 'type_[12]:';
done | sort -n | uniq -c | sort -n
57265 * type_1: 0x0
57265 * type_2: 0x0
232089 * type_1: 0x5
232089 * type_2: 0x1
Ok, it seems that (type_1, type_2)
can either have the (0x0, 0x0)
values, or the (0x5, 0x1)
values. In most cases, it’s the latter.
Looking at a few (0x0, 0x0)
, a common file with such values are .png
.
It’s a bit of a wild guess, but these might be compression levels. (0x0, 0x0)
, i.e. no compression would be logical for .png
as these files are already compressed. Compressing them would actually only cost CPU resources with no space gains.
For now, lets just keep that in mind, we will revisit it later.
Glueing the entries together
So, we have metadata entries which can be linked together, we have pkg pointer entries which are linked to metadata entries and footer entries.
It’s time to link all that together.
To do that, the obvious choice is to feed these IDs into an hash map, this will make look-ups easier and quicker.
Once done, the output now looks like that:
File entry [259]:
* metadata_id: 0xc90b3d356989c551
* footer_id: 0x77ed330d07031170
* offset_pkg_data: 0x5d30db8
* type_1: 0x5
* type_2: 0x1
* size_pkg_data: 0x89667
* id_pkg_data: 0xaab740ef4f6a6
* padding: 0x0
* file_name_size: 18
* offset_idx_file_name: 0x164e
* id: 0xc90b3d356989c551
* parent_id: 0xeb7ddcfb5178376
* filename: snow_tiles_ah.dds
parent [1]:
* file_name_size: 8
* offset_idx_file_name: 0x19cf
* id: 0xeb7ddcfb5178376
* parent_id: 0xb220a7743c83e638
* filename: weather
parent [2]:
* file_name_size: 5
* offset_idx_file_name: 0x16a3
* id: 0xb220a7743c83e638
* parent_id: 0x3837637adc4586b1
* filename: maps
parent [3]:
* file_name_size: 7
* offset_idx_file_name: 0x1746
* id: 0x3837637adc4586b1
* parent_id: 0xdbb1a1d1b108b927
* filename: system
This entry is for the following path: /system/maps/weather/snow_tiles_ah.dds
This should be enough now to start thinking about the actual tool and how it will be implemented.
Another tangent
The goal is in the end is to parse all the index files, so it got me curious about the IDs across the different files.
Looking at the dumps, content
is a fairly common directory name, present in a lot of the index files.
And if we look at these records, we get:
kakwa@linux GitHub/wows-depack (main *%) » for i in ~/Games/World\ of\ Warships/bin/6775398/idx/*;\
do \
echo $i; \
./wows-depack-cli -i "$i" | grep -B 5 '* filename: content$'; \
done | less
[...]
/home/kakwa/Games/World of Warships/bin/6775398/idx/camouflage.idx
parent [5]:
* file_name_size: 8
* offset_idx_file_name: 0xecae
* id: 0xa33046442d8327fc
* parent_id: 0xdbb1a1d1b108b927
* filename: content
--
parent [4]:
* file_name_size: 8
* offset_idx_file_name: 0xecae
* id: 0xa33046442d8327fc
* parent_id: 0xdbb1a1d1b108b927
* filename: content
--
parent [5]:
* file_name_size: 8
* offset_idx_file_name: 0xecae
* id: 0xa33046442d8327fc
* parent_id: 0xdbb1a1d1b108b927
* filename: content
[...]
Interestingly id
is always 0xa33046442d8327fc
(and also parent_id
is 0xdbb1a1d1b108b927
). This will make implementation a bit easier.
However, it raises an interesting question: how this id
is generated? Is it completely random? Or is it derived from the path/name?
It’s not really critical to read files, but might be important to write content if we ever get to that.
File System Tree
Build a tree structure using:
- HashMap for fast ID lookups (hashmap.c)
- Inode types: files and directories
- Tree construction: Start with PKG data chunks (files), resolve names via metadata, build directory hierarchy using
parent_id
relationships
The result: a complete archive tree with root node and children.
With a bit more work, adding a path/tree printer function, I now get something like that:
/postfx_animations.xml
/settings/Default_v3.settings
/settings/Default_v1.settings
/settings/Default_v2.settings
/scripts/user_data_object_defs/Barge.def
/scripts/user_data_object_defs/SpatialUIDebugTool.def
/scripts/user_data_object_defs/FogPoint.def
/scripts/user_data_object_defs/StaticSoundEmitter.def
[...]
/helpers/maps/green_hemisphere.dds
/helpers/maps/lev_dirt01.dds
/helpers/maps/fat_disc_quarter.dds
/helpers/maps/lev_grass_01.dds
/helpers/maps/lev_grassflowers.dds
/helpers/maps/red_ruler.dds
/helpers/maps/hemisphere.dds
/helpers/maps/disc_quarter.dds
/helpers/maps/red_hemisphere_ring.dds
/server_stats.xml
Or that in (ugly) tree form:
-./
|-* postfx_animations.xml
|--settings/
| |-* Default_v3.settings
| |-* Default_v1.settings
| |-* Default_v2.settings
|--scripts/
| |--user_data_object_defs/
| | |-* Barge.def
| | |-* SpatialUIDebugTool.def
| | |-* FogPoint.def
| | |-* StaticSoundEmitter.def
| | |-* Minefield.def
| | |-* SoundedEffect.def
| | |-* SquadronReticleTool.def
| | |-* Trigger.def
| | |-* WayPoint.def
[...]
Recap (Part 3)
- We defined the C structs for the metadata/index
- We resolved a few loose ends: filename lengths, parent IDs, unique identifiers for linking
- We implemented a file tree using hashmaps and parent-child relationships
- We identified compression type patterns
In the next and last part we will tie things together and wrap up the implementation.