Figured i would break this out of the Melee Hacks main thread, to allow for better access to the available information, and hopefully gather more hex editing and reverse engineering enthusiasts to help sort out the various data locations and containing structures.
---------------------------
dat, usd, and certain other formats derived from the same format are all structured in a similar way. The file layout goes:
File Header
Data Block
Relocation Table
Root Nodes (2)
String Table
The file starts with a 0x20 (32) byte header which gives information on how to access the general archive structure.
The start of the header starts with the 32-bit (4-byte) size of the file upon creation, which can be used to verify the file is the correct length.
The next value is 32-bit as well and gives the length of the data section that follows directly after the header. Given that main data section starts immediately after the header, the base offset within the file is 0x20 bytes from the beginning of the file. All file offsets found later in the file start from the beginning of this data section, so taking the entire file into account all offsets will be 0x20 bytes less than you would expect (as you have to add the base data offset). If this is too confusing, copying all data from directly after the header, for the size given in the header, and placing it in a separate file will allow all the offsets in the file to be treated without modification.
Now that you have the header and data sections sorted out, they are not much use without knowing where various information begins within the data.
Starting directly after the relocation table is a list of file offsets. The number of which is given in the header by relocationTableCount. Each file offset is 32-bit and is relative to the beginning of the data section. The purpose of the relocation table is not to give access to specific data structures, but to show the location of every other file offset located within the data section. This is done because when the file is loaded into memory by the game the memory location assigned by the main operating system will not be based on a zero offset like loading the file simply in a hex editor. As such all file offset within the data will still be relative to the beginning of the data and not absolute addresses in memory. Using the relocation table all file offsets within the file can easily be modified to be absolute instead of relative. If you are not planning on doing any real coding with this format, or have no issue using the other file offsets in a relative manner, then this information can be ignored entirely in most cases. The main benefit this data provides for those not using it to convert absolute memory addresses, is that it gives the location of every other file offset within the data. Given knowledge of how structures work in code, it is highly unlikely for an offset/pointer to jump into the middle of a structure. Given this very strong assumption, it can be assumed that any file offset is going to point to the location of the beginning of another data structure. This can be very useful in determining the start and end of a particular data structure, as they should not overlap. So if you start at the beginning of a structure you do not know yet and happen to cross an address that is accessed through another file offset, you have a strong indicator of where the structure should stop. If there is a count somewhere you can divide the size of the area by it and get individual structure lengths and so on.
After the relocation table there are two lists of 8-byte structures i have simply termed root nodes. These structures contain 2 values, a file offset followed by a string table offset.
The number of the structures in the two lists is given by the counts, rootCount0x0C and rootCount0x10, the latter often being 0 in some files.
The rootOffset is relative to the beginning of the data section, while the stringTableOffset is relative to the beginning of the string table that follows directly after the root node lists.
As stated the string table is located directly after the root node lists. The size is arbitrary and not specified in the file. You could subtract the size of the other sections combined from the total file size if desired, as the files are terminated by the string table information. The number of strings in the table is given by rootCount0x0C + rootCount0x10, though the main way of accessing data from this area is through the offsets given in the root nodes.
Some equations for the various file relative start offsets for clarification:
Now, on to the interpretation of the data the root nodes point to.
So far i have found no other identifying information for the starting structures other than a naming scheme followed by the names of the root nodes given by their string table offset.
For instance:
Pl*.dat - The main player data files, usually contain a root node whose name begins with "ftData".
Pl*Aj.dat - Player animation related files (Which actually are containers for multiple sub dat files reference through the main Pl*.dat file), usually contain a root node name that contains "figatree".
Ty*.dat, Pl*Nr.dat - And other player color related files, can contain multiple nodes usually ending with "_joint" with other possible sub-strings being "matanim", "shapeanim" or assumed MATerial and SHAPE ANIMation respectively.
Gr*.dat - Stage related files can contain a large number of nodes with sub names like "_image" (which you may note contain a format identifier as well), "_tlut", "_tlut_desc", "coll_data", "map_head", and a number of others.
As an example:
i have only investigated a number of these sections so far, mainly those related to gaining access to joint information which contains the hierarchy and also branches off into various data structures related to materials, textures, palettes, geometry and mesh information. All of which can give you access to data like the location, width, height, and format information of image data. The format and number of colors found in palette data. The vertex attributes, joint weighting, and other mesh parameters. And so on...
This is the main Joint object structure, this can usually be found on root nodes that end with "_joint" that do not contain "matanim" or "shapeanim" (usually with "TopN" or "Share" instead). Following this hierarchy of structures you can obtain all the information i am currently using to produce the screenshots i have posted.
As for the collision data mentioned, i had not looked into it until recently and have not deciphered any more than has already been posted. The data is found on the "coll_data" root node and the main structure follows this layout:
The offset i have termed vertexOffset points to the data outline in Milun's information (i have only given it that name, not actually investigated any further). The data is 2 float values per entry which i am guess as outlined are 2D position values.
The indexOffset points to the data crossed out in Milun's explanations, the other offset points to somewhat similar information, but i have not gone deeper than this yet.
Well, overall that is the main information you need to at least start accessing any data you want within the files. i have posted more information between here and the emutalk thread, as well as a template file that outlines this information. i will see if i can accompany this post with pictures and further explanations. Feel free to ask for any other specifics i may have already deciphered. You can get a good idea of what that is from what i have posted in my screenshots. As always any help deciphering and/or placing more know information is greatly appreciated.
Hope it helps.
---------------------------
Pokémon Stadium without transformation : http://www.megaupload.com/?d=7UMQJ727
@Milun : How can I find the collision data ? Randomly, trying everywhere ?..
I spend hours to find the position data of the part I want (to many position data O_o) and I think finding collision data is something like impossible to me...
hey revel...
I still don't exactly understand how root nodes work :/
but I've thought maybe root nodes are offset indexes
such as, if a root node was set at '80', then it initiates at offset[128] and so on,
until it hits another node index such as 'F0',
then node '80' is terminated and node 'F0' takes it's place (reading from offset[240] and so on)
also, you never replied to my Q about the 16bit hex before the root node offsets...
it's only either '00 00' for normal costumes, or '00 01' for colord costumes...
what is that, and what does it do??
i guess i'll explain it again here and see if any others can pick it up and understand it further.Well collision data is always a bit of a scroll above the magnifier. You can tell you've found it when you see a relocation table, a.k.a the thing I put a green X over on this page.
Hope that helps.
dat, usd, and certain other formats derived from the same format are all structured in a similar way. The file layout goes:
File Header
Data Block
Relocation Table
Root Nodes (2)
String Table
The file starts with a 0x20 (32) byte header which gives information on how to access the general archive structure.
Code:
struct DAT_HEADER
{
// 0x00
uint32 fileSize0x00 <format = hex>;
uint32 dataBlockSize0x04 <format = hex>; // size of main data block
uint32 relocationTableCount0x08;
uint32 rootCount0x0C;
// 0x10
uint32 rootCount0x10;
uint32 unknown0x14; // '001B' in main Pl*.dat files
uint32 unknown0x18;
uint32 unknown0x1C;
// 0x20
};
The start of the header starts with the 32-bit (4-byte) size of the file upon creation, which can be used to verify the file is the correct length.
The next value is 32-bit as well and gives the length of the data section that follows directly after the header. Given that main data section starts immediately after the header, the base offset within the file is 0x20 bytes from the beginning of the file. All file offsets found later in the file start from the beginning of this data section, so taking the entire file into account all offsets will be 0x20 bytes less than you would expect (as you have to add the base data offset). If this is too confusing, copying all data from directly after the header, for the size given in the header, and placing it in a separate file will allow all the offsets in the file to be treated without modification.
Now that you have the header and data sections sorted out, they are not much use without knowing where various information begins within the data.
Starting directly after the relocation table is a list of file offsets. The number of which is given in the header by relocationTableCount. Each file offset is 32-bit and is relative to the beginning of the data section. The purpose of the relocation table is not to give access to specific data structures, but to show the location of every other file offset located within the data section. This is done because when the file is loaded into memory by the game the memory location assigned by the main operating system will not be based on a zero offset like loading the file simply in a hex editor. As such all file offset within the data will still be relative to the beginning of the data and not absolute addresses in memory. Using the relocation table all file offsets within the file can easily be modified to be absolute instead of relative. If you are not planning on doing any real coding with this format, or have no issue using the other file offsets in a relative manner, then this information can be ignored entirely in most cases. The main benefit this data provides for those not using it to convert absolute memory addresses, is that it gives the location of every other file offset within the data. Given knowledge of how structures work in code, it is highly unlikely for an offset/pointer to jump into the middle of a structure. Given this very strong assumption, it can be assumed that any file offset is going to point to the location of the beginning of another data structure. This can be very useful in determining the start and end of a particular data structure, as they should not overlap. So if you start at the beginning of a structure you do not know yet and happen to cross an address that is accessed through another file offset, you have a strong indicator of where the structure should stop. If there is a count somewhere you can divide the size of the area by it and get individual structure lengths and so on.
After the relocation table there are two lists of 8-byte structures i have simply termed root nodes. These structures contain 2 values, a file offset followed by a string table offset.
Code:
struct ROOT_NODE
{
uint32 rootOffset0x00 <format = hex>;
uint32 stringTableOffset0x04 <format = hex>; // offset to name string
};
The rootOffset is relative to the beginning of the data section, while the stringTableOffset is relative to the beginning of the string table that follows directly after the root node lists.
As stated the string table is located directly after the root node lists. The size is arbitrary and not specified in the file. You could subtract the size of the other sections combined from the total file size if desired, as the files are terminated by the string table information. The number of strings in the table is given by rootCount0x0C + rootCount0x10, though the main way of accessing data from this area is through the offsets given in the root nodes.
Some equations for the various file relative start offsets for clarification:
Code:
local int64 dataOffset = 0x20;
local int64 relocOffset = dataOffset + fileHeader.dataBlockSize0x04;
local int64 rootOffset0 = relocOffset + (fileHeader.relocationTableCount0x08 * 4);
local int64 rootOffset1 = rootOffset0 + (fileHeader.rootCount0x0C * 8);
local int64 tableOffset = rootOffset1 + (fileHeader.rootCount0x10 * 8);
So far i have found no other identifying information for the starting structures other than a naming scheme followed by the names of the root nodes given by their string table offset.
For instance:
Pl*.dat - The main player data files, usually contain a root node whose name begins with "ftData".
Pl*Aj.dat - Player animation related files (Which actually are containers for multiple sub dat files reference through the main Pl*.dat file), usually contain a root node name that contains "figatree".
Ty*.dat, Pl*Nr.dat - And other player color related files, can contain multiple nodes usually ending with "_joint" with other possible sub-strings being "matanim", "shapeanim" or assumed MATerial and SHAPE ANIMation respectively.
Gr*.dat - Stage related files can contain a large number of nodes with sub names like "_image" (which you may note contain a format identifier as well), "_tlut", "_tlut_desc", "coll_data", "map_head", and a number of others.
As an example:
Code:
if (Strstr(nodeName, "_joint") != -1 &&
Strstr(nodeName, "shapeanim") == -1 &&
Strstr(nodeName, "matanim") == -1)
{
// NOTE: Used for Ty*.dat and Pl*Nr.dat files
struct
{
DumpJObj(rootOffset0x00);
} jobjData;
}
else if (Strstr(nodeName, "ftData") != -1)
{
// NOTE: Used for Pl*.dat files
DumpFighter(rootOffset0x00);
}
else if (Strstr(nodeName, "map_head") != -1)
{
// NOTE: Used for Gr*.dat files
struct MAP_HEAD stageData;
}
else if (Strstr(nodeName, "figatree") != -1)
{
// NOTE: Used for Pl*Aj.dat files
struct FIGATREE_DATA animationData;
}
else if (Strstr(nodeName, "coll_data") != -1)
{
// NOTE: Used for Gr*.dat files
struct COLL_DATA collisionData;
}
Code:
struct JOBJ_DATA
{
// 0x00
uint32 unknown0x00 <format = hex>;
uint32 flags <format = hex>;
uint32 childOffset <format = hex>; // child jobj structure
uint32 nextOffset <format = hex>; // next jobj structure
// 0x10
uint32 dobjOffset <format = hex>; // dobj structure - object information?
float3 rotation; // rotation
float3 scale; // scale
float3 translation; // translation
uint32 transformOffset <format = hex>; // inverse transform
uint32 unknown0x3C;
// 0x40
};
This is the main Joint object structure, this can usually be found on root nodes that end with "_joint" that do not contain "matanim" or "shapeanim" (usually with "TopN" or "Share" instead). Following this hierarchy of structures you can obtain all the information i am currently using to produce the screenshots i have posted.
As for the collision data mentioned, i had not looked into it until recently and have not deciphered any more than has already been posted. The data is found on the "coll_data" root node and the main structure follows this layout:
Code:
struct COLL_DATA
{
uint32 vertexOffset <format = hex>;
uint32 vertexCount;
uint32 indexOffset <format = hex>;
uint32 indexCount;
struct
{
uint16 indexStart;
uint16 indexCount;
} unknownData0x10[5];
uint32 unknownOffset0x24 <format = hex>;
uint32 unknownCount0x28;
};
The indexOffset points to the data crossed out in Milun's explanations, the other offset points to somewhat similar information, but i have not gone deeper than this yet.
Well, overall that is the main information you need to at least start accessing any data you want within the files. i have posted more information between here and the emutalk thread, as well as a template file that outlines this information. i will see if i can accompany this post with pictures and further explanations. Feel free to ask for any other specifics i may have already deciphered. You can get a good idea of what that is from what i have posted in my screenshots. As always any help deciphering and/or placing more know information is greatly appreciated.
Hope it helps.
Last edited: