I use the trie data structure as I found it most fitting to a disassembler mechanism.
When you read a byte and have to decide if it's enough or you should read more bytes, 'till you get to the instruction information.
It's really fast because you POP the instruction info in top 3 iterates on the DB, because an instruction can be formed from two bytes + 3 bits reg from the ModR/M byte.
/* Check for special 0x9b, WAIT instruction, which can be part of some instructions(x87). */
if(tmpIndex0==INST_WAIT_INDEX){
/* Only OCST_1dBYTES get a chance to include this byte as part of the opcode. */
isWaitIncluded=TRUE;
/* Ignore all prefixes, since they are useless and operate on the WAIT instruction itself. */
prefixes_ignore_all(ps);
/* Move to next code byte as a new whole instruction. */
ci->code+=1;
ci->codeLen-=1;
if(ci->codeLen<0)returnNULL;/* Faster to return NULL, it will be detected as WAIT later anyway. */
/* Since we got a WAIT prefix, we re-read the first byte. */
tmpIndex0=*ci->code;
}
/* Walk first byte in InstructionsTree root. */
in=InstructionsTree[tmpIndex0];
if(in==INT_NOTEXISTS)returnNULL;
instType=INST_NODE_TYPE(in);
/* Single byte instruction (OCST_1BYTE). */
if((instType<INT_INFOS)&&(!isWaitIncluded)){
/* Some single byte instructions need extra treatment. */
switch(tmpIndex0)
{
caseINST_ARPL_INDEX:
/*
* ARPL/MOVSXD share the same opcode, and both have different operands and mnemonics, of course.
* Practically, I couldn't come up with a comfortable way to merge the operands' types of ARPL/MOVSXD.
* And since the DB can't be patched dynamically, because the DB has to be multi-threaded compliant,
* I have no choice but to check for ARPL/MOVSXD right here - "right about now, the funk soul brother, check it out now, the funk soul brother...", fatboy slim
*/
if(ci->dt==Decode64Bits){
return&II_MOVSXD;
}/* else ARPL will be returned because its defined in the DB already. */
break;
caseINST_NOP_INDEX:/* Nopnopnop */
/* Check for Pause, since it's prefixed with 0xf3, which is not a real mandatory prefix. */
if(ps->decodedPrefixes&INST_PRE_REP){
/* Flag this prefix as used. */
ps->usedPrefixes|=INST_PRE_REP;
return&II_PAUSE;
}
/*
* Treat NOP/XCHG specially.
* If we're not in 64bits restore XCHG to NOP, since in the DB it's XCHG.
* Else if we're in 64bits examine REX, if exists, and decide which instruction should go to output.
* 48 90 XCHG RAX, RAX is a true NOP (eat REX in this case because it's valid).
* 90 XCHG EAX, EAX is a true NOP (and not high dword of RAX = 0 although it should be a 32 bits operation).
* Note that if the REX.B is used, then the register is not RAX anymore but R8, which means it's not a NOP.