-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathhexrays.hpp
11961 lines (10508 loc) · 502 KB
/
hexrays.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*!
* Hex-Rays Decompiler project
* Copyright (c) 1990-2019 Hex-Rays
* ALL RIGHTS RESERVED.
* \mainpage
* There are 2 representations of the binary code in the decompiler:
* - microcode: processor instructions are translated into it and then
* the decompiler optimizes and transforms it
* - ctree: ctree is built from the optimized microcode and represents
* AST-like tree with C statements and expressions. It can
* be printed as C code.
*
* Microcode is represented by the following classes:
* - mbl_array_t keeps general info about the decompiled code and
* array of basic blocks. usually mbl_array_t is named 'mba'
* - mblock_t a basic block. includes list of instructions
* - minsn_t an instruction. contains 3 operands: left, right, and
* destination
* - mop_t an operand. depending on its type may hold various info
* like a number, register, stack variable, etc.
* - mlist_t list of memory or register locations; can hold vast areas
* of memory and multiple registers. this class is used
* very extensively in the decompiler. it may represent
* list of locations accessed by an instruction or even
* an entire basic block. it is also used as argument of
* many functions. for example, there is a function
* that searches for an instruction that refers to a mlist_t.
* See http://www.hexblog.com/?p=1232 for some pictures.
*
* Ctree is represented by:
* - cfunc_t keeps general info about the decompiled code, including a
* pointer to mbl_array_t. deleting cfunc_t will delete
* mbl_array_t too (however, decompiler returns cfuncptr_t,
* which is a reference counting object and deletes the
* underlying function as soon as all references to it go
* out of scope). cfunc_t has 'body', which represents the
* decompiled function body as cinsn_t.
* - cinsn_t a C statement. can be a compound statement or any other
* legal C statements (like if, for, while, return,
* expression-statement, etc). depending on the statement
* type has pointers to additional info. for example, the
* 'if' statement has poiner to cif_t, which holds the
* 'if' condition, 'then' branch, and optionally 'else'
* branch. Please note that despite of the name cinsn_t
* we say "statements", not "instructions". For us
* instructions are part of microcode, not ctree.
* - cexpr_t a C expression. is used as part of a C statement, when
* necessary. cexpr_t has 'type' field, which keeps the
* expression type.
* - citem_t a base class for cinsn_t and cexpr_t, holds common info
* like the address, label, and opcode.
* - cnumber_t a constant 64-bit number. in addition to its value also
* holds information how to represent it: decimal, hex, or
* as a symbolic constant (enum member). please note that
* numbers are represented by another class (mnumber_t)
* in microcode.
* See http://www.hexblog.com/?p=107 for some pictures and more details.
*
* Both microcode and ctree use the following class:
* - lvar_t a local variable. may represent a stack or register
* variable. a variable has a name, type, location, etc.
* the list of variables is stored in mba->vars.
* - lvar_locator_t holds a variable location (vdloc_t) and its definition
* address.
* - vdloc_t describes a variable location, like a register number,
* a stack offset, or, in complex cases, can be a mix of
* register and stack locations. very similar to argloc_t,
* which is used in ida. the differences between argloc_t
* and vdloc_t are:
* - vdloc_t never uses ARGLOC_REG2
* - vdloc_t uses micro register numbers instead of
* processor register numbers
* - the stack offsets are never negative in vdloc_t, while
* in argloc_t there can be negative offsets
*
* The above are the most important classes in this header file. There are
* many auxiliary classes, please see their definitions in the header file.
*
* See also the description of \ref vmpage.
*
*/
#ifndef __HEXRAYS_HPP
#define __HEXRAYS_HPP
#include <pro.h>
#include <fpro.h>
#include <ida.hpp>
#include <idp.hpp>
#include <gdl.hpp>
#include <ieee.h>
#include <loader.hpp>
#include <kernwin.hpp>
#include <typeinf.hpp>
#include <set>
#include <map>
#include <deque>
#include <queue>
#include <algorithm>
/*!
* \page vmpage Virtual Machine used by Microcode
*
* We can imagine a virtual micro machine that executes microcode.
* This virtual micro machine has many registers.
* Each register is 8 bits wide. During translation of processor
* instructions into microcode, multibyte processor registers are mapped
* to adjacent microregisters. Processor condition codes are also
* represented by microregisters. The microregisters are grouped
* into following groups:
* - 0..7: condition codes
* - 8..n: all processor registers (including fpu registers, if necessary)
* this range may also include temporary registers used during
* the initial microcode generation
* - n.. : so called kernel registers; they are used during optimization
* see is_kreg()
*
* Each micro-instruction (minsn_t) has zero to three operands.
* Some of the possible operands types are:
* - immediate value
* - register
* - memory reference
* - result of another micro-instruction
*
* The operands (mop_t) are l (left), r (right), d (destination).
* An example of a microinstruction:
*
* add r0.4, #8.4, r2.4
*
* which means 'add constant 8 to r0 and place the result into r2'.
* where
* - the left operand is 'r0', its size is 4 bytes (r0.4)
* - the right operand is a constant '8', its size is 4 bytes (#8.4)
* - the destination operand is 'r2', its size is 4 bytes (r2.4)
* Note that 'd' is almost always the destination but there are exceptions.
* See mcode_modifies_d(). For example, stx does not modify 'd'.
* See the opcode map below for the list of microinstructions and their
* operands. Most instructions are very simple and do not need
* detailed explanations. There are no side effects in microinstructions.
*
* Each operand has a size specifier. The following sizes can be used in
* practically all contexts: 1, 2, 4, 8, 16 bytes. Floating types may have
* other sizes. Functions may return objects of arbitrary size, as well as
* operations upon UDT's (user-defined types, i.e. are structs and unions).
*
* Memory is considered to consist of several segments.
* A memory reference is made using a (selector, offset) pair.
* A selector is always 2 bytes long. An offset can be 4 or 8 bytes long,
* depending on the bitness of the target processor.
* Currently the selectors are not used very much. The decompiler tries to
* resolve (selector, offset) pairs into direct memory references at each
* opportunity and then operates on mop_v operands. In other words,
* while the decompiler can handle segmented memory models, internally
* it still uses simple linear addresses.
*
* The following memory regions are recognized:
* - GLBLOW global memory: low part, everything below the stack
* - LVARS stack: local variables
* - RETADDR stack: return address
* - SHADOW stack: shadow arguments
* - ARGS stack: regular stack arguments
* - GLBHIGH global memory: high part, everything above the stack
* Any stack region may be empty. Objects residing in one memory region
* are considered to be completely distinct from objects in other regions.
* We allocate the stack frame in some memory region, which is not
* allocated for any purposes in IDA. This permits us to use linear addresses
* for all memory references, including the stack frame.
*
* If the operand size is bigger than 1 then the register
* operand references a block of registers. For example:
*
* ldc #1.4, r8.4
*
* loads the constant 1 to registers 8, 9, 10, 11:
*
* #1 -> r8
* #0 -> r9
* #0 -> r10
* #0 -> r11
*
* This example uses little-endian byte ordering.
* Big-endian byte ordering is supported too. Registers are always little-
* endian, regardless of the memory endianness.
*
* Each instruction has 'next' and 'prev' fields that are used to form
* a doubly linked list. Such lists are present for each basic block (mblock_t).
* Basic blocks have other attributes, including:
* - dead_at_start: list of dead locations at the block start
* - maybuse: list of locations the block may use
* - maybdef: list of locations the block may define (or spoil)
* - mustbuse: list of locations the block will certainly use
* - mustbdef: list of locations the block will certainly define
* - dnu: list of locations the block will certainly define
* but will not use (registers or non-aliasable stkack vars)
*
* These lists are represented by the mlist_t class. It consists of 2 parts:
* - rlist_t: list of microregisters (possibly including virtual stack locations)
* - ivlset_t: list of memory locations represented as intervals
* we use linear addresses in this list.
* The mlist_t class is used quite often. For example, to find what an operand
* can spoil, we build its 'maybe-use' list. Then we can find out if this list
* is accessed using the is_accessed() or is_accessed_globally() functions.
*
* All basic blocks of the decompiled function constitute an array called
* mbl_array_t (array of microblocks). This is a huge class that has too
* many fields to describe here (some of the fields are not visible in the sdk)
* The most importants ones are:
* - stack frame: frregs, stacksize, etc
* - memory: aliased, restricted, and other ranges
* - type: type of the current function, its arguments (argidx) and
* local variables (vars)
* - natural: array of pointers to basic blocks. the basic blocks
* are also accessible as a doubly linked list starting from 'blocks'.
* - bg: control flow graph. the graph gives access to the use-def
* chains that describe data dependencies between basic blocks
*
*/
#ifdef __NT__
#pragma warning(push)
#pragma warning(disable:4062) // enumerator 'x' in switch of enum 'y' is not handled
#pragma warning(disable:4265) // virtual functions without virtual destructor
#endif
#define hexapi ///< Public functions are marked with this keyword
// Warning suppressions for PVS Studio:
//-V:2:654 The condition '2' of loop is always true.
//-V::719 The switch statement does not cover all values
//-V:verify:678
//-V:chain_keeper_t:690 copy ctr will be generated
//-V:add_block:656 call to the same function
//-V:add:792 The 'add' function located to the right of the operator '|' will be called regardless of the value of the left operand
//-V:sub:792 The 'sub' function located to the right of the operator '|' will be called regardless of the value of the left operand
//-V:intersect:792 The 'intersect' function located to the right of the operator '|' will be called regardless of the value of the left operand
// Lint suppressions:
//lint -sem(mop_t::_make_cases, custodial(1))
//lint -sem(mop_t::_make_pair, custodial(1))
//lint -sem(mop_t::_make_callinfo, custodial(1))
//lint -sem(mop_t::_make_insn, custodial(1))
//lint -sem(mop_t::make_insn, custodial(1))
// Microcode level forward definitions:
class mop_t; // microinstruction operand
class mop_pair_t; // pair of operands. example, :(edx.4,eax.4).8
class mop_addr_t; // address of an operand. example: &global_var
class mcallinfo_t; // function call info. example: <cdecl:"int x" #10.4>.8
class mcases_t; // jump table cases. example: {0 => 12, 1 => 13}
class minsn_t; // microinstruction
class mblock_t; // basic block
class mbl_array_t; // array of blocks, represents microcode for a function
class codegen_t; // helper class to generate the initial microcode
class mbl_graph_t; // control graph of microcode
struct vdui_t; // widget representing the pseudocode window
struct hexrays_failure_t; // decompilation failure object, is thrown by exceptions
struct mba_stats_t; // statistics about decompilation of a function
struct mlist_t; // list of memory and register locations
struct voff_t; // value offset (microregister number or stack offset)
typedef std::set<voff_t> voff_set_t;
struct vivl_t; // value interval (register or stack range)
typedef int mreg_t; ///< Micro register
// Ctree level forward definitions:
struct cfunc_t; // result of decompilation, the highest level object
struct citem_t; // base class for cexpr_t and cinsn_t
struct cexpr_t; // C expression
struct cinsn_t; // C statement
struct cblock_t; // C statement block (sequence of statements)
struct cswitch_t; // C switch statement
struct carg_t; // call argument
struct carglist_t; // vector of call arguments
typedef std::set<ea_t> easet_t;
typedef std::set<minsn_t*> minsn_ptr_set_t;
typedef std::set<qstring> strings_t;
typedef qvector<minsn_t*> minsnptrs_t;
typedef qvector<mop_t*> mopptrs_t;
typedef qvector<mop_t> mopvec_t;
typedef qvector<uint64> uint64vec_t;
typedef qvector<mreg_t> mregvec_t;
// Function frames must be smaller than this value, otherwise
// the decompiler will bail out with MERR_HUGESTACK
#define MAX_SUPPORTED_STACK_SIZE 0x100000 // 1MB
//-------------------------------------------------------------------------
// Original version of macro DEFINE_MEMORY_ALLOCATION_FUNCS
// (uses decompiler-specific memory allocation functions)
#if defined(SWIG)
#define HEXRAYS_MEMORY_ALLOCATION_FUNCS()
#elif defined(SWIGPYTHON)
#define HEXRAYS_MEMORY_ALLOCATION_FUNCS DEFINE_MEMORY_ALLOCATION_FUNCS
#else
#define HEXRAYS_PLACEMENT_DELETE void operator delete(void *, void *) {}
#define HEXRAYS_MEMORY_ALLOCATION_FUNCS() \
void *operator new (size_t _s) { return hexrays_alloc(_s); } \
void *operator new[](size_t _s) { return hexrays_alloc(_s); } \
void *operator new(size_t /*size*/, void *_v) { return _v; } \
void operator delete (void *_blk) { hexrays_free(_blk); } \
void operator delete[](void *_blk) { hexrays_free(_blk); } \
HEXRAYS_PLACEMENT_DELETE
#endif
void* hexapi hexrays_alloc(size_t size);
void hexapi hexrays_free(void* ptr);
typedef uint64 uvlr_t;
typedef int64 svlr_t;
enum { MAX_VLR_SIZE = sizeof(uvlr_t) };
const uvlr_t MAX_VALUE = uvlr_t(-1);
const svlr_t MAX_SVALUE = svlr_t(uvlr_t(-1) >> 1);
const svlr_t MIN_SVALUE = ~MAX_SVALUE;
enum cmpop_t
{ // the order of comparisons is the same as in microcode opcodes
CMP_NZ,
CMP_Z,
CMP_AE,
CMP_B,
CMP_A,
CMP_BE,
CMP_GT,
CMP_GE,
CMP_LT,
CMP_LE,
};
//-------------------------------------------------------------------------
// value-range class to keep possible operand value(s).
class valrng_t
{
protected:
int flags;
#define VLR_TYPE 0x0F // valrng_t type
#define VLR_NONE 0x00 // no values
#define VLR_ALL 0x01 // all values
#define VLR_IVLS 0x02 // union of disjoint intervals
#define VLR_RANGE 0x03 // strided range
#define VLR_SRANGE 0x04 // strided range with signed bound
#define VLR_BITS 0x05 // known bits
#define VLR_SECT 0x06 // intersection of sub-ranges
// each sub-range should be simple or union
#define VLR_UNION 0x07 // union of sub-ranges
// each sub-range should be simple or
// intersection
#define VLR_UNK 0x08 // unknown value (like 'null' in SQL)
int size; // operand size: 1..8 bytes
// all values must fall within the size
union
{
struct // VLR_RANGE/VLR_SRANGE
{ // values that are between VALUE and LIMIT
// and conform to: value+stride*N
uvlr_t value; // initial value
uvlr_t limit; // final value
// we adjust LIMIT to be on the STRIDE lattice
svlr_t stride; // stride between values
};
struct // VLR_BITS
{
uvlr_t zeroes; // bits known to be clear
uvlr_t ones; // bits known to be set
};
char reserved[sizeof(qvector<int>)];
// VLR_IVLS/VLR_SECT/VLR_UNION
};
void hexapi clear(void);
void hexapi copy(const valrng_t& r);
valrng_t& hexapi assign(const valrng_t& r);
public:
explicit valrng_t(int size_ = MAX_VLR_SIZE)
: flags(VLR_NONE), size(size_), value(0), limit(0), stride(0) {}
valrng_t(const valrng_t& r) { copy(r); }
~valrng_t(void) { clear(); }
valrng_t& operator=(const valrng_t& r) { return assign(r); }
void swap(valrng_t& r) { qswap(*this, r); }
DECLARE_COMPARISONS(valrng_t);
DEFINE_MEMORY_ALLOCATION_FUNCS()
void set_none(void) { clear(); }
void set_all(void) { clear(); flags = VLR_ALL; }
void set_unk(void) { clear(); flags = VLR_UNK; }
void hexapi set_eq(uvlr_t v);
void hexapi set_cmp(cmpop_t cmp, uvlr_t _value);
// reduce size
// it takes the low part of size NEW_SIZE
// it returns "true" if size is changed successfully.
// e.g.: valrng_t vr(2); vr.set_eq(0x1234);
// vr.reduce_size(1);
// uvlr_t v; vr.cvt_to_single_value(&v);
// assert(v == 0x34);
bool hexapi reduce_size(int new_size);
// Perform intersection or union or inversion.
// \return did we change something in THIS?
bool hexapi intersect_with(const valrng_t& r);
bool hexapi unite_with(const valrng_t& r);
void hexapi inverse(); // works for VLR_IVLS only
bool empty(void) const { return flags == VLR_NONE; }
bool all_values(void) const { return flags == VLR_ALL; }
bool is_unknown(void) const { return flags == VLR_UNK; }
bool hexapi has(uvlr_t v) const;
void hexapi print(qstring* vout) const;
const char* hexapi dstr(void) const;
bool hexapi cvt_to_single_value(uvlr_t* v) const;
bool hexapi cvt_to_cmp(cmpop_t* cmp, uvlr_t* val, bool strict) const;
int get_size() const { return size; }
static uvlr_t max_value(int size_)
{
return size_ == MAX_VLR_SIZE
? MAX_VALUE
: (uvlr_t(1) << (size_ * 8)) - 1;
}
static uvlr_t min_svalue(int size_)
{
return size_ == MAX_VLR_SIZE
? MIN_SVALUE
: (uvlr_t(1) << (size_ * 8 - 1));
}
static uvlr_t max_svalue(int size_)
{
return size_ == MAX_VLR_SIZE
? MAX_SVALUE
: (uvlr_t(1) << (size_ * 8 - 1)) - 1;
}
uvlr_t max_value() const { return max_value(size); }
uvlr_t min_svalue() const { return min_svalue(size); }
uvlr_t max_svalue() const { return max_svalue(size); }
};
DECLARE_TYPE_AS_MOVABLE(valrng_t);
//-------------------------------------------------------------------------
// possible memory and register access types.
/*
enum access_type_t
{
NO_ACCESS = 0,
WRITE_ACCESS = 1,
READ_ACCESS = 2,
RW_ACCESS = WRITE_ACCESS | READ_ACCESS,
};
*/
// Are we looking for 'must access' or 'may access' information?
// 'must access' means that the code will always access the specified location(s)
// 'may access' means that the code may in some cases access the specified location(s)
// Example: ldx cs.2, r0.4, r1.4
// MUST_ACCESS: r0.4 and r1.4, usually displayed as r0.8 because r0 and r1 are adjacent
// MAY_ACCESS: r0.4 and r1.4, and all aliasable memory, because
// ldx may access any part of the aliasable memory
typedef int maymust_t;
const maymust_t
// One of the following two bits should be specified:
MUST_ACCESS = 0x00, // access information we can count on
MAY_ACCESS = 0x01, // access information we should take into account
// Optionally combined with the following bits:
MAYMUST_ACCESS_MASK = 0x01,
ONE_ACCESS_TYPE = 0x20, // for find_first_use():
// use only the specified maymust access type
// (by default it inverts the access type for def-lists)
INCLUDE_SPOILED_REGS = 0x40, // for build_def_list() with MUST_ACCESS:
// include spoiled registers in the list
EXCLUDE_PASS_REGS = 0x80, // for build_def_list() with MAY_ACCESS:
// exclude pass_regs from the list
FULL_XDSU = 0x100, // for build_def_list():
// if xds/xdu source and targets are the same
// treat it as if xdsu redefines the entire destination
WITH_ASSERTS = 0x200, // for find_first_use():
// do not ignore assertions
EXCLUDE_VOLATILE = 0x400, // for build_def_list():
// exclude volatile memory from the list
INCLUDE_UNUSED_SRC = 0x800, // for build_use_list():
// do not exclude unused source bytes for m_and/m_or insns
INCLUDE_DEAD_RETREGS = 0x1000, // for build_def_list():
// include dead returned registers in the list
INCLUDE_RESTRICTED = 0x2000,// for MAY_ACCESS: include restricted memory
CALL_SPOILS_ONLY_ARGS = 0x4000;// for build_def_list() & MAY_ACCESS:
// do not include global memory into the
// spoiled list of a call
inline THREAD_SAFE bool is_may_access(maymust_t maymust)
{
return (maymust & MAYMUST_ACCESS_MASK) != MUST_ACCESS;
}
//-------------------------------------------------------------------------
/// \defgroup MERR_ Microcode error codes
//@{
enum merror_t
{
MERR_OK = 0, ///< ok
MERR_BLOCK = 1, ///< no error, switch to new block
MERR_INTERR = -1, ///< internal error
MERR_INSN = -2, ///< cannot convert to microcode
MERR_MEM = -3, ///< not enough memory
MERR_BADBLK = -4, ///< bad block found
MERR_BADSP = -5, ///< positive sp value has been found
MERR_PROLOG = -6, ///< prolog analysis failed
MERR_SWITCH = -7, ///< wrong switch idiom
MERR_EXCEPTION = -8, ///< exception analysis failed
MERR_HUGESTACK = -9, ///< stack frame is too big
MERR_LVARS = -10, ///< local variable allocation failed
MERR_BITNESS = -11, ///< only 32/16bit functions can be decompiled
MERR_BADCALL = -12, ///< could not determine call arguments
MERR_BADFRAME = -13, ///< function frame is wrong
MERR_UNKTYPE = -14, ///< undefined type %s (currently unused error code)
MERR_BADIDB = -15, ///< inconsistent database information
MERR_SIZEOF = -16, ///< wrong basic type sizes in compiler settings
MERR_REDO = -17, ///< redecompilation has been requested
MERR_CANCELED = -18, ///< decompilation has been cancelled
MERR_RECDEPTH = -19, ///< max recursion depth reached during lvar allocation
MERR_OVERLAP = -20, ///< variables would overlap: %s
MERR_PARTINIT = -21, ///< partially initialized variable %s
MERR_COMPLEX = -22, ///< too complex function
MERR_LICENSE = -23, ///< no license available
MERR_ONLY32 = -24, ///< only 32-bit functions can be decompiled for the current database
MERR_ONLY64 = -25, ///< only 64-bit functions can be decompiled for the current database
MERR_BUSY = -26, ///< already decompiling a function
MERR_FARPTR = -27, ///< far memory model is supported only for pc
MERR_EXTERN = -28, ///< special segments cannot be decompiled
MERR_FUNCSIZE = -29, ///< too big function
MERR_BADRANGES = -30, ///< bad input ranges
MERR_STOP = -31, ///< no error, stop the analysis
MERR_MAX_ERR = 31,
MERR_LOOP = -32, ///< internal code: redo last loop (never reported)
};
//@}
/// Get textual description of an error code
/// \param out the output buffer for the error description
/// \param code \ref MERR_
/// \param mba the microcode array
/// \return the error address
ea_t hexapi get_merror_desc(qstring* out, merror_t code, mbl_array_t* mba);
//-------------------------------------------------------------------------
// List of microinstruction opcodes.
// The order of setX and jX insns is important, it is used in the code.
// Instructions marked with *F may have the FPINSN bit set and operate on fp values
// Instructions marked with +F must have the FPINSN bit set. They always operate on fp values
// Other instructions do not operate on fp values.
enum mcode_t
{
m_nop = 0x00, // nop // no operation
m_stx = 0x01, // stx l, {r=sel, d=off} // store register to memory *F
m_ldx = 0x02, // ldx {l=sel,r=off}, d // load register from memory *F
m_ldc = 0x03, // ldc l=const, d // load constant
m_mov = 0x04, // mov l, d // move *F
m_neg = 0x05, // neg l, d // negate
m_lnot = 0x06, // lnot l, d // logical not
m_bnot = 0x07, // bnot l, d // bitwise not
m_xds = 0x08, // xds l, d // extend (signed)
m_xdu = 0x09, // xdu l, d // extend (unsigned)
m_low = 0x0A, // low l, d // take low part
m_high = 0x0B, // high l, d // take high part
m_add = 0x0C, // add l, r, d // l + r -> dst
m_sub = 0x0D, // sub l, r, d // l - r -> dst
m_mul = 0x0E, // mul l, r, d // l * r -> dst
m_udiv = 0x0F, // udiv l, r, d // l / r -> dst
m_sdiv = 0x10, // sdiv l, r, d // l / r -> dst
m_umod = 0x11, // umod l, r, d // l % r -> dst
m_smod = 0x12, // smod l, r, d // l % r -> dst
m_or = 0x13, // or l, r, d // bitwise or
m_and = 0x14, // and l, r, d // bitwise and
m_xor = 0x15, // xor l, r, d // bitwise xor
m_shl = 0x16, // shl l, r, d // shift logical left
m_shr = 0x17, // shr l, r, d // shift logical right
m_sar = 0x18, // sar l, r, d // shift arithmetic right
m_cfadd = 0x19, // cfadd l, r, d=carry // calculate carry bit of (l+r)
m_ofadd = 0x1A, // ofadd l, r, d=overf // calculate overflow bit of (l+r)
m_cfshl = 0x1B, // cfshl l, r, d=carry // calculate carry bit of (l<<r)
m_cfshr = 0x1C, // cfshr l, r, d=carry // calculate carry bit of (l>>r)
m_sets = 0x1D, // sets l, d=byte SF=1 Sign
m_seto = 0x1E, // seto l, r, d=byte OF=1 Overflow of (l-r)
m_setp = 0x1F, // setp l, r, d=byte PF=1 Unordered/Parity *F
m_setnz = 0x20, // setnz l, r, d=byte ZF=0 Not Equal *F
m_setz = 0x21, // setz l, r, d=byte ZF=1 Equal *F
m_setae = 0x22, // setae l, r, d=byte CF=0 Above or Equal *F
m_setb = 0x23, // setb l, r, d=byte CF=1 Below *F
m_seta = 0x24, // seta l, r, d=byte CF=0 & ZF=0 Above *F
m_setbe = 0x25, // setbe l, r, d=byte CF=1 | ZF=1 Below or Equal *F
m_setg = 0x26, // setg l, r, d=byte SF=OF & ZF=0 Greater
m_setge = 0x27, // setge l, r, d=byte SF=OF Greater or Equal
m_setl = 0x28, // setl l, r, d=byte SF!=OF Less
m_setle = 0x29, // setle l, r, d=byte SF!=OF | ZF=1 Less or Equal
m_jcnd = 0x2A, // jcnd l, d // d is mop_v or mop_b
m_jnz = 0x2B, // jnz l, r, d // ZF=0 Not Equal *F
m_jz = 0x2C, // jz l, r, d // ZF=1 Equal *F
m_jae = 0x2D, // jae l, r, d // CF=0 Above or Equal *F
m_jb = 0x2E, // jb l, r, d // CF=1 Below *F
m_ja = 0x2F, // ja l, r, d // CF=0 & ZF=0 Above *F
m_jbe = 0x30, // jbe l, r, d // CF=1 | ZF=1 Below or Equal *F
m_jg = 0x31, // jg l, r, d // SF=OF & ZF=0 Greater
m_jge = 0x32, // jge l, r, d // SF=OF Greater or Equal
m_jl = 0x33, // jl l, r, d // SF!=OF Less
m_jle = 0x34, // jle l, r, d // SF!=OF | ZF=1 Less or Equal
m_jtbl = 0x35, // jtbl l, r=mcases // Table jump
m_ijmp = 0x36, // ijmp {r=sel, d=off} // indirect unconditional jump
m_goto = 0x37, // goto l // l is mop_v or mop_b
m_call = 0x38, // call l d // l is mop_v or mop_b or mop_h
m_icall = 0x39, // icall {l=sel, r=off} d // indirect call
m_ret = 0x3A, // ret
m_push = 0x3B, // push l
m_pop = 0x3C, // pop d
m_und = 0x3D, // und d // undefine
m_ext = 0x3E, // ext in1, in2, out1 // external insn, not microcode *F
m_f2i = 0x3F, // f2i l, d int(l) => d; convert fp -> integer +F
m_f2u = 0x40, // f2u l, d uint(l)=> d; convert fp -> uinteger +F
m_i2f = 0x41, // i2f l, d fp(l) => d; convert integer -> fp e +F
m_u2f = 0x42, // i2f l, d fp(l) => d; convert uinteger -> fp +F
m_f2f = 0x43, // f2f l, d l => d; change fp precision +F
m_fneg = 0x44, // fneg l, d -l => d; change sign +F
m_fadd = 0x45, // fadd l, r, d l + r => d; add +F
m_fsub = 0x46, // fsub l, r, d l - r => d; subtract +F
m_fmul = 0x47, // fmul l, r, d l * r => d; multiply +F
m_fdiv = 0x48, // fdiv l, r, d l / r => d; divide +F
#define m_max 0x49 // first unused opcode
};
/// Must an instruction with the given opcode be the last one in a block?
/// Such opcodes are called closing opcodes.
/// \param mcode instruction opcode
/// \param including_calls should m_call/m_icall be considered as the closing opcodes?
/// If this function returns true, the opcode cannot appear in the middle
/// of a block. Calls are a special case because before MMAT_CALLS they are
/// closing opcodes. Afteer MMAT_CALLS that are not considered as closing opcodes.
THREAD_SAFE bool hexapi must_mcode_close_block(mcode_t mcode, bool including_calls);
/// May opcode be propagated?
/// Such opcodes can be used in sub-instructions (nested instructions)
/// There is a handful of non-propagatable opcodes, like jumps, ret, nop, etc
/// All other regular opcodes are propagatable and may appear in a nested
/// instruction.
THREAD_SAFE bool hexapi is_mcode_propagatable(mcode_t mcode);
// Is add or sub instruction?
inline THREAD_SAFE bool is_mcode_addsub(mcode_t mcode) { return mcode == m_add || mcode == m_sub; }
// Is xds or xdu instruction? We use 'xdsu' as a shortcut for 'xds or xdu'
inline THREAD_SAFE bool is_mcode_xdsu(mcode_t mcode) { return mcode == m_xds || mcode == m_xdu; }
// Is a 'set' instruction? (an instruction that sets a condition code)
inline THREAD_SAFE bool is_mcode_set(mcode_t mcode) { return mcode >= m_sets && mcode <= m_setle; }
// Is a 1-operand 'set' instruction? Only 'sets' is in this group
inline THREAD_SAFE bool is_mcode_set1(mcode_t mcode) { return mcode == m_sets; }
// Is a 1-operand conditional jump instruction? Only 'jcnd' is in this group
inline THREAD_SAFE bool is_mcode_j1(mcode_t mcode) { return mcode == m_jcnd; }
// Is a conditional jump?
inline THREAD_SAFE bool is_mcode_jcond(mcode_t mcode) { return mcode >= m_jcnd && mcode <= m_jle; }
// Is a 'set' instruction that can be converted into a conditional jump?
inline THREAD_SAFE bool is_mcode_convertible_to_jmp(mcode_t mcode) { return mcode >= m_setnz && mcode <= m_setle; }
// Is a conditional jump instruction that can be converted into a 'set'?
inline THREAD_SAFE bool is_mcode_convertible_to_set(mcode_t mcode) { return mcode >= m_jnz && mcode <= m_jle; }
// Is a call instruction? (direct or indirect)
inline THREAD_SAFE bool is_mcode_call(mcode_t mcode) { return mcode == m_call || mcode == m_icall; }
// Must be an FPU instruction?
inline THREAD_SAFE bool is_mcode_fpu(mcode_t mcode) { return mcode >= m_f2i; }
// Is a commutative instruction?
inline THREAD_SAFE bool is_mcode_commutative(mcode_t mcode)
{
return mcode == m_add
|| mcode == m_mul
|| mcode == m_or
|| mcode == m_and
|| mcode == m_xor
|| mcode == m_setz
|| mcode == m_setnz
|| mcode == m_cfadd
|| mcode == m_ofadd;
}
// Is a shift instruction?
inline THREAD_SAFE bool is_mcode_shift(mcode_t mcode)
{
return mcode == m_shl
|| mcode == m_shr
|| mcode == m_sar;
}
// Is a kind of div or mod instruction?
inline THREAD_SAFE bool is_mcode_divmod(mcode_t op)
{
return op == m_udiv || op == m_sdiv || op == m_umod || op == m_smod;
}
// Convert setX opcode into corresponding jX opcode
// This function relies on the order of setX and jX opcodes!
inline THREAD_SAFE mcode_t set2jcnd(mcode_t code)
{
return mcode_t(code - m_setnz + m_jnz);
}
// Convert setX opcode into corresponding jX opcode
// This function relies on the order of setX and jX opcodes!
inline THREAD_SAFE mcode_t jcnd2set(mcode_t code)
{
return mcode_t(code + m_setnz - m_jnz);
}
// Negate a conditional opcode.
// Conditional jumps can be negated, example: jle -> jg
// 'Set' instruction can be negated, example: seta -> setbe
// If the opcode cannot be negated, return m_nop
THREAD_SAFE mcode_t hexapi negate_mcode_relation(mcode_t code);
// Swap a conditional opcode.
// Only conditional jumps and set instructions can be swapped.
// The returned opcode the one required for swapped operands.
// Example "x > y" is the same as "y < x", therefore swap(m_jg) is m_jl.
// If the opcode cannot be swapped, return m_nop
THREAD_SAFE mcode_t hexapi swap_mcode_relation(mcode_t code);
// Return the opcode that performs signed operation.
// Examples: jae -> jge; udiv -> sdiv
// If the opcode cannot be transformed into signed form, simply return it.
THREAD_SAFE mcode_t hexapi get_signed_mcode(mcode_t code);
// Return the opcode that performs unsigned operation.
// Examples: jl -> jb; xds -> xdu
// If the opcode cannot be transformed into unsigned form, simply return it.
THREAD_SAFE mcode_t hexapi get_unsigned_mcode(mcode_t code);
// Does the opcode perform a signed operation?
inline THREAD_SAFE bool is_signed_mcode(mcode_t code) { return get_unsigned_mcode(code) != code; }
// Does the opcode perform a unsigned operation?
inline THREAD_SAFE bool is_unsigned_mcode(mcode_t code) { return get_signed_mcode(code) != code; }
// Does the 'd' operand gets modified by the instruction?
// Example: "add l,r,d" modifies d, while instructions
// like jcnd, ijmp, stx does not modify it.
// Note: this function returns 'true' for m_ext but it may be wrong.
// Use minsn_t::modifes_d() if you have minsn_t.
THREAD_SAFE bool hexapi mcode_modifies_d(mcode_t mcode);
// Processor condition codes are mapped to the first microregisters
// The order is important, see mop_t::is_cc()
const mreg_t mr_none = mreg_t(-1);
const mreg_t mr_cf = mreg_t(0); // carry bit
const mreg_t mr_zf = mreg_t(1); // zero bit
const mreg_t mr_sf = mreg_t(2); // sign bit
const mreg_t mr_of = mreg_t(3); // overflow bit
const mreg_t mr_pf = mreg_t(4); // parity bit
const int cc_count = mr_pf - mr_cf + 1; // number of condition code registers
const mreg_t mr_cc = mreg_t(5); // synthetic condition code, used internally
const mreg_t mr_first = mreg_t(8); // the first processor specific register
//-------------------------------------------------------------------------
/// Operand locator.
/// It is used to denote a particular operand in the ctree, for example,
/// when the user right clicks on a constant and requests to represent it, say,
/// as a hexadecimal number.
struct operand_locator_t
{
private:
// forbid the default constructor, force the user to initialize objects of this class.
operand_locator_t(void) {}
public:
ea_t ea; ///< address of the original processor instruction
int opnum; ///< operand number in the instruction
operand_locator_t(ea_t _ea, int _opnum) : ea(_ea), opnum(_opnum) {}
DECLARE_COMPARISONS(operand_locator_t);
HEXRAYS_MEMORY_ALLOCATION_FUNCS()
};
//-------------------------------------------------------------------------
/// Number representation.
/// This structure holds information about a number format.
struct number_format_t
{
HEXRAYS_MEMORY_ALLOCATION_FUNCS()
flags_t flags; ///< ida flags, which describe number radix, enum, etc
char opnum; ///< operand number: 0..UA_MAXOP
char props; ///< properties: combination of NF_ bits (\ref NF_)
/// \defgroup NF_ Number format property bits
/// Used in number_format_t::props
//@{
#define NF_FIXED 0x01 ///< number format has been defined by the user
#define NF_NEGDONE 0x02 ///< temporary internal bit: negation has been performed
#define NF_BINVDONE 0x04 ///< temporary internal bit: inverting bits is done
#define NF_NEGATE 0x08 ///< The user asked to negate the constant
#define NF_BITNOT 0x10 ///< The user asked to invert bits of the constant
#define NF_STROFF 0x20 ///< internal bit: used as stroff, valid iff is_stroff()
//@}
uchar serial; ///< for enums: constant serial number
char org_nbytes; ///< original number size in bytes
qstring type_name; ///< for stroffs: structure for offsetof()\n
///< for enums: enum name
/// Contructor
number_format_t(int _opnum = 0)
: flags(0), opnum(char(_opnum)), props(0), serial(0), org_nbytes(0) {}
/// Get number radix
/// \return 2,8,10, or 16
int get_radix(void) const { return ::get_radix(flags, opnum); }
/// Is number representation fixed?
/// Fixed representation cannot be modified by the decompiler
bool is_fixed(void) const { return props != 0; }
/// Is a hexadecimal number?
bool is_hex(void) const { return ::is_numop(flags, opnum) && get_radix() == 16; }
/// Is a decimal number?
bool is_dec(void) const { return ::is_numop(flags, opnum) && get_radix() == 10; }
/// Is a octal number?
bool is_oct(void) const { return ::is_numop(flags, opnum) && get_radix() == 8; }
/// Is a symbolic constant?
bool is_enum(void) const { return ::is_enum(flags, opnum); }
/// Is a character constant?
bool is_char(void) const { return ::is_char(flags, opnum); }
/// Is a structure field offset?
bool is_stroff(void) const { return ::is_stroff(flags, opnum); }
/// Is a number?
bool is_numop(void) const { return !is_enum() && !is_char() && !is_stroff(); }
/// Does the number need to be negated or bitwise negated?
/// Returns true if the user requested a negation but it is not done yet
bool needs_to_be_inverted(void) const
{
return (props & (NF_NEGATE | NF_BITNOT)) != 0 // the user requested it
&& (props & (NF_NEGDONE | NF_BINVDONE)) == 0; // not done yet
}
};
// Number formats are attached to (ea,opnum) pairs
typedef std::map<operand_locator_t, number_format_t> user_numforms_t;
//-------------------------------------------------------------------------
/// Base helper class to convert binary data structures into text.
/// Other classes are derived from this class.
struct vd_printer_t
{
qstring tmpbuf;
int hdrlines; ///< number of header lines (prototype+typedef+lvars)
///< valid at the end of print process
/// Print.
/// This function is called to generate a portion of the output text.
/// The output text may contain color codes.
/// \return the number of printed characters
/// \param indent number of spaces to generate as prefix
/// \param format printf-style format specifier
/// \return length of printed string
AS_PRINTF(3, 4) virtual int hexapi print(int indent, const char* format, ...);
HEXRAYS_MEMORY_ALLOCATION_FUNCS()
};
/// Helper class to convert cfunc_t into text.
struct vc_printer_t : public vd_printer_t
{
const cfunc_t* func; ///< cfunc_t to generate text for
char lastchar; ///< internal: last printed character
/// Constructor
vc_printer_t(const cfunc_t* f) : func(f), lastchar(0) {}
/// Are we generating one-line text representation?
/// \return \c true if the output will occupy one line without line breaks
virtual bool idaapi oneliner(void) const { return false; }
};
/// Helper class to convert binary data structures into text and put into a file.
struct file_printer_t : public vd_printer_t
{
FILE* fp; ///< Output file pointer
/// Print.
/// This function is called to generate a portion of the output text.
/// The output text may contain color codes.
/// \return the number of printed characters
/// \param indent number of spaces to generate as prefix
/// \param format printf-style format specifier
/// \return length of printed string
AS_PRINTF(3, 4) int hexapi print(int indent, const char* format, ...);
/// Constructor
file_printer_t(FILE* _fp) : fp(_fp) {}
};
/// Helper class to convert cfunc_t into a text string
struct qstring_printer_t : public vc_printer_t
{
bool with_tags; ///< Generate output with color tags
qstring& s; ///< Reference to the output string
/// Constructor
qstring_printer_t(const cfunc_t* f, qstring& _s, bool tags)
: vc_printer_t(f), with_tags(tags), s(_s) {}
/// Print.
/// This function is called to generate a portion of the output text.
/// The output text may contain color codes.
/// \return the number of printed characters
/// \param indent number of spaces to generate as prefix
/// \param format printf-style format specifier
/// \return length of the printed string
AS_PRINTF(3, 4) int hexapi print(int indent, const char* format, ...);
};
//-------------------------------------------------------------------------
/// \defgroup type Type string related declarations
/// Type related functions and class.
//@{
/// Print the specified type info.
/// This function can be used from a debugger by typing "tif->dstr()"
const char* hexapi dstr(const tinfo_t* tif);
/// Verify a type string.
/// \return true if type string is correct
bool hexapi is_type_correct(const type_t* ptr);
/// Is a small structure or union?
/// \return true if the type is a small UDT (user defined type).
/// Small UDTs fit into a register (or pair or registers) as a rule.
bool hexapi is_small_udt(const tinfo_t& tif);
/// Is definitely a non-boolean type?
/// \return true if the type is a non-boolean type (non bool and well defined)
bool hexapi is_nonbool_type(const tinfo_t& type);
/// Is a boolean type?
/// \return true if the type is a boolean type
bool hexapi is_bool_type(const tinfo_t& type);
/// Is a pointer or array type?
inline THREAD_SAFE bool is_ptr_or_array(type_t t)
{
return is_type_ptr(t) || is_type_array(t);
}
/// Is a pointer, array, or function type?
inline THREAD_SAFE bool is_paf(type_t t)
{
return is_ptr_or_array(t) || is_type_func(t);
}
/// Is struct/union/enum definition (not declaration)?
inline THREAD_SAFE bool is_inplace_def(const tinfo_t& type)
{
return type.is_decl_complex() && !type.is_typeref();
}
/// Calculate number of partial subtypes.
/// \return number of partial subtypes. The bigger is this number, the uglier is the type.
int hexapi partial_type_num(const tinfo_t& type);
/// Get a type of a floating point value with the specified width
/// \returns type info object
/// \param width width of the desired type
tinfo_t hexapi get_float_type(int width);
/// Create a type info by width and sign.
/// Returns a simple type (examples: int, short) with the given width and sign.
/// \param srcwidth size of the type in bytes
/// \param sign sign of the type
tinfo_t hexapi get_int_type_by_width_and_sign(int srcwidth, type_sign_t sign);
/// Create a partial type info by width.
/// Returns a partially defined type (examples: _DWORD, _BYTE) with the given width.
/// \param size size of the type in bytes
tinfo_t hexapi get_unk_type(int size);
/// Generate a dummy pointer type
/// \param ptrsize size of pointed object
/// \param isfp is floating point object?
tinfo_t hexapi dummy_ptrtype(int ptrsize, bool isfp);
/// Get type of a structure field.
/// This function performs validity checks of the field type. Wrong types are rejected.
/// \param mptr structure field
/// \param type pointer to the variable where the type is returned. This parameter can be NULL.
/// \return false if failed