Skip to content

Commit

Permalink
i#6662 public traces, part 1: synthetic ISA (#6691)
Browse files Browse the repository at this point in the history
A synthetic ISA that has the purpose of preserving register dependencies
and giving hints on the type of operation an instruction performs. This PR
implements the encoding/decoding functionalities for this new ISA, which we
call #DR_ISA_REGDEPS.

Note that being a synthetic ISA, some routines that work on instructions
coming from an actual ISA (such as #DR_ISA_AMD64) are not supported (e.g.,
decode_sizeof()).

Currently we support:
- instr_convert_to_isa_regdeps(): to convert an #instr_t of an actual ISA to a
   #DR_ISA_REGDEPS #instr_t.
- instr_encode() and instr_encode_to_copy(): to encode a #DR_ISA_REGDEPS
   #instr_t into a sequence of contiguous bytes.
- decode() and decode_from_copy(): to decode an encoded #DR_ISA_REGDEPS
   instruction into an #instr_t.

A #DR_ISA_REGDEPS #instr_t contains the following information:
- categories: composed by #dr_instr_category_t values, they indicate the type of
   operation performed (e.g., a load, a store, a floating point math operation, a
   branch, etc.). Note that categories are composable, hence more than one
   category can be set. This information can be obtained using instr_get_category().
- arithmetic flags: we don't distinguish between different flags, we only report if
   at least one arithmetic flag was read (all arithmetic flags will be set to read)
   and/or written (all arithmetic flags will be set to written). This information
   can be obtained using instr_get_arith_flags().
- number of source and destination operands: we only consider register operands.
   This information can be obtained using instr_num_srcs() and instr_num_dsts().
- source operation size: is the largest source operand the instruction operates on.
   This information can be obtained by accessing the #instr_t operation_size field.
- list of register operand identifiers: they are contained in #opnd_t lists,
   separated in source and destination. Note that these #reg_id_t identifiers are
   virtual and it should not be assumed that they belong to any DR_REG_ enum value
   of any specific architecture. These identifiers are meant for tracking register
   dependencies with respect to other #DR_ISA_REGDEPS instructions only.
   These lists can be obtained by walking the #instr_t operands with instr_get_dst() and
   instr_get_src().
- ISA mode: is always #DR_ISA_REGDEPS. This information can be obtained using
   instr_get_isa_mode().
- encoding bytes: an array of bytes containing the #DR_ISA_REGDEPS #instr_t
   encoding. Note that this information is present only for decoded instructions
   (i.e., #instr_t generated by decode() or decode_from_copy()). This information
   can be obtained using instr_get_raw_bits().
- length: the length of the encoded instruction in bytes. Note that this information
   is present only for decoded instructions (i.e., #instr_t generated by decode() or
   decode_from_copy()). This information can be obtained by accessing
   the #instr_t length field.

Note that all routines that operate on #instr_t and #opnd_t are also supported for
#DR_ISA_REGDEPS instructions. However, querying information outside of those
described above (e.g., the instruction opcode with instr_get_opcode()) will return
the zeroed value set by instr_create() or instr_init() when the #instr_t was
created (e.g., instr_get_opcode() would return OP_INVALID).
  • Loading branch information
edeiana authored Apr 10, 2024
1 parent 59a2c38 commit 0343305
Show file tree
Hide file tree
Showing 24 changed files with 1,184 additions and 26 deletions.
4 changes: 4 additions & 0 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,10 @@ Further non-compatibility-affecting changes include:
is set to true by default to match the existing behavior of the invariant checker.
- Added a new instr API instr_is_xrstor() that tells whether an instruction is any
variant of the x86 xrstor opcode.
- Added a new #dr_isa_mode_t: #DR_ISA_REGDEPS, which is a synthetic ISA with the main
purpose of preserving register dependencies.
- Added instr_convert_to_isa_regdeps() API that converts an #instr_t from a real ISA
(e.g., #DR_ISA_AMD64) to the #DR_ISA_REGDEPS synthetic ISA.


**************************************************
Expand Down
2 changes: 2 additions & 0 deletions core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ set(DECODER_SRCS
ir/${ARCH_NAME}/decode.c
ir/encode_shared.c
ir/${ARCH_NAME}/encode.c
ir/isa_regdeps/encode.c
ir/isa_regdeps/decode.c
ir/disassemble_shared.c
ir/${ARCH_NAME}/disassemble.c
ir/ir_utils_shared.c
Expand Down
10 changes: 10 additions & 0 deletions core/ir/aarch64/codec.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@

#include <stdint.h>
#include "../globals.h"
#include "../isa_regdeps/decode.h"
#include "arch.h"
#include "decode.h"
#include "disassemble.h"
#include "encode_api.h"
#include "instr.h"
#include "instr_create_shared.h"

Expand Down Expand Up @@ -9721,6 +9723,14 @@ decode_category(uint encoding, instr_t *instr)
byte *
decode_common(dcontext_t *dcontext, byte *pc, byte *orig_pc, instr_t *instr)
{
/* #DR_ISA_REGDEPS synthetic ISA has its own decoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the decoding of other architectures with this synthetic ISA special
* case.
*/
if (dr_get_isa_mode(dcontext) == DR_ISA_REGDEPS)
return decode_isa_regdeps(dcontext, pc, instr);

byte *next_pc = pc + 4;
uint enc = *(uint *)pc;
uint eflags = 0;
Expand Down
3 changes: 2 additions & 1 deletion core/ir/aarch64/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
*/

#include "../globals.h"
#include "encode_api.h"
#include "instr.h"
#include "decode.h"
#include "decode_fast.h" /* ensure we export decode_next_pc, decode_sizeof */
Expand All @@ -41,7 +42,7 @@
bool
is_isa_mode_legal(dr_isa_mode_t mode)
{
return (mode == DR_ISA_ARM_A64);
return (mode == DR_ISA_ARM_A64 || mode == DR_ISA_REGDEPS);
}

app_pc
Expand Down
4 changes: 2 additions & 2 deletions core/ir/aarch64/instr.c
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@
bool
instr_set_isa_mode(instr_t *instr, dr_isa_mode_t mode)
{
if (mode != DR_ISA_ARM_A64)
if (mode != DR_ISA_ARM_A64 && mode != DR_ISA_REGDEPS)
return false;
instr->isa_mode = DR_ISA_ARM_A64;
instr->isa_mode = mode;
return true;
}

Expand Down
12 changes: 11 additions & 1 deletion core/ir/arm/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
*/

#include "../globals.h"
#include "../isa_regdeps/decode.h"
#include "encode_api.h"
#include "instr.h"
#include "decode.h"
#include "decode_private.h"
Expand Down Expand Up @@ -172,7 +174,7 @@ decode_in_it_block(decode_state_t *state, app_pc pc, decode_info_t *di)
bool
is_isa_mode_legal(dr_isa_mode_t mode)
{
return (mode == DR_ISA_ARM_THUMB || DR_ISA_ARM_A32);
return (mode == DR_ISA_ARM_THUMB || mode == DR_ISA_ARM_A32 || mode == DR_ISA_REGDEPS);
}

/* We need to call canonicalize_pc_target() on all next_tag-writing
Expand Down Expand Up @@ -2428,6 +2430,14 @@ decode_opcode(dcontext_t *dcontext, byte *pc, instr_t *instr)
static byte *
decode_common(dcontext_t *dcontext, byte *pc, byte *orig_pc, instr_t *instr)
{
/* #DR_ISA_REGDEPS synthetic ISA has its own decoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the decoding of other architectures with this synthetic ISA special
* case.
*/
if (dr_get_isa_mode(dcontext) == DR_ISA_REGDEPS)
return decode_isa_regdeps(dcontext, pc, instr);

const instr_info_t *info = &invalid_instr;
decode_info_t di;
byte *next_pc;
Expand Down
2 changes: 1 addition & 1 deletion core/ir/arm/instr.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
bool
instr_set_isa_mode(instr_t *instr, dr_isa_mode_t mode)
{
if (mode != DR_ISA_ARM_THUMB && mode != DR_ISA_ARM_A32) {
if (mode != DR_ISA_ARM_THUMB && mode != DR_ISA_ARM_A32 && mode != DR_ISA_REGDEPS) {
return false;
}
instr->isa_mode = mode;
Expand Down
88 changes: 81 additions & 7 deletions core/ir/encode_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,87 @@

/** Specifies which processor mode to use when decoding or encoding. */
typedef enum _dr_isa_mode_t {
DR_ISA_IA32, /**< IA-32 (Intel/AMD 32-bit mode). */
DR_ISA_X86 = DR_ISA_IA32, /**< Alias for DR_ISA_IA32. */
DR_ISA_AMD64, /**< AMD64 (Intel/AMD 64-bit mode). */
DR_ISA_ARM_A32, /**< ARM A32 (AArch32 ARM). */
DR_ISA_ARM_THUMB, /**< Thumb (ARM T32). */
DR_ISA_ARM_A64, /**< ARM A64 (AArch64). */
DR_ISA_RV64IMAFDC, /**< RISC-V (rv64imafdc). */
/**
* IA-32 (Intel/AMD 32-bit mode).
*/
DR_ISA_IA32,
/**
* Alias for DR_ISA_IA32.
*/
DR_ISA_X86 = DR_ISA_IA32,
/**
* AMD64 (Intel/AMD 64-bit mode).
*/
DR_ISA_AMD64,
/**
* ARM A32 (AArch32 ARM).
*/
DR_ISA_ARM_A32,
/**
* Thumb (ARM T32).
*/
DR_ISA_ARM_THUMB,
/**
* ARM A64 (AArch64).
*/
DR_ISA_ARM_A64,
/**
* RISC-V (rv64imafdc).
*/
DR_ISA_RV64IMAFDC,
/**
* A synthetic ISA that has the purpose of preserving register dependencies and giving
* hints on the type of operation an instruction performs.
*
* Being a synthetic ISA, some routines that work on instructions coming from an
* actual ISA (such as #DR_ISA_AMD64) are not supported (e.g., decode_sizeof()).
*
* Currently we support:
* - instr_convert_to_isa_regdeps(): to convert an #instr_t of an actual ISA to a
* #DR_ISA_REGDEPS #instr_t.
* - instr_encode() and instr_encode_to_copy(): to encode a #DR_ISA_REGDEPS #instr_t
* into a sequence of contiguous bytes.
* - decode() and decode_from_copy(): to decode an encoded #DR_ISA_REGDEPS instruction
* into an #instr_t.
*
* A #DR_ISA_REGDEPS #instr_t contains the following information:
* - categories: composed by #dr_instr_category_t values, they indicate the type of
* operation performed (e.g., a load, a store, a floating point math operation, a
* branch, etc.). Note that categories are composable, hence more than one category
* can be set. This information can be obtained using instr_get_category().
* - arithmetic flags: we don't distinguish between different flags, we only report if
* at least one arithmetic flag was read (all arithmetic flags will be set to read)
* and/or written (all arithmetic flags will be set to written). This information
* can be obtained using instr_get_arith_flags().
* - number of source and destination operands: we only consider register operands.
* This information can be obtained using instr_num_srcs() and instr_num_dsts().
* - source operation size: is the largest source operand the instruction operates on.
* This information can be obtained by accessing the #instr_t operation_size field.
* - list of register operand identifiers: they are contained in #opnd_t lists,
* separated in source and destination. Note that these #reg_id_t identifiers are
* virtual and it should not be assumed that they belong to any DR_REG_ enum value
* of any specific architecture. These identifiers are meant for tracking register
* dependencies with respect to other #DR_ISA_REGDEPS instructions only. These
* lists can be obtained by walking the #instr_t operands with instr_get_dst() and
* instr_get_src().
* - ISA mode: is always #DR_ISA_REGDEPS. This information can be obtained using
* instr_get_isa_mode().
* - encoding bytes: an array of bytes containing the #DR_ISA_REGDEPS #instr_t
* encoding. Note that this information is present only for decoded instructions
* (i.e., #instr_t generated by decode() or decode_from_copy()). This information
* can be obtained using instr_get_raw_bits().
* - length: the length of the encoded instruction in bytes. Note that this
* information is present only for decoded instructions (i.e., #instr_t generated by
* decode() or decode_from_copy()). This information can be obtained by accessing
* the #instr_t length field.
*
* Note that all routines that operate on #instr_t and #opnd_t are also supported for
* #DR_ISA_REGDEPS instructions. However, querying information outside of those
* described above (e.g., the instruction opcode with instr_get_opcode()) will return
* the zeroed value set by instr_create() or instr_init() when the #instr_t was
* created (e.g., instr_get_opcode() would return OP_INVALID).
*/
DR_ISA_REGDEPS,
} dr_isa_mode_t;

DR_API
Expand Down
29 changes: 24 additions & 5 deletions core/ir/encode_shared.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
/* encode_shared.c -- cross-platform encodingn routines */

#include "../globals.h"
#include "isa_regdeps/encode.h"
#include "arch.h"
#include "instr.h"
#include "decode.h"
Expand Down Expand Up @@ -111,28 +112,46 @@ get_encoding_info(instr_t *instr)
return info;
}

static byte *
instr_encode_common(dcontext_t *dcontext, instr_t *instr, byte *copy_pc, byte *final_pc,
bool check_reachable,
bool *has_instr_opnds /*OUT OPTIONAL*/
_IF_DEBUG(bool assert_reachable))
{
/* #DR_ISA_REGDEPS synthetic ISA has its own encoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the encoding of other architectures with this synthetic ISA special
* case.
*/
if (instr_get_isa_mode(instr) == DR_ISA_REGDEPS)
return encode_isa_regdeps(dcontext, instr, copy_pc);

return instr_encode_arch(dcontext, instr, copy_pc, final_pc, check_reachable,
has_instr_opnds _IF_DEBUG(assert_reachable));
}

/* completely ignores reachability and predication failures */
byte *
instr_encode_ignore_reachability(dcontext_t *dcontext, instr_t *instr, byte *pc)
{
return instr_encode_arch(dcontext, instr, pc, pc, false, NULL _IF_DEBUG(false));
return instr_encode_common(dcontext, instr, pc, pc, false, NULL _IF_DEBUG(false));
}

/* just like instr_encode but doesn't assert on reachability or predication failures */
byte *
instr_encode_check_reachability(dcontext_t *dcontext, instr_t *instr, byte *pc,
bool *has_instr_opnds /*OUT OPTIONAL*/)
{
return instr_encode_arch(dcontext, instr, pc, pc, true,
has_instr_opnds _IF_DEBUG(false));
return instr_encode_common(dcontext, instr, pc, pc, true,
has_instr_opnds _IF_DEBUG(false));
}

byte *
instr_encode_to_copy(void *drcontext, instr_t *instr, byte *copy_pc, byte *final_pc)
{
dcontext_t *dcontext = (dcontext_t *)drcontext;
return instr_encode_arch(dcontext, instr, copy_pc, final_pc, true,
NULL _IF_DEBUG(true));
return instr_encode_common(dcontext, instr, copy_pc, final_pc, true,
NULL _IF_DEBUG(true));
}

byte *
Expand Down
28 changes: 26 additions & 2 deletions core/ir/instr_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -298,10 +298,19 @@ struct _instr_t {

uint opcode;

union {
# ifdef X86
/* PR 251479: offset into instr's raw bytes of rip-relative 4-byte displacement */
byte rip_rel_pos;
/* Offset into instr's raw bytes of rip-relative 4-byte displacement.
* This field is valid when instr_t isa_mode is DR_ISA_X86.
*/
byte rip_rel_pos;
# endif
/* Size of source data (i.e., read) a DR_ISA_REGDEPS instruction operates on.
* This field is valid when instr_t isa_mode is DR_ISA_REGDEPS.
* Note that opnd_size_t is an alias of byte.
*/
opnd_size_t operation_size;
};

/* we dynamically allocate dst and src arrays b/c x86 instrs can have
* up to 8 of each of them, but most have <=2 dsts and <=3 srcs, and we
Expand Down Expand Up @@ -2096,6 +2105,21 @@ DR_API
instr_t *
instr_convert_short_meta_jmp_to_long(void *drcontext, instrlist_t *ilist, instr_t *instr);

DR_API
/**
* Converts a real ISA (e.g., #DR_ISA_AMD64) instruction \p instr_real_isa into a
* #DR_ISA_REGDEPS instruction and stores it into \p instr_regdeps_isa.
* Assumes \p instr_regdeps_isa has been allocated by the caller (e.g., using
* instr_create()).
* Assumes \p instr_real_isa is a fully-decoded or synthesized instruction of a real ISA
* with valid operand information.
* \note \p instr_regdeps_isa will contain only the information of a #DR_ISA_REGDEPS
* synthetic instruction.
*/
void
instr_convert_to_isa_regdeps(void *drcontext, instr_t *instr_real_isa,
instr_t *instr_regdeps_isa);

DR_API
/**
* Given \p eflags, returns whether or not the conditional branch, \p
Expand Down
Loading

0 comments on commit 0343305

Please sign in to comment.