diff --git a/en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.html b/en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.html index e667857..728b8c6 100644 --- a/en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.html +++ b/en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.html @@ -14826,7 +14826,7 @@
In the sequel are notations throughout the book. We also occasionally follow other notations defined locally.
English Letters | Description |
---|---|
advantage | |
action | |
action space | |
baseline in policy gradient; numerical belief in partially observable tasks; (lower case only) bonus; behavior policy in off-policy learning | |
belief in partially observable tasks | |
𝔅 | Bellman expectation operator of policy |
𝔅 | Bellman optimal operator (upper case only used in distributional RL) |
a batch of transition generated by experience replay; belief space in partially observable tasks | |
belief space with terminal belief in partially observable tasks | |
counting; coefficients in linear programming | |
metrics | |
KL divergence | |
JS divergence | |
total variation | |
indicator of episode end | |
set of experience | |
eligibility trace | |
expectation | |
𝔣 | a mapping |
Fisher information matrix | |
return | |
gradient vector | |
action preference | |
entropy | |
index of iteration | |
loss | |
probability, dynamics | |
transition matrix | |
observation probability in partially observable tasks; infinitesimal in asymptotic notations | |
infinite in asymptotic notations | |
observation | |
probability | |
action value | |
action value of policy | |
optimal action values (upper case only used in distributional RL) | |
vector representation of action values | |
reward | |
reward space | |
state | |
state space | |
state space with terminal state | |
steps in an episode | |
𝔲 | belief update operator in partially observable tasks |
TD target; (lower case only) upper bound | |
state value | |
state value of the policy | |
optimal state values (upper case only used in distributional RL) | |
vector representation of state values | |
variance | |
parameters of value function estimate | |
an event | |
event space | |
parameters for eligibility trace | |
Greek Letters | Description |
learning rate | |
reinforce strength in eligibility trace; distortion function in distributional RL | |
discount factor | |
TD error | |
parameters for exploration | |
decay strength of eligibility trace | |
policy | |
optimal policy | |
expert policy in imitation learning | |
parameters for policy function estimates | |
threshold for value iteration | |
visitation frequency; important sampling ratio in off-policy learning | |
vector representation of visitation frequency | |
sojourn time of SMDP | |
trajectory | |
accumulated probability in distribution RL; (lower case only) conditional probability for partially observable tasks | |
Generalized Advantage Estimate (GAE) | |
Other Notations | Description |
share the same distribution | |
equal almost everywhere | |
compare numbers; element-wise comparison | |
partial order of policy | |
absolute continuous | |
empty set | |
gradient | |
obey a distribution | |
absolute value of a real number; element-wise absolute values of a vector or a matrix; the number of elements in a set |
In the sequel are notations throughout the book. We also occasionally follow other notations defined locally.
English Letters | Description |
---|---|
advantage | |
action | |
action space | |
baseline in policy gradient; numerical belief in partially observable tasks; (lower case only) bonus; behavior policy in off-policy learning | |
belief in partially observable tasks | |
𝔅 | Bellman expectation operator of policy |
𝔅 | Bellman optimal operator (upper case only used in distributional RL) |
a batch of transition generated by experience replay; belief space in partially observable tasks | |
belief space with terminal belief in partially observable tasks | |
counting; coefficients in linear programming | |
metrics | |
KL divergence | |
JS divergence | |
total variation | |
indicator of episode end | |
set of experience | |
eligibility trace | |
expectation | |
𝔣 | a mapping |
Fisher information matrix | |
return | |
gradient vector | |
action preference | |
entropy | |
index of iteration | |
loss | |
probability, dynamics | |
transition matrix | |
observation probability in partially observable tasks; infinitesimal in asymptotic notations | |
infinite in asymptotic notations | |
observation | |
probability | |
action value | |
action value of policy | |
optimal action values (upper case only used in distributional RL) | |
vector representation of action values | |
reward | |
reward space | |
state | |
state space | |
state space with terminal state | |
steps in an episode | |
trajectory | |
𝔲 | belief update operator in partially observable tasks |
TD target; (lower case only) upper bound | |
state value | |
state value of the policy | |
optimal state values (upper case only used in distributional RL) | |
vector representation of state values | |
variance | |
parameters of value function estimate | |
an event | |
event space | |
parameters for eligibility trace | |
Greek Letters | Description |
learning rate | |
reinforce strength in eligibility trace; distortion function in distributional RL | |
discount factor | |
TD error | |
parameters for exploration | |
decay strength of eligibility trace | |
policy | |
optimal policy | |
expert policy in imitation learning | |
parameters for policy function estimates | |
threshold for value iteration | |
visitation frequency; important sampling ratio in off-policy learning | |
vector representation of visitation frequency | |
sojourn time of SMDP | |
accumulated probability in distribution RL; (lower case only) conditional probability for partially observable tasks | |
Generalized Advantage Estimate (GAE) | |
Other Notations | Description |
share the same distribution | |
equal almost everywhere | |
compare numbers; element-wise comparison | |
partial order of policy | |
absolute continuous | |
empty set | |
gradient | |
obey a distribution | |
absolute value of a real number; element-wise absolute values of a vector or a matrix; the number of elements in a set |