Home

In multi-objective reinforcement learning (MORL), agents are typically dealing with multiple objectives that need to be optimized simultaneously. These objectives might sometimes be conflicting, requiring a method to balance them. Two common approaches to handling multiple objectives are Expected Scalarised Returns (ESR) and Scalarised Expected Returns (SER).

1. Expected Scalarised Returns (ESR)

ESR involves first scalarizing the vector of rewards at each time step (i.e., converting the multi-objective reward into a single scalar reward) and then calculating the expected return over the scalarized rewards.

Example:

Suppose an agent has two objectives: ( r_1 ) (maximize energy efficiency) and ( r_2 ) (maximize speed). A simple scalarization function might be a weighted sum of the two objectives:

[ r_\text{scalar} = w_1 \cdot r_1 + w_2 \cdot r_2 ]

Where ( w_1 ) and ( w_2 ) are the weights assigned to each objective.

In ESR, the agent first computes the scalar reward at each step:

[ r_\text{scalar}(t) = w_1 \cdot r_1(t) + w_2 \cdot r_2(t) ]

Then, the agent computes the expected return based on these scalarized rewards:

[ \text{ESR} = \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t \cdot r_\text{scalar}(t)\right] ]

Here, the expectation ( \mathbb{E} ) is taken over the distribution of possible trajectories.

2. Scalarised Expected Returns (SER)

SER flips the order of operations: first, the expected return for each objective is calculated independently, and then these expected returns are scalarized.

Example:

Using the same two objectives ( r_1 ) and ( r_2 ):

First, calculate the expected return for each objective separately:

[ \mathbb{E}\left[ R_1 \right] = \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t \cdot r_1(t)\right] ] [ \mathbb{E}\left[ R_2 \right] = \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t \cdot r_2(t)\right] ]

Then, scalarize these expected returns:

[ \text{SER} = w_1 \cdot \mathbb{E}\left[ R_1 \right] + w_2 \cdot \mathbb{E}\left[ R_2 \right] ]

Key Difference:

ESR first combines the objectives at each time step (via scalarization) and then calculates the expected return.
SER first computes the expected return for each objective individually and then combines these returns.

This difference can lead to different outcomes and preferences, especially in environments with high variability or where trade-offs between objectives need to be carefully managed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

1. Expected Scalarised Returns (ESR)

Example:

2. Scalarised Expected Returns (SER)

Example:

Key Difference:

Clone this wiki locally