Sampling from replay buffer with bias? #33

tangwentlw · 2024-04-23T05:16:01Z

Hi,

Thank you for the excellent implementation. I have a quick question regarding sampling from the replay buffer. In the following code from LazyMemory class, you added a bias term to the randomly generated index when the replay buffer is full (bias = -self._p if self._n == self.capacity else 0). Is there any particular reason for doing this? My understanding is that, since the indexes are uniformly generated, adding a bias term would not make any difference compared to not adding the bias. Am I right? Or is there anything that I missed?

    def sample(self, batch_size):
        indices = np.random.randint(low=0, high=len(self), size=batch_size)
        return self._sample(indices, batch_size)

    def _sample(self, indices, batch_size):
        bias = -self._p if self._n == self.capacity else 0

        states = np.empty(
            (batch_size, *self.state_shape), dtype=np.uint8)
        next_states = np.empty(
            (batch_size, *self.state_shape), dtype=np.uint8)

        for i, index in enumerate(indices):
            _index = np.mod(index+bias, self.capacity)
            states[i, ...] = self['state'][_index]
            next_states[i, ...] = self['next_state'][_index]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling from replay buffer with bias? #33

Sampling from replay buffer with bias? #33

tangwentlw commented Apr 23, 2024

Sampling from replay buffer with bias? #33

Sampling from replay buffer with bias? #33

Comments

tangwentlw commented Apr 23, 2024