samplebatch
SampleBatch is a term used in reinforcement learning to describe a data structure that holds a batch of experience samples collected from interactions between agents and their environment. It serves as a lightweight container for the information needed to train policy and value networks. A typical sample batch includes fields such as observations, actions, rewards, next observations, and done flags, along with optional auxiliary data like action probabilities, infos, or agent identifiers in multi-agent settings.
In practice, a sample batch is implemented as a dictionary-like object or a small class that stores
Design considerations for sample batches emphasize flexibility and efficiency. They must accommodate on-policy and off-policy algorithms,