In reinforcement learning, generating a trajectory of model actions and outcomes used to compute training rewards.
← All terms