2025-10-11

Where we talk about world models and robotics.

Papers

World models are machine learning algorithms that learn to represent a “world”—they summarize an agent’s experience to enable learning of complex behaviors. They fit in the reinforcement learning category and, contrary to actor-critic algorithms that learn online or by experience replay, world models interpolate past experiences. A key paper in the domain if Dream to control: Learning behaviors by latent imagaination (Hafner et al).

In Dreamer, behavior is learned by predicting possible trajectories in the compact latent space of the world model. The world model is learned similarly to a non-linear Kalman filter where the next state is predicted by a “transition model”. At the same time, a representation model encodes observations and actions in continuous vectors, and a reward model predicts the rewards given the model states. using this representation, working on the latent space enable the model to not have to observe or render the corresponding images/future. Thus latent imagination requires a representation model, a transition model, and a reward model.

Three possible method for learning a world model include: reward prediction (learning to predict future rewards given actions and past observations), image reconstruction (learning by reconstruction images observed), and constrastive estimartion (learning by predicting the state from the images instead of predicting the images themselves).

In robotics, those can be used to bypass the need for a physical simulator (Wu et al). Here, the world model is learned first as a “general” representation; the behavior is then learned second from roll-out that are entirely done in the latent space (avoid costly decoding of observation and speeding up parallelization).

In this case the learner trains the world model and actor-critic behavior independently from the actor thread predicting actions for the environment interactions, enabling fast learning and interactions.