Learning Machines: Robobo Robotics

Optimizing Robobo robot behavior for obstacle avoidance, item foraging, and item retrieval using CMA-ES evolutionary strategies in simulation and hardware.

April 1, 2024

RoboticsEvolutionary AlgorithmsCMA-ESPython

Overview

Course project for Learning Machines at Vrije Universiteit Amsterdam (M.Sc. AI). Investigated the effectiveness of the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) algorithm for optimizing robot behavior across three tasks of increasing complexity. Training was performed in CoppeliaSim simulation, then transferred to physical Robobo hardware.

The Robot

The Robobo robot is equipped with eight infrared sensors (five front, three rear) and a phone-mounted camera. Its action space is limited to controlling individual left/right wheel speeds and movement duration — no pre-programmed action sequences. This constrained action space makes the optimization problem challenging.

Three Tasks

Task 1: Obstacle Avoidance (left), Task 2: Item Foraging (middle), Task 3: Item Retrieval (right)

Task 2: Foraging Arena with Food Sources

Task 3: Single Target Retrieval

Task 1 — Obstacle Avoidance: Navigate an environment filled with obstacles, maximizing exploration while minimizing collisions. Used only the eight IR sensors as input to a small neural network optimized by CMA-ES.

Task 2 — Item Foraging: Locate and collect multiple food items (green blocks) scattered across the arena. Extended the input to include camera data, segmented into 3 or 9 zones to test how visual resolution affects learning speed and accuracy.

Task 3 — Item Retrieval: Find and push a single target item to a specific location. Tested with both static and randomized object positions to evaluate generalization of learned behaviors.

Approach

CMA-ES was chosen over deep reinforcement learning because it doesn't require gradient information and handles noisy sensor-action relationships well
Each task had a custom fitness/reward function guiding the evolutionary optimization
Neural network controllers of varying sizes were evolved, with the network weights as the parameters being optimized
Sim-to-real transfer was tested by deploying trained controllers on hardware Robobo robots

Key Findings

CMA-ES can solve all three tasks given enough training time. However, under the project's time constraints, reinforcement learning approaches would likely converge faster. Increasing camera segmentation from 3 to 9 zones improved item detection accuracy but significantly increased training time.

Technologies

Python, CMA-ES, CoppeliaSim, ROS, Docker, Neural Networks