Improving Bipedal Robot Motion via Reinforcement Learning and Tailored Rewards

Document Type

Article

Publication Date

Winter 10-29-2024

Abstract

This study proposes a novel reward function for deep deterministic policy gradient (DDPG) based bipedal robot walking control. The reward incorporates target reaching and a new term promoting stability and natural gaits via body orientation angles. This design encourages desired behaviors while adapting to diverse robot morphologies. Additionally, action-space noise (Ornstein-Uhlenbeck process) and parameter-space noise (Gaussian noise on stiffness, damping, friction) are introduced to enhance DDPG’s exploration efficiency and achieve better policy learning. This combined noise strategy facilitates exploration of diverse terrains and promotes adaptive behavior. The reward function is analyzed for its impact on gait patterns and leg loading, investigating its influence on mimicking human walking and load distribution. Simulations demonstrate the robot’s learning capability, achieving coordinated gait, balance, and successful termination. Torque analysis across leg joints and movement axes is conducted. The proposed approach, combining modified rewards and action/parameter space noise, offers a promising solution to mitigate local minima issues in DDPG. The MATLAB®/Simulink Reinforcement Learning toolbox is employed.

Share

COinS