Introduction to Quadruped Robotics

Untitled

Quadruped robots are legged robots that have four legs and are known for their mobility and stability of locomotion. They have been developed and studied extensively in recent years, with various design and development approaches being explored. Quadruped robots have the potential to navigate complex terrains and perform tasks that are difficult for wheeled or bipedal robots.

The dynamics of a quadruped robot are second order in nature and can be modelled to be affine in the control space, then we can model the robot using the following relation

$$ \ddot{q} = f_{1}(q, \dot{q}, t) + f_{2}(q, \dot{q}, t)u $$

Where $q$ represents the state of the robot (position, orientation in space and the motor angles), $\dot{q}$ and $\ddot{q}$ are the higher order derivates of $q$ , $u$ the controls(input angles to the motor) and $t$ the time

Almost always we have $rank[f_{2}(q, \dot{q}, t)] < dim[q]$ for quadruped robots and such systems are called underactuated systems, otherwise they are called fully actuated systems. An intuitive way to think about this is that when $rank[f_{2}(q, \dot{q}, t)] >= dim[q]$ there always exist a u to reach a desired $\ddot{q}$ i.e the resulting function $f$ is surjective and that’s why it’s called fully actuated

Control of a highly dynamic Quadruped robot is a very challenging problem due to the under actuation of the body during many locomotion gaits and due to constraints placed on ground reaction forces to keep the contact angle inside the friction cone to avoid slipping

Introduction to model based reinforcement learning(MBRL)

What is RL?

In a nutshell, RL is the study of agents and how they learn by trial and error. It formalizes the idea that rewarding or punishing an agent for its behavior makes it more likely to repeat or forego that behavior in the future.

The agent interacts the environment by taking some action $a_{t}$ at time $t$ which puts the agent in a new state $s_{t+1}$ of the environment and receives some reward $r_{t + 1}$ for doing so, the goal or the sole purpose of the agent is to take actions such that they maximize the cumulative reward it receives over time, policies are rules or stochastic functions that tell the agent which action to take at a given state in time $a_{t} \sim \pi(.|s_{t})$

The central problem of RL is to learn an optimal policy $\pi*$ that maximizes the expected cumulative reward the agent receives over time

$$ \pi* = \argmax\limits_{\pi} \underset{[a_{0}, a_{1},..a_{n}] \sim \pi}{E} [R([a_{0}, a_{1},..a_{n}])] \\ R([a_{0}, a_{1},..a_{n}]) = \sum\limits_{t=0}^{\infin} \gamma^{t}r(a_{t}) $$

Model based reinforcement learning

In MBRL the agent has a model of the environment (or learns the model with experience). By a model of the environment, we mean a function which predicts state transitions and rewards. This has the advantage that it allows the agent to plan by thinking ahead, seeing what would happen for a range of possible choices, and explicitly deciding between its options.

The disadvantage is that almost always the model is not readily available to the agent, and learning the model from experience only is very hard and a lot of time and compute is required to learn it. The biggest challenge is that bias in the model can be exploited by the agent, resulting in an agent which performs well with respect to the learned model, but behaves sub-optimally (or super terribly) in the real environment.

Untitled

The famous AlphaGo which defeated Lee Sedol in a 4-1 game of Go is a model based agent

Learning the Model