I turned a thermodynamics principle into a learning algorithm – and it lands a moonlander

Github project and demo videos (please use a web browser if possible, as Github Mobile app does not properly render some videos)

What my project does

Physics ensures that particles usually settle in low-energy states; electrons stay near an atom’s nucleus, and air molecules don’t just fly off into space. I’ve applied an analogy of this principle to a completely different problem: teaching a neural network to safely land a lunar lander.

I did this by assigning low « energy » to good landing attempts (e.g. no crash, low fuel use) and high « energy » to poor ones. Then, using standard neural network training techniques, I enforced equations derived from thermodynamics. As a result, the lander learns to land successfully with a high probability.

Target audience

This is primarily a fun project for anyone interested in physics, AI, or Reinforcement Learning (RL) in general.

Comparison to Existing Alternatives

While most of the algorithm variants I tested aren’t competitive with the current industry standard, one approach does look promising. When the derived equations are written as a regularization term, the algorithm exhibits superior stability properties compared to popular methods like Entropy Bonus.

Given that stability is a major challenge in the heavily regularized RL used to train today’s LLMs, I guess it makes sense to investigate further.

submitted by /u/kongaskristjan to r/Python
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *