PPO algorithms on Grid2OP

Why

Power grid operation is an increasingly complex task due to the unpredictability of sustainable energy sources (wind, solar, waves...). Reinforcement learning algorithms can be beneficial in supporting operators and avoid blackouts.

What

I have developed several variations of a PPO (proximal policy optimization) algorithm on the Grid2OP platform. I re-ran the benchmarks provided by the L2RPN (learning to run a power network) challenge and extended them with different reward functions, graph-neural network based embeddings, and a maskable action mechanism.

How

I have implemented the algorithms with PyTorch, Stable-Baselines, PyTorch-Geometric, and the Grid2OP environment utilities. The training runs have been monitored with Tensorboard. Finally, I have prepared the results data with Polars and plotted the results with Plotly.