PDL for parametric AC-OPF: paper-faithful implementation with analytical dual#6
Conversation
…efBus, ρ_eq_scale Key changes to src/L2OALM.jl: - Paper-faithful augmented Lagrangian loss (correct g,h from BNK.constraints!) - Clamp inequality multipliers to μ ≥ 0 (KKT correctness) - Analytical primal loss: apply ALM dual update per gradient step (λ_eff = clamp(λ̂ + ρ·h, −M, M)); eliminates dual tracking gap - ρ_eq_scale: separate penalty multiplier for equality constraints - use_penalty_only kwarg for penalty-only (no dual) ablation Key changes to test/power.jl and examples/case57_train.jl: - BoundedOutput head: sigmoid variable-bound enforcement (max_bound_viol ≡ 0) - FixRefBus layer: reference-bus angle fixed to zero architecturally; GPU-safe - Reduced-space AC-OPF: eliminate branch-flow variables - Warmup at ρ_max, lr decay, env-var hyperparameter overrides New: examples/case57_train_twophase.jl — two-phase training (Phase 1 fixed ρ warm-start to skip degenerate basin; Phase 2 growing ρ with ρ_eq_scale). Best result: max_eq = 1.165 p.u. on 5000 held-out case57 test samples, max_ineq = max_bound = 0 (analytical dual, ρ_max=1e4, MAX_DUAL=1e6).
Both flags are now ALMMethod fields (defaults: false, true) instead of train!/single_train_step! kwargs, keeping all method config in one place. - use_analytical_dual: apply ALM dual update analytically per gradient step - use_dual_learning: set false to keep the dual network frozen throughout (skips dual training loop and the per-outer-iter deepcopy of dual state) Remove use_penalty_only from single_train_step! (warmup is handled directly in train! and was never exposed as a kwarg callers needed to set).
|
Amazing!! Let us know when you are ready for a review |
|
Hi @andrewrosemberg, I wanted to mention that this is my personal attempt to reproduce the paper's approach. I tried to fix the branch's implementation and make it work on the 57-bus case. The results are reported above. To be honest, I was hoping for even better results, but it may be a limitation of the approach. Feel free to review and even improve or tune it further. Many thanks! |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Updated the CI workflow to include version 1.12 and removed older versions.
| Vc_eq = abs.(h) | ||
| Vc = vcat(Vc_ineq, Vc_eq) | ||
|
|
||
| lagrangian_term = (sum(μ .* g) + sum(λ .* h)) / num_s |
There was a problem hiding this comment.
@ivanightingale , I believe this was one of the issues we discussed correct?
The fact that we are not correcting the sign after multiplying dual by constraint value.
In my implementation, I just decided to use the violations instead of the function values, but I see the issue with that too
There was a problem hiding this comment.
The current PR is faithful to the PDL paper, but like we discussed the paper didn't get ALM exactly right.
Here is the proper augmented Lagrangian (primal loss) from
Rockafellar, R. Tyrrell. "Augmented Lagrange multiplier functions and duality in nonconvex programming." SIAM Journal on Control 12.2 (1974): 268-285.
and a derivation:
More modern implementations like ALGENCAN usually use an equivalent primal objective. So strictly speaking, the "augmented Lagrangian" is not really just "the Lagrangian plus a penalty term".
There was a problem hiding this comment.
In fact I have implemented ALGENCAN, LANCELOT, and a version of ALM that uses the augmented Lagrangian from the PDL paper, all under a same framework. The first two converge, but the last one doesn't. I'm talking about optimization algorithms here.
So from a "motivated by the optimization algorithm" perspective, one would want to try swapping in a proper augmented Lagrangian as the primal net loss in PDL. I have tried doing this, but it didn't change the convergence of training too much.
There was a problem hiding this comment.
Thanks, this is insightful.
In case you guys have the new versions of ALM. Could you also benchmark them to see if they can improve my results?
There was a problem hiding this comment.
could be nice to have a way to allow users to choose between these different options! Any ideas on how we design the API @xkhainguyen @ivanightingale ? after we implement, I can run all on comparable hardware and cases to benchmark.
My implementation worked well up to case 6K rte but took a while to converge
There was a problem hiding this comment.
I guess we can just split cases / approaches within function single_train_step!
and then choose options here
method = ALMMethod(;
batch_model = bm,
num_equal = num_equal,
use_analytical_dual = true, # recommended; false = use dual net output directly
use_dual_learning = true, # false = freeze dual network (ablation only)
ρmax = 1e4, max_dual = 1e6, τ = 0.8, α = 2.0,
)
train!(method, trainer, data; K=100, L_primal=2500, L_dual=2500, warmup_epochs=25000)
Feel free to add your approaches there
| mean_loss = mean([s.total_loss for s in batch_states]) | ||
| return (; | ||
| max_violation = max_violation, | ||
| max_violation = mean_violations, # proxy for ρ criterion: bump only when mean stalls |
There was a problem hiding this comment.
The paper uses the max-of-max, but maybe mean-of-max is better.
There was a problem hiding this comment.
Oh, I didn’t notice. I’ve always felt that the field is missing a standardized set of metrics that every paper should report.
@andrewrosemberg @ivanightingale I'd be happy to contribute to a community effort around this.
There was a problem hiding this comment.
We started this organization to do just that. We can explore here and then move the evaluation metrics to a specific repo. We were creating one repo for each big method (still missing a few like deepmind cosmos ) and we were going to train and benchmark on a public dataset https://huggingface.co/datasets/LearningToOptimize/Parametric-Optimization-Problems)
@xkhainguyen happy for us all to have a call and plan :)
Summary
Implementation of Primal-Dual Learning (PDL) (Park & Van Hentenryck, AAAI 2023) for parametric AC-OPF, with several fixes and extensions over a naive baseline.
Fixes to the base implementation
BNK.constraints!)BoundedOutputsigmoid headmax_bound_viol ≡ 0by construction; no dead gradient zone (vs hardsigmoid)FixRefBuslayerAnalytical dual correction
The dual network lags the true multipliers by orders of magnitude at high ρ. Instead of using
λ̂(θ)directly, we apply one ALM update analytically at every primal gradient step:Gradient flows through g(ŷ) and h(ŷ) — not through the dual net — giving the correct ρg² and ρh² penalty terms without tracking lag. The dual network still trains in parallel, providing an improving warm-start base over time.
API
Both options live in
ALMMethod(defaults shown):Results on case57 (5000 held-out test samples)
use_analytical_dual=true, ρmax=1e4, max_dual=1e6use_analytical_dual=falseVariable bounds and inequalities are satisfied exactly in all runs.
Test plan
julia --project=. -e 'using Pkg; Pkg.test()'passes on CPUBNK_TEST_CUDA=1 julia --project=. -e 'using Pkg; Pkg.test()'passes on GPUjulia --project=. examples/case57_train.jlruns to completion