{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482
Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120 |
|
New job on instance
|
|
The build succeeded, but it fails in the CUDA sanity check: I guess it may be related to the |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
@casparvl The icelake cc80 build with the Surf bot failed because of: Have you encountered this before? |
Maybe we just need to add |
I tried various things with an interactive job on Snellius, but edit: the zen4 job also ran out of memory according to Slurm, but somehow kept running and then timed out after a day. @casparvl do you have any idea what's going on? |
|
The only thing I can think of: these nodes don't have local disks, so |
I've seen this happen before. If you have, say, 3 processes running, OOM killer might kill one, leave 2 stray processes that just wait for the other one to do something. And that then runs indefinitely. SLURM doesn't end the job, since you still have running processes. |
|
I've done an interactive build on an A100 node on Snellius with my personal account and on top of EESSI (without a container), that worked fine: No memory issues, and the max memory usage was like ~4GB. I'll do another one with the container. |
|
Let me just try this again as well: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
Hmmm, something is wrong. We changed some things in the bot config in our config management system, but for some reason it concludes it shouldn't submit a job based on the above commands. Will dig into why... |
|
@bedroge I don't know if you are looking at adding the |
Requires: