sweeps and train on hip/rocm/amdgpu, for 5.0#573
Conversation
|
I've apparently messed up a rebase when moving this from 4.0 to 5.0 branch, so there have been some wrong changes [which has not affected my test sweep/train on breakout]. Sorry for that. About the stuff related to sweeps in this PR - there has been a problem with sweep_obj [used for early stopping] being passed between processes when protein sweep is on amd gpu - torch has special handling for this on cuda, but not rocm] - this PR changes how info related to early stopping is passed around. It has decent performance on a consumer amd gpu [RX 6850M XT], but I have not tested it on anything else. |
6b59ae4 to
d6272f2
Compare
|
I've cleaned it up as much as I've been able to. Tested with a long sweep on an amd gpu. |
|
Tested with nvidia gpu too [thanks to ViolentGlizzy ]. - performance on my amd gpu is 2-3 times worse than on a supposedly equivalent nvidia gpu, but not worse than pufferlib 3.0 performance on amd gpu. |

Alternative to #562 for the 5.0 branch,
tested only a short breakout sweep and eval on an amd gpu.made with a lot of help from codex gpt-5.5.