I'm not 100% clear about pytorch syntax. Should the two following ways to compute the gradients df/dtheta be equivalent? Why are they not? :) I'm not entirely sure what loss.backward(backward_ones) does. Is this df/d1 ?
loss.mean().backward(retain_variables=True)
print(reg_funcs.params.grad.data)
reg_funcs.params.data.zero_()
loss.backward(backward_ones)
print(reg_funcs.params.grad.data)
I'm not 100% clear about pytorch syntax. Should the two following ways to compute the gradients df/dtheta be equivalent? Why are they not? :) I'm not entirely sure what
loss.backward(backward_ones)does. Is this df/d1 ?