Hi I am running some old code but it’s failing with a new error with 0.5.5: RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient. The tolerance for nondeterminism was 0.0.
in test = gradcheck(solver, middle_velocity, eps=1e-8, atol=1e-3)
I have read this post and I have real = ti.float64 ti.init(arch=ti.cuda, default_fp=real, debug=False) at the beginning of the file
By commenting out every line I can find this kernel may cause the problem, not those IO lines
@ti.classkernel
def substep_kernel(self, t: ti.i32):
for i, j in ti.ndrange((1, self.grid_size + 1), (1, self.grid_size + 1)):
self.ti_pressure[t, i, j] = (self.ti_pressure[t - 1, i - 1, j] +
self.ti_pressure[t - 1, i, j - 1] +
self.ti_pressure[t - 1, i + 1, j] +
self.ti_pressure[t - 1, i, j + 1] -
self.ti_velocity_divergence[i-1, j-1] * self.dx**2
) / 4
One possibility is that your ti_pressure access goes out of bound - What’s the size of your ti_pressure tensor? I has to be at least self.grid.size + 1 in this case to avoid access out-of-bound.
Maybe you meant ti.ndrange((1, self.grid_size - 1), )...?
Well I have changed some parameters and this is confusing… This will work when grid_size=3 or grid_size=4 but gives the error when 5 or more lol
ti_pressure is a larger tensor (grid_size+2, grid_size+2), since all the pressure are in the grid cells instead of on faces, the outside padding is all zero
I’m going to check carefully on the commit history later to see if I can figure this out… and could you have a look at this if you have time https://codeshare.io/2Wq4ry
thx for your time!
Hi @143, I feel like it’s a numerical issue. On my end it actually works fine with grid_size = 5, iter_steps = 1000 but fails when grid_size = 10, iter_steps = 1000. I feel like the number of iter_step is too big for gradient evaluation to be stable… I changed iter_steps to 20 and it works with grid_size = 20. Do you really need 1000 jacobi iteration steps?
Wow that would be great
I started with 200, and kept adding iteration steps When last time I ran this piece I believe it’s working well with large iterations lol
Thanks!