Backward is not reentrant

Hi I am running some old code but it’s failing with a new error with 0.5.5:
RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient. The tolerance for nondeterminism was 0.0.
in
test = gradcheck(solver, middle_velocity, eps=1e-8, atol=1e-3)
I have read this post and I have
real = ti.float64 ti.init(arch=ti.cuda, default_fp=real, debug=False) at the beginning of the file

By commenting out every line I can find this kernel may cause the problem, not those IO lines

    @ti.classkernel 
    def substep_kernel(self, t: ti.i32):
        for i, j in ti.ndrange((1, self.grid_size + 1), (1, self.grid_size + 1)):
            self.ti_pressure[t, i, j] = (self.ti_pressure[t - 1, i - 1, j] +
                                         self.ti_pressure[t - 1, i, j - 1] +
                                         self.ti_pressure[t - 1, i + 1, j] +
                                         self.ti_pressure[t - 1, i, j + 1] -
                                         self.ti_velocity_divergence[i-1, j-1] * self.dx**2
                                         ) / 4

Thank you!

1 个赞

One possibility is that your ti_pressure access goes out of bound - What’s the size of your ti_pressure tensor? I has to be at least self.grid.size + 1 in this case to avoid access out-of-bound.

Maybe you meant ti.ndrange((1, self.grid_size - 1), )...?

Btw @ti.classkernel is now deprecated. Please simply use @ti.kernel, both inside/outside the class.

Well I have changed some parameters and this is confusing… This will work when grid_size=3 or grid_size=4 but gives the error when 5 or more lol

ti_pressure is a larger tensor (grid_size+2, grid_size+2), since all the pressure are in the grid cells instead of on faces, the outside padding is all zero

I’m going to check carefully on the commit history later to see if I can figure this out… and could you have a look at this if you have time
https://codeshare.io/2Wq4ry
thx for your time!

Thanks for sharing the code. I’ll take a close look tomorrow.

Hi @143, I feel like it’s a numerical issue. On my end it actually works fine with grid_size = 5, iter_steps = 1000 but fails when grid_size = 10, iter_steps = 1000. I feel like the number of iter_step is too big for gradient evaluation to be stable… I changed iter_steps to 20 and it works with grid_size = 20. Do you really need 1000 jacobi iteration steps?

Btw classfunc is deprecated. Since v0.5.7 you can use func to decorate all the functions, regardless if it’s a class function or not.

Wow that would be great
I started with 200, and kept adding iteration steps :frowning: When last time I ran this piece I believe it’s working well with large iterations lol
Thanks!

1 个赞