When does Taichi compile if it is called in Pytorch?

I am trying to put Taichi under the framework of Pytorch (in the forward function). The code can run correctly but it seems that Taichi compiles every time I run it in the torch forward function.
The next code is for a simple linear layer replaced by Taichi under the Pytorch framework.

import torch
from torch.autograd import gradcheck
import torch.nn.functional as F
from torch.autograd import gradcheck
import math
import torch.nn as nn
import taichi as ti

batch_size = 32
real = ti.f32
global_data = ti.Matrix(batch_size, 6, dt=real, shape=(), needs_grad=True)
global_weight = ti.Matrix(6, 1, dt=real, shape=(), needs_grad=True)
global_bias = ti.Matrix(batch_size, 1, dt=real, shape=(), needs_grad=True)
global_output = ti.Matrix(batch_size, 1, dt=real, shape=(), needs_grad=True)


@ti.kernel
def torch_kernel():
    global_output[None] = global_data[None] @ global_weight[None] + global_bias[None]


class LinearFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, weight, bias=None):
        ctx.save_for_backward(input, weight, bias)
        global_data.from_torch(input)
        global_weight.from_torch(weight)
        global_bias.from_torch(bias)
        torch_kernel()
        return global_output.to_torch()

    @staticmethod
    def backward(ctx, grad_output):
        ti.clear_all_gradients()
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None
        global_output.grad.from_torch(grad_output)
        torch_kernel.grad()

        if ctx.needs_input_grad[0]:
            grad_input = global_data.grad.to_torch()
        if ctx.needs_input_grad[1]:
            grad_weight = global_weight.grad.to_torch()
        if ctx.needs_input_grad[2]:
            grad_bias = global_bias.grad.to_torch()

        return grad_input, grad_weight, grad_bias


class Linear(nn.Module):
    def __init__(self, input_features, output_features):
        super(Linear, self).__init__()
        self.weight = nn.Parameter(torch.Tensor(input_features, output_features))
        self.bias = nn.Parameter(torch.Tensor(output_features))
        self.weight.data.normal_(0, math.sqrt(2. / output_features / input_features))

    def forward(self, input):
        bias = self.bias.unsqueeze(0).expand(batch_size, 1)
        return LinearFunction.apply(input, self.weight, bias)


data = torch.rand(batch_size, 6, dtype=torch.float32, requires_grad=True)
linear = Linear(6, 1)

test = gradcheck(linear, data, eps=1e-6, atol=1e-3)
print(test)

The part output is (the whole output is too long)

/Users/zhe/anaconda3/bin/python /Users/zhe/PycharmProjects/mpm_learning/linear_taichi.py
[Release mode]
[T 12/15/19 01:34:12.312] [logging.cpp:Logger@68] Taichi core started. Thread ID = 19620
[Taichi version 0.2.5, cpu only, commit 1129ba3d]
/Users/zhe/anaconda3/lib/python3.7/site-packages/torch/autograd/gradcheck.py:170: UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point. 
  'At least one of the inputs that requires gradient '
[I 12/15/19 01:34:12.354] [taichi_llvm_context.cpp:TaichiLLVMContext@59] Creating llvm context for arch: x86_64
Materializing layout...
[I 12/15/19 01:34:12.585] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 201.221 ms
[I 12/15/19 01:34:12.588] [struct_llvm.cpp:operator()@264] Allocating data structure of size 2096
Initializing runtime with 529 elements
Runtime initialized.
Compiling kernel numpy_to_tensor_0_...
[I 12/15/19 01:34:12.657] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.999 ms
Compiling kernel numpy_to_tensor_1_...
[I 12/15/19 01:34:12.713] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.1831 ms
Compiling kernel numpy_to_tensor_2_...
[I 12/15/19 01:34:12.758] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.7729 ms
Compiling kernel numpy_to_tensor_3_...
[I 12/15/19 01:34:12.807] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.7081 ms
Compiling kernel numpy_to_tensor_4_...
[I 12/15/19 01:34:12.856] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.4141 ms
Compiling kernel numpy_to_tensor_5_...
[I 12/15/19 01:34:12.898] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.5941 ms
Compiling kernel numpy_to_tensor_6_...
[I 12/15/19 01:34:12.939] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.3262 ms
Compiling kernel numpy_to_tensor_7_...
[I 12/15/19 01:34:12.979] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.5689 ms
Compiling kernel numpy_to_tensor_8_...
[I 12/15/19 01:34:13.021] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.9758 ms
Compiling kernel numpy_to_tensor_9_...
[I 12/15/19 01:34:13.074] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 21.96 ms
Compiling kernel numpy_to_tensor_10_...
[I 12/15/19 01:34:13.131] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 25.645 ms
Compiling kernel numpy_to_tensor_11_...
[I 12/15/19 01:34:13.206] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 38.4231 ms
Compiling kernel numpy_to_tensor_12_...
[I 12/15/19 01:34:13.311] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 38.877 ms
Compiling kernel numpy_to_tensor_13_...
[I 12/15/19 01:34:13.405] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 23.371 ms
Compiling kernel numpy_to_tensor_14_...
[I 12/15/19 01:34:13.456] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.944 ms
Compiling kernel numpy_to_tensor_15_...
[I 12/15/19 01:34:13.498] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.2138 ms
Compiling kernel numpy_to_tensor_16_...
[I 12/15/19 01:34:13.556] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 31.8511 ms
Compiling kernel numpy_to_tensor_17_...
[I 12/15/19 01:34:13.651] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 40.4429 ms
Compiling kernel numpy_to_tensor_18_...
[I 12/15/19 01:34:13.700] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.7759 ms
Compiling kernel numpy_to_tensor_19_...
[I 12/15/19 01:34:13.751] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.4091 ms
Compiling kernel numpy_to_tensor_20_...
[I 12/15/19 01:34:13.825] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 37.9131 ms
Compiling kernel numpy_to_tensor_21_...
[I 12/15/19 01:34:13.901] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.5151 ms
Compiling kernel numpy_to_tensor_22_...
[I 12/15/19 01:34:13.942] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.5431 ms
Compiling kernel numpy_to_tensor_23_...
[I 12/15/19 01:34:13.982] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.2842 ms
Compiling kernel numpy_to_tensor_24_...
[I 12/15/19 01:34:14.030] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.1351 ms
Compiling kernel numpy_to_tensor_25_...
[I 12/15/19 01:34:14.110] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 35.912 ms
Compiling kernel numpy_to_tensor_26_...
[I 12/15/19 01:34:14.187] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 24.992 ms
Compiling kernel numpy_to_tensor_27_...
[I 12/15/19 01:34:14.238] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.425 ms
Compiling kernel numpy_to_tensor_28_...
[I 12/15/19 01:34:14.332] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 47.4381 ms
Compiling kernel numpy_to_tensor_29_...
[I 12/15/19 01:34:14.396] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.0861 ms
Compiling kernel numpy_to_tensor_30_...
[I 12/15/19 01:34:14.439] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.324 ms
Compiling kernel numpy_to_tensor_31_...
[I 12/15/19 01:34:14.536] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 48.497 ms
Compiling kernel numpy_to_tensor_32_...
[I 12/15/19 01:34:14.604] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.2988 ms
Compiling kernel numpy_to_tensor_33_...
[I 12/15/19 01:34:14.649] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.4189 ms
Compiling kernel numpy_to_tensor_34_...
[I 12/15/19 01:34:14.692] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.978 ms
Compiling kernel numpy_to_tensor_35_...
[I 12/15/19 01:34:14.733] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.4122 ms
Compiling kernel numpy_to_tensor_36_...
[I 12/15/19 01:34:14.818] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 42.2211 ms
Compiling kernel numpy_to_tensor_37_...
[I 12/15/19 01:34:14.907] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.645 ms
Compiling kernel numpy_to_tensor_38_...
[I 12/15/19 01:34:14.959] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 25.8501 ms
Compiling kernel numpy_to_tensor_39_...
[I 12/15/19 01:34:15.028] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 25.6801 ms
Compiling kernel numpy_to_tensor_40_...
[I 12/15/19 01:34:15.075] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.789 ms
Compiling kernel numpy_to_tensor_41_...
[I 12/15/19 01:34:15.120] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.3061 ms
Compiling kernel numpy_to_tensor_42_...
[I 12/15/19 01:34:15.169] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.9859 ms
Compiling kernel numpy_to_tensor_43_...
[I 12/15/19 01:34:15.297] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 58.4121 ms
Compiling kernel numpy_to_tensor_44_...
[I 12/15/19 01:34:15.353] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.641 ms
Compiling kernel numpy_to_tensor_45_...
[I 12/15/19 01:34:15.413] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 31.076 ms
Compiling kernel numpy_to_tensor_46_...
[I 12/15/19 01:34:15.499] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 35.049 ms
Compiling kernel numpy_to_tensor_47_...
[I 12/15/19 01:34:15.542] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.4301 ms
Compiling kernel numpy_to_tensor_48_...
[I 12/15/19 01:34:15.588] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.028 ms
Compiling kernel numpy_to_tensor_49_...
[I 12/15/19 01:34:15.662] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 44.1701 ms
Compiling kernel numpy_to_tensor_50_...
[I 12/15/19 01:34:15.751] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 28.2102 ms
Compiling kernel numpy_to_tensor_51_...
[I 12/15/19 01:34:15.803] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.5591 ms
Compiling kernel numpy_to_tensor_52_...
[I 12/15/19 01:34:15.852] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.5929 ms
Compiling kernel numpy_to_tensor_53_...
[I 12/15/19 01:34:15.929] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 32.892 ms
Compiling kernel numpy_to_tensor_54_...
[I 12/15/19 01:34:16.028] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 32.217 ms
Compiling kernel numpy_to_tensor_55_...
[I 12/15/19 01:34:16.077] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.3299 ms
Compiling kernel numpy_to_tensor_56_...
[I 12/15/19 01:34:16.119] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.9301 ms
Compiling kernel numpy_to_tensor_57_...
[I 12/15/19 01:34:16.179] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 35.1121 ms
Compiling kernel numpy_to_tensor_58_...
[I 12/15/19 01:34:16.277] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 32.8052 ms
Compiling kernel numpy_to_tensor_59_...
[I 12/15/19 01:34:16.321] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.444 ms
Compiling kernel numpy_to_tensor_60_...
[I 12/15/19 01:34:16.361] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.0298 ms
Compiling kernel numpy_to_tensor_61_...
[I 12/15/19 01:34:16.432] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 43.339 ms
Compiling kernel numpy_to_tensor_62_...
[I 12/15/19 01:34:16.530] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 30.987 ms
Compiling kernel numpy_to_tensor_63_...
[I 12/15/19 01:34:16.584] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 21.2998 ms
Compiling kernel numpy_to_tensor_64_...
[I 12/15/19 01:34:16.627] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.6612 ms
Compiling kernel numpy_to_tensor_65_...
[I 12/15/19 01:34:16.692] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 35.2621 ms
Compiling kernel numpy_to_tensor_66_...
[I 12/15/19 01:34:16.796] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 25.2621 ms
Compiling kernel numpy_to_tensor_67_...
[I 12/15/19 01:34:16.844] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.1651 ms
Compiling kernel numpy_to_tensor_68_...
[I 12/15/19 01:34:16.891] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.976 ms
Compiling kernel numpy_to_tensor_69_...
[I 12/15/19 01:34:16.987] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 48.2569 ms
Compiling kernel numpy_to_tensor_70_...
[I 12/15/19 01:34:17.054] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.1801 ms
Compiling kernel numpy_to_tensor_71_...
[I 12/15/19 01:34:17.095] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.87 ms
Compiling kernel numpy_to_tensor_72_...
[I 12/15/19 01:34:17.138] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.62 ms
Compiling kernel numpy_to_tensor_73_...
[I 12/15/19 01:34:17.181] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.5889 ms
Compiling kernel numpy_to_tensor_74_...
[I 12/15/19 01:34:17.259] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 45.037 ms
Compiling kernel numpy_to_tensor_75_...
[I 12/15/19 01:34:17.368] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 41.225 ms
Compiling kernel numpy_to_tensor_76_...
[I 12/15/19 01:34:17.414] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.9079 ms
Compiling kernel numpy_to_tensor_77_...
[I 12/15/19 01:34:17.454] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.2499 ms
Compiling kernel numpy_to_tensor_78_...
[I 12/15/19 01:34:17.530] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 37.7719 ms
Compiling kernel numpy_to_tensor_79_...
[I 12/15/19 01:34:17.603] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.6308 ms
Compiling kernel numpy_to_tensor_80_...
[I 12/15/19 01:34:17.643] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.783 ms
Compiling kernel numpy_to_tensor_81_...
[I 12/15/19 01:34:17.684] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.226 ms
Compiling kernel numpy_to_tensor_82_...
[I 12/15/19 01:34:17.723] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 15.4581 ms
Compiling kernel numpy_to_tensor_83_...
[I 12/15/19 01:34:17.817] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 44.3521 ms
Compiling kernel numpy_to_tensor_84_...
[I 12/15/19 01:34:17.912] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 23.2489 ms
Compiling kernel numpy_to_tensor_85_...
[I 12/15/19 01:34:17.958] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 18.6651 ms
Compiling kernel numpy_to_tensor_86_...
[I 12/15/19 01:34:18.000] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.8049 ms
Compiling kernel numpy_to_tensor_87_...
[I 12/15/19 01:34:18.066] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 40.8189 ms
Compiling kernel numpy_to_tensor_88_...
[I 12/15/19 01:34:18.180] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.7799 ms
Compiling kernel numpy_to_tensor_89_...
[I 12/15/19 01:34:18.221] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.681 ms
Compiling kernel numpy_to_tensor_90_...
[I 12/15/19 01:34:18.261] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 15.728 ms
Compiling kernel numpy_to_tensor_91_...
[I 12/15/19 01:34:18.331] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 37.0431 ms
Compiling kernel numpy_to_tensor_92_...
[I 12/15/19 01:34:18.408] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.5319 ms
Compiling kernel numpy_to_tensor_93_...
[I 12/15/19 01:34:18.449] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.7451 ms
Compiling kernel numpy_to_tensor_94_...
[I 12/15/19 01:34:18.488] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 15.5342 ms
Compiling kernel numpy_to_tensor_95_...
[I 12/15/19 01:34:18.532] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.886 ms
Compiling kernel numpy_to_tensor_96_...
[I 12/15/19 01:34:18.628] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 45.1322 ms
Compiling kernel numpy_to_tensor_97_...
[I 12/15/19 01:34:18.705] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 21.013 ms
Compiling kernel numpy_to_tensor_98_...
[I 12/15/19 01:34:18.752] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.994 ms
Compiling kernel numpy_to_tensor_99_...
[I 12/15/19 01:34:18.839] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 30.5619 ms
Compiling kernel numpy_to_tensor_100_...
[I 12/15/19 01:34:18.881] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.5288 ms
Compiling kernel numpy_to_tensor_101_...
[I 12/15/19 01:34:18.921] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.3441 ms
Compiling kernel numpy_to_tensor_102_...
[I 12/15/19 01:34:18.961] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.6841 ms
Compiling kernel numpy_to_tensor_103_...
[I 12/15/19 01:34:19.067] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 49.566 ms
Compiling kernel numpy_to_tensor_104_...
[I 12/15/19 01:34:19.126] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.3219 ms
Compiling kernel numpy_to_tensor_105_...
[I 12/15/19 01:34:19.166] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.4912 ms
Compiling kernel numpy_to_tensor_106_...
[I 12/15/19 01:34:19.243] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 38.3141 ms
Compiling kernel numpy_to_tensor_107_...
[I 12/15/19 01:34:19.304] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 19.9041 ms
Compiling kernel numpy_to_tensor_108_...
[I 12/15/19 01:34:19.345] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.096 ms
Compiling kernel numpy_to_tensor_109_...
[I 12/15/19 01:34:19.400] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 29.7561 ms
Compiling kernel numpy_to_tensor_110_...
[I 12/15/19 01:34:19.500] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 29.3882 ms
Compiling kernel numpy_to_tensor_111_...
[I 12/15/19 01:34:19.542] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.494 ms
Compiling kernel numpy_to_tensor_112_...
[I 12/15/19 01:34:19.587] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 20.1821 ms
Compiling kernel numpy_to_tensor_113_...
[I 12/15/19 01:34:19.666] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 41.9819 ms
Compiling kernel numpy_to_tensor_114_...
[I 12/15/19 01:34:19.756] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 22.8288 ms
Compiling kernel numpy_to_tensor_115_...
[I 12/15/19 01:34:19.798] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.6779 ms
Compiling kernel numpy_to_tensor_116_...
[I 12/15/19 01:34:19.839] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.4251 ms
Compiling kernel numpy_to_tensor_117_...
[I 12/15/19 01:34:19.879] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 17.9222 ms
Compiling kernel numpy_to_tensor_118_...
[I 12/15/19 01:34:19.971] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 39.1479 ms
Compiling kernel numpy_to_tensor_119_...
[I 12/15/19 01:34:20.023] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.8641 ms
Compiling kernel numpy_to_tensor_120_...
[I 12/15/19 01:34:20.063] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.197 ms
Compiling kernel numpy_to_tensor_121_...
[I 12/15/19 01:34:20.114] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 27.7119 ms
Compiling kernel numpy_to_tensor_122_...
[I 12/15/19 01:34:20.207] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 37.6899 ms
Compiling kernel numpy_to_tensor_123_...
[I 12/15/19 01:34:20.250] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.4678 ms
Compiling kernel numpy_to_tensor_124_...
[I 12/15/19 01:34:20.290] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 15.9659 ms
Compiling kernel numpy_to_tensor_125_...
[I 12/15/19 01:34:20.340] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 27.4131 ms
Compiling kernel numpy_to_tensor_126_...
[I 12/15/19 01:34:20.431] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 42.695 ms
Compiling kernel numpy_to_tensor_127_...
[I 12/15/19 01:34:20.474] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.562 ms
Compiling kernel numpy_to_tensor_128_...
[I 12/15/19 01:34:20.515] [codegen_llvm_x86.cpp:global_optimize_module_x86_64@93] Global optimization time: 16.273 ms

In addition, I would like to ask if the data transport procedure needs copy?

Let me take some time to explain what happened. Here’s a suggested implementation:

import torch
from torch.autograd import gradcheck
import torch.nn.functional as F
from torch.autograd import gradcheck
import math
import torch.nn as nn
import taichi as ti

batch_size = 32
real = ti.f32
global_data = ti.Vector(6, dt=real, shape=batch_size, needs_grad=True)
global_weight = ti.Vector(6, dt=real, shape=(), needs_grad=True)

global_bias = ti.Vector(1, dt=real, shape=batch_size, needs_grad=True)
global_output = ti.Vector(1, dt=real, shape=batch_size, needs_grad=True)


@ti.kernel
def torch_kernel():
  for i in range(batch_size):
    global_output[i] = ti.Matrix([ti.dot(global_data[i], global_weight[None])]) + global_bias[i]


class LinearFunction(torch.autograd.Function):
  @staticmethod
  def forward(ctx, input, weight, bias=None):
    ctx.save_for_backward(input, weight, bias)
    global_data.from_torch(input)
    global_weight.from_torch(weight)
    global_bias.from_torch(bias)
    torch_kernel()
    return global_output.to_torch()
  
  @staticmethod
  def backward(ctx, grad_output):
    ti.clear_all_gradients()
    input, weight, bias = ctx.saved_tensors
    grad_input = grad_weight = grad_bias = None
    global_output.grad.from_torch(grad_output)
    torch_kernel.grad()
    
    if ctx.needs_input_grad[0]:
      grad_input = global_data.grad.to_torch(as_vector=True)
    if ctx.needs_input_grad[1]:
      grad_weight = global_weight.grad.to_torch()
    if ctx.needs_input_grad[2]:
      grad_bias = global_bias.grad.to_torch(as_vector=True)
    
    return grad_input, grad_weight, grad_bias


class Linear(nn.Module):
  def __init__(self, input_features, output_features):
    super(Linear, self).__init__()
    self.weight = nn.Parameter(torch.Tensor(input_features, output_features))
    self.bias = nn.Parameter(torch.Tensor(output_features))
    self.weight.data.normal_(0, math.sqrt(2. / output_features / input_features))
  
  def forward(self, input):
    bias = self.bias.unsqueeze(0).expand(batch_size, 1)
    return LinearFunction.apply(input, self.weight, bias)


data = torch.rand(batch_size, 6, dtype=torch.float32, requires_grad=True)
linear = Linear(6, 1)

test = gradcheck(linear, data, eps=1e-3, atol=1e-4)
print(test)
  • ti.Matrix(batch_size, 6) is too big. I moved batch_size to tensor dimensions. Regarding matrix sizes: https://taichi.readthedocs.io/en/latest/tensor_matrix.html#matrix-size
  • torch.gradcheck uses finite differences. With torch.float32 precision, 1e-6 is too small for finite difference to work. I used 1e-3 instead.
  • The compilation will only happen once. If you rerun gradient check in the script, then no extra compilation will happen.

So Compiling kernel numpy_to_tensor_i_... is due to the large Matrix size?

Yeah, for the 6x32 matrix, Taichi will generate 6x32=192 kernels…

I can optimize this (fuse 192 into 1) in a future release, but in general it is a good idea to use smaller matrices…

This question can be found Which dimensions of Matrix are multiplied?

The short answer is, every kernel will be compiled the first time it is invoked. I’ll do a more detailed documentation on this tomorrow.

A brief intro to Taichi compilation: https://taichi.readthedocs.io/en/latest/compilation.html