相同的代码在GPU上跑就报错?

非常奇怪
直接看代码

import time

import taichi as ti

ti.init(arch=ti.gpu, default_fp=ti.f64)


@ti.kernel
def calc_pi() -> ti.f64:
    sum = 0.0
    for i in range(66666666):
        n = 2 * i + 1
        sum += pow(-1.0, i) / n
    return sum * 4


t0 = time.time()
print('PI =', calc_pi())
t1 = time.time()
print(f'{t1 - t0:.3} sec')

运行直接报错
[Taichi] version 1.1.3, llvm 10.0.0, commit 1262a70a, win, python 3.10.7
[Taichi] Starting on arch=cuda
C:\Users\001\AppData\Local\Programs\Python\Python310\lib\site-packages\taichi\lang\ast\ast_transformer.py:38: Warning: Casting range_for boundary values from i64 to
i32, which may cause numerical issues
warnings.warn(
[E 09/24/22 10:58:15.924 11848] [D:/a/taichi/taichi/taichi/runtime/llvm/llvm_context.cpp:operator()@79] LLVM Fatal Error: Cannot select: t11: f64,ch = AtomicLoadFAdd<(load store seq_cst 8 on %ir.28)> t8:1, t8, t10
t8: i64,ch = load<(load 8 from %ir.27, addrspace 1)> t0, t5, undef:i64
t5: i64 = add nuw t3, Constant:i64<32856>
t3: i64 = addrspacecast[0 → 1] t2
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t1: i64 = Register %0
t4: i64 = Constant<32856>
t7: i64 = undef
t10: f64,ch = CopyFromReg t0, Register:f64 %16
t9: f64 = Register %16
In function: calc_pi_c76_0_kernel_1_range_for

Assertion failed: !UpdateListeners && “Dangling registered DAGUpdateListeners”, file C:\repos\llvm-10.0.0\lib\CodeGen\SelectionDAG\SelectionDAG.cpp, line 1039

然而只是将 gpu 改为cpu 就能正常运行

import time

import taichi as ti

ti.init(arch=ti.cpu, default_fp=ti.f64)


@ti.kernel
def calc_pi() -> ti.f64:
    sum = 0.0
    for i in range(66666666):
        n = 2 * i + 1
        sum += pow(-1.0, i) / n
    return sum * 4


t0 = time.time()
print('PI =', calc_pi())
t1 = time.time()
print(f'{t1 - t0:.3} sec')

[Taichi] version 1.1.3, llvm 10.0.0, commit 1262a70a, win, python 3.10.7
[Taichi] Starting on arch=x64
PI = 3.1415926385898443
1.9 sec

1 个赞

破案了
我的gtx960 的cuda不支持f64的原子操作
我用其他显卡试了,可以正常运行

2 个赞

自己找到问题了好棒!

:grinning: