# 写了一个基于taichi的双调排序算法~

3 Likes

Cool! 关于编译速度：`ti.static` 会 unroll for loop，并不适合循环数特别多的情况。去掉以后就能秒编译了：

``````import taichi as ti
import numpy as np
n2 = 8
n = 2**n2
print('n=', n)
ti.init(ti.opengl)

num = ti.field(ti.i32, n)
value = ti.field(ti.f32, n)
result = ti.field(ti.f32, n)
step = ti.field(ti.i32, 2)  #包含内步长与外步长

step[0] = 2  #外步长
step[1] = 2  #内步长

@ti.kernel
def init():
for i in range(n):
num[i] = i

@ti.func
def xor(a, b):
return (a + b) & 1

@ti.kernel
def getResult():
for i in range(n):
result[i] = value[num[i]]

@ti.kernel
def step1():
for i in range(n // 2):
halfstep = step[1] // 2
i1 = i + (i // (halfstep)) * halfstep
i2 = i1 + halfstep
num1, num2 = num[i1], num[i2]
a, b = value[num1], value[num2]
updown = (i * 2 // step[0]) & 1  #up,down方向
if xor(a > b, updown):
num[i1], num[i2] = num2, num1

def printresult():
getResult()
print(num)
print(value)
print(result)

init()
value.from_numpy(np.random.rand(n))
print('initial finish')
for i in range(1, n2 + 1):
step[0] = 2**i
for j in range(i):
step[1] = 2**(i - j)
step1()
print(step[0], step[1])
for i in range(n):
print(value[num[i]])
printresult()
``````
3 Likes

changeable版本测试结果如下，显卡型号gt740，cpu型号i7 7820x，可以看到taichi版本双调排序略快于numpy的quicksort，期待如果有人有不同的配置的话，也发一下测试结果，看一下如果更好的显卡是否能有更高的加速效果。
[Taichi] mode=release
[Taichi] version 0.7.14, llvm 10.0.0, commit 58feee37, win, python 3.7.6
n= 33554432
[Taichi] Starting on arch=opengl
[Taichi] materializing…
initial finish
steps: 2 2
min: 4.423782229423523e-09
ti bisort time: 2.7536299228668213 s
ti bisort result: [4.4237822e-09 1.4668331e-07 2.2770837e-07 … 9.9999994e-01 1.0000000e+00
1.0000000e+00]
np result [4.4237822e-09 1.4668331e-07 2.2770837e-07 … 9.9999994e-01 1.0000000e+00
1.0000000e+00]
np quicksort time: 3.3600070476531982 s
enter to continue