关于稀疏空间结构的访问问题

mustang · 2022 年2 月 26 日 15:43

在github上提了个issue, 不过大概率是凉了. 也没人发表看法. 这里看看有没有大佬有什么看法. 不过这里的排版不怎么样, 贴个git的地址吧. https://github.com/taichi-dev/taichi/issues/4381

although the data of Sparse Spatial Data Struct has been sparse, but it still is dense for different purpose , if we want to find some the couples of data that satisfy some condition. Or we want to process a part of datas of field for some specific purpose . below is current code.

tp = ti.types.struct(...)
f = tp.field(shape=(M,N))

@ti.kernel
def kernel():
    for i,j in f:
        for ri, rj in f:
            if compare(f[i,j], f[ri,rj]):
                #do some thing 
                ....

it is simple but it is a O(n^2) algorithm, and it must create n thread, n is size of field. and it has the half of compute being redundant at least. if only a couple of data satisfy condition, then 99% of compute is redundant. And if we find a triple data that satisfy some condition, this code will is below:

...
    for i,j in f:
        for i1, j1 in f:
            for i2, j2 in f:
                if compare(f[i,j], f[i1,j1], f[i2, j2]):
                    #do some thing 
                    ....
...

it is O(n^3) and the 5/6 compute is redundant at least. if we find m data, it only has 1/m! of compute is valid at most .

Can we do better ? yes, it is a universal problem that comparison each other among n datas. so we can abstract it to a funtion, we name it as ti.involvere. then we give a condition needing satisfied to finish it. then we will get a mask that is special optimized for access parted data. below is new code:

tp = ti.types.struct(...)
f = tp.field(shape=(M,N))
mask = f.mask()

@ti.func 
def discriminant(a, b): 
    if ... 
        #do some thing
        return a # or b or a,b or None

@ti.kernel
def kernel():
    for i,j in f:
        ti.involvere(discriminant, mask)
    for i,j in mask:
        ...

the number of parameter of discriminant(a,b) rely on the number of referenced data. it is inconsequential that how to process a and b. our focus is what will be returned. obviously, it have four kinds return-value while 2 parameter.

now , we need to compile python to c++ to implement ti.involvere. unfortunately, i am not good at c++, so i write pseudo here.

#the number of parameter rely on python code.
def discriminant(any_type a, any_type  b) -> int:   
    if ...
        ...  
        return 0 
        #the return value is only four kinds while 2 parameter.  
        #we use b0000,b0001,b0010,b0011 to stand for None, a,b,ab


class Involvere:
    def __init__(discriminant, mask):
        self.mask = mask
        self.discriminant= discriminant


    #get index of next WAIT_FOR_COMPARISON from mask, if arrive the end+1 of mask, return None.
    def next():
        self.next_pointer = self.mask.find(self.next_pointer) 
        return self.next_pointer     


    #here define how to select data for discriminant. 
    def volvere(int index_0):
        self.next_pointer = index_0      # a anchor that indicating which data has been processed

        index_1 = self.next()
       #index_2 = self.next()  if is not 2 data to participate in comparison,
       #index_3 ...

        while self.next_pointer:   
            index_of_obsolete = self.discriminant(self.mask.get_field_data(index_0),
                                                  self.mask.get_field_data(index_1)) 
                                                 #self.mask.get_field_data(index_2)
                                                 #self.mask. ...)

            switch index_of_obsolete:
                case 0:             # no data is obloleted.
                    #if discriminant has 3 parameter, here is index_2,  4 parameter to index_3 ...
                    index_1 = self.next()  

                case index_of_obsolete & b0010:
                    self.mask[index_1] = OBSOLETE   # define OBSOLETE 0
                    index_1 = self.next()

                #case index_of_obsolete & b0100:   # discriminant has 3 parameter
                #   self.mask[index_2] = OBSOLETE  
                #   index_2 = self.next()

                # ...

                case index_of_obsolete & b0001:  # if the first data is obsoleted, return
                     self.mask[index_0] = OBSOLETE  
                     return 

def kernel():
...
    involvere = Involvere(discriminant, mask) 

    #i don't know taichi how to implement parallel execute, remaining the same is ok
    threads = create_thread(field.i * field.j)  
    for t in threads:

        #i don't know taichi how parallelly use the index of field. 
        #i assume t.i and t.i stand for i and j of f[i,j]
        involvere.volvere(t.i * field.shape[1] + t.j)  # see funtion code.
...

In addition, the Mask class has below ability:
1.it can be generated by field.mask() to extract active/deactive status.
2.several masks of the same field can be merged to a mask by AND, OR, NOT, XOR…
3.can access a field by mask of this field.

YuPeng · 2022 年3 月 1 日 02:48

Hi @mustang, 非常抱歉，对你的需求还是不太明白。

我看下来的意思是你需要的功能Taichi是不是已经有了? 比如 bitmasked, pointer field。请看：here

mustang · 2022 年3 月 1 日 04:44

比如我将field内容分个组A,B,C吧, 他们是相同的数据, 但是在一些情境下, 我要改变其中一组数据. 要被改动的数据可能需要实时判别, 也可能是提前知道的. 总之, 我并不想整个field一起改动. 且不说每次对其中一组进行改动要重新使能失能bitmasked, pointer . 就算是一组dense, 我也未必就想要整个dense的数据都做相同的改动啊…
当然每次for中我们可以重新用if判断我们需要哪些数据做出改变, 这不是回到问题了吗, 首先判别哪些数据需要被改变这个问题当前算法的效率并不高(当然仅是从代码分析上), 其次每次都使用一个field这么大的GPU并行块, 实际却只有一小部分做运算, 其他做了一次if就全停摆了, 这很浪费算力吧.

mustang · 2022 年3 月 1 日 04:53

当然一种当前比较简单高效的做法是另建一个field作为掩码, 将不同组的数据索引标记到新建的掩码上.然后访问时用新建掩码的索引访问原本的field.

这不是也回来了吗, 既然要新建一个field充当掩码, 何不建一个功能更丰富的Mask class. 他更容易组合不同的掩码进行使用. 以后也更好扩展和优化.

当然还有一个核心的问题没有提到, 就是判别哪些数据是符合我们某种用途的, 这才是本文的重点. 本文提供了一种算法处理这个问题, 效率还需要实际测试, 总之从代码分析上是更快了.

mustang · 2022 年3 月 1 日 05:47

这和taichi设计并不冲突, bitmask, pointer主要是用来回收内存的. 虽然附带了一点过滤访问的能力, 但过滤的是不再需要的数据, 并不是用来做部分访问的. field则是将可能相继访问的数据存储到一起提高时间空间利用率, 但可能相继访问的数据并不一定就需要全部相继访问到. 而mask则是主要用于部分访问的.