学习TileFlow这篇论文中是如何进行多个内存层级的tiling.

1. 输入格式描述

输入需要如下几个文件, 每个文件描述了不同的内容.

tileflow arch/arch.yaml prob/prob.yaml map/map.yaml macro.yaml

1.1 arch.yaml

描述了整个芯片的架构层级.

architecture: 
  version: 0.2 

  subtree:
  - name: System
    
    local: 
    - name: MainMemory
      class: DRAM 
      attributes:
        block-size: 16384
        depth: 1
        word-bits: 16
        read_bandwidth: 4.3
        write_bandwidth: 2.9
      
    subtree: 
    - name: Buffer 
    
      local:  
      - name: Cache 
        class: SRAM
        attributes:
          word-bits: 16
          block_size: 16384
          depth: 3
          read_bandwidth: 52
          write_bandwidth: 20 # 16 


      subtree:
      - name: PE

        local: 
        - name: RegFile[0..255] 
          class: regfile
          attributes:
            meshX: 16
            meshY: 16
            depth: 1
            block_size: 3
            word-bits: 16
            read_bandwidth: 3.2
            write_bandwidth: 3.2

        - name: mac[0..255] 
          class: intmac 
          attributes: 
            word-bits: 16
            meshX: 16
            meshY: 16

1.2 prob.yaml

这里其实类似halide, 需要列出所有共享的迭代变量, 然后下面每个算子都是用这些维度来构建. 具体可以参考这里.

problem:
  io:
    ins: A, B
    outs: C
  dimensions: [M,N,K]
  instance:
    M: M
    N: N
    K: K

  ops:
  - name: GEMM
    dimensions: [M,N,K] 
    data-spaces:
    - name: C 
      projection:
        - [[M]] 
        - [[N]] 
      read-write: True 
    - name: A 
      projection:
        - [[M]]
        - [[K]]
    - name: B
      projection:
        - [[K]]
        - [[N]]
    ins: A, B
    out: C

1.3 map.yaml

map是一个比较重要的配置, 这里重点说明一下. type分为temporal表示顺序执行和spatial表示并行执行. factors: M = MO N = NO K= KO表示它将M/N/K维度分别分为MO/NO/KO块.permutation: NMK表示循环从内到外分别为NMK. 这里的factors: M=MM K=KM N=NI表示再次切分这里的三个维度. multicast: true表示多播. split: 1表示映射到硬件xy.

原始文档参考这里.

mapping:
  node-type: Tile 
  type: temporal 
  factors: M = MO N = NO K= KO
  permutation: KMN 
  target: MainMemory 
  
    
  subtree: 
  - node-type: Tile 
    type: temporal 
    factors: M=MM K=KM N=NI
    permutation: NMK
    target: Cache

    subtree:
    - node-type: Tile 
      type: spatial  
      factors: M=16 K=16
      permutation: MK
      split: 1
      target: Cache
      multicast: true

      subtree: 
      - node-type: Tile 
        type: temporal  
        factors: M=1 N=1 K=1
        permutation: MNK
        target: RegFile
        
        subtree:
        - node-type: Op
          name: GEMM

1.4 执行结果

这里好像所有的

***Optimal Mapping:
-----------------Nest Analysis----------------
read: C A B update: C 
for N in [0:NO(32)), MainMemory
  for M in [0:MO(1)), MainMemory
    for K in [0:KO(1)), MainMemory
         read: C A B update: C fill: C A B write-back: C 
         for K in [0:KM(1)), Cache
           for M in [0:MM(1)), Cache
             for N in [0:NI(2)), Cache
                  read: C A B update: C fill: C A B write-back: C 
                  for K in [0:16) (Spatial-Y), Cache
                    for M in [0:16) (Spatial-X), Cache
                        read: C A B update: C fill: C A B write-back: C 
                        for K in [0:1), RegFile
                          for N in [0:1), RegFile
                            for M in [0:1), RegFile
                                 read: C A B update: C fill: C A B write-back: C 
                                 Op: GEMM(A,B,)->C

Cycle: 536, Energy: 1.44756e+07
--------------END Nest Analysis---------------

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis