学习TileFlow这篇论文中是如何进行多个内存层级的tiling.

1. 输入格式描述

输入需要如下几个文件, 每个文件描述了不同的内容.

tileflow arch/arch.yaml prob/prob.yaml map/map.yaml macro.yaml 

1.1 arch.yaml

描述了整个芯片的架构层级.

architecture: 
version: 0.2

subtree:
- name: System

local:
- name: MainMemory
class: DRAM
attributes:
block-size: 16384
depth: 1
word-bits: 16
read_bandwidth: 4.3
write_bandwidth: 2.9

subtree:
- name: Buffer

local:
- name: Cache
class: SRAM
attributes:
word-bits: 16
block_size: 16384
depth: 3
read_bandwidth: 52
write_bandwidth: 20 # 16


subtree:
- name: PE

local:
- name: RegFile[0..255]
class: regfile
attributes:
meshX: 16
meshY: 16
depth: 1
block_size: 3
word-bits: 16
read_bandwidth: 3.2
write_bandwidth: 3.2

- name: mac[0..255]
class: intmac
attributes:
word-bits: 16
meshX: 16
meshY: 16

1.2 prob.yaml

这里其实类似halide, 需要列出所有共享的迭代变量, 然后下面每个算子都是用这些维度来构建. 具体可以参考这里.

problem:
io:
ins: A, B
outs: C
dimensions: [M,N,K]
instance:
M: M
N: N
K: K

ops:
- name: GEMM
dimensions: [M,N,K]
data-spaces:
- name: C
projection:
- [[M]]
- [[N]]
read-write: True
- name: A
projection:
- [[M]]
- [[K]]
- name: B
projection:
- [[K]]
- [[N]]
ins: A, B
out: C

1.3 map.yaml

map是一个比较重要的配置, 这里重点说明一下. type分为temporal表示顺序执行和spatial表示并行执行. factors: M = MO N = NO K= KO表示它将M/N/K维度分别分为MO/NO/KO块.permutation: NMK表示循环从内到外分别为NMK. 这里的factors: M=MM K=KM N=NI表示再次切分这里的三个维度. multicast: true表示多播. split: 1表示映射到硬件xy.

原始文档参考这里.

mapping:
node-type: Tile
type: temporal
factors: M = MO N = NO K= KO
permutation: KMN
target: MainMemory


subtree:
- node-type: Tile
type: temporal
factors: M=MM K=KM N=NI
permutation: NMK
target: Cache

subtree:
- node-type: Tile
type: spatial
factors: M=16 K=16
permutation: MK
split: 1
target: Cache
multicast: true

subtree:
- node-type: Tile
type: temporal
factors: M=1 N=1 K=1
permutation: MNK
target: RegFile

subtree:
- node-type: Op
name: GEMM

1.4 执行结果

这里好像所有的

***Optimal Mapping:
-----------------Nest Analysis----------------
read: C A B update: C
for N in [0:NO(32)), MainMemory
for M in [0:MO(1)), MainMemory
for K in [0:KO(1)), MainMemory
read: C A B update: C fill: C A B write-back: C
for K in [0:KM(1)), Cache
for M in [0:MM(1)), Cache
for N in [0:NI(2)), Cache
read: C A B update: C fill: C A B write-back: C
for K in [0:16) (Spatial-Y), Cache
for M in [0:16) (Spatial-X), Cache
read: C A B update: C fill: C A B write-back: C
for K in [0:1), RegFile
for N in [0:1), RegFile
for M in [0:1), RegFile
read: C A B update: C fill: C A B write-back: C
Op: GEMM(A,B,)->C

Cycle: 536, Energy: 1.44756e+07
--------------END Nest Analysis---------------