TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis
学习TileFlow
这篇论文中是如何进行多个内存层级的tiling
.
1. 输入格式描述
输入需要如下几个文件, 每个文件描述了不同的内容. tileflow arch/arch.yaml prob/prob.yaml map/map.yaml macro.yaml
1.1 arch.yaml
描述了整个芯片的架构层级. architecture:
version: 0.2
subtree:
- name: System
local:
- name: MainMemory
class: DRAM
attributes:
block-size: 16384
depth: 1
word-bits: 16
read_bandwidth: 4.3
write_bandwidth: 2.9
subtree:
- name: Buffer
local:
- name: Cache
class: SRAM
attributes:
word-bits: 16
block_size: 16384
depth: 3
read_bandwidth: 52
write_bandwidth: 20 # 16
subtree:
- name: PE
local:
- name: RegFile[0..255]
class: regfile
attributes:
meshX: 16
meshY: 16
depth: 1
block_size: 3
word-bits: 16
read_bandwidth: 3.2
write_bandwidth: 3.2
- name: mac[0..255]
class: intmac
attributes:
word-bits: 16
meshX: 16
meshY: 16
1.2 prob.yaml
这里其实类似halide
, 需要列出所有共享的迭代变量,
然后下面每个算子都是用这些维度来构建. 具体可以参考这里.
problem: |
1.3 map.yaml
map是一个比较重要的配置, 这里重点说明一下.
type
分为temporal表示顺序执行和spatial表示并行执行.
factors: M = MO N = NO K= KO
表示它将M/N/K维度分别分为MO/NO/KO块.permutation: NMK
表示循环从内到外分别为NMK
.
这里的factors: M=MM K=KM N=NI
表示再次切分这里的三个维度.
multicast: true
表示多播.
split: 1
表示映射到硬件xy.
原始文档参考这里.
mapping: |
1.4 执行结果
这里好像所有的 ***Optimal Mapping:
-----------------Nest Analysis----------------
read: C A B update: C
for N in [0:NO(32)), MainMemory
for M in [0:MO(1)), MainMemory
for K in [0:KO(1)), MainMemory
read: C A B update: C fill: C A B write-back: C
for K in [0:KM(1)), Cache
for M in [0:MM(1)), Cache
for N in [0:NI(2)), Cache
read: C A B update: C fill: C A B write-back: C
for K in [0:16) (Spatial-Y), Cache
for M in [0:16) (Spatial-X), Cache
read: C A B update: C fill: C A B write-back: C
for K in [0:1), RegFile
for N in [0:1), RegFile
for M in [0:1), RegFile
read: C A B update: C fill: C A B write-back: C
Op: GEMM(A,B,)->C
Cycle: 536, Energy: 1.44756e+07
--------------END Nest Analysis---------------