❯ clang++ -std=c++17 -stdlib=libc++ -Iout/build/debug/include/ -fno-objc-arc -framework Metal -framework Foundation -framework MetalKit brighter.o main.cpp && ./a.out 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, Entering Pipeline brighter Target: arm-64-osx-debug-metal Input Buffer input: buffer(0, 0x0, 0x16b35a770, 1, float32, {0, 16, 1}, {0, 16, 16}) Output Buffer brighter: buffer(0, 0x0, 0x16b35a370, 0, float32, {0, 16, 1}, {0, 16, 16}) Metal - Allocating: MTLCreateSystemDefaultDevice Metal - Allocating: new_command_queue Caching compiled kernel: 0x143611bd0 id 2 context 0x14480ca00 Time for halide_metal_initialize_kernels: 6.449170e-01 ms halide_copy_to_device validating input buffer: buffer(0, 0x0, 0x16b35a370, 0, float32, {0, 16, 1}, {0, 16, 16}) halide_device_malloc validating input buffer: buffer(0, 0x0, 0x16b35a370, 0, float32, {0, 16, 1}, {0, 16, 16}) halide_device_malloc: target device interface 0x104acb260 halide_metal_device_malloc (user_context: 0x0, buf: 0x16b35abe8) allocating buffer(0, 0x0, 0x16b35a370, 0, float32, {0, 16, 1}, {0, 16, 16}) Time: 8.459000e-03 ms halide_copy_to_device 0x16b35abe8 skipped (host is not dirty) halide_copy_to_device validating input buffer: buffer(0, 0x0, 0x16b35a770, 1, float32, {0, 16, 1}, {0, 16, 16}) halide_device_malloc validating input buffer: buffer(0, 0x0, 0x16b35a770, 1, float32, {0, 16, 1}, {0, 16, 16}) halide_device_malloc: target device interface 0x104acb260 halide_metal_device_malloc (user_context: 0x0, buf: 0x16b35ac60) allocating buffer(0, 0x0, 0x16b35a770, 1, float32, {0, 16, 1}, {0, 16, 16}) Time: 3.375000e-03 ms halide_copy_to_device 0x16b35ac60 host is dirty halide_copy_to_device 0x16b35ac60 calling copy_to_device() halide_metal_copy_to_device dev = 0x1436206f0 metal_buffer = 0x143621d00 host = 0x16b35a770 Time for halide_metal_copy_to_device: 1.747500e-01 ms Metal - supports setBytes Total args size is 44 and with padding, size is 44 Setting shared memory length to 0 Dispatching threadgroups (number 0) blocks(2, 2, 1) threads(8, 8, 1) Time for halide_metal_device_run: 7.601670e-01 ms Exiting Pipeline brighter halide_copy_to_host validating input buffer: buffer(5425404128, 0x104acb260, 0x16b35a370, 2, float32, {0, 16, 1}, {0, 16, 16}) copy_to_host_already_locked 0x16b35abe8 dev_dirty is true Time for halide_metal_copy_to_host: 4.510420e-01 ms 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, halide_device_free validating input buffer: buffer(5425404128, 0x104acb260, 0x16b35a370, 0, float32, {0, 16, 1}, {0, 16, 16}) halide_metal_device_free called on buf 0x16b35abe8 device is 5425404128 Time: 1.316700e-02 ms halide_device_free validating input buffer: buffer(5425465072, 0x104acb260, 0x16b35a770, 0, float32, {0, 16, 1}, {0, 16, 16}) halide_metal_device_free called on buf 0x16b35ac60 device is 5425465072 Time: 4.042000e-03 ms
|