Performance History

Overview of test results for the most recent commits:

Performance for matmul_512_512_4096_bf16_f32_O2_npu1_4col_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_512_4096_bf16_f32_O2_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_512_4096_bf16_f32_O3_npu1_4col_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_512_4096_bf16_f32_O3_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_512_4096_bf16_f32_O3_npu1_4col_outline_ukernel_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_ukernel_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_transpose_b_512_4096_512_bf16_f32_O3_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_ukernel_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_transpose_a_4096_512_512_bf16_f32_O3_npu1_4col_outline_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_empty_benchmark

Total ops: 0 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_4_level_tiling_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_4_level_tiling_ukernel_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul4d_512_4096_512_bf16_f32_O3_npu1_4col_outline_4_level_tiling_ukernel_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_empty_4_level_tiling_benchmark

Total ops: 0 Number of cores: 16

Performance for matmul_512_512_512_bf16_f32_O2_npu1_4col_callrepl_100_outline_benchmark

Total ops: 26843545600 Number of cores: 1

Performance for matmul_512_512_512_bf16_f32_O3_npu1_4col_callrepl_100_outline_benchmark

Total ops: 26843545600 Number of cores: 1

Performance for matmul_const_bias_ctrlpkt_1024_1024_1024_i8_i32_benchmark_bias_2

Total ops: 26843545600 Number of cores: 1

Performance for matmul_512_512_4096_bf16_f32_O3_npu1_4col_outline_packet_flow_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_4096_512_bf16_f32_O3_npu1_4col_outline_packet_flow_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_packet_flow_benchmark

Total ops: 2147483648 Number of cores: 16

Performance for matmul_512_512_512_bf16_f32_O3_npu1_4col_callrepl_100_outline_ukernel_benchmark

Total ops: 26843545600 Number of cores: 1

Performance for matmul_1024_1024_1024_i8_i32_npu1_4col_reconfigure_only_benchmark

Performance for matmul_1024_1024_1024_i8_i32_npu1_4col_pdi_load_only_benchmark