Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it

📅 2025-10-03    ⚓ Hacker News    🌐 Source    🖼️ Load Image