Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it
📅 2025-10-03 ⚓ Hacker News 🌐 Source 🖼️ Load Image