NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon
Source: MarkTechPost Pretraining frontier-scale LLMs in FP8 is now standard practice, but moving to 4-bit floating point has...