AdaMeZO: Adam-style LLM fine-tuning without storing gradient moments in GPU memory
AdaMeZO is a zeroth-order optimizer that combines the advantages of the Adam algorithm with the memory efficiency of the MeZO approach for fine-tuning large language models. It uses only forward passes and achieves up to 70% fewer passes compared to MeZO, with improved convergence.