AWS Nova distillation for video semantic search: 95 percent cost savings and twice the inference speed
AWS demonstrated how model distillation transfers intelligence from the large Nova Premier model into the smaller Nova Micro for video search routing. Results include 95 percent savings on inference costs, 50 percent lower latency (833 ms instead of 1741 ms), and preserved quality per LLM-as-judge scoring (4.0 out of 5). The entire training used 10,000 synthetic examples generated from Nova Premier.