🤖 Models

35 articles

🟡 🤖 Models April 27, 2026 · 3 min read

arXiv:2604.21764: 'Thinking with Reasoning Skills' reduces reasoning tokens while improving accuracy — ACL 2026 Industry Track

ArXiv 2604.21764: 'Thinking with Reasoning Skills' reduces reasoning tokens while improving accuracy — ACL 2026 Industry Track

The team of Guangxiang Zhao and co-authors published on April 23, 2026 the paper 'Thinking with Reasoning Skills: Fewer Tokens, More Accuracy' accepted at the ACL 2026 Industry Track. The approach distills 'reusable reasoning skills' from long chain-of-thought reasoning and uses them as a retrieval-guided shortcut for new problems, significantly reducing token count while improving accuracy on coding and math tasks.

🔴 🤖 Models April 24, 2026 · 3 min read

DeepSeek releases V4-Pro and V4-Flash: two open-source models with one million token context and 80.6 on SWE Verified

Editorial illustration: DeepSeek V4 models — modules with one million tokens

DeepSeek on April 24, 2026 released V4-Pro (1.6T / 49B active) and V4-Flash (284B / 13B active), two open-source models with one million token context. V4-Pro scored 80.6 on SWE Verified, near Opus 4.6, with drastically reduced memory consumption.

🔴 🤖 Models April 24, 2026 · 3 min read

OpenAI introduces GPT-5.5: the smartest model for coding, research, and complex data analysis through tools

Editorial illustration: AI model — modeli

OpenAI launched GPT-5.5 on April 23, 2026, describing it as their smartest model to date. It is aimed at complex tasks such as programming, research, and data analysis through tools. The model launch was accompanied by a System Card and a special Bio Bug Bounty program.

🟡 🤖 Models April 24, 2026 · 3 min read

Thinking with Reasoning Skills (ACL 2026 Industry Track): fewer tokens, higher accuracy through retrieval of reasoning skills

Editorial illustration: reasoning skills — reasoning patterns and tokens

A team led by Zhao et al. published at ACL 2026 Industry Track a paper proposing the distillation of reusable reasoning skills from extensive exploration. Instead of reasoning from scratch, the model retrieves relevant patterns, reducing the number of reasoning tokens while increasing accuracy on coding and math tasks.

🟡 🤖 Models April 23, 2026 · 3 min read

Google announces GA of gemini-embedding-2: first multimodal embedding model with 5 modalities in one space

Editorial illustration: AI model — modeli

Google announced the general availability of the gemini-embedding-2 model, which supports text, images, video, audio, and PDF inputs mapped into a unified embedding space. The model was in preview since March 10, 2026, and is now available to everyone via the Gemini API.

🟡 🤖 Models April 23, 2026 · 2 min read

Microsoft AutoAdapt: automatic LLM adaptation to specialized domains in 30 minutes and $4

Editorial illustration: AI model — modeli

Microsoft Research introduced AutoAdapt, a framework that automates the adaptation of general language models to specialized domains such as medicine, law, and incident response. The system autonomously chooses between RAG and fine-tuning, optimizes hyperparameters, and completes the job in approximately 30 minutes at an additional cost of around $4.

🟢 🤖 Models April 23, 2026 · 3 min read

Apple introduces MANZANO — a unified multimodal model that balances image understanding and generation

Editorial illustration: AI model — modeli

Apple's research team at ICLR 2026 has introduced MANZANO, a unified multimodal framework that addresses a long-standing trade-off between image understanding capabilities and image generation quality. The model uses a hybrid vision tokenizer that produces continuous embeddings for understanding and discrete tokens for generation, a shared encoder, and two specialized adapters — reducing the quality loss that typically occurs when a single model attempts to do both tasks.

🟢 🤖 Models April 22, 2026 · 2 min read

MathNet: 30,676 olympiad problems from 47 countries, SOTA models still fall short

Editorial illustration: Connected nodes with mathematical symbols and globe fragments from 47 countries

An MIT team published MathNet, a multimodal benchmark with 30,676 olympiad math problems from 47 countries and 17 languages. Gemini-3.1-Pro achieves 78.4%, GPT-5 69.3%, and embedding models have significant difficulty finding mathematically equivalent problems.

🟢 🤖 Models April 22, 2026 · 3 min read

xAI Speech-to-Text API exits beta: general availability for 25 languages

Editorial illustration: Microphone and audio wave streams converting into transcripts in 25 languages through the Grok API

xAI has announced that its Speech-to-Text (STT) API is moving from beta to general availability. The service supports 25 languages, offers batch and streaming modes, and is available without a waitlist — completing the voice stack alongside the previously GA-released Grok Voice Agent.

🔴 🤖 Models April 21, 2026 · 4 min read

Claude Opus 4.7 and Haiku 4.5 Generally Available on Amazon Bedrock: 27 Regions and Self-Serve Enterprise Access

Editorial illustration: Claude Opus 4.7 and Haiku 4.5 generally available on Amazon Bedrock: 27 regions and self-serve enterprise access

Anthropic has moved Claude Opus 4.7 and Haiku 4.5 to general availability within Amazon Bedrock. Both models are now active across 27 AWS regions, without a waitlist, through the standard Messages API endpoint and with support for regional and global request routing.

🟡 🤖 Models April 21, 2026 · 3 min read

Anthropic Retires Claude Haiku 3 from Production: Migration to Haiku 4.5 Mandatory from April 20

Editorialna ilustracija: Anthropic povlači Claude Haiku 3 iz produkcije: migracija na Haiku 4.5 obavezna od 20. travnja

Anthropic formally retired Claude Haiku 3 (model ID claude-3-haiku-20240307) from production on April 20, 2026. All API calls to this model now return an error. The recommended migration target is Claude Haiku 4.5, and the move is part of the deprecation cycle announced in February 2026.

🟢 🤖 Models April 21, 2026 · 4 min read

Why Does Fine-Tuning Promote Hallucinations? Interference Among Semantic Representations, and the Solution Is Self-Distillation SFT

Editorialna ilustracija: Zašto fine-tuning potiče halucinacije? Interference među semantičkim reprezentacijama, a rješen

A new ArXiv paper reveals that hallucinations after fine-tuning are caused neither by insufficient capacity nor by behavior cloning, but by interference among overlapping semantic representations. The solution: self-distillation SFT that regularizes output-distribution drift and treats fine-tuning as a continual learning problem.

🟡 🤖 Models April 19, 2026 · 3 min read

YAN: Mixture-of-Experts Flow Matching Achieves 40× Speedup Over Autoregressive LMs with Just 3 Sampling Steps

Editorial illustration: abstract vector field and parallel flow streams of a generative model

YAN is a new generative language model that combines Transformer and Mamba architectures with a Mixture-of-Experts Flow Matching approach — achieving quality comparable to autoregressive models in just 3 sampling steps, delivering a 40× speedup over AR baselines and up to 1000× over diffusion language models. The model decomposes global transport geometries into locally specialized vector fields.

🟢 🤖 Models April 19, 2026 · 2 min read

IG-Search: Reward That Measures Information Gain Improves Search-Augmented Reasoning with 6.4% Overhead

Editorial illustration: information gain curve and search arrows through reasoning steps

IG-Search is a new approach to training AI models for search-augmented reasoning that uses Information Gain as a step-level reward signal. The signal is derived from the model's own generation probabilities without external annotations, and Qwen2.5-3B with this method achieves an average EM score of 0.430 across 7 QA benchmarks — 1.6 points above MR-Search and 0.9 points above GiGPO with a computational overhead of just 6.4%.

🟢 🤖 Models April 19, 2026 · 3 min read

LLMs Learn the Shortest Path on Graphs — But Fail When the Task Horizon Grows

Editorial illustration: graph with nodes and paths, a long horizon fading into the distance

A new arXiv paper systematically investigates LLM generalization on the shortest-path problem across two dimensions: spatial transfer to unseen maps works well, but horizon-length scaling consistently fails due to recursive instability. The findings have direct implications for autonomous agents — training data coverage defines the boundary of capability, RL improves stability but does not extend that boundary, and inference-time scaling helps but does not solve the length-scaling problem.

🟡 🤖 Models April 18, 2026 · 3 min read

AWS Nova distillation for video semantic search: 95 percent cost savings and twice the inference speed

AWS demonstrated how model distillation transfers intelligence from the large Nova Premier model into the smaller Nova Micro for video search routing. Results include 95 percent savings on inference costs, 50 percent lower latency (833 ms instead of 1741 ms), and preserved quality per LLM-as-judge scoring (4.0 out of 5). The entire training used 10,000 synthetic examples generated from Nova Premier.

🟡 🤖 Models April 18, 2026 · 4 min read

AWS Nova Multimodal Embeddings for video search: hybrid approach delivers 90 percent recall instead of 51 percent

AWS Nova Multimodal Embeddings is a new architecture that simultaneously processes the visual, audio and text content of a video into a shared 1024-dimensional vector space without converting to text. Combining semantic embedding with BM25 lexical search yields 90 percent Recall@5, compared to 51 percent for baseline combined-mode embeddings — a jump of 30 to 40 percentage points across all metrics.

🟡 🤖 Models April 18, 2026 · 4 min read

NVIDIA Nemotron OCR v2: 34.7 pages per second, five languages in one model, 28x faster than PaddleOCR

NVIDIA published Nemotron OCR v2 on HuggingFace — a multilingual OCR model processing 34.7 pages per second on a single A100 GPU. That is 28 times faster than PaddleOCR v5. The model supports English, Chinese, Japanese, Korean and Russian in a single architecture, with no language detection required. Trained on 12.2 million synthetic images, the model and dataset are available under the NVIDIA Open Model License and CC-BY-4.0.

🟢 🤖 Models April 18, 2026 · 3 min read

ArXiv AC/DC: automatic discovery of specialised LLMs through model and task coevolution

AC/DC is a new framework presented at ICLR 2026 that simultaneously evolves LLM models through model merging and tasks through synthetic data. Discovered model populations demonstrate broader expertise coverage than manually curated models without explicit benchmark optimization. Models outperform larger counterparts with less GPU memory, representing a new paradigm in continuous LLM development.

🔴 🤖 Models April 17, 2026 · 2 min read

Anthropic: Claude Opus 4.7 brings high-res vision, task budgets and a new tokenizer — Opus 4 retires

Claude Opus 4.7 is Anthropic's new flagship AI model replacing Opus 4.6 at the same price of $5 for input and $25 for output per million tokens. It brings triple the image resolution up to 2576 pixels, a new effort level xhigh for complex agentic tasks, task budgets that allow the model to independently manage resources in long loops, and a completely new tokenizer.

🟡 🤖 Models April 17, 2026 · 3 min read

ArXiv: conformal prediction exposes hidden unreliability in LLM judges

Diagnosing LLM Judge Reliability is a new study showing that aggregate reliability metrics for LLM-as-a-judge systems mask serious per-instance inconsistencies. Although overall transitivity violation rates are 0.8 to 4.1 percent, as many as 33 to 67 percent of documents have at least one transitivity cycle. The method relies on conformal prediction sets with theoretically guaranteed coverage.

🟡 🤖 Models April 17, 2026 · 2 min read

ArXiv: LongCoT benchmark reveals GPT 5.2 achieves only 9.8% on long chain-of-thought reasoning

LongCoT is a new benchmark with 2,500 expert-designed problems across five domains that tests the ability for long chain-of-thought reasoning which can require tens to hundreds of thousands of tokens. Current frontier models fail dramatically with GPT 5.2 scoring 9.8 percent and Gemini 3 Pro at just 6.1 percent, identifying a critical weakness for autonomous deployment of AI agents.

🟡 🤖 Models April 17, 2026 · 2 min read

Google Research: AI generates synthetic neurons and saves 157 person-years in brain mapping

Google Research has developed the MoGen system that uses the PointInfinity point cloud flow matching model to generate synthetic neuron shapes indistinguishable from real ones according to expert assessments. Just 10 percent of synthetic data in training reduces the error rate by 4.4 percent, equivalent to saving 157 person-years of manual labor in mapping a full mouse brain.

🟡 🤖 Models April 17, 2026 · 3 min read

Google Simula: synthetic data as mechanism design rather than sample-by-sample optimization

Simula is Google's framework that treats synthetic data generation as a mechanism design problem rather than individual sample optimization. The system uses reasoning models to build hierarchical taxonomies and controls four independent axes of data generation. It is already in production — powering Gemini safety classifiers, MedGemma, Android fraud detection, and spam filtering in Google Messages.

🟡 🤖 Models April 17, 2026 · 2 min read

OpenAI: GPT-Rosalind — first frontier reasoning model specialized for life sciences

GPT-Rosalind is OpenAI's new frontier reasoning model specialized for research in life sciences, including drug discovery, genomic analysis and protein reasoning. The model continues the trend of specialized AI systems following GPT-5.4-Cyber for cybersecurity, and signals OpenAI's strategic decision to build vertically optimized models for key industries.

🟡 🤖 Models April 16, 2026 · 2 min read

Google: Gemini 3.1 Flash TTS Brings Expressive AI Speech to More Than 70 Languages

Google has launched Gemini 3.1 Flash TTS, a new text-to-speech model supporting more than 70 languages and achieving an Elo score of 1,211 on the Artificial Analysis leaderboard. The key innovation is audio tags — embedding natural language commands directly into text for precise control of voice, intonation, and emotion. The model is available on Google AI Studio, Vertex AI, and Google Vids, with SynthID watermarking for detecting AI-generated audio.

🟢 🤖 Models April 16, 2026 · 2 min read

ArXiv: Numerical Instability in LLMs — How Floating-Point Errors Create Chaos in Transformers

New research rigorously analyzes how rounding errors in floating-point arithmetic propagate chaos through the layers of transformer architecture. The paper identifies three behavioral regimes — stable, chaotic, and signal-dominated — and demonstrates that numerical instability is not a bug but a fundamental property of LLMs that threatens reproducibility in production systems.

🔴 🤖 Models April 15, 2026 · 2 min read

Anthropic: Claude Sonnet 4 and Opus 4 Retiring on June 15

Anthropic has announced the deprecation of the original Claude Sonnet 4 and Claude Opus 4 models. Both models will be removed from the API on June 15, 2026. Development teams should migrate to version 4.6 as soon as possible.

🟡 🤖 Models April 15, 2026 · 2 min read

ArXiv: Neurons Responsible for Harmful Responses in Large Language Models Identified

Causal analysis of mechanisms within LLMs reveals that harmful content originates in later model layers, primarily through MLP blocks. A small set of neurons in the final layer acts as a control mechanism for harmful responses.

🟡 🤖 Models April 15, 2026 · 1 min read

Google: Gemini Robotics-ER 1.6 Brings Instrument Reading and Spatial Understanding

Google has released Gemini Robotics-ER 1.6 with new instrument reading capabilities and improved spatial and physical understanding. The previous version 1.5 will be shut down on April 30.

🟡 🤖 Models April 14, 2026 · 2 min read

ArXiv: Process Reward Agents — real-time feedback improves AI reasoning in medicine without retraining

Researchers have introduced Process Reward Agents (PRA), a new approach that provides step-by-step feedback during AI reasoning in medical domains. The system works with existing models without retraining and achieves significant results on medical benchmarks.

🟡 🤖 Models April 13, 2026 · 1 min read

ArXiv PRA: 4B model achieves 80.8% on medical benchmark — new SOTA for small scale

Process Reward Agents enable small frozen models (0.5B-8B) to significantly improve medical reasoning without any training — Qwen3-4B achieves a new state-of-the-art of 80.8% on MedQA.

🟡 🤖 Models April 13, 2026 · 1 min read

ArXiv SPPO: Sequence-level PPO solves the credit assignment problem in long reasoning chains

Sequence-Level PPO reformulates LLM reasoning as a contextual bandit problem, achieving the performance of expensive group methods like GRPO with dramatically fewer resources — without multi-sampling.

🟡 🤖 Models April 11, 2026 · 2 min read

ArXiv SUPERNOVA: reinforcement learning on natural instructions improves reasoning by 52.8%

A new paper, SUPERNOVA, shows that systematic curation of existing instruction-tuning datasets can significantly improve reasoning in LLMs. Models trained on SUPERNOVA achieve up to a 52.8% relative improvement on the BBEH benchmark.

🟢 🤖 Models April 10, 2026 · 2 min read

Sentence Transformers v5.4 adds support for multimodal embedding and reranker models

HuggingFace's Sentence Transformers library has received version 5.4, which introduces multimodal embedding and reranker models. Users can now map text, images, audio and video into a shared embedding space and perform cross-modal similarity — a unification of search across different content types.