Weekly Intelligence

AI Quick Bites

March 07, 2026 · 332 items from 13 sources

Last refreshed: March 07, 2026 at 15:08 UTC

Highlights

The five most consequential developments in AI this week — selected from 332 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
OPSDC achieves 57-59% token reduction with accuracy gains on reasoning benchmarks using only self-distillation—no labels, no budgets—making it one of the most practical reasoning compression techniques to date.
arxiv 2026-03-07 20 min
03
The LSP scheduler delivers up to 3.4x inference speedup for Diffusion Language Models with no training required, directly addressing the practical gap between DLM parallelism theory and hardware efficiency.
arxiv 2026-03-07 20 min
04
Predicting VLM hallucinations before generation via lightweight probes (up to 0.93 AUROC) enables cheap early abstention and routing—a practical safety tool for production VLM deployments.
arxiv 2026-03-07 20 min
05
The Judge Reliability Harness exposes that no current LLM judge is uniformly reliable under simple perturbations, which is critical for anyone using LLM-as-judge in benchmarks or RLHF pipelines.
arxiv 2026-03-07 15 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar.

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here.

KeygraphHQ/shannon
8/10
Shannon Lite is a fully autonomous AI pentester achieving 96.15% (100/104 exploits) on the XBOW benchmark for web apps and APIs without hints. This is a significant result for AI-powered offensive security automation, with 6,865 stars this week signaling strong community interest.
github 2026-03-07 5 min
Hardening Firefox with Anthropic's Red Team
8/10
Anthropic's red team collaborated with Mozilla to find and fix real security vulnerabilities in Firefox, with confirmed CVEs attributed to Claude — a landmark demonstration of AI-assisted security research producing verified, production-grade bug discoveries.
hackernews 2026-03-07 8 min
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
7/10
Provides evidence of 'reasoning theater' in large CoT models (DeepSeek-R1 671B, GPT-OSS 120B): final answers are decodable from activations far earlier than the CoT reveals, enabling probe-guided early exit that cuts tokens by up to 80% on MMLU. Important for understanding CoT faithfulness and inference efficiency.
arxiv 2026-03-07 18 min
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
7/10
Uses censored Chinese LLMs (Qwen3) as a natural testbed for knowledge elicitation and lie detection, finding that few-shot prompting and fine-tuning on generic honesty data most reliably surface suppressed knowledge, with self-classification near uncensored-model performance. Novel real-world testbed for honesty research beyond synthetic lying models.
arxiv 2026-03-07 18 min
OBLITERATUS
7/10
OBLITERATUS by Pliny the Prompter is a 'one-click model liberation' jailbreak playground with the highest trending score in this batch. Represents an active, publicly accessible tool for systematic safety bypass research and red-teaming.
huggingface_spaces 2026-03-07 3 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
Proposes honesty fine-tuning to make LLMs self-report hidden objectives when interrogated, improving alignment auditing for agentic systems. Directly relevant to the growing concern about deceptive alignment in capable AI agents.
conferences 2026-03-07 20 min
Claude-powered AI bot just compromised multiple GitHub repos autonomously
7/10
An autonomous Claude-powered bot scanned 47,000+ GitHub repos and successfully compromised several by submitting malicious PRs that exploited CI/CD workflows to exfiltrate tokens — entirely without human direction. Concrete real-world demonstration of autonomous AI-driven supply chain attacks.
reddit 2026-03-07 5 min
2,863 Google API keys on public websites now silently authenticate to Gemini. One developer was billed $82,314 in 48 hours. Google's initial response: "Intended Behavior."
7/10
Researcher found 2,863 exposed Google API keys on public websites that now silently authenticate to Gemini, with one developer billed $82K in 48 hours; Google initially called it 'intended behavior.' Highlights a critical API key scope-creep vulnerability introduced by Gemini's authentication model.
reddit 2026-03-07 5 min
Claude's Cycles [pdf]
7/10
Donald Knuth's paper documenting his experiments with Claude, exploring cyclic behaviors and failure modes in LLM reasoning — a notable contribution from a computing legend that provides rigorous analysis of LLM limitations.
hackernews 2026-03-07 20 min
Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
6/10
Open-source harness for stress-testing LLM judge reliability, revealing that no evaluated judge is uniformly robust across benchmarks—simple perturbations like text formatting or paraphrasing cause meaningful accuracy drops. Important for anyone using LLM-as-judge in evaluation pipelines.
arxiv 2026-03-07 15 min
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval
6/10
Proposes INTRA, a retrieval-free fact-checking method that exploits interactions between LLM internal representations, outperforming logit-based approaches across 9 datasets and 3 models. Positions internal-representation probing as a scalable alternative to RAG-based verification.
arxiv 2026-03-07 18 min
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
6/10
Introduces average bias-boundedness (A-BB), a formal framework guaranteeing bounded bias impact in LLM-as-a-Judge systems, retaining 61–99% rank correlation on Arena-Hard-Auto across four judges. Relevant as autonomous AI feedback loops become more common.
arxiv 2026-03-07 18 min
x1xhlol/system-prompts-and-models-of-ai-tools
6/10
Comprehensive collection of leaked/extracted system prompts from major AI coding tools including Claude Code, Cursor, Devin, Windsurf, Replit, and 20+ others. Valuable for understanding how leading AI products are instructed and for security/red-teaming research.
trendshift 2026-03-07 10 min
steerling-8b
6/10
Steerling-8B is a causal diffusion model with interpretability and concept-steering capabilities, tagged with masked-diffusion and block-causal architectures. Novel architecture for controllable generation and interpretability research.
huggingface_models 2026-03-07 3 min
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
6/10
ICLR 2026 theoretical paper analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, studying in-context learning mechanisms. Contributes to mechanistic understanding of why transformers generalize.
conferences 2026-03-07 20 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative relevance score across all sources.

Top Authors
#1
prithivMLmods
2 items · avg 90.5/10
181.0
#2
FrameAI4687
1 item · avg 177.0/10
177.0
#3
r3gm
2 items · avg 85.0/10
170.0
#4
pliny-the-prompter
1 item · avg 129.0/10
129.0
#5
multimodalart
1 item · avg 115.0/10
115.0
#6
mrfakename
1 item · avg 88.0/10
88.0
#7
HuggingFaceM4
1 item · avg 83.0/10
83.0
#8
linoyts
2 items · avg 32.0/10
64.0
#9
microsoft
1 item · avg 64.0/10
64.0
#10
selfit-camera
1 item · avg 57.0/10
57.0
Top Organizations
#1
public-apis
1 item · avg 526476.4/10
526476.4
#2
awesome-selfhosted
1 item · avg 361405.0/10
361405.0
#3
openclaw
1 item · avg 344635.0/10
344635.0
#4
x1xhlol
1 item · avg 167963.7/10
167963.7
#5
microsoft
2 items · avg 68495.9/10
136991.8
#6
anthropics
2 items · avg 56398.8/10
112797.7
#7
obra
1 item · avg 92955.0/10
92955.0
#8
toeverything
1 item · avg 83595.0/10
83595.0
#9
ruvnet
4 items · avg 16878.0/10
67511.8
#10
FlowiseAI
1 item · avg 65621.2/10
65621.2

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

LLM Judge Reliability Dashboard
A hosted testing suite that stress-tests your LLM-as-judge pipeline against formatting perturbations, paraphrasing, and adversarial inputs before you ship it to production. Research shows even top judges fail on simple text changes, so teams need a fast way to audit judge robustness and get a reliability score before trusting automated eval results. Build it as a CI/CD plugin that runs a standardized perturbation battery and flags fragile judges.
AI evaluation pipelines and RLHF workflows Automated benchmark scoring systems Enterprise LLM quality assurance Red-teaming and safety auditing tools
https://arxiv.org/abs/2603.05399v1 https://arxiv.org/abs/2603.05485v1
CoT Token Budget Optimizer
A drop-in inference wrapper that uses probe-guided early exit and on-policy self-distillation to compress chain-of-thought reasoning at runtime, cutting token usage by 50-80% without sacrificing accuracy. Research shows LLM answers are decodable from internal activations far earlier than the CoT reveals, and self-distillation can compress reasoning chains with no external labels. Package this as a middleware layer compatible with any OpenAI-compatible API endpoint.
Cost reduction for high-volume reasoning API calls Latency-sensitive agentic pipelines Edge and on-device LLM deployment Developer tooling for local model inference
https://arxiv.org/abs/2603.05488v1 https://arxiv.org/abs/2603.05433v1 https://arxiv.org/abs/2603.05454v1
VLM Hallucination Guard
A lightweight pre-generation safety layer for vision-language model applications that predicts hallucination risk from internal representations in a single forward pass — before any tokens are generated. With up to 0.93 AUROC across modern VLMs, this enables cheap early abstention or adaptive decoding triggers that can be inserted into any VLM serving stack. Build it as an open-source inference interceptor with a simple confidence threshold API.
Medical and legal document analysis with VLMs Multimodal RAG pipelines requiring factual grounding Customer-facing chatbots with image understanding Automated content moderation and fact-checking
https://arxiv.org/abs/2603.05465v1 https://arxiv.org/abs/2603.05471v1
Local Model Hardware Matchmaker
An intelligent CLI and GUI tool that profiles your machine's RAM, CPU, GPU, and thermal headroom, then recommends and auto-configures the best available open-weight LLM for your specific hardware — including quantization level, context length, and batch size. The friction of choosing and fitting local models is a top complaint in the developer community, and combining hardware profiling with model benchmarks creates a genuinely useful daily-driver tool. Extend llmfit's approach with automatic GGUF/GPTQ selection, benchmark-backed recommendations, and one-click download.
Developer onboarding to local LLM workflows Privacy-sensitive enterprise deployments Edge AI on laptops and workstations Offline-first AI assistant applications
https://github.com/AlexsJones/llmfit https://arxiv.org/abs/2603.05500v1
Synthetic Data Quality Auditor
A developer tool that automatically validates synthetic training data generated by LLMs for statistical validity before it enters your fine-tuning pipeline, flagging failure modes like model misspecification, attenuated uncertainty, and distribution shift. Teams are increasingly using LLM-generated synthetic data to bootstrap fine-tuning, but shipping bad synthetic data silently degrades model quality. Build a Python library with pluggable statistical tests, coverage metrics, and a report card output that integrates with Hugging Face datasets and common data pipelines.
Fine-tuning dataset preparation and validation Synthetic data pipelines for low-resource domains RLHF preference data quality control Regulated industries requiring data provenance (healthcare, finance)
https://arxiv.org/abs/2603.05396v1 https://arxiv.org/abs/2603.05400v1

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending and TrendShift, enriched with commit velocity and contributor activity.

1
GH Trending
KeygraphHQ/shannon
typescript 32,484 3,231 6,865 stars this week
Shannon Lite is a fully autonomous AI pentester achieving 96.15% (100/104 exploits) on the XBOW benchmark for web apps and APIs without hints. This is a significant result for AI-powered offensive security automation, with 6,865 stars this week signaling strong community interest.
Build idea
A continuous automated penetration testing SaaS that runs Shannon against customer web apps and APIs on a scheduled basis, delivering prioritized vulnerability reports and remediation guidance without requiring a human pentester.
17 issues
2
GH Trending
alibaba/OpenSandbox
python 6,655 481 5,156 stars this week
Alibaba's general-purpose sandbox platform for AI agents supporting coding agents, GUI agents, RL training, and agent evaluation with Docker/Kubernetes runtimes and multi-language SDKs. 5,156 stars this week and broad scope make this a significant infrastructure release for agent developers.
Build idea
A managed cloud platform for AI agent developers that provides on-demand, isolated sandbox environments for safely running, testing, and benchmarking coding and GUI agents at scale.
53 issues
3
GH Trending
anthropics/skills
python 86,411 9,140 7,230 stars this week
Anthropic's official public repository for Agent Skills with 86k+ stars and 7,230 stars this week — the canonical registry for Claude agent capabilities, signaling Anthropic's push toward a standardized skill/plugin ecosystem for agents.
Build idea
A marketplace where developers publish, monetize, and distribute verified Claude agent skills, earning revenue each time their skill is invoked by other builders' agents.
412 issues
4
GH Trending
block/goose
rust 32,576 2,990 1,181 stars this week
Open-source, extensible AI coding agent written in Rust that goes beyond suggestions to install, execute, edit, and test code with any LLM backend; 32k stars and 1,181 new stars this week signals strong traction.
Build idea
A developer productivity platform that deploys Goose as a self-hosted coding agent inside enterprise environments, integrating with internal codebases, CI/CD pipelines, and ticketing systems to autonomously resolve engineering tasks.
352 commits/mo 377 issues
5
GH Trending
inclusionAI/AReaL
python 4,506 373 744 stars this week
Fast, flexible RL training framework specifically for LLM reasoning and agent tasks; 744 new stars this week and active development (59 commits/month) make it a notable alternative to RLHF toolkits like TRL.
Build idea
A managed RL fine-tuning service for enterprises that want to train domain-specific reasoning models, offering AReaL as the backend with a no-code interface for defining reward functions and evaluating trained models.
59 commits/mo 32 issues
6
TrendShift
openai/skills
Python 10,900 608
OpenAI's official Skills Catalog for Codex — a curated library of reusable skill definitions that Codex agents can invoke. Signals OpenAI's direction toward composable, skill-based agent architectures and is directly relevant to anyone building on Codex.
Build idea
A no-code platform that lets non-technical teams compose and deploy custom Codex-powered automation workflows by assembling pre-built skills from the OpenAI catalog without writing any agent orchestration code.
121 issues
7
TrendShift
openai/symphony
Elixir 7,400 473
OpenAI's Symphony orchestrates project work into isolated, autonomous coding agent runs (built in Elixir), shifting teams from supervising agents to managing work queues. A significant architectural pattern for production agentic software development.
Build idea
A project management SaaS for engineering teams that translates GitHub issues and product specs into autonomous Symphony agent runs, tracking progress and surfacing completed pull requests through a Kanban-style work queue dashboard.
8
GH Trending
LMCache/LMCache
python 7,572 981 632 stars this week
Distributed KV cache layer for LLMs designed to accelerate inference by sharing and reusing KV cache across requests and instances. Solid infrastructure work with 7,572 stars and active development.
Build idea
A drop-in LLM inference acceleration service that sits between enterprise applications and their LLM providers, using distributed KV caching to cut inference costs and latency for high-volume, repetitive-context workloads.
249 issues
9
TrendShift
QwenLM/Qwen-Agent
Python 14,800 1,400
Official agent framework from Alibaba's Qwen team supporting Qwen 3.0+, with Function Calling, MCP protocol, Code Interpreter, RAG, and Chrome extension. Solid production-grade framework from a top model provider.
Build idea
A white-label enterprise AI assistant builder that lets companies create custom internal agents with RAG over proprietary documents, function calling into internal APIs, and a Chrome extension for employee-facing deployment, all powered by Qwen.
446 issues
10
GH Trending
X-PLUG/MobileAgent
python 8,017 807 633 stars this week
Mobile-Agent GUI agent family from Alibaba DAMO Academy enabling autonomous mobile device control via multimodal LLMs. Active project with 8k stars and recent updates.
Build idea
A mobile QA automation service that uses MobileAgent to autonomously execute test scripts on real or emulated devices, replacing manual mobile testing workflows for app development teams.
177 issues

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following.

1
Andrés Marafioti · Hugging Face
@andimarafioti 419 56 repos
Multimodal Research Lead at Hugging Face.
andimarafioti/faster-qwen3-tts
Python 440 62
Real-time text-to-speech with Qwen3-TTS
2
zhayujie · Minimal Future Tech
@zhayujie 1,370 25 repos
Minimalist Developer
zhayujie/chatgpt-on-wechat
Python 41,980 9,798
CowAgent是基于大模型的超级AI助理,能主动思考和任务规划、访问操作系统和外部资源、创造和执行Skills、拥有长期记忆并不断成长。同时支持飞书、钉钉、企业微信应用、微信公众号、网页等接入,可选择OpenAI/Claude/Gemini/DeepSeek/ Qwen/GLM/Kimi/LinkAI,能处理文本、语音、图片和文件,可快速搭建个人AI助手和企业数字员工。
3
Robert Allen · @epicpast @hmhco
@zircote 164 160 repos
zircote/rlm-rs
Rust 19
Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.
4
Classic298
@Classic298 94 9 repos
Classic298/prune-open-webui
Python 62 1
🧹 An automizable or interactive pruning tool for Open WebUI: clean up orphaned files, old chats, inactive users, stale vector data and more!
5
郑诚 (Cheng Zheng) · 奇绩创坛 MiraclePlus
@1c7 2,904 342 repos
Remote Software Engineer based in Guangzhou (since 2020). 人在广州,远程工作中(从 2020 年起)。
1c7/chinese-independent-developer
47,017 3,970
👩🏿‍💻👨🏾‍💻👩🏼‍💻👨🏽‍💻👩🏻‍💻中国独立开发者项目列表 -- 分享大家都在做什么
6
Marcin Szeniak
@Klocman 588 11 repos
A random software and electronics hobbyist from Poland.
Klocman/Bulk-Crap-Uninstaller
C# 17,716 779
Remove large amounts of unwanted applications quickly.
7
Aurelle
@aurelleb 245 20 repos
Freelance web developer with a heavy interest in lower-level things. Owner of @vicinaehq
8
Gunnar Morling · Confluent
@gunnarmorling 2,583 304 repos
Technologist @ Confluent · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴
gunnarmorling/1brc
Java 7,965 2,207
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
9
James M Snell · @cloudflare
@jasnell 2,003 222 repos
Node.js Core Contributor / TSC Cloudflare Principal Engineer, Workers Runtime
jasnell/new-streams
TypeScript 387 4
A proposal for a new streams API
10
Karl Seguin
@karlseguin 2,370 149 repos
karlseguin/http.zig
Zig 1,404 99
An HTTP/1.1 server for zig
11
Kim Morrison
@kim-em 406 202 repos
kim-em/lean-zip
Lean 38 3
Lean language developer profile; not directly AI/ML relevant.
12
Ben Brandt · @zed-industries
@benbrandt 411 76 repos
Rust Engineer at Zed Industries
benbrandt/text-splitter
Rust 573 29
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
13
Brady Gaster
@bradygaster 848 92 repos
Brady Gaster is a PM Architect in the CoreAI division at Microsoft where he works on Apps, Agents, MIDI, and most recently, Squad
bradygaster/squad
TypeScript 683 88
Squad: AI agent teams for any project
14
David East · @google-labs-code
@davideast 2,885 106 repos
Working on @google-labs-code. Stitch and Jules <3
davideast/stitch-mcp
TypeScript 362 42
A CLI for moving AI-generated UI designs from Google’s Stitch platform into your development workflow.
15
Benson Wong · Tailscale and Elethink
@mostlygeek 263 114 repos
mostlygeek/llama-swap
Go 2,674 197
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
16
mxsm · @apache
@mxsm 718 51 repos
RocketMQ-Rust Maintainer & Apache EventMesh PMC|Committer & Apache RocketMQ active contributor
mxsm/rocketmq-rust
Rust 1,483 240
🚀Apache RocketMQ build in Rust🦀. Faster, safer, and with lower memory usage. ⭐ Star to support our work❤️!
17
Nathan Brake · @mozilla.ai
@njbrake 294 50 repos
Machine Learning at Mozilla.ai
njbrake/agent-of-empires
Rust 1,039 76
Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI Coding Agent Terminal Session manager via tmux and git Worktrees
18
qixing-jk
@qixing-jk 67 62 repos
qixing-jk/all-api-hub
TypeScript 1,892 110
一站式管理 New API 兼容中转站账号:余额/用量看板、自动签到、密钥一键导出到常用应用、网页内 API 可用性测试、渠道与模型同步/重定向 | New‑API relay manager: balance/usage, auto check‑in, one‑click key export to popular clients, in‑page API checks, channel/model sync & redirect
19
Stephen Berry
@stephenberry 588 108 repos
Creator and developer of the Ascent simulation architecture and the Glaze JSON library.
stephenberry/glaze
C++ 2,408 217
Extremely fast, in memory, JSON and reflection library for modern C++. BEVE, CBOR, CSV, MessagePack, TOML, YAML, EETF
20
Sasha Varlamov · @git-ai-project
@svarlamov 122 116 repos
Working on Git AI

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week.

Arena Leaderboard — Top 15
#ModelTypeEloVotes
1 claude-opus-4-6 Anthropic Closed 1504 9,170
2 claude-opus-4-6-thinking Anthropic Closed 1502 8,313
3 gemini-3.1-pro-preview Google Closed 1500 4,041
4 grok-4.20-beta1 xAI Closed 1491 5,280
5 gemini-3-pro Google Closed 1485 39,923
6 gpt-5.4-high OpenAI Closed 1479 3,503
7 gpt-5.2-chat-latest-20260210 OpenAI Closed 1479 5,786
8 gemini-3-flash Google Closed 1473 30,600
9 grok-4.1-thinking xAI Closed 1473 39,309
10 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1470 32,516
11 claude-opus-4-5-20251101 Anthropic Closed 1467 37,462
12 dola-seed-2.0-preview Bytedance Closed 1465 6,712
13 grok-4.1 xAI Closed 1462 43,536
14 gemini-3-flash (thinking-minimal) Google Closed 1462 22,846
15 gpt-5.4 OpenAI Closed 1457 3,417
New & Trending Models
zai-org/GLM-5
222,572 downloads 1,737 likes 91 trending
Open Source 2026-02-11
GLM-5 from ZhipuAI (ZAI) is a new frontier model release with 1.7k likes, 222k downloads, and the highest trending score among HF models in this batch. MIT-licensed bilingual model representing a significant new open-weight release from a major Chinese AI lab.
MiniMaxAI/MiniMax-M2.5
389,688 downloads 1,112 likes 78 trending
Custom License 2026-02-12
MiniMax-M2.5 is a major open model release with 389K downloads and 1.1K likes, representing MiniMax's latest generation with strong community adoption. High download velocity suggests competitive benchmark performance worth investigating.
Qwen/Qwen3-Coder-Next
1,126,844 downloads 1,084 likes 63 trending
Open Source 2026-01-30
Qwen3-Coder-Next from Alibaba has over 1.1M downloads and 1K+ likes, making it one of the most-downloaded coding models currently trending. Likely a strong competitor in the code generation space worth benchmarking.
sarvamai/sarvam-105b
111 downloads 130 likes 130 trending
Open Source 2026-03-03
Sarvam-105B is a major multilingual model supporting 22+ Indian languages including Sanskrit, Maithili, and tribal languages, representing a significant investment in underrepresented language AI. The breadth of Indic language coverage is unprecedented at this scale.
LiquidAI/LFM2-24B-A2B
15,094 downloads 270 likes 64 trending
Custom License 2026-02-24
LiquidAI's LFM2-24B-A2B is a 24B MoE model with only 2B active parameters, targeting edge deployment with multilingual support across 10 languages. The extreme active-parameter efficiency ratio makes this architecturally notable.
guidelabs/steerling-8b
1,202 downloads 103 likes 34 trending
Open Source 2026-02-22
Steerling-8B is a causal diffusion model with interpretability and concept-steering capabilities, tagged with masked-diffusion and block-causal architectures. Novel architecture for controllable generation and interpretability research.
sarvamai/sarvam-30b
352 downloads 89 likes 89 trending
Open Source 2026-03-03
Sarvam-30B MoE companion to the 105B model, offering a more accessible size point for the same 22+ Indic language coverage. Together these represent a serious open multilingual model family for South Asian languages.
stepfun-ai/Step-3.5-Flash
325,417 downloads 691 likes 24 trending
Open Source 2026-02-01
Step-3.5-Flash from StepFun is a fast inference LLM with 325k downloads and Apache-2.0 license. High download count suggests strong practical adoption for efficient text generation.
stepfun-ai/Step-3.5-Flash-Base
415 downloads 67 likes 67 trending
Open Source 2026-03-02
Newly released base model checkpoint for Step-3.5-Flash, enabling fine-tuning and research on StepFun's efficient architecture. Apache-2.0 license makes it fully open for downstream use.
unsloth/Qwen3-Coder-Next-GGUF
535,141 downloads 463 likes 32 trending
Open Source 2026-02-03
Unsloth's GGUF quantization of Qwen3-Coder-Next with 535k downloads, making the coding-specialized model accessible for local inference. High download count confirms strong demand for quantized coding models.
zai-org/GLM-4.7-Flash
1,686,620 downloads 1,595 likes 22 trending
Open Source 2026-01-19
GLM-4.7-Flash from ZhipuAI with 1.7M downloads and MIT license — one of the most downloaded models in this batch. Bilingual (EN/ZH) fast inference model with strong adoption.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
4,212 downloads 139 likes 137 trending
Open Source 2026-02-27
Knowledge distillation of Claude 4.6 Opus reasoning capabilities into Qwen3.5-27B, part of a series of Claude-distilled open models. Represents the ongoing trend of distilling frontier model reasoning into smaller open-weight models.
LocoreMind/LocoOperator-4B
5,253 downloads 272 likes 40 trending
Open Source 2026-02-23
4B agent-optimized model fine-tuned for tool-calling and agentic tasks via distillation from Qwen3-4B-Instruct. Notable likes-to-downloads ratio suggests quality, targeting efficient local agent deployment.
Nanbeige/Nanbeige4.1-3B
495,944 downloads 965 likes 69 trending
Open Source 2026-02-10
Nanbeige4.1-3B is a compact bilingual (EN/ZH) model with nearly 500K downloads and an associated arXiv paper, indicating a serious research-backed small model release. High download count for a 3B model suggests strong practical utility.
allenai/Olmo-Hybrid-7B
15,357 downloads 33 likes 33 trending
Open Source 2026-01-28
AllenAI's OLMo-Hybrid-7B is a fully open research model combining hybrid architecture elements. Noteworthy as a transparent, reproducible research artifact from a credible academic lab.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live.

Omni Video Factory
FrameAI4687
gradio 399 174
mit
Gradio space offering text-to-video, image-to-video, and video extension capabilities in one interface. High trending score (174) and 399 likes indicate strong user interest, though technical novelty is unclear without more detail.
faster-qwen3-tts
HuggingFaceM4
docker 118 83
HuggingFace M4 team's optimized TTS demo using Qwen3, suggesting inference speed improvements for the Qwen3 TTS pipeline. Official HF team demo implies production-quality optimization work.
LFM2.5 1.2B Thinking WebGPU
LiquidAI
static 82 38
LiquidAI's 1.2B thinking model running entirely in-browser via WebGPU, demonstrating that small reasoning models can run client-side without any server infrastructure. Practically significant for privacy-preserving and offline AI applications.
OmniLottie
OmniLottie
gradio 38 38
apache-2.0
Demo for OmniLottie, an AI system for generating Lottie animations. Niche generative media application with modest traction.
Qwen3-TTS Demo
Qwen
gradio 1,631 56
apache-2.0
Interactive demo for Qwen3-TTS, Alibaba's text-to-speech model with 1.6k likes indicating solid community interest. Apache-2.0 licensed, suggesting open deployment potential.
Wan2.2 Animate
Wan-AI
gradio 4,889 50
apache-2.0
Wan2.2 Animate demo from Wan-AI with nearly 5k likes, one of the more popular open video generation models. Image-to-video animation capability.
FLUX.2 [Klein] 9B
black-forest-labs
gradio 627 45
Official demo for FLUX.2 Klein 9B from Black Forest Labs, a compact 9B image generation model. Represents continued iteration on the FLUX architecture for efficient image synthesis.
DeepSite v4
enzostvs
docker 16,560 30
mit
DeepSite v4 is a vibe-coding app generator with 16k+ likes, one of the most-liked HF spaces. Generates full web applications from natural language prompts.
Flux2 Klein Face Swap
linoyts
gradio 79 32
Face swap application built on FLUX.2 Klein 9B via LoRA fine-tuning. Derivative application of the Klein model with limited novelty.
Qwen Image Edit Camera Control
linoyts
gradio 2,048 29
apache-2.0
Camera angle control for image editing using Qwen Image Edit 2509 with fast 4-step inference. Demonstrates controllable viewpoint synthesis via diffusion-based editing.
TRELLIS.2
microsoft
gradio 1,196 64
mit
Microsoft's TRELLIS.2 enables high-fidelity 3D asset generation from single images with 1.2k likes. Represents a meaningful step forward in image-to-3D generation quality from a major lab.
Z Image Turbo
mrfakename
gradio 2,479 88
Z Image Turbo is a high-trending image generation demo with 2.5k likes, suggesting a fast turbo-mode image model. High trending score indicates strong community interest in speed-optimized generation.
MTEB Leaderboard
mteb
docker 7,105 31
mit
The canonical MTEB embedding model leaderboard with 7k+ likes. Essential reference for selecting embedding models for RAG and retrieval applications.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 1,848 112
Demo generating multiple 3D camera angle views of an image using Qwen Image Edit, with 1.8k likes and top trending score. Showcases novel view synthesis capability accessible via a simple interface.
OBLITERATUS
pliny-the-prompter
gradio 131 129
agpl-3.0
OBLITERATUS by Pliny the Prompter is a 'one-click model liberation' jailbreak playground with the highest trending score in this batch. Represents an active, publicly accessible tool for systematic safety bypass research and red-teaming.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-03-07
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
ICLR 2026 paper introducing Common Corpus, the largest openly licensed dataset for LLM pre-training, addressing legal concerns around copyrighted training data. Important for researchers needing legally defensible pre-training data.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-03-07
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
ICLR 2026 paper presenting MedAraBench, a large-scale Arabic medical QA benchmark. Addresses a real gap in multilingual medical NLP but narrow in scope.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-03-07
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
ICLR 2026 theoretical paper analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, studying in-context learning mechanisms. Contributes to mechanistic understanding of why transformers generalize.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-03-07
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Introduces 'Task Tokens' for adapting transformer-based behavior foundation models in humanoid control without full retraining, enabling flexible conditioning on new tasks. Solid contribution to hierarchical RL for robotics but incremental relative to existing prompt-based adaptation work.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-03-07
Submodular Function Minimization with Dueling Oracle
Theoretical work on submodular function minimization using noisy pairwise comparison oracles, relevant to preference-based optimization. Niche mathematical contribution with limited direct ML practitioner impact.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-03-07
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Introduces a benchmark for evaluating MLLMs on scan-oriented academic paper reasoning — testing whether models can holistically parse and reason over full papers rather than just retrieve facts. Highlights a significant gap between current MLLM capabilities and autonomous research assistance.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-03-07
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th order recursive consistent velocity field estimation for any-step generation, addressing computational overhead and training complexity in consistency-style few-step generative models. Potentially useful simplification over existing multi-component consistency model training.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-03-07
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT is a fully offline hierarchical RL framework using masked skill tokens to enable policy transfer across environments with different dynamics, without any online interaction. Addresses a practical sim-to-real gap problem in embodied AI.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-03-07
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings — a theoretical gap that has persisted despite SGDM's ubiquity in deep learning. Useful for practitioners who need formal guarantees for momentum-based optimizers.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-03-07
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG trains retrieval embedders using RL value-based objectives to support multi-step retrieval over long contexts, directly addressing the single-step retrieval bottleneck in complex multi-hop QA. The RL-trained embedder approach is a meaningful departure from standard contrastive retrieval training.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-03-07
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Proposes cross-lingual alignment techniques to improve semantic proximity in multilingual information retrieval, addressing query-document language mismatch. Solid but incremental work in a well-studied area.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-03-07
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematically benchmarks GPT-4o, o4-mini, Gemini 1.5/2.0 Pro on standard CV tasks (depth, segmentation, optical flow, etc.), revealing where frontier multimodal models still fall short of specialized vision models. Important calibration paper for teams deciding when to use VLMs vs. task-specific models.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-03-07
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous neural field representations for variable-cardinality discrete structure prediction (object detection, molecular modeling), enabling diffusion/flow-matching over sets without padding. Novel representation approach but niche application domain.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-03-07
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses random path sampling in wavelet scattering transforms to reduce computational cost while preserving perceptual gradient quality for audio/vision inverse problems. Specialized signal processing contribution with limited broad ML impact.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-03-07
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for probabilistic rare-event forecasting in imbalanced multivariate time series. Addresses a real industrial need but is a specialized niche contribution.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-03-07
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to adaptive precision in classical numerical methods. Useful for scientific ML applications requiring flexible compute budgets.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-03-07
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic space multi-modal learning to enzyme classification, capturing hierarchical enzyme relationships better than Euclidean methods. Domain-specific bioinformatics contribution with limited general ML relevance.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-03-07
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality issues in interleaved audio-text generation. Relevant to the growing voice AI space but incremental over existing non-AR approaches.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-03-07
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K is a benchmark for proactive and personalized mobile GUI agents that act without explicit instructions by leveraging user context — a step toward truly autonomous mobile assistants. The proactive framing distinguishes it from existing reactive GUI agent benchmarks.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-03-07
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (Instant-NGP), replacing heuristic hyperparameter tuning with principled design. Useful for practitioners working with neural fields and NeRF-style representations.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 332 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

332+ research items ready to explore