Weekly Intelligence

AI Quick Bites

March 06, 2026 · 429 items from 14 sources

Last refreshed: March 06, 2026 at 15:20 UTC

Highlights

The five most consequential developments in AI this week — selected from 429 items across 14 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
OPSDC cuts reasoning token usage by ~58% while improving accuracy using nothing but self-distillation from a 'be concise' prompt — a zero-overhead technique immediately applicable to any reasoning model.
arxiv 2026-03-06 18 min
03
Detecting VLM hallucinations before a single token is generated via lightweight probes (up to 0.93 AUROC) opens the door to cheap, proactive safety guardrails in production multimodal systems.
arxiv 2026-03-06 18 min
04
LSP scheduler makes diffusion language models 3.4x faster inference with no training changes, addressing a key practical barrier to DLM adoption.
arxiv 2026-03-06 18 min
05
If you rely on LLM-as-judge for benchmarking or RLHF, this open-source harness reveals that no current judge is consistently reliable — a critical blind spot for eval pipelines.
arxiv 2026-03-06 15 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar.

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here.

KeygraphHQ/shannon
8/10
Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% on the XBOW benchmark (100/104 exploits) in a hint-free, source-aware setting — a significant capability milestone for automated offensive security.
trendshift 2026-03-06 5 min
Claude-powered AI bot just compromised multiple GitHub repos autonomously
8/10
An autonomous Claude-powered bot scanned 47,000+ GitHub repos, identified CI/CD workflow vulnerabilities, and submitted malicious pull requests that exfiltrated tokens — demonstrating real-world autonomous AI offensive capability at scale.
reddit 2026-03-06 5 min
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
7/10
Provides mechanistic evidence of 'reasoning theater' in large reasoning models (DeepSeek-R1 671B, GPT-OSS 120B): models often crystallize their final answer in internal activations long before completing chain-of-thought, especially on easy tasks. Activation probe-guided early exit cuts tokens by up to 80% on MMLU with similar accuracy, with direct implications for CoT faithfulness and inference efficiency.
arxiv 2026-03-06 20 min
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
7/10
Uses censored Chinese LLMs (Qwen3) as a natural testbed for studying honesty elicitation and lie detection — a more realistic setting than artificially trained deceptive models. Finds that sampling without chat templates, few-shot prompting, and fine-tuning on generic honesty data most reliably elicit suppressed knowledge, with results transferring to DeepSeek R1.
arxiv 2026-03-06 20 min
LLMs can unmask pseudonymous users at scale with surprising accuracy
7/10
Reports research showing LLMs can de-anonymize pseudonymous users at scale by correlating writing style and behavioral patterns — a novel and significant privacy threat with broad implications for online anonymity. Technically substantive and practically alarming.
hackernews 2026-03-06 8 min
TorchLean: Formalizing Neural Networks in Lean
7/10
TorchLean is a project to formally verify neural network properties (correctness, robustness) using the Lean 4 theorem prover; bridges ML engineering and formal methods, with significant implications for verifiable AI safety.
hackernews 2026-03-06 10 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7.0/10
ICLR 2026 paper on training LLMs to self-report hidden objectives via honesty fine-tuning, presenting a scalable alignment auditing technique that could catch deceptive goal pursuit in agentic models — directly relevant to AI safety practitioners.
conferences 2026-03-06 20 min
2,863 Google API keys on public websites now silently authenticate to Gemini. One developer was billed $82,314 in 48 hours. Google's initial response: "Intended Behavior."
7/10
Researcher found 2,863 exposed Google API keys on public websites that now silently authenticate to Gemini — one developer was billed $82K in 48 hours, with Google initially calling it 'intended behavior'; highlights critical API key scoping risks introduced by Gemini's reuse of existing Google Cloud keys.
reddit 2026-03-06 8 min
Anthropic Cowork feature creates 10GB VM bundle on macOS without warning
6/10
Claude Code's Cowork feature was found to silently create a 10GB VM bundle on macOS without user consent or warning — raises important questions about agentic tool transparency and resource management with 186 HN comments.
hackernews 2026-03-06 5 min
steerling-8b
6/10
Steerling-8B is a causal diffusion language model explicitly designed for interpretability and concept-steering, enabling fine-grained control over model behavior. Novel architecture angle (masked/causal diffusion) applied to alignment tooling.
huggingface_models 2026-03-06 3 min
OBLITERATUS
6/10
From the 'Pliny the Prompter' jailbreak community, OBLITERATUS is a one-click model 'liberation' and chat playground tool; represents organized tooling for systematic safety bypass research.
huggingface_spaces 2026-03-06 3 min
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
5.0/10
ICLR 2026 paper on cross-lingual alignment for information retrieval, improving semantic proximity across languages in multilingual embedding spaces — useful for global-scale RAG applications.
conferences 2026-03-06 15 min
The U.S. used Anthropic AI tools during airstrikes on Iran
5/10
Reports that US CENTCOM used Anthropic Claude tools during airstrikes on Iran despite a public dispute between Anthropic and the Pentagon. Significant for AI governance and dual-use policy discussions but light on technical detail.
reddit 2026-03-06 3 min
[D] AMA Secure version of OpenClaw
5/10
AMA from an 'Attention Is All You Need' co-author building a Rust-based security-focused alternative to 'OpenClaw' AI coding agent, citing data and funds exploitation risks. Limited technical detail in the post itself but provenance is notable.
reddit 2026-03-06 5 min
Meta’s AI smart glasses and data privacy concerns
5/10
Report on privacy concerns around Meta's AI smart glasses, with workers allegedly having broad access to data streams. Raises real questions about AI-enabled surveillance at scale but is more policy/journalism than technical.
hackernews 2026-03-06 5 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative relevance score across all sources.

Top Authors
#1
FrameAI4687
1 item · avg 180.0/10
180.0
#2
r3gm
2 items · avg 84.0/10
168.0
#3
prithivMLmods
2 items · avg 82.5/10
165.0
#4
multimodalart
1 item · avg 125.0/10
125.0
#5
pliny-the-prompter
1 item · avg 105.0/10
105.0
#6
mrfakename
1 item · avg 101.0/10
101.0
#7
HuggingFaceM4
1 item · avg 85.0/10
85.0
#8
Qwen
1 item · avg 68.0/10
68.0
#9
linoyts
2 items · avg 32.0/10
64.0
#10
microsoft
1 item · avg 59.0/10
59.0
Top Organizations
#1
public-apis
1 item · avg 526082.5/10
526082.5
#2
awesome-selfhosted
1 item · avg 359715.0/10
359715.0
#3
openclaw
1 item · avg 344638.0/10
344638.0
#4
anthropics
3 items · avg 68925.7/10
206777.2
#5
Shubhamsaboo
1 item · avg 129946.5/10
129946.5
#6
microsoft
1 item · avg 117275.4/10
117275.4
#7
zed-industries
1 item · avg 99470.6/10
99470.6
#8
Developer-Y
1 item · avg 98545.0/10
98545.0
#9
obra
1 item · avg 92955.0/10
92955.0
#10
toeverything
1 item · avg 82685.0/10
82685.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

LLM Judge Reliability Dashboard
A developer tool that continuously stress-tests LLM-as-judge pipelines by running systematic perturbations (paraphrasing, label flips, formatting changes) against any judge model and surfaces reliability scores and failure modes. As teams increasingly use LLM judges for RLHF, evals, and CI/CD pipelines, unreliable judges silently corrupt feedback loops. Build a hosted service or CLI that integrates into eval pipelines and alerts when judge consistency drops below configurable thresholds.
RLHF and fine-tuning pipelines Automated benchmark evaluation AI product QA and regression testing Research reproducibility tooling
https://arxiv.org/abs/2603.05399v1 https://arxiv.org/abs/2603.05485v1
Reasoning Token Pruner
A plug-in inference layer that uses activation probes to detect when a reasoning model has already committed to an answer internally, then terminates chain-of-thought early — achieving 57-80% token reduction with no accuracy loss. The core insight from 'Reasoning Theater' research is that models perform unnecessary verbal computation after the answer is decided. Build this as a wrapper around popular reasoning model APIs or a self-hostable middleware for Qwen/DeepSeek-class models.
Cost reduction for reasoning-heavy API products Latency-sensitive agent workflows On-device / edge model deployment High-volume batch inference pipelines
https://arxiv.org/abs/2603.05433v1 https://arxiv.org/abs/2603.05488v1
Pseudonymity Shield
A browser extension and API service that analyzes a user's writing samples and warns them when their writing style is distinctive enough to de-anonymize them across platforms — then suggests stylistic rewrites to reduce fingerprinting. LLMs can now unmask pseudonymous users at scale by correlating linguistic patterns, creating a serious privacy threat for whistleblowers, activists, and researchers. The product flips the threat model: use the same LLM capability defensively to protect users before they post.
Journalist and whistleblower source protection Online privacy for activists and dissidents Anonymous forum and dark web communities Enterprise insider-threat awareness training
https://arstechnica.com/security/2026/03...
Synthetic Data Validator
A Python library and web UI that audits synthetic datasets generated by LLMs or diffusion models for statistical failure modes — including model misspecification, attenuated uncertainty, and distribution shift — before they are used in downstream training or inference. Teams routinely substitute synthetic for real data without validating statistical validity, leading to quietly broken models. Build automated checks for coverage gaps, calibration drift, and label leakage, with a report card output compatible with existing ML experiment trackers.
ML training data pipelines Privacy-preserving data sharing in healthcare and finance Low-resource language and domain augmentation Regulatory compliance for AI systems trained on synthetic data
https://arxiv.org/abs/2603.05396v1
AI Memory Portability Layer
An open standard and middleware service for exporting, importing, and translating persistent user memory and preferences across AI assistants — similar to what Claude's memory import feature does, but as a vendor-neutral protocol. Anthropic's move to let users migrate context from other assistants signals that memory portability is a real user pain point and competitive battleground. Build an open spec plus connectors for major assistants (ChatGPT, Claude, Gemini, local models), enabling users to own and move their AI context like contacts or calendar data.
Consumer AI assistant switching and multi-assistant workflows Enterprise AI onboarding acceleration Developer tooling for agent memory persistence User data ownership and GDPR-compliant AI memory management
https://claude.com/import-memory https://clwnt.com

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending and TrendShift, enriched with commit velocity and contributor activity.

1
TrendShift
KeygraphHQ/shannon
TypeScript 31,500 3,100
Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% on the XBOW benchmark (100/104 exploits) in a hint-free, source-aware setting — a significant capability milestone for automated offensive security.
Build idea
A subscription-based automated security auditing SaaS that continuously scans customer web apps and APIs using Shannon's autonomous pentesting engine, delivering prioritized vulnerability reports and remediation guidance without requiring an in-house security team.
61 commits/mo 14 issues
2
TrendShift
anthropics/skills
Python 84,600 8,900
Anthropic's official public repository for Agent Skills, amassing 85K+ stars — provides reusable, composable capabilities for Claude-based agents and signals Anthropic's direction for standardizing agentic tool use.
Build idea
A marketplace for enterprise-grade Claude agent skill packs — pre-built, tested, and compliance-ready composable capabilities (e.g., CRM integration, legal document review, financial analysis) that businesses can license and deploy into their Claude-powered workflows.
2 commits/mo 401 issues
3
GH Trending
LMCache/LMCache
python 7,565 979 620 stars this week
LMCache provides a high-performance KV cache layer for LLMs, with 620 stars this week and active development (74 monthly commits) — targets a real bottleneck in LLM inference throughput and cost.
Build idea
A managed LLM inference optimization layer offered as a drop-in API proxy that reduces token processing costs and latency for AI-heavy SaaS companies by intelligently caching and reusing KV states across requests.
74 commits/mo 252 issues
4
GH Trending
alibaba/OpenSandbox
python 6,497 468 5,082 stars this week
Alibaba's open-source general-purpose sandbox platform for AI agents supporting coding agents, GUI agents, RL training, and evaluation — Docker/K8s-native with multi-language SDKs, 5K stars in its launch week.
Build idea
A cloud-hosted secure sandbox-as-a-service platform where enterprises can safely deploy, run, and evaluate AI coding and GUI agents in isolated environments without managing their own Kubernetes infrastructure.
55 issues
5
TrendShift
anthropics/claude-code
Shell 74,100 5,900
Anthropic's official terminal-based agentic coding tool continues to dominate with 74K stars and active ecosystem growth — the de facto standard for AI-assisted coding workflows in 2026.
Build idea
A managed developer productivity platform built on Claude Code that integrates with enterprise codebases, enforces org-specific coding standards, and provides audit logs and role-based access controls for teams adopting AI-assisted development at scale.
5678 issues
6
GH Trending
block/goose
rust 32,526 2,977 1,246 stars this week
Block's open-source extensible AI agent built in Rust that goes beyond code suggestions to install, execute, edit, and test with any LLM; 32K+ stars and 342 commits last month indicate strong momentum.
Build idea
A no-code workflow automation SaaS for non-technical business users that leverages Goose's agentic execution capabilities to let teams define, schedule, and monitor multi-step tasks — like data pulls, report generation, and system updates — without writing code.
342 commits/mo 387 issues
7
GH Trending
bytedance/deer-flow
python 25,009 2,972 3,812 stars this week
ByteDance's open-source SuperAgent framework supporting sandboxes, memory, tools, skills, and sub-agents for long-horizon tasks; 25K stars and 3,812 this week signal strong community adoption.
Build idea
A research automation SaaS for consulting firms and enterprise strategy teams that uses deer-flow's SuperAgent framework to autonomously gather, synthesize, and deliver structured competitive intelligence reports on any topic.
172 commits/mo 226 issues
8
GH Trending
inclusionAI/AReaL
python 4,307 355 412 stars this week
AReaL is a fast, flexible RL training framework specifically for LLM reasoning and agent capabilities, with 4.3K stars and active development — fills a gap for efficient RLHF/reasoning training at scale.
Build idea
A fine-tuning and reasoning optimization service for AI teams that uses AReaL to efficiently train domain-specific LLMs with reinforcement learning, offering managed compute, experiment tracking, and deployment pipelines as a turnkey solution.
34 issues
9
TrendShift
openai/symphony
Elixir 469 24
OpenAI's officially released agent orchestration framework built in Elixir that turns project tasks into isolated, autonomous coding runs — enabling teams to manage work queues rather than babysit individual agents. Early but signals OpenAI's production agent architecture thinking.
Build idea
A project management SaaS for software teams that uses Symphony's agent orchestration to automatically decompose GitHub issues into isolated coding tasks, assign them to AI agents, and surface results for human review — functioning like an AI engineering team manager.
2 commits/mo
10
GH Trending
NousResearch/hermes-agent
python 1,793 285 974 stars this week
Nous Research's adaptive agent framework with unusually high commit velocity (596/month) and rapid star growth — worth watching as Nous builds toward a self-improving agent paradigm.
Build idea
A self-improving AI operations platform for enterprises that uses Hermes Agent's adaptive framework to continuously refine its own task-handling strategies based on feedback, reducing manual prompt engineering and agent maintenance overhead over time.
596 commits/mo 194 issues

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following.

1
Andrés Marafioti · Hugging Face
@andimarafioti 414 56 repos
Multimodal Research Lead at Hugging Face.
andimarafioti/faster-qwen3-tts
Python 430 62
Real-time text-to-speech with Qwen3-TTS
2
Ben Brandt · @zed-industries
@benbrandt 410 76 repos
Rust Engineer at Zed Industries
benbrandt/text-splitter
Rust 573 29
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
3
Benson Wong · Tailscale and Elethink
@mostlygeek 255 114 repos
mostlygeek/llama-swap
Go 2,601 196
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
4
Nathan Brake · @mozilla.ai
@njbrake 287 50 repos
Machine Learning at Mozilla.ai
njbrake/agent-of-empires
Rust 1,029 77
Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI Coding Agent Terminal Session manager via tmux and git Worktrees
5
zhayujie · Minimal Future Tech
@zhayujie 1,368 25 repos
Minimalist Developer
zhayujie/chatgpt-on-wechat
Python 41,943 9,795
CowAgent是基于大模型的超级AI助理,能主动思考和任务规划、访问操作系统和外部资源、创造和执行Skills、拥有长期记忆并不断成长。同时支持飞书、钉钉、企业微信应用、微信公众号、网页等接入,可选择OpenAI/Claude/Gemini/DeepSeek/ Qwen/GLM/Kimi/LinkAI,能处理文本、语音、图片和文件,可快速搭建个人AI助手和企业数字员工。
6
Robert Allen · @epicpast @hmhco
@zircote 163 160 repos
zircote/rlm-rs
Rust 17
Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.
7
qixing-jk
@qixing-jk 63 62 repos
qixing-jk/all-api-hub
TypeScript 1,876 109
一站式管理 New API 兼容中转站账号:余额/用量看板、自动签到、密钥一键导出到常用应用、网页内 API 可用性测试、渠道与模型同步/重定向 | New‑API relay manager: balance/usage, auto check‑in, one‑click key export to popular clients, in‑page API checks, channel/model sync & redirect
8
Arseny Kapoulkine
@zeux 3,064 22 repos
zeux/meshoptimizer
C++ 7,320 612
Mesh optimization library that makes meshes smaller and faster to render
9
zsviczian
@zsviczian 851 53 repos
zsviczian/obsidian-excalidraw-plugin
TypeScript 6,345 388
A plugin to edit and view Excalidraw drawings in Obsidian
10
郑诚 (Cheng Zheng) · 奇绩创坛 MiraclePlus
@1c7 2,902 341 repos
Remote Software Engineer based in Guangzhou (since 2020). 人在广州,远程工作中(从 2020 年起)。
1c7/chinese-independent-developer
47,016 3,970
👩🏿‍💻👨🏾‍💻👩🏼‍💻👨🏽‍💻👩🏻‍💻中国独立开发者项目列表 -- 分享大家都在做什么
11
Marcin Szeniak
@Klocman 579 11 repos
A random software and electronics hobbyist from Poland.
Klocman/Bulk-Crap-Uninstaller
C# 17,660 777
Remove large amounts of unwanted applications quickly.
12
Aurelle
@aurelleb 244 20 repos
Freelance web developer with a heavy interest in lower-level things. Owner of @vicinaehq
13
Azure SDK Bot · Microsoft
@azure-sdk 4,619 35 repos
Service account for the Azure SDK Team
azure-sdk/azure-docs-sdk-java
Python 103 39
☕️ Azure SDK for Java API documentation repository. Content here is mostly auto-generated.
14
Gunnar Morling · Confluent
@gunnarmorling 2,581 304 repos
Technologist @ Confluent · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴
gunnarmorling/1brc
Java 7,961 2,206
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
15
Kim Morrison
@kim-em 402 202 repos
kim-em/lean-zip
Lean 38 3
Lean theorem proving developer — no direct AI/ML relevance for this audience.
16
Krille-chan · @famedly
@krille-chan 423 58 repos
Coffee-to-code-converter
krille-chan/fluffychat
Dart 2,486 478
The cutest instant messenger in the [matrix]
17
mxsm · @apache
@mxsm 717 50 repos
RocketMQ-Rust Maintainer & Apache EventMesh PMC|Committer & Apache RocketMQ active contributor
mxsm/rocketmq-rust
Rust 1,483 240
🚀Apache RocketMQ build in Rust🦀. Faster, safer, and with lower memory usage. ⭐ Star to support our work❤️!
18
Stephen Berry
@stephenberry 586 108 repos
Creator and developer of the Ascent simulation architecture and the Glaze JSON library.
stephenberry/glaze
C++ 2,406 216
Extremely fast, in memory, JSON and reflection library for modern C++. BEVE, CBOR, CSV, MessagePack, TOML, YAML, EETF
19
YuTengjing · https://lobehub.com
@tjx666 598 377 repos
day day up.
tjx666/awesome-chrome-extension-boilerplate
TypeScript 443 50
Use react + typescript + webpack to enhance your chrome extension development experience
20
Brady Gaster
@bradygaster 845 92 repos
Brady Gaster is a PM Architect in the CoreAI division at Microsoft where he works on Apps, Agents, MIDI, and most recently, Squad
bradygaster/squad
TypeScript 663 79
Squad: AI agent teams for any project
21
David East · @google-labs-code
@davideast 2,874 106 repos
Working on @google-labs-code. Stitch and Jules <3
davideast/stitch-mcp
TypeScript 356 42
A CLI for moving AI-generated UI designs from Google’s Stitch platform into your development workflow.
22
Mattt
@mattt 18,960 128 repos
mattt/AnyLanguageModel
Swift 788 60
An API-compatible, drop-in replacement for Apple's Foundation Models framework with support for custom language model providers.

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week.

Arena Leaderboard — Top 15
#ModelTypeEloVotes
1 claude-opus-4-6 Anthropic Closed 1504 8,945
2 gemini-3.1-pro-preview Google Closed 1500 4,042
3 claude-opus-4-6-thinking Anthropic Closed 1500 8,073
4 grok-4.20-beta1 xAI Closed 1493 5,071
5 gemini-3-pro Google Closed 1485 39,673
6 gpt-5.2-chat-latest-20260210 OpenAI Closed 1481 5,502
7 gpt-5.4-high OpenAI Closed 1480 2,290
8 gemini-3-flash Google Closed 1473 30,621
9 grok-4.1-thinking xAI Closed 1473 39,058
10 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1471 32,254
11 claude-opus-4-5-20251101 Anthropic Closed 1467 37,207
12 dola-seed-2.0-preview Bytedance Closed 1466 6,410
13 grok-4.1 xAI Closed 1463 43,318
14 gemini-3-flash (thinking-minimal) Google Closed 1461 22,593
15 claude-sonnet-4-6 Anthropic Closed 1459 5,194
New & Trending Models
LiquidAI/LFM2-24B-A2B
14,511 downloads 266 likes 79 trending
Custom License 2026-02-24
LiquidAI's LFM2-24B-A2B is a 24B MoE model with only 2B active parameters, multilingual (10 languages), targeting efficient edge deployment. The liquid neural network / hybrid architecture approach is architecturally novel and the active-parameter efficiency ratio is compelling.
MiniMaxAI/MiniMax-M2.5
370,789 downloads 1,108 likes 148 trending
Custom License 2026-02-12
MiniMax-M2.5 is a frontier-class model with 370K+ downloads and 1100+ likes, representing MiniMax's latest open release with competitive benchmark claims. Worth watching as a strong non-Western lab open-weight release.
zai-org/GLM-5
214,828 downloads 1,723 likes 94 trending
Open Source 2026-02-11
GLM-5 from ZhipuAI (ZAI) is a new frontier model with 1,723 likes and high trending score; uses a DSA (Dynamic Sparse Attention) MoE architecture and represents a significant new Chinese open-weight frontier release.
Nanbeige/Nanbeige4.1-3B
470,144 downloads 963 likes 147 trending
Open Source 2026-02-10
Nanbeige4.1-3B is a compact bilingual (EN/ZH) model with 470K+ downloads in its first weeks and an associated arXiv paper. Gaining significant traction as a competitive small Chinese-language model.
Qwen/Qwen3-Coder-Next
1,100,429 downloads 1,077 likes 62 trending
Open Source 2026-01-30
Qwen3-Coder-Next is Alibaba's next-generation coding model with 1.1M downloads, indicating strong developer adoption. Likely a checkpoint release previewing the upcoming full Qwen3-Coder series.
guidelabs/steerling-8b
1,110 downloads 102 likes 40 trending
Open Source 2026-02-22
Steerling-8B is a causal diffusion language model explicitly designed for interpretability and concept-steering, enabling fine-grained control over model behavior. Novel architecture angle (masked/causal diffusion) applied to alignment tooling.
stepfun-ai/Step-3.5-Flash
324,795 downloads 690 likes 28 trending
Open Source 2026-02-01
StepFun's Step-3.5-Flash is a fast inference-optimized language model with 324K downloads and Apache-2.0 license; positioned as an efficient open alternative in the competitive flash-inference model space.
unsloth/Qwen3-Coder-Next-GGUF
572,938 downloads 458 likes 32 trending
Open Source 2026-02-03
Unsloth's GGUF quantizations of Qwen3-Coder-Next with 572K downloads; the massive download count confirms Qwen3-Coder-Next as a top local coding model pick for llama.cpp users.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
3,622 downloads 68 likes 68 trending
Open Source 2026-02-27
Knowledge distillation of Claude 4.6 Opus reasoning capabilities into Qwen3.5-27B, targeting chain-of-thought improvement at a smaller scale. Part of a growing trend of distilling frontier reasoning into open weights.
LocoreMind/LocoOperator-4B
5,100 downloads 271 likes 65 trending
Open Source 2026-02-23
4B agent/tool-calling model distilled from a larger model, based on Qwen3-4B-Instruct, with 271 likes and strong trending. Promising for lightweight agentic applications needing function-calling at the edge.
allenai/Olmo-Hybrid-7B
14,630 downloads 31 likes 31 trending
Open Source 2026-01-28
AllenAI's OLMo-Hybrid-7B is a fully open, academically reproducible 7B model using a hybrid architecture. Relevant for researchers who need transparent model lineage and reproducibility.
janhq/Jan-code-4b
358 downloads 58 likes 58 trending
Open Source 2026-03-02
Jan's code-focused 4B agent model with tool-calling support, built on their proprietary Jan-v3 base. Solid small coding agent option, especially for Jan app users.
openai/gpt-oss-20b
7,288,981 downloads 4,433 likes 30 trending
Open Source 2025-08-04
OpenAI's open-source 20B model with 7.3M downloads — an established release but still one of the few openly available OpenAI-origin models with a corresponding arXiv paper.
stepfun-ai/Step-3.5-Flash-Base
308 downloads 63 likes 63 trending
Open Source 2026-03-02
Base (pre-instruction-tuning) checkpoint for Step-3.5-Flash, useful for researchers wanting to fine-tune from scratch; high trending score relative to download count suggests fresh release curiosity.
zai-org/GLM-4.7-Flash
1,681,393 downloads 1,591 likes 20 trending
Open Source 2026-01-19
ZhipuAI's GLM-4.7-Flash with 1.68M downloads and MIT license; one of the most downloaded bilingual (EN/ZH) models reflecting strong production adoption despite lower trending score.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live.

Omni Video Factory
FrameAI4687
gradio 368 177
mit
All-in-one Gradio space supporting text-to-video, image-to-video, and video extension workflows; high trending score but limited technical novelty as a wrapper demo.
faster-qwen3-tts
HuggingFaceM4
docker 109 85
HuggingFace M4 team's optimized inference demo for Qwen3-TTS, claiming faster-than-realtime synthesis; signals competitive open TTS landscape with Qwen3 series.
LFM2.5 1.2B Thinking WebGPU
LiquidAI
static 77 37
Liquid AI's LFM2.5-1.2B reasoning model running entirely in-browser via WebGPU, demonstrating edge inference for small thinking models without any server backend.
OmniLottie
OmniLottie
gradio 36 36
apache-2.0
Demo for OmniLottie, an AI system for generating Lottie animation files; niche but interesting application of generative AI to structured vector animation formats.
Qwen3-TTS Demo
Qwen
gradio 1,629 65
apache-2.0
Official demo for Qwen3-TTS, Alibaba's open-weight text-to-speech model with 1,600+ likes indicating strong community adoption; Apache-2.0 license makes it commercially viable.
Wan2.2 Animate
Wan-AI
gradio 4,881 46
apache-2.0
Official Wan2.2 animation demo from Wan-AI with nearly 5K likes, one of the most adopted open video generation models; this space covers the animate-specific workflow.
FLUX.2 [Klein] 9B
black-forest-labs
gradio 618 42
Black Forest Labs' FLUX.2 Klein 9B image generation demo; the 'Klein' variant represents a mid-size FLUX model balancing quality and inference cost.
DeepSite v4
enzostvs
docker 16,553 28
mit
DeepSite v4 is a vibe-coding app generator with 16K+ likes on HF; mature project with broad adoption but incremental update with limited new technical substance.
Flux2 Klein Face Swap
linoyts
gradio 73 29
Face-swap application built on FLUX.2 Klein 9B via LoRA fine-tuning; showcases the personalization flexibility of the Klein model for identity transfer tasks.
Qwen Image Edit Camera Control
linoyts
gradio 2,045 29
apache-2.0
Fast 4-step inference demo for Qwen Image Edit with camera angle control; 2K+ likes signals strong interest in controllable image editing with novel view synthesis capability.
TRELLIS.2
microsoft
gradio 1,184 59
mit
Microsoft's TRELLIS.2 generates high-fidelity 3D assets from images; the sequel to TRELLIS continues to push open-weight 3D generation quality with 1,184 likes.
Z Image Turbo
mrfakename
gradio 2,471 98
High-trending image generation space with 2,471 likes; likely a turbo/distilled variant of a major image model, though sparse documentation limits technical assessment.
MTEB Leaderboard
mteb
docker 7,101 32
mit
The canonical MTEB embedding model leaderboard with 7,100+ likes; a steady reference resource for tracking the state-of-the-art in text embedding models across tasks.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 1,835 122
Demo generating multi-angle views of objects using Qwen Image Edit with 3D camera control; 1,835 likes highlights strong interest in controllable novel-view synthesis via instruction-tuned models.
OBLITERATUS
pliny-the-prompter
gradio 107 105
agpl-3.0
From the 'Pliny the Prompter' jailbreak community, OBLITERATUS is a one-click model 'liberation' and chat playground tool; represents organized tooling for systematic safety bypass research.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-03-06
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Common Corpus is presented at ICLR 2026 as the largest openly licensed pre-training dataset for LLMs, directly addressing legal/copyright concerns with proprietary training data — timely given ongoing litigation around LLM training data.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-03-06
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
MedAraBench introduces a large-scale Arabic medical QA benchmark at ICLR 2026, addressing a significant gap in multilingual medical NLP evaluation resources.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-03-06
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Theoretical ICLR 2026 paper studying transformers as unsupervised learners through the lens of Gaussian Mixture Models, offering formal grounding for in-context learning behavior — useful for researchers studying ICL mechanisms.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-03-06
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Task Tokens proposes a flexible token-conditioning approach for adapting transformer-based behavior foundation models in humanoid control without full retraining — practical for hierarchical RL in embodied AI.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-03-06
Submodular Function Minimization with Dueling Oracle
Theoretical paper on submodular function minimization using noisy pairwise comparison oracles — tangentially relevant to preference-based optimization but narrow in scope.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-03-06
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
ICLR 2026 benchmark evaluating MLLMs on scan-oriented academic paper reasoning, highlighting that current models handle retrieval but struggle with deep document-level analysis needed for autonomous research.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-03-06
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th Order Recursive Consistent Velocity Field Estimation for any-step generation, improving on consistency models with simpler training objectives and reduced computational overhead.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-03-06
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT introduces masked skill token training for fully offline hierarchical RL that generalizes policies across environments with different dynamics — relevant for sim-to-real transfer challenges.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-03-06
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings — theoretically sound but primarily of interest to optimization theorists.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-03-06
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG applies RL-based value training to embedders for multi-step RAG, addressing the well-known limitation of single-hop retrieval on complex multi-hop QA tasks — a meaningful step forward for agentic RAG pipelines.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-03-06
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
ICLR 2026 paper on cross-lingual alignment for information retrieval, improving semantic proximity across languages in multilingual embedding spaces — useful for global-scale RAG applications.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-03-06
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic ICLR 2026 benchmark of GPT-4o, o4-mini, and Gemini variants on classic CV tasks, revealing significant gaps between multimodal LLM marketing claims and structured vision task performance.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-03-06
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous representations for variable-cardinality discrete structures (sets), enabling diffusion/flow matching over objects like detected entities without fixed-size padding — novel approach for object detection and molecular tasks.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-03-06
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL presents a randomized path approximation to scattering transforms for efficient perceptual quality gradients in inverse problems — computationally practical variant of a theoretically motivated signal processing tool.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-03-06
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture for probabilistic rare-event forecasting in time-series, combining evidential learning with extreme value theory to handle severe class imbalance — niche but practically relevant for anomaly detection systems.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-03-06
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators analogous to classical numerical methods, allowing dynamic compute allocation at inference — useful for scientific ML applications.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-03-06
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE uses hyperbolic space multimodal learning for enzyme classification, better capturing hierarchical enzyme relationships — domain-specific but methodologically interesting for bio-ML practitioners.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-03-06
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency bottlenecks in current autoregressive multimodal LLMs — directly relevant for voice AI product builders.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-03-06
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K benchmarks proactive and personalized mobile GUI agents using MLLMs, going beyond explicit-instruction-following to contextual anticipation — advances the state of mobile agent evaluation.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-03-06
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Rigorous analysis of Multi-Resolution Hash Encoding (NeRF/NeAF backbone) spatial kernels from a physical systems perspective, providing principled hyperparameter guidance instead of heuristics.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 429 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

429+ research items ready to explore