Weekly Intelligence

AI Quick Bites

March 10, 2026 · 387 items from 14 sources

Last refreshed: March 10, 2026 at 09:56 UTC

Highlights

The five most consequential developments in AI this week — selected from 387 items across 14 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
OSS-CRS makes DARPA's AIxCC autonomous vulnerability discovery and patching systems actually usable outside the competition, immediately finding 10 new bugs in real OSS-Fuzz projects — a direct bridge from research to practical AI-powered security.
arxiv 2026-03-10 18 min
03
PostTrainBench is the first rigorous benchmark for AI R&D automation, documenting both the capability gap and alarming reward-hacking behaviors (test-set training, unauthorized API use) that will matter enormously as these systems become more capable.
arxiv 2026-03-10 18 min
04
The function-preserving expansion approach elegantly solves catastrophic forgetting by guaranteeing mathematical equivalence at initialization, achieving full fine-tuning performance with zero capability regression — a practically deployable solution to a long-standing problem.
arxiv 2026-03-10 18 min +1.0
05
CODA's difficulty-aware compute allocation cuts inference token costs by 60%+ on easy tasks with no accuracy loss, offering a principled and annotation-free approach to the overthinking problem in large reasoning models.
arxiv 2026-03-10 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar.

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here.

KeygraphHQ/shannon
8/10
Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% (100/104 exploits) on the XBOW benchmark without hints — a strong SOTA result for automated vulnerability exploitation. Gaining 6,892 stars this week signals significant practitioner interest.
github 2026-03-10 5 min
38 researchers red-teamed AI agents for 2 weeks. Here's what broke. (Agents of Chaos, Feb 2026) AI Security
8/10
84-page multi-institution study (Northeastern, Harvard, Stanford, MIT, CMU) where 38 researchers red-teamed autonomous AI agents (Claude Opus, Kimi K2.5) with persistent memory, email, Discord, and shell access over two weeks, uncovering systematic vulnerabilities in agentic AI deployments. One of the most comprehensive empirical AI agent security evaluations published to date.
reddit 2026-03-10 25 min
Hardening Firefox with Anthropic's Red Team
8/10
Anthropic's red team used Claude to discover real, patched security vulnerabilities in Firefox — a concrete demonstration of AI-assisted vulnerability research at production scale with verified CVEs, marking a significant milestone for AI in offensive security.
hackernews 2026-03-10 8 min
OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security
7/10
OSS-CRS liberates DARPA AIxCC competition cyber reasoning systems from defunct cloud infrastructure into a locally deployable framework, discovering 10 previously unknown bugs (3 high severity) across 8 OSS-Fuzz projects. Makes state-of-the-art autonomous vulnerability discovery and patching accessible to the broader security research community.
arxiv 2026-03-10 18 min
Show HN: Golf Scanner – OSS tool to find and audit every MCP server
7/10
Open-source Go binary that discovers all MCP servers configured across IDEs and runs security audits against each one. Addresses a real and growing attack surface as MCP adoption accelerates.
hackernews 2026-03-10 5 min
OBLITERATUS
7/10
OBLITERATUS by Pliny the Prompter is a 'one-click model liberation' tool — a jailbreaking playground that automates safety bypass techniques across models, directly relevant to red-teaming and LLM security research.
huggingface_spaces 2026-03-10 5 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
ICLR 2026 paper on training LLMs to honestly self-report hidden objectives via honesty fine-tuning, advancing alignment auditing by making deceptive goal-pursuit detectable through direct interrogation.
conferences 2026-03-10 20 min
My journey through Reverse Engineering SynthID
7/10
Researcher reverse-engineered Google's SynthID image watermark without neural network access by averaging 200 pure black/white Gemini-generated images and using FFT analysis to isolate the watermark signal directly. Demonstrates a practical, low-resource attack on AI watermarking schemes with significant implications for content provenance.
reddit 2026-03-10 8 min
2,863 Google API keys on public websites now silently authenticate to Gemini. One developer was billed $82,314 in 48 hours. Google's initial response: "Intended Behavior."
7/10
2,863 exposed Google API keys on public websites now silently authenticate to Gemini, with one developer billed $82K in 48 hours; Google initially classified this as intended behavior. A critical API key exposure issue specific to AI services with real financial and security consequences.
reddit 2026-03-10 6 min
Promptfoo Is Joining OpenAI
7/10
Promptfoo, the leading open-source LLM red-teaming and evaluation framework, is joining OpenAI. Significant for the AI security/safety ecosystem as a key independent tool gets absorbed by a major lab.
hackernews 2026-03-10 5 min
The L in "LLM" Stands for Lying
6/10
High-engagement blog post (472 HN comments) making a technical argument about LLM hallucination and deception as structural properties rather than bugs — useful framing for practitioners but not novel research.
hackernews 2026-03-10 8 min
Open-source AI coding agent skill that finds and fixes infra security misconfigs
6/10
Open-source AI coding agent skill that detects and fixes infrastructure security misconfigurations. Practical application of AI agents to security hardening workflows.
hackernews 2026-03-10 5 min
Remove invisible AI watermarks from Gemini images using reverse alpha math
6/10
Tool that removes invisible SynthID-style AI watermarks from Gemini-generated images using reverse alpha channel math. Highlights fragility of current AI watermarking approaches.
hackernews 2026-03-10 3 min
Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA
6/10
Arbitrary-Rank Ablation (ARA) is a new fine-tuning-based decensoring method that dramatically reduces refusals in open-source models, outperforming prior abliteration techniques. Relevant to alignment robustness research and the ongoing arms race between safety training and circumvention.
reddit 2026-03-10 5 min
Threat actors are using fake Claude Code download pages to deploy a fileless infostealer via mshta.exe — developers should be aware
6/10
Active malvertising campaign uses fake Claude Code download portals (via hijacked Google Ads) to deliver a fileless infostealer via mshta.exe, specifically targeting developers. Directly relevant to AI tooling supply chain security.
reddit 2026-03-10 4 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative relevance score across all sources.

Top Authors
#1
multimodalart
2 items · avg 4.5/10
9.0
#2
prithivMLmods
2 items · avg 4.0/10
8.0
#3
7.0
#4
7.0
#5
7.0
#6
Bingxiang He
1 item · avg 7.0/10
7.0
Top Organizations
#1
ruvnet
4 items · avg 5.8/10
23.0
#2
karpathy
2 items · avg 8.0/10
16.0
#3
alibaba
2 items · avg 6.5/10
13.0
#4
anthropics
2 items · avg 6.5/10
13.0
#5
666ghj
2 items · avg 4.5/10
9.0
#6
KeygraphHQ
1 item · avg 8.0/10
8.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Memory Coach
A developer toolkit that implements RetroAgent-style dual intrinsic feedback loops for LLM agents — combining numerical task-completion signals with distilled language lessons stored in a retrievable memory buffer. Unlike static prompt engineering, agents continuously improve from their own failures without retraining. Build this as an open-source middleware layer that wraps existing agent frameworks like LangChain or AutoGen.
Customer support agents that learn from failed resolutions Coding assistants that accumulate project-specific lessons Game-playing agents for procedurally generated environments Enterprise workflow automation with self-improving task routing
https://arxiv.org/abs/2603.08561v1
AI Security Sandbox
A locally deployable, sandboxed platform for autonomous vulnerability discovery and patching, inspired by OSS-CRS liberating DARPA AIxCC systems for open-source use. The platform would let security teams run AI-powered fuzzing and patch generation against their own codebases without cloud dependencies or data exposure risks. Critical guardrails — isolated execution, audit logs, and human-in-the-loop patch approval — address the reward hacking behaviors documented in PostTrainBench.
Open-source project security auditing Enterprise pre-release vulnerability scanning Security research and red-teaming labs CI/CD pipeline integration for automated CVE detection
https://arxiv.org/abs/2603.08566v1 https://arxiv.org/abs/2603.08640v1
Grow-Not-Overwrite Finetuner
A fine-tuning service built on the 'Grow, Don't Overwrite' function-preserving expansion method, allowing teams to adapt foundation models to new tasks without catastrophic forgetting of base capabilities. Users upload their dataset, the service expands only a small subset of layers with mathematically equivalent initialization, and returns a model that excels at the new task while retaining original behavior. This directly solves the painful trade-off practitioners face between specialization and generality.
Domain-specific LLM adaptation for legal, medical, or finance verticals Continual learning pipelines for production models receiving new data Multi-task model serving without maintaining separate model copies Enterprise model customization without full retraining costs
https://arxiv.org/abs/2603.08647v1
Black-Box Model Auditor
A SaaS tool that uses the UNBOX approach — combining LLMs and text-to-image diffusion — to audit any vision API for learned biases and failure modes without requiring model access, gradients, or training data. Teams point it at a black-box API endpoint and receive human-readable reports describing what concepts the model has learned, where it underperforms, and what demographic or contextual biases exist. This fills a critical compliance gap as AI regulation tightens globally.
AI vendor due diligence for enterprise procurement Regulatory compliance auditing for deployed vision systems Bias detection in hiring, lending, or healthcare screening tools Competitive model benchmarking without white-box access
https://arxiv.org/abs/2603.08639v1
Adaptive Token Budget
A drop-in inference middleware inspired by CODA that dynamically allocates reasoning compute based on estimated query difficulty — routing easy requests through fast shallow paths and reserving extended chain-of-thought for genuinely hard problems. With 60%+ token cost reduction on easy tasks documented in research, this translates directly to API cost savings at scale. Build it as a proxy layer compatible with OpenAI, Anthropic, and open-source model APIs.
High-volume customer-facing chatbots with mixed query complexity Enterprise RAG pipelines where most queries are factual lookups Coding assistants balancing autocomplete vs. architectural reasoning Cost optimization layer for AI startups with tight inference budgets
https://arxiv.org/abs/2603.08659v1 https://arxiv.org/abs/2603.08660v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1
Visual Translate by Vozo
Translate text in your videos without recreating visuals
SaaS Artificial Intelligence Video
176
80
https://www.producthunt.com/r/WDYAI...
#2
Chronicle 2.0
AI presentations without the AI slop
Productivity Artificial Intelligence Design
142
76
https://www.producthunt.com/r/CNILD...
#3
sitefire.ai
Marketing suite for the agentic web
Public Relations Marketing SEO
112
32
https://www.producthunt.com/r/O5UCC...
#4
Claude Code Review
Multi-agent review catching bugs early in AI-generated code
Developer Tools Artificial Intelligence Development
104
5
https://www.producthunt.com/r/TX3N3...
#5
Fish Audio S2
Real Expressive AI Voices
Open Source Artificial Intelligence GitHub
101
12
https://www.producthunt.com/r/SZPNS...
#6
Your Next Store
AI-first platform for building commerce stores, fast
SaaS Artificial Intelligence E-Commerce
98
7
https://www.producthunt.com/r/JSAOC...
#7
Spine Swarm
Manage a team of AI agents that do real work
Productivity Artificial Intelligence Tech
89
12
https://www.producthunt.com/r/X5DFJ...
#8
CodeGuide
Generate PRDs, specs and wireframes your AI understands.
Productivity Developer Tools Artificial Intelligence
87
4
https://www.producthunt.com/r/WJ3IG...
#9
Sonarly
The AI that fixes prod autonomously
Software Engineering Developer Tools Artificial Intelligence
87
16
https://www.producthunt.com/r/RUAT5...
#10
MacQuit
Quit all running Mac apps in one click from your menu bar
Mac Productivity Menu Bar Apps
84
2
https://www.producthunt.com/r/AACQE...
View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending and TrendShift, enriched with commit velocity and contributor activity.

1
GH Trending
KeygraphHQ/shannon
typescript 32,992 3,287 6,892 stars this week
Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% (100/104 exploits) on the XBOW benchmark without hints — a strong SOTA result for automated vulnerability exploitation. Gaining 6,892 stars this week signals significant practitioner interest.
Build idea
A continuous security testing SaaS that automatically runs Shannon against staging environments on every deployment, delivering prioritized vulnerability reports to dev teams before code reaches production.
19 issues
2
TrendShift
karpathy/autoresearch
Python 8,700 1,200
Karpathy's new project using AI agents to autonomously run ML research experiments on single-GPU nanochat training — meta-AI doing AI research. Rapidly gained 19K stars in days, signaling high interest in automated research loops.
Build idea
A managed AI research acceleration platform where ML teams submit hypotheses and receive back fully run experiment results, ablations, and findings — compressing weeks of GPU experimentation into hours.
69 issues
3
TrendShift
karpathy/nanochat
Python 45,400 6,000
Karpathy's minimal ChatGPT-quality model trainable for ~$100, with 45K+ stars. Democratizes LLM training research and serves as the substrate for autoresearch experiments.
Build idea
A turnkey fine-tuning service for startups that lets them train a proprietary, domain-specific chat model on their own data for under $200, delivered as a deployable API endpoint.
26 commits/mo 71 issues
4
GH Trending
LMCache/LMCache
python 7,602 987 417 stars this week
LMCache provides a high-performance KV cache layer for LLMs, enabling faster inference by caching and reusing KV states across requests. With 7.6K stars and active development, it's becoming a serious infrastructure component for LLM serving.
Build idea
A drop-in LLM inference optimization layer sold to enterprises running self-hosted models, reducing GPU costs and latency by intelligently caching KV states across repeated or similar prompts.
246 issues
5
GH Trending
alibaba/OpenSandbox
python 7,308 540 3,262 stars this week
Alibaba's general-purpose sandbox platform for AI applications supporting multi-language SDKs, Docker/Kubernetes runtimes, and use cases including coding agents, GUI agents, agent evaluation, and RL training. Addresses a real infrastructure gap for safe agent execution with 7.3K stars and 3,262 new this week.
Build idea
A cloud-hosted secure sandbox API service for AI agent developers that provides isolated, metered execution environments — billed per agent run — eliminating the infrastructure burden of safely running untrusted AI-generated code.
120 commits/mo 52 issues
6
TrendShift
anthropics/claude-code
Shell 75,700 6,100
Anthropic's official agentic coding CLI with 76K+ stars, enabling natural language control of codebases including git workflows and complex refactoring. The dominant terminal-based coding agent with continued active development.
Build idea
A managed enterprise coding agent platform built on Claude Code that integrates with corporate SSO, audit logging, and internal codebases, giving large engineering teams a governed, policy-compliant AI coding assistant.
38 commits/mo 5869 issues
7
GH Trending
block/goose
rust 32,751 3,013 637 stars this week
Open-source, extensible AI agent built in Rust that goes beyond code suggestions to install, execute, edit, and test with any LLM. Strong traction with 32K+ stars and 256 commits last month signals active production use.
Build idea
A no-code workflow builder for non-technical business users that wraps Goose agents to automate repetitive software operations — like data pipeline maintenance or report generation — without requiring engineering involvement.
256 commits/mo 378 issues
8
GH Trending
inclusionAI/AReaL
python 4,596 382 991 stars this week
Fast reinforcement learning framework for LLM reasoning and agents, gaining 991 stars in a week. Targets single-GPU RL training for reasoning models, filling a gap between research and accessible RL fine-tuning.
Build idea
A fine-tuning SaaS that lets companies improve the reasoning capabilities of their private LLMs using reinforcement learning on domain-specific problem sets, accessible on a single GPU with no ML expertise required.
34 issues
9
GH Trending
openai/codex
rust 64,315 8,566 1,536 stars this week
OpenAI's official lightweight coding agent that runs in the terminal, built in Rust with 64k+ stars and active development (667 commits last month). Represents OpenAI's open-source push into agentic coding tools competing with Claude Code.
Build idea
A developer productivity analytics platform that wraps Codex CLI to track, audit, and benchmark AI-assisted coding activity across engineering teams, providing ROI metrics and security oversight for CTOs.
667 commits/mo 1780 issues
10
TrendShift
ruvnet/RuView
Rust 31,800 4,200
RuView applies WiFi signal processing and DensePose-style models to achieve real-time human pose estimation, vital sign monitoring, and presence detection using only commodity WiFi hardware — no cameras required. Privacy-preserving sensing with significant surveillance and health monitoring implications.
Build idea
A privacy-first elder care monitoring subscription service that uses existing home WiFi routers to detect falls, monitor breathing, and track activity patterns — alerting caregivers without installing any cameras.
36 issues

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following.

1
Benson Wong · Tailscale and Elethink
@mostlygeek 274 113 repos
mostlygeek/llama-swap
Go 2,733 202
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
2
Nathan Brake · @mozilla.ai
@njbrake 296 50 repos
Machine Learning at Mozilla.ai
njbrake/agent-of-empires
Rust 1,063 81
Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI Coding Agent Terminal Session manager via tmux and git Worktrees
3
Brady Gaster
@bradygaster 863 94 repos
Brady Gaster is a PM Architect in the CoreAI division at Microsoft where he works on Apps, Agents, MIDI, and most recently, Squad
bradygaster/squad
TypeScript 730 96
Squad: AI agent teams for any project
4
David East · @google-labs-code
@davideast 2,895 106 repos
Working on @google-labs-code. Stitch and Jules <3
davideast/stitch-mcp
TypeScript 376 46
A CLI for moving AI-generated UI designs from Google’s Stitch platform into your development workflow.
5
Lukas Masuch · Snowflake
@lukasmasuch 1,346 72 repos
lukasmasuch/best-of-ml-python
23,301 3,104
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
6
qixing-jk
@qixing-jk 77 63 repos
qixing-jk/all-api-hub
TypeScript 1,972 112
一站式管理 New API 兼容中转站账号:余额/用量看板、自动签到、密钥一键导出到常用应用、网页内 API 可用性测试、渠道与模型同步/重定向 | New‑API relay manager: balance/usage, auto check‑in, one‑click key export to popular clients, in‑page API checks, channel/model sync & redirect
7
Saúl Ibarra Corretgé · @jitsi / @8x8
@saghul 1,959 164 repos
Fellow Jitster
saghul/txiki.js
C 2,970 200
A tiny JavaScript runtime
8
Austin Griffith
@austintgriffith 2,646 202 repos
👩‍🎤 builder on Ethereum
austintgriffith/ethskills
HTML 104 18
The missing knowledge between AI agents and production Ethereum.
9
Andy Anderson · ibm.com
@clubanderson 79 98 repos
Platform Engineering | Kubernetes | AI | Software Architect
clubanderson/clubTivi
Dart 2 2
Open-source cross-platform IPTV player with intelligent EPG mapping, multi-provider stream failover, and remote control support. Built with Flutter.
10
Gunnar Morling · Confluent
@gunnarmorling 2,586 304 repos
Technologist @ Confluent · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴
gunnarmorling/1brc
Java 7,971 2,209
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
11
Elie Habib
@koala73 1,880 19 repos
koala73/worldmonitor
TypeScript 35,107 5,898
Real-time global intelligence dashboard — AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface
12
sigoden
@sigoden 1,098 84 repos
sigoden/aichat
Rust 9,515 626
All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.
13
Teng Lin · XtalPi Inc.
@teng-lin 206 4 repos
teng-lin/notebooklm-py
Python 4,478 547
Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.
14
Yair Morgenstern
@yairm210 2,209 42 repos
yairm210/Unciv
Kotlin 10,152 1,788
Open-source Android/Desktop remake of Civ V

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week.

Arena Leaderboard — Top 15
#ModelTypeEloVotes
1 claude-opus-4-6 Anthropic Closed 1504 9,170
2 claude-opus-4-6-thinking Anthropic Closed 1502 8,313
3 gemini-3.1-pro-preview Google Closed 1500 4,041
4 grok-4.20-beta1 xAI Closed 1491 5,280
5 gemini-3-pro Google Closed 1485 39,923
6 gpt-5.4-high OpenAI Closed 1479 3,503
7 gpt-5.2-chat-latest-20260210 OpenAI Closed 1479 5,786
8 gemini-3-flash Google Closed 1473 30,600
9 grok-4.1-thinking xAI Closed 1473 39,309
10 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1470 32,516
11 claude-opus-4-5-20251101 Anthropic Closed 1467 37,462
12 dola-seed-2.0-preview Bytedance Closed 1465 6,712
13 grok-4.1 xAI Closed 1462 43,536
14 gemini-3-flash (thinking-minimal) Google Closed 1462 22,846
15 gpt-5.4 OpenAI Closed 1457 3,417
New & Trending Models
openai/gpt-oss-20b
7,401,682 downloads 4,443 likes 24 trending
Open Source 2025-08-04
OpenAI's open-source 20B parameter model released under Apache 2.0 with 7.4M downloads and 4.4K likes — a significant move toward open-weight releases from OpenAI, with vLLM support and quantization options (8-bit, mxfp4).
zai-org/GLM-5
234,052 downloads 1,762 likes 80 trending
Open Source 2026-02-11
GLM-5 is ZAI's flagship next-generation language model with strong trending metrics (80 trending score, 1762 likes), featuring a novel MoE DSA architecture. Represents a significant open-weight model release competing at the frontier level with MIT licensing.
sarvamai/sarvam-105b
2,048 downloads 198 likes 198 trending
Open Source 2026-03-03
Sarvam AI releases a 105B parameter model supporting 22+ Indian languages under Apache 2.0, using a custom MLA architecture — a significant open multilingual model targeting underserved South Asian language communities.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
22,984 downloads 336 likes 304 trending
Open Source 2026-02-27
Qwen3.5-27B distilled from Claude 4.6 Opus reasoning traces to transfer chain-of-thought capabilities to a smaller open model, with 336 likes and strong download numbers. Represents the growing trend of distilling frontier model reasoning into accessible open weights.
Qwen/Qwen3-Coder-Next
1,181,726 downloads 1,102 likes 57 trending
Open Source 2026-01-30
Qwen3-Coder-Next is Qwen's latest coding-focused model with 1.18M downloads and 1,102 likes, suggesting strong community adoption. Minimal public documentation but download velocity indicates it's a meaningful coding model update.
sarvamai/sarvam-30b
4,221 downloads 137 likes 137 trending
Open Source 2026-03-03
Sarvam AI's 30B MoE model for 22+ Indian languages, companion to the 105B dense model — provides a more accessible inference option for the same multilingual coverage.
stepfun-ai/Step-3.5-Flash
132,663 downloads 700 likes 27 trending
Open Source 2026-02-01
StepFun's Step-3.5-Flash is a fast inference model with 132K downloads and Apache 2.0 license, backed by multiple arXiv papers — a competitive open-weight model worth benchmarking for latency-sensitive applications.
tencent/Penguin-VL-8B
576 downloads 43 likes 43 trending
Open Source 2026-03-05
Tencent's Penguin-VL-8B is a vision-language model built on Qwen3-8B with a custom vision encoder and arXiv paper — a new competitive open VLM in the 8B class worth benchmarking against InternVL and Qwen-VL.
zai-org/GLM-4.7-Flash
1,739,776 downloads 1,603 likes 22 trending
Open Source 2026-01-19
GLM-4.7-Flash is a lightweight MoE-based bilingual (EN/ZH) text generation model from ZAI with 1.7M downloads, designed for fast inference. Represents the efficient end of the GLM-4 family with MIT license.
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
70,471 downloads 150 likes 128 trending
Open Source 2026-02-27
GGUF quantized version of the Claude 4.6 Opus reasoning-distilled Qwen3.5-27B for local inference, with 70k downloads. Enables running frontier-distilled reasoning locally.
LiquidAI/LFM2-24B-A2B
20,505 downloads 279 likes 41 trending
Custom License 2026-02-24
LiquidAI's LFM2-24B-A2B is a 24B MoE model with only 2B active parameters, supporting 10 languages and targeting edge deployment. Interesting architecture for efficient multilingual inference.
MiniMaxAI/MiniMax-M2.5
448,370 downloads 1,143 likes 75 trending
Custom License 2026-02-12
MiniMax-M2.5 is a large-scale model with 448k downloads and 1,143 likes, available with FP8 support and Azure deployment. Limited public documentation makes it hard to assess technical novelty.
allenai/Olmo-Hybrid-7B
17,288 downloads 44 likes 44 trending
Open Source 2026-01-28
AllenAI's OLMo-Hybrid-7B is a fully open 7B model with a hybrid architecture, continuing the OLMo line of transparent research models. Relevant for researchers needing fully open (weights + data + code) baselines.
stepfun-ai/Step-3.5-Flash-Base
615 downloads 78 likes 33 trending
Open Source 2026-03-02
Base (pre-instruction-tuned) version of Step-3.5-Flash — useful for fine-tuning experiments but lower immediate impact than the instruct variant.
tencent/Penguin-VL-2B
213 downloads 21 likes 21 trending
Open Source 2026-03-05
Tencent's 2B vision-language model built on Qwen3-1.7B with a custom vision encoder — a compact VLM with an arXiv paper, interesting for edge deployment scenarios.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live.

UGI Leaderboard
DontPlanToEnd
docker 1,569 39
apache-2.0
Leaderboard tracking 'uncensored' model capabilities — useful for researchers studying alignment and refusal behavior, though the framing is more community-driven than rigorous research.
ALL Bench Leaderboard
FINAL-Bench
static 47 47
apache-2.0
A consolidated leaderboard aggregating multiple benchmarks for model comparison — useful reference but derivative of existing evaluation infrastructure.
Omni Video Factory
FrameAI4687
gradio 468 178
mit
A Gradio demo combining text-to-video, image-to-video, and video extension capabilities — a convenience wrapper around existing video generation models with no novel technical contribution.
The Synthetic Data Playbook: Generating Trillions of the Finest Tokens
HuggingFaceFW
docker 126 126
HuggingFace FineWeb team's interactive playbook on generating high-quality synthetic training data at trillion-token scale — directly actionable guidance for practitioners building pretraining datasets.
faster-qwen3-tts
HuggingFaceM4
docker 141 74
Optimized TTS demo using Qwen3-based speech synthesis with speed improvements — demonstrates faster inference for the Qwen3 TTS pipeline.
LTX 2.3 Distilled
Lightricks
gradio 56 56
Lightricks releases LTX 2.3 Distilled, a distilled video generation model — distillation for video generation is technically interesting as it reduces inference cost while maintaining quality.
Qwen3-TTS Demo
Qwen
gradio 1,655 51
apache-2.0
Official demo for Qwen3-TTS, Alibaba's text-to-speech model with 1.6K likes — a capable open TTS system worth evaluating for voice applications.
Wan2.2 Animate
Wan-AI
gradio 4,905 46
apache-2.0
Wan2.2 Animate demo with nearly 5K likes — one of the more popular open video generation models, useful for benchmarking against commercial alternatives.
FLUX.2 [Klein] 9B
black-forest-labs
gradio 638 42
Black Forest Labs' FLUX.2 Klein 9B image generation demo — a smaller, faster variant of the FLUX.2 family for accessible high-quality image synthesis.
Free Unlimited Google Veo 3
deddytoyota
static 54 36
Unofficial wrapper claiming free access to Google Veo 3 with NSFW framing — likely a scraper or misleading demo, not a legitimate technical resource.
Flux2 Klein Face Swap
linoyts
gradio 96 35
Face swap application built on FLUX.2 Klein 9B with LoRA — a derivative application demo with limited technical novelty.
TRELLIS.2
microsoft
gradio 1,215 56
mit
Microsoft's TRELLIS.2 generates high-fidelity 3D assets from images with 1.2K likes — a strong open 3D generation model from a major lab worth tracking for 3D content pipelines.
Z Image Turbo
mrfakename
gradio 2,506 77
Z Image Turbo demo with 2.5K likes suggests strong community interest in this fast image generation space — likely a distilled or optimized image model but details are sparse.
Nano Banana PRO
multimodalart
gradio 580 35
mit
HuggingFace PRO-gated demo space — limited public technical value without PRO access, unclear what model or technique underlies it.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 1,874 89
Demo using Qwen's vision model to generate multiple-angle views with 3D camera control from a single image — 1.9K likes indicates strong interest in this novel multiview generation capability.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-03-10
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Common Corpus is presented at ICLR 2026 as the largest openly licensed pre-training dataset for LLMs, directly addressing legal and copyright concerns in training data. Important for the open-source LLM ecosystem needing legally clean training corpora.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-03-10
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
MedAraBench introduces a large-scale Arabic medical QA benchmark at ICLR 2026, addressing a significant gap in multilingual medical NLP evaluation. Useful contribution for underrepresented language research but narrow scope.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-03-10
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
ICLR 2026 paper providing theoretical analysis of transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, advancing understanding of in-context learning mechanisms. Contributes to foundational theory of why transformers generalize.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-03-10
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Task Tokens introduces a flexible conditioning mechanism for adapting transformer-based behavior foundation models in humanoid robotics without full retraining. Practical approach to multi-task adaptation in embodied AI at ICLR 2026.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-03-10
Submodular Function Minimization with Dueling Oracle
Theoretical ICLR 2026 paper on submodular function minimization using a dueling/pairwise comparison oracle. Highly specialized mathematical optimization work with limited direct ML practitioner relevance.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-03-10
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
ICLR 2026 benchmark evaluating MLLMs on scan-oriented academic paper reasoning, distinguishing between search (finding specific facts) and scan (holistic document understanding) tasks. Highlights a meaningful capability gap in current multimodal models for research automation.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-03-10
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
ICLR 2026 paper proposing N-th order recursive consistent velocity field estimation for any-step generation, simplifying few-step generative model training by removing complex multi-component losses. Advances consistency model efficiency.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-03-10
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT (Masked Skill Token Training) is a fully offline hierarchical RL framework for transferring policies across environments with different dynamics, using masked skill tokens to bridge the sim-to-real gap. Addresses a core challenge in embodied AI deployment.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-03-10
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
ICLR 2026 paper establishing high-probability convergence and generalization bounds for SGD with momentum in non-convex settings, filling a theoretical gap relevant to deep learning optimization. Important theoretical foundation but limited immediate practitioner impact.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-03-10
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG trains embedders using value-based RL to support multi-step retrieval for complex long-context QA, going beyond single-step RAG limitations. Novel application of RL to retrieval training at ICLR 2026 with clear practical impact on multi-hop reasoning.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-03-10
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
ICLR 2026 paper proposing cross-lingual alignment techniques to improve semantic proximity in multilingual information retrieval, addressing the gap between query and document languages in CLIR tasks.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-03-10
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5 Pro and others on standard computer vision tasks, revealing where frontier multimodal models actually stand versus specialized CV systems — important for practitioners choosing models for vision pipelines.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-03-10
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous representations for variable-cardinality set prediction using neural fields and flow matching, enabling object detection and molecular modeling without fixed-size output assumptions.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-03-10
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL proposes randomized path sampling in wavelet scattering transforms to reduce computational cost while preserving perceptual quality gradients for audio/vision inverse problems.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-03-10
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for probabilistic rare-event forecasting in imbalanced multivariate time-series, targeting safety-critical applications.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-03-10
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to classical numerical methods — useful for scientific computing applications.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-03-10
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic space learning and multi-modal fusion to enzyme classification, better capturing hierarchical EC number relationships than Euclidean embedding approaches.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-03-10
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality limitations of purely autoregressive interleaved audio-text generation.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-03-10
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K is a benchmark for proactive and personalized mobile GUI agents that act without explicit instructions, using contextual user history — a step toward more autonomous on-device AI assistants.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-03-10
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides a rigorous physical-systems analysis of multi-resolution hash encoding's spatial kernel (used in NeRF/Instant-NGP), replacing heuristic hyperparameter tuning with principled design.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 387 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

387+ research items ready to explore