Weekly Intelligence

AI Quick Bites

March 24, 2026 · 379 items from 14 sources

Last refreshed: March 24, 2026 at 08:47 UTC
Next refresh: March 30, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week — selected from 379 items across 14 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
ACPO directly fixes a fundamental flaw in DPO for vision-language models (Visual Anchor Collapse) that causes hallucinations, with a simple asymmetric gradient scaling fix that consistently improves hallucination benchmarks.
arxiv 2026-03-24 18 min
03
Chimera's 1.2-2.4x latency reduction for heterogeneous multi-agent LLM serving is immediately actionable for teams running production agentic pipelines on mixed model clusters.
arxiv 2026-03-24 18 min
04
Gumbel Distillation offers a principled, model-agnostic way to close the quality gap between parallel and autoregressive language models — a 30% MAUVE improvement is a meaningful step toward viable non-AR generation.
arxiv 2026-03-24 18 min
05
The first theoretical proof that confidence-based decoding for diffusion LMs is provably efficient gives formal backing to a widely-used empirical heuristic and guides future DLM decoding design.
arxiv 2026-03-24 20 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training
8/10
Replication and extension of the RYS method showing that duplicating 3-4 contiguous 'reasoning circuit' layers in a 24B LLM boosts logical deduction from 0.22 to 0.76 with no training, on consumer AMD GPUs. Provides mechanistic interpretability evidence for discrete cognitive circuits in transformers and a no-training capability boost.
hackernews 2026-03-24 10 min
Paper: Detecting hallucinations before the first token
8/10
Research on detecting hallucinations in transformer LLMs before the first token is generated by analyzing pre-generative epistemic signals — a potentially significant advance in hallucination mitigation that could enable proactive refusal or uncertainty flagging rather than post-hoc detection.
hackernews 2026-03-24 20 min
Show HN: FireClaw – Open-source proxy defending AI agents from prompt injection
7/10
Open-source security proxy that sits between AI agents and the web, running a 4-stage pipeline to prevent prompt injection attacks from malicious web content before they reach the agent. Proactive prevention rather than post-hoc detection is a meaningful architectural distinction.
hackernews 2026-03-24 5 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
Introduces honesty fine-tuning to make LLMs self-report hidden objectives when interrogated, directly tackling the deceptive alignment problem in agentic AI — important safety research for auditing capable AI systems.
conferences 2026-03-24 20 min
Anthropic's Claude Code CLI had a workspace trust bypass (CVE-2026-33068). Repository settings loaded before trust dialog. Classic configuration loading order bug in an AI developer tool
7/10
CVE-2026-33068 (CVSS 7.7) in Anthropic's Claude Code CLI: repository-level settings including a `bypassPermissions` field were loaded before the workspace trust dialog, allowing a malicious repo to silently pre-approve dangerous operations like file writes and command execution. A concrete, high-severity vulnerability in an AI coding agent with broad system access.
reddit 2026-03-24 5 min
ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints
6/10
ACPO addresses likelihood displacement in DPO for vision-language models by applying asymmetric scaling to the rejected reward, preventing Visual Anchor Collapse where models ignore visual evidence in favor of language priors. Shows consistent improvements on hallucination benchmarks over standard DPO.
arxiv 2026-03-24 18 min
Medical AI gets 66% worse when you use automated labels for training, and the benchmark hides it! [R][P]
6/10
Research finding that automated labels amplify bias by ~40% in medical segmentation models for breast cancer, with performance degrading 66% for younger patients — highlights a critical but underreported data quality issue in medical AI benchmarking.
reddit 2026-03-24 8 min
Study of 2.4M workers finds 96% of permissions unused, a manageable problem until AI agents start running 24/7 with the same access
6/10
Research on 2.4M workers showing 96% of permissions go unused — a manageable human problem that becomes critical when AI agents run 24/7 with the same over-provisioned access. Highlights the least-privilege gap as a key AI agent security risk.
reddit 2026-03-24 5 min
2% of ICML papers desk rejected because the authors used LLM in their reviews
5/10
ICML officially desk-rejected 2% of submitted papers for authors using LLMs in peer reviews, marking a significant enforcement action on AI-assisted reviewing policies at a top ML venue. Important precedent for academic integrity in the AI research community.
hackernews 2026-03-24 5 min
I built a runtime guardrail that stops AI agents from doing dumb things
5/10
MoltGuard is a runtime guardrail tool that intercepts and blocks dangerous AI agent tool calls before execution, claiming 16K+ downloads. Addresses a real problem in agentic AI safety but limited technical detail in the post.
hackernews 2026-03-24 3 min
Are developers trusting AI-generated code too much?
5/10
Developer built a proxy to detect security issues in AI-generated code including hardcoded secrets, unsafe patterns, and prompt injection hidden in comments. Raises valid concern about over-trust in AI codegen but light on technical depth.
hackernews 2026-03-24 3 min
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
5/10
Proposes cross-lingual alignment improvements for multilingual information retrieval, addressing semantic proximity gaps between query and document languages — solid but incremental work in CLIR.
conferences 2026-03-24 20 min
[D] ICML rejects papers of reviewers who used LLMs despite agreeing not to
5/10
ICML 2026 reportedly rejected all papers submitted by reviewers who used LLMs for reviews despite opting into a no-LLM track — a notable enforcement action raising questions about AI detection reliability and academic integrity policy.
reddit 2026-03-24 3 min
Anthropic for Science Blog
5/10
Anthropic launches a dedicated science research blog, signaling increased focus on publishing AI safety and interpretability research. Organizational move worth tracking for future technical output.
hackernews 2026-03-24 3 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
Top Organizations
#1
garrytan
3 items · avg 5.7/10
17.0
#2
anthropics
3 items · avg 5.3/10
16.0
#3
karpathy
2 items · avg 8.0/10
16.0
#4
browser-use
2 items · avg 7.0/10
14.0
#5
bytedance
2 items · avg 7.0/10
14.0
#6
langchain-ai
2 items · avg 7.0/10
14.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Knowledge Persistence Layer
Build a structured knowledge-sharing system for AI coding agents — a 'Stack Overflow for agents' — where agents can log, query, and reuse learned gotchas, solutions, and domain-specific patterns across sessions and model instances. This directly addresses the stateless amnesia problem where every agent run rediscovers the same errors. The product would expose a lightweight API that any agent (Claude Code, Codex, custom) can call to push and pull verified knowledge units.
Claude Code / Codex workflow persistence across projects Enterprise codebases where agents repeatedly hit the same internal API quirks Quantum or domain-specific coding environments where LLMs lack training coverage Multi-agent pipelines sharing learned context across specialized sub-agents
https://blog.mozilla.ai/cq-stack-overflo... https://arxiv.org/abs/2603.22184v1
Cardiac AI Second Opinion
Productize the MARCUS architecture as a consumer or clinical-facing cardiac diagnostic assistant that accepts ECG images, echocardiograms, or CMR scans and returns structured diagnostic summaries with confidence scores. Given MARCUS outperforms GPT-4o and Gemini 2.5 Pro by 34-45% on cardiac tasks, there is a clear moat in domain-specific agentic VLMs over generalist models. The product targets cardiologists needing rapid second opinions, rural clinics with limited specialist access, and remote patient monitoring platforms.
Clinical decision support for cardiologists Remote and rural telehealth cardiac screening Insurance pre-authorization automation for cardiac procedures Wearable ECG data interpretation pipelines
https://arxiv.org/abs/2603.22179v1
LLM Judge Config Advisor
Build a developer tool that recommends the optimal LLM-as-judge configuration — model choice, prompt template, and task category — based on the evaluation task a team describes. The research benchmarking 37 LLMs across 5 prompts and 8 task categories provides a concrete empirical foundation, and teams currently waste significant time and money misconfiguring evaluators. The tool would expose a simple interface: describe your eval task, get a ranked recommendation with expected human-correlation scores and cost estimates.
AI teams setting up automated eval pipelines Fine-tuning workflows needing reliable reward signal RAG system quality monitoring Agentic pipeline output validation
https://arxiv.org/abs/2603.22214v1 https://blog.icml.cc/2026/03/18/on-viola...
Single-GPU High-Rank Fine-Tuner
Package the Scaling DoRA optimizations — factored norm computation and fused Triton kernels — into a user-friendly fine-tuning toolkit that enables high-rank (256-384) LoRA-style adaptation of 8-32B VLMs on a single consumer or prosumer GPU. The 1.5-2.7x speedup and up to 7GB VRAM reduction make previously infeasible fine-tuning runs accessible to individual researchers and small teams. Wrap this in a CLI and Python API with sensible defaults, integrated with Hugging Face and llamafile for local deployment.
Individual researchers fine-tuning large VLMs without cloud GPU budgets Domain-specific VLM adaptation for medical, legal, or scientific imaging Rapid prototyping of instruction-tuned models for startups Local enterprise fine-tuning with data privacy constraints
https://arxiv.org/abs/2603.22276v1 https://blog.mozilla.ai/llamafile-reload...
Autonomous Mobile QA Agent
Build a productized autonomous QA agent for mobile apps that uses a VLM to interpret screenshots, plan test flows, and execute UI interactions — turning natural language test descriptions into repeatable regression suites. The community is already hand-rolling this with Claude, but there is no polished product that handles the full loop: test spec input, device farm integration, screenshot-grounded failure reporting, and CI/CD hooks. This fills a real gap between manual QA and brittle Appium-style scripted tests.
Mobile app regression testing in CI/CD pipelines Accessibility compliance verification across device sizes Startup QA without dedicated mobile test engineers Cross-platform iOS and Android parity validation
https://christophermeiklejohn.com/ai/zab... https://arxiv.org/abs/2603.22169v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1
Cekura
Observe and analyze your voice and chat AI agents
SaaS Developer Tools Audio
95
12
https://www.producthunt.com/r/BVQAP...
#2
Kitty Points Leaderboard
Find interesting community members and see how you stack up
Product Hunt
93
3
https://www.producthunt.com/r/UA5BM...
#3
Claude Computer Use
Enable Claude to use your computer to complete tasks
Productivity Task Management Artificial Intelligence
91
2
https://www.producthunt.com/r/AYTPI...
#4
Agent Hub Builder
Build a Netflix-style library of AI-powered tools to sell
Artificial Intelligence No-Code Online Learning
87
10
https://www.producthunt.com/r/QXVYV...
#5
jared.so
Your AI employee that delivers. Every day.
Productivity Artificial Intelligence Business
87
3
https://www.producthunt.com/r/7P5NY...
#6
Drift: Claude Code for robot simulations
AI agent to run robot simulations 10x faster and reliably.
Robots Developer Tools Artificial Intelligence
79
4
https://www.producthunt.com/r/W65GW...
#7
Google Gemini in Chrome
Turn your browser into an AI workspace
Productivity Artificial Intelligence Tech
77
1
https://www.producthunt.com/r/3VDHS...
#8
Ordo: Save, Organise & Rediscover.
Finally a saving app that works.
Productivity Social Media Artificial Intelligence
74
2
https://www.producthunt.com/r/SJFYA...
#9
TypeScript 6.0
The last TypeScript release built on JavaScript
Open Source GitHub Development Language
73
1
https://www.producthunt.com/r/INYIH...
#10
Jotform AI
Build forms faster with Jotform AI
Productivity Artificial Intelligence No-Code
72
1
https://www.producthunt.com/r/VAASW...
View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
TrendShift
karpathy/autoresearch
Python 50,800 7,100
Karpathy's autoresearch uses AI agents to autonomously run ML research experiments on a single GPU — closes the loop between hypothesis generation and empirical validation for nanochat training, 53k stars and a genuine research automation milestone.
Build idea
A SaaS platform for ML teams that autonomously runs hyperparameter searches, ablation studies, and model comparisons overnight on rented GPUs, delivering a structured research report by morning.
155 issues
2
TrendShift
browser-use/browser-use
Python 83,500 9,700
Browser-use makes websites accessible to AI agents for task automation, now at 84k stars — one of the most widely adopted browser-control frameworks for LLM agents.
Build idea
A no-code workflow automation tool for SMBs that lets non-technical users describe repetitive web tasks in plain English — like pulling competitor prices or filling procurement forms — and executes them automatically via AI browser agents.
219 issues
3
TrendShift
bytedance/deer-flow
Python 39,400 4,600
ByteDance's open-source SuperAgent framework with sandboxes, memory, tools, and subagents for long-horizon research and coding tasks — 41k stars and actively maintained.
Build idea
An enterprise AI research assistant service where companies subscribe to a managed agent that autonomously monitors industry news, synthesizes competitive intelligence, and delivers weekly briefings with cited sources.
283 issues
4
GH Trending
langchain-ai/deepagents
python 17,191 2,423 4,831 stars this week
LangChain's official deep agent harness built on LangGraph, featuring planning tools, filesystem access, and subagent spawning for complex multi-step tasks. Gained 4,800+ stars in a week, signaling strong developer interest in production-grade agentic scaffolding.
Build idea
A developer platform that lets software teams deploy production-ready AI agents for internal tasks — such as codebase audits, ticket triage, and documentation generation — with built-in observability and approval workflows.
421 commits/mo 160 issues
5
GH Trending
openai/codex
rust 67,224 8,996 1,563 stars this week
OpenAI's official lightweight terminal-based coding agent written in Rust, with 67k stars and active development (616 commits last month). The go-to reference implementation for CLI coding agents.
Build idea
A subscription CLI tool for freelance developers and agencies that acts as an always-on coding copilot in the terminal, automating boilerplate generation, refactoring, and bug fixes across any codebase without leaving the command line.
616 commits/mo 2222 issues
6
GH Trending
unslothai/unsloth
python 57,879 4,881 3,719 stars this week
Unsloth Studio adds a web UI for training and running open models (Qwen, DeepSeek, Gemma) locally, with 57k+ stars and nearly 4k new stars this week — one of the most actively used fine-tuning frameworks in the open-source ecosystem.
Build idea
A managed fine-tuning service targeting mid-market companies that want custom private LLMs — customers upload their data, select a base model, and receive a fine-tuned model endpoint without needing ML expertise or cloud infrastructure knowledge.
563 commits/mo 1029 issues
7
GH Trending
vllm-project/vllm-omni
python 3,710 616 549 stars this week
Official vLLM extension for omni-modality model inference (text, audio, vision, etc.) with 3.7k stars and active development — extends vLLM's high-throughput serving to multimodal frontier models.
Build idea
A multimodal AI inference API service for product teams that need to process mixed inputs — such as voice memos, screenshots, and text — in a single unified pipeline, billed per token across modalities.
248 commits/mo 487 issues
8
GH Trending
volcengine/OpenViking
python 18,578 1,273 4,636 stars this week
ByteDance's Volcengine open-sources OpenViking, a context database for AI agents that unifies memory, resources, and skills via a filesystem paradigm with hierarchical delivery and self-evolution — 18k stars and 4.6k new stars this week signals strong traction.
Build idea
A persistent memory and context management layer sold as a B2B API, enabling companies building AI agents to give those agents long-term, structured knowledge of users, past interactions, and domain-specific skills without building custom memory infrastructure.
281 commits/mo 79 issues
9
GH Trending
MiroMindAI/MiroThinker
python 8,019 584 1,070 stars this week
Deep research agent with models achieving 74.0 and 88.2 on BrowseComp benchmark, targeting complex research and prediction tasks. Competitive benchmark scores on web browsing comprehension are notable.
Build idea
A premium deep research subscription service for analysts, investors, and consultants that takes complex multi-part questions and returns thoroughly sourced, structured research reports compiled autonomously by web-browsing AI agents.
13 commits/mo 55 issues
10
GH Trending
alibaba/page-agent
typescript 13,654 1,041 4,261 stars this week
Alibaba's JavaScript in-page GUI agent enabling natural language control of web interfaces without browser extensions. 4K+ stars this week signals strong developer interest in lightweight web automation.
Build idea
An embeddable AI concierge widget that e-commerce and SaaS companies add to their websites, allowing customers to navigate, filter, fill forms, and complete purchases using natural language commands without any browser extension required.
140 commits/mo 56 issues

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
jakevin · @apache
@jackwener 1,803 34 repos
AI enthusiast! Love Open Source! PMC member & Committer of @apache Arrow & Datafusion & Doris
jackwener/opencli
TypeScript 5,770 468
Make Any Website & Tool Your CLI. A universal CLI Hub and AI-native runtime. Transform any website, Electron app, or local binary into a standardized command-line interface. Built for AI Agents to discover, learn, and execute tools seamlessly via a unified AGENT.md integration.
2
Jarrod Watts
@jarrodwatts 1,000 144 repos
jarrodwatts/claude-hud
JavaScript 12,290 511
A Claude Code plugin that shows what's happening - context usage, active tools, running agents, and todo progress
3
Brady Gaster
@bradygaster 1,018 98 repos
Brady Gaster is a PM Architect in the CoreAI division at Microsoft where he works on Apps, Agents, MIDI, and most recently, Squad
bradygaster/squad
TypeScript 1,245 167
Squad: AI agent teams for any project
4
David East · @google-labs-code
@davideast 3,039 106 repos
Working on @google-labs-code. Stitch and Jules <3
davideast/stitch-mcp
TypeScript 576 69
A CLI for moving AI-generated UI designs from Google’s Stitch platform into your development workflow.
5
Matt Van Horn
@mvanhorn 393 269 repos
Co-founded June ("self-driving oven" acquired by Weber) & the co that became Lyft. Building again, more soon. Vibe coding Last30Days research tool
mvanhorn/last30days-skill
Python 4,776 526
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
6
Nathan Brake · @mozilla.ai
@njbrake 442 50 repos
Machine Learning at Mozilla.ai
njbrake/agent-of-empires
Rust 1,312 105
Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI, Pi.dev, Copilot CLI Coding Agent Terminal Session manager via tmux and git Worktrees
7
Fengda Huang · @Thoughtworks
@phodal 20,540 382 repos
I'm digging holes.
phodal/routa
TypeScript 223 40
Build Your Agent Team for Real-World AI Development - Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop.
8
Sebastian Raschka
@rasbt 36,624 147 repos
AI Research Engineer working on LLMs.
rasbt/LLMs-from-scratch
Jupyter Notebook 89,124 13,596
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
9
Paul Bakaus · Google
@pbakaus 837 24 repos
Creative Technologist ☕️ Entrepreneur • 🗝 Gate opener • 🗺 Adventurer • 🎬 Geek • Created jQuery UI, Google for Creators, Spotter Studio • he/him
pbakaus/impeccable
JavaScript 12,905 525
The design language that makes your AI harness better at design.
10
qixing-jk
@qixing-jk 111 63 repos
qixing-jk/all-api-hub
TypeScript 2,363 139
一站式管理 New API 兼容中转站账号:余额/用量看板、自动签到、密钥一键导出到常用应用、网页内 API 可用性测试、渠道与模型同步/重定向 | New‑API relay manager: balance/usage, auto check‑in, one‑click key export to popular clients, in‑page API checks, channel/model sync & redirect
11
郑诚 (Cheng Zheng) · 奇绩创坛 MiraclePlus
@1c7 2,948 342 repos
AI Engineer & Full-Stack Builder. Focused on applied AI, RAG architectures, and shipping robust business tools.
1c7/chinese-independent-developer
47,334 4,028
👩🏿‍💻👨🏾‍💻👩🏼‍💻👨🏽‍💻👩🏻‍💻中国独立开发者项目列表 -- 分享大家都在做什么
12
Daniel Griesser · @getsentry
@HazAT 300 58 repos
( ͡° ͜ʖ ͡°)
HazAT/glimpse
JavaScript 453 19
Native macOS micro-UI for scripts and agents — sub-50ms WKWebView windows with bidirectional JSON communication
13
Alireza Rezvani · @lindera-engineering
@alirezarezvani 622 27 repos
CTO in HealthTech | Engineering AI that helps people move safely and live better . Augmented AI | Agentic Coding | Turning complex problems into simple solution
alirezarezvani/claude-skills
Python 6,626 785
+192 Claude Code skills & agent plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, product, compliance, C-level advisory.
14
Duy /zuey/ · @digitopvn
@mrgoonie 511 102 repos
CEO/CTO @ wearetopgroup.com 🇻🇳 Founder of GoClaw.sh, ClaudeKit.cc, IndieBoosting.com, GitRace.dev & DxUp.dev 🚀
mrgoonie/claudekit-skills
Python 1,882 374
All powerful skills of ClaudeKit.cc!
15
Willem Jiang
@WillemJiang 794 116 repos
16
Bartek Iwańczuk · @denoland
@bartlomieju 1,082 93 repos
Engineering Manager at @denoland
17
Dotta · @forgottenrunes
@cryppadotta 410 18 repos
CEO Forgotten Runes Wizards Cult. Crypto-quant. Creator of Dotlicense, the ERC721 software license (2018).
cryppadotta/dotta-license
JavaScript 507 101
ERC721-based Software Licensing Framework
18
Dream Hunter · @awsl-project
@dreamhunter2333 402 111 repos
不知道为什么柠檬它围绕着我
dreamhunter2333/cloudflare_temp_email
TypeScript 7,217 3,882
CloudFlare free temp domain email 免费收发 临时域名邮箱 支持附件 IMAP SMTP TelegramBot
19
Hartmut Kaiser
@hkaiser 209 25 repos
20
Josh Lehman · Martian Engineering
@jalehman 199 91 repos
Partner at Martian Engineering
jalehman/xc
TypeScript 15 4
CLI client for the X API v2
21
Jorge Manrubia · @basecamp
@jorgemanrubia 699 73 repos
22
Klaus Post · @minio
@klauspost 3,813 139 repos
klauspost/compress
Go 5,439 368
Optimized Go Compression Packages
23
Lawrence Chen · @manaflow-ai
@lawrencecchen 128 173 repos
lawrencecchen/awesome-libghostty
6
Curated list of projects and resources built with libghostty and libghostty-vt
24
Matthew Diakonov
@m13v 141 94 repos
m13v/fazm
Swift 77 7
Fazm Desktop for macOS

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard — Top 15
#ModelTypeEloVotes
1 claude-opus-4-6-thinking Anthropic Closed 1502 11,801
2 claude-opus-4-6 Anthropic Closed 1501 12,546
3 gemini-3.1-pro-preview Google Closed 1493 14,677
4 grok-4.20-beta1 xAI Closed 1492 7,396
5 gemini-3-pro Google Closed 1486 41,762
6 gpt-5.4-high OpenAI Closed 1485 4,965
7 gpt-5.2-chat-latest-20260210 OpenAI Closed 1482 10,140
8 grok-4.20-beta-0309-reasoning xAI Closed 1481 4,504
9 gemini-3-flash Google Closed 1475 31,060
10 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1474 37,036
11 grok-4.1-thinking xAI Closed 1472 43,930
12 claude-opus-4-5-20251101 Anthropic Closed 1469 41,976
13 claude-sonnet-4-6 Anthropic Closed 1465 9,843
14 qwen3.5-max-preview Alibaba Closed 1464 4,252
15 gpt-5.3-chat-latest OpenAI Closed 1464 8,942
New & Trending Models
nvidia/Nemotron-Cascade-2-30B-A3B
19,722 downloads 234 likes 234 trending
Custom License 2026-03-18
NVIDIA's Nemotron-Cascade-2 is a 30B total / 3B active parameter MoE model combining reasoning and general-purpose capabilities via SFT+RL, with a very high trending score (234) suggesting it's a notable new release with strong efficiency-to-performance ratio.
openai/gpt-oss-120b
4,523,709 downloads 4,604 likes 26 trending
Open Source 2025-08-04
OpenAI's open-source 120B model released on HuggingFace under Apache 2.0, with 4.5M downloads and 4.6K likes — a landmark open release from OpenAI that signals a significant shift in their open-source strategy.
Qwen/Qwen3-Coder-Next
1,248,880 downloads 1,172 likes 41 trending
Open Source 2026-01-30
Qwen3-Coder-Next from Alibaba's Qwen team with 1.25M downloads and 1.17k likes — a leading open-weight coding model with strong community adoption and agentic capabilities.
deepseek-ai/DeepSeek-V3.2
308,961 downloads 1,330 likes 24 trending
Open Source 2025-12-01
DeepSeek-V3.2 with 309k downloads and 1.33k likes — an updated iteration of DeepSeek's flagship MoE model, continuing to be one of the most downloaded open-weight frontier models.
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
116,259 downloads 289 likes 64 trending
Custom License 2026-03-10
NVIDIA's Nemotron-3 Super 120B MoE model (12B active parameters) in BF16, using a latent-MoE architecture with multi-token prediction — strong download numbers (116K) suggest real community interest in this efficient large-scale model.
MiniMaxAI/MiniMax-M2.5
498,634 downloads 1,275 likes 68 trending
Custom License 2026-02-12
MiniMax's M2.5 model with 498k downloads and 1.2k likes — a competitive frontier model from MiniMax with strong community uptake, though limited public technical details available.
Multilingual-Multimodal-NLP/IndustrialCoder
299 downloads 34 likes 34 trending
Open Source 2026-03-13
Domain-specific code model targeting industrial hardware description languages (Verilog, CUDA, Triton, chip design, CAD) — fills a real gap for hardware engineers that general code models underserve.
microsoft/bitnet-b1.58-2B-4T
15,604 downloads 1,396 likes 30 trending
Open Source 2025-04-15
Microsoft's BitNet b1.58 2B model trained on 4T tokens using 1.58-bit weights — continues to be a reference point for extreme quantization research with 1.4k likes and active interest.
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
32,255 downloads 54 likes 37 trending
Custom License 2026-03-07
NVIDIA's Nemotron-3 Nano 4B model trained on a rich multi-dataset mix including agentic, math, and competitive programming data — compact model with strong post-training recipe targeting edge deployment.
nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF
8,769 downloads 63 likes 53 trending
Custom License 2026-03-07
NVIDIA's Nemotron-3 Nano 4B in GGUF format, trained on specialized datasets including agentic, math, and competitive programming data. A compact, locally-runnable model with reasoning capabilities from NVIDIA's Nemotron-3 family.
silx-ai/Quasar-10B
148 downloads 33 likes 31 trending
Open Source 2026-03-09
Quasar-10B is a 10B linear-attention model based on Qwen3.5-9B-Base with a claimed 2M token context window using GLA (Gated Linear Attention), which is technically interesting for long-context inference without quadratic attention costs.
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
136,277 downloads 141 likes 39 trending
Open Source 2026-03-03
GGUF-quantized Qwen3.5-9B distilled from Claude Opus reasoning traces with 136k downloads — demonstrates continued community interest in reasoning distillation from frontier models into smaller open weights.
Rakuten/RakutenAI-3.0
595 downloads 67 likes 45 trending
Open Source 2026-03-16
Rakuten's new bilingual (Japanese/English) large model built on DeepSeek-V3 architecture — notable as a production-grade Japanese-English model from a major enterprise, though technical details are sparse.
Tesslate/OmniCoder-9B
20,303 downloads 371 likes 119 trending
Open Source 2026-03-12
Multimodal coding agent model (image-text-to-text) fine-tuned from Qwen3.5-9B with 20k downloads and 371 likes — combines vision and code generation for agentic use cases.
XiaomiMiMo/MiMo-V2-Flash
175,196 downloads 673 likes 20 trending
Open Source 2025-12-16
Xiaomi's MiMo-V2-Flash with 175k downloads — a fast inference variant of their MiMo reasoning model series, though trending score has dropped suggesting initial hype has settled.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

Tensorbend
Ex0bit
static 49 47
A HuggingFace Space with minimal metadata; insufficient information to assess technical substance.
Omni Video Factory
FrameAI4687
gradio 682 100
mit
A Gradio demo space combining text-to-video, image-to-video, and video extension capabilities; high likes (682) but appears to be a wrapper aggregating existing video generation models rather than novel research.
NSFW Uncensored Adult Image
Heartsync
gradio 591 33
NSFW image generation space; not relevant to AI research or engineering.
LTX 2.3 Distilled
Lightricks
gradio 222 53
Official demo for LTX Video 2.3 Distilled from Lightricks, a distilled video generation model offering faster inference; represents continued progress in efficient video synthesis from a serious commercial player.
Wan2.2 Animate
Wan-AI
gradio 5,027 71
apache-2.0
Official Wan2.2 Animate demo from Wan-AI with 5K+ likes, enabling animation generation; part of the Wan2.x video generation model family that has gained significant community traction.
Fish Audio S2 Pro
artificialguybr
gradio 112 55
other
Zero-GPU demo of Fish Audio S2 Pro text-to-speech model; Fish Audio has been producing competitive TTS models and this provides accessible inference for the latest S2 Pro version.
FLUX.2 Klein 9B KV
black-forest-labs
gradio 108 66
Official Black Forest Labs demo for FLUX.2 Klein 9B with KV caching — a new FLUX model variant optimized for faster inference via KV cache, from the team behind the leading open image generation models.
Free Unlimited Google Veo 3
deddytoyota
static 265 131
Appears to be an unofficial/scraped wrapper claiming free access to Google Veo 3 with NSFW content; likely a spam/scam space with no legitimate technical substance.
LTX 2.3 First-Last Frame
linoyts
gradio 69 37
LTX Video 2.3 demo with first-and-last-frame conditioning, enabling controlled video generation between two keyframes — a useful capability for video editing workflows.
Voxtral Realtime WebGPU
mistralai
static 76 38
Mistral's Voxtral real-time speech transcription running entirely in-browser via WebGPU — notable for client-side inference of a capable ASR model without server round-trips.
Z Image Turbo
mrfakename
gradio 2,646 81
Z-Image Turbo is a fast image generation model with 2.6K likes indicating strong community adoption; limited metadata but high engagement suggests a competitive fast-inference image model.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 1,953 38
Demo using Qwen's image model to generate multiple camera angles of a scene in 3D, with nearly 2K likes; interesting application of multimodal models for novel-view synthesis-style outputs.
Kimodo
nvidia
docker 54 54
apache-2.0
NVIDIA's Kimodo generates high-quality human motion sequences from text prompts; relevant to animation, robotics, and embodied AI research as a text-to-motion foundation model.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 425 136
apache-2.0
Fast image editing demo combining FireRed-Image-Edit with Qwen-Image-Edit-Rapid; a community wrapper with high trending score but derivative of existing editing models.
Qwen-Image-Edit-2511-LoRAs-Fast
prithivMLmods
gradio 1,140 63
apache-2.0
Collection of LoRA adapters for Qwen Image Edit with 1.1K likes; useful for practitioners but derivative fine-tuning work on an existing model.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-03-24
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
ICLR 2026 paper introducing Common Corpus, claimed to be the largest ethically-sourced (non-copyrighted) pre-training dataset for LLMs. Important for researchers needing legally defensible training data.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-03-24
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
ICLR 2026 paper presenting MedAraBench, a large-scale Arabic medical QA benchmark addressing a significant gap in multilingual medical NLP evaluation. Useful for researchers in low-resource language AI.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-03-24
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
ICLR 2026 theoretical paper analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, providing formal grounding for in-context learning behavior. Relevant for researchers studying ICL mechanisms.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-03-24
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
ICLR 2026 paper introducing Task Tokens for adapting transformer-based behavior foundation models in humanoid robotics without full retraining. Incremental but practically useful for robotics researchers.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-03-24
Submodular Function Minimization with Dueling Oracle
ICLR 2026 theory paper on submodular function minimization with a dueling oracle. Tangentially relevant to preference-based optimization but not directly applicable to mainstream AI/ML practice.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-03-24
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
ICLR 2026 benchmark paper evaluating MLLMs on scan-oriented academic paper reasoning, finding current models still far from autonomous research capability. Useful for tracking MLLM progress on complex document understanding.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-03-24
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
ICLR 2026 paper proposing N-th order recursive consistent velocity field estimation for flexible any-step generation, addressing computational overhead in consistency models. Technically solid contribution to efficient generative modeling.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-03-24
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT introduces a fully offline hierarchical RL framework using masked skill token training to transfer policies across environments with different dynamics — relevant for sim-to-real transfer in robotics without requiring online fine-tuning.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-03-24
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings — theoretically solid but incremental contribution to optimization theory.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-03-24
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG applies RL-based value training to embedders for multi-step retrieval in long-context QA, addressing the fundamental limitation of single-step RAG on complex multi-hop questions — a meaningful advance over standard RAG pipelines.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-03-24
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Proposes cross-lingual alignment improvements for multilingual information retrieval, addressing semantic proximity gaps between query and document languages — solid but incremental work in CLIR.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-03-24
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5 Pro and Flash on standard CV tasks (depth, segmentation, etc.) reveals where frontier multimodal models still fall short of specialized vision models — important calibration for practitioners choosing between VLMs and task-specific models.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-03-24
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous neural field representations for variable-cardinality set prediction (object detection, molecular modeling), enabling diffusion/flow-matching over discrete structures without padding hacks.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-03-24
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses random path sampling in wavelet scattering transforms to reduce computational cost while preserving perceptual gradient quality for audio/vision inverse problems — useful for differentiable audio synthesis practitioners.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-03-24
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for rare-event forecasting in multivariate time series — niche but technically interesting for anomaly detection applications.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-03-24
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to adaptive precision in classical numerical methods — useful for AI4Science applications.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-03-24
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic space multi-modal learning to enzyme classification, capturing hierarchical EC number relationships — specialized bioinformatics contribution with limited general ML impact.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-03-24
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for speech-to-speech LLMs, addressing latency and quality issues in interleaved audio-text generation — relevant for real-time voice AI systems.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-03-24
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K is a 20K-sample benchmark for proactive and personalized mobile GUI agents that act without explicit instructions by leveraging user context — advances evaluation of next-gen agentic mobile assistants.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-03-24
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (Instant-NGP), replacing heuristic hyperparameter tuning with principled design — useful for practitioners working with neural radiance fields.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 379 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

379+ research items ready to explore