Weekly Intelligence

AI Quick Bites

March 10, 2026 · 387 items from 14 sources

    Last refreshed: March 10, 2026 at 09:56 UTC
  

This Week vs Last Week (2026-03-07)

Total Items

387 +55 was 332

Sources

14 +1 was 13

Highlights

The five most consequential developments in AI this week — selected from 387 items across 14 sources. These are the things an AI engineer, researcher, or founder needs to know.

Top Pick

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

RetroAgent's dual intrinsic feedback mechanism — combining numerical subtask rewards with retrievable language lessons — represents a meaningful advance in making RL-trained LLM agents continuously improve from experience rather than converge to static strategies, with large gains across four diverse benchmarks.

arxiv 2026-03-10 20 min

02

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security BREAKOUT

OSS-CRS makes DARPA's AIxCC autonomous vulnerability discovery and patching systems actually usable outside the competition, immediately finding 10 new bugs in real OSS-Fuzz projects — a direct bridge from research to practical AI-powered security.

arxiv 2026-03-10 18 min

03

PostTrainBench: Can LLM Agents Automate LLM Post-Training? BREAKOUT

PostTrainBench is the first rigorous benchmark for AI R&D automation, documenting both the capability gap and alarming reward-hacking behaviors (test-set training, unauthorized API use) that will matter enormously as these systems become more capable.

arxiv 2026-03-10 18 min

04

Grow, Don't Overwrite: Fine-tuning Without Forgetting BREAKOUT

The function-preserving expansion approach elegantly solves catastrophic forgetting by guaranteeing mathematical equivalence at initialization, achieving full fine-tuning performance with zero capability regression — a practically deployable solution to a long-standing problem.

arxiv 2026-03-10 18 min +1.0

05

CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning

CODA's difficulty-aware compute allocation cuts inference token costs by 60%+ on easy tasks with no accuracy loss, offering a principled and annotation-free approach to the overthinking problem in large reasoning models.

arxiv 2026-03-10 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar.

New This Week 14 items

Rising 311 items

+376484.0 openclaw/openclaw 376488.0
+183649.0 Github might be in trouble — ThePrimeTime 183652.0
+141882.4 shadcn-ui/ui 141884.4

Dropped Off 18 items

Category Trends

How AI research areas are shifting week over week. Charts track volume changes over 10 weeks — spot rising fields before they peak.

AI Dev Tools

79 items -9 (-10%)

AI Research

65 items +65 (0%)

LLM Agents

63 items +2 (3%)

Inference & Local Models

29 items -19 (-40%)

Generative Media

25 items +1 (4%)

Training & Fine-tuning

23 items -18 (-44%)

AI Security

22 items -2 (-8%)

Other

18 items -2 (-10%)

Multimodal AI

15 items -5 (-25%)

AI Safety

14 items -5 (-26%)

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here.

KeygraphHQ/shannon

8/10

Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% (100/104 exploits) on the XBOW benchmark without hints — a strong SOTA result for automated vulnerability exploitation. Gaining 6,892 stars this week signals significant practitioner interest.

github 2026-03-10 5 min

38 researchers red-teamed AI agents for 2 weeks. Here's what broke. (Agents of Chaos, Feb 2026) AI Security

8/10

84-page multi-institution study (Northeastern, Harvard, Stanford, MIT, CMU) where 38 researchers red-teamed autonomous AI agents (Claude Opus, Kimi K2.5) with persistent memory, email, Discord, and shell access over two weeks, uncovering systematic vulnerabilities in agentic AI deployments. One of the most comprehensive empirical AI agent security evaluations published to date.

reddit 2026-03-10 25 min

Hardening Firefox with Anthropic's Red Team

8/10

Anthropic's red team used Claude to discover real, patched security vulnerabilities in Firefox — a concrete demonstration of AI-assisted vulnerability research at production scale with verified CVEs, marking a significant milestone for AI in offensive security.

hackernews 2026-03-10 8 min

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

7/10

OSS-CRS liberates DARPA AIxCC competition cyber reasoning systems from defunct cloud infrastructure into a locally deployable framework, discovering 10 previously unknown bugs (3 high severity) across 8 OSS-Fuzz projects. Makes state-of-the-art autonomous vulnerability discovery and patching accessible to the broader security research community.

arxiv 2026-03-10 18 min

Show HN: Golf Scanner – OSS tool to find and audit every MCP server

7/10

Open-source Go binary that discovers all MCP servers configured across IDEs and runs security audits against each one. Addresses a real and growing attack surface as MCP adoption accelerates.

hackernews 2026-03-10 5 min

OBLITERATUS

7/10

OBLITERATUS by Pliny the Prompter is a 'one-click model liberation' tool — a jailbreaking playground that automates safety bypass techniques across models, directly relevant to red-teaming and LLM security research.

huggingface_spaces 2026-03-10 5 min

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

7/10

ICLR 2026 paper on training LLMs to honestly self-report hidden objectives via honesty fine-tuning, advancing alignment auditing by making deceptive goal-pursuit detectable through direct interrogation.

conferences 2026-03-10 20 min

My journey through Reverse Engineering SynthID

7/10

Researcher reverse-engineered Google's SynthID image watermark without neural network access by averaging 200 pure black/white Gemini-generated images and using FFT analysis to isolate the watermark signal directly. Demonstrates a practical, low-resource attack on AI watermarking schemes with significant implications for content provenance.

reddit 2026-03-10 8 min

2,863 Google API keys on public websites now silently authenticate to Gemini. One developer was billed $82,314 in 48 hours. Google's initial response: "Intended Behavior."

7/10

2,863 exposed Google API keys on public websites now silently authenticate to Gemini, with one developer billed $82K in 48 hours; Google initially classified this as intended behavior. A critical API key exposure issue specific to AI services with real financial and security consequences.

reddit 2026-03-10 6 min

Promptfoo Is Joining OpenAI

7/10

Promptfoo, the leading open-source LLM red-teaming and evaluation framework, is joining OpenAI. Significant for the AI security/safety ecosystem as a key independent tool gets absorbed by a major lab.

hackernews 2026-03-10 5 min

The L in "LLM" Stands for Lying

6/10

High-engagement blog post (472 HN comments) making a technical argument about LLM hallucination and deception as structural properties rather than bugs — useful framing for practitioners but not novel research.

hackernews 2026-03-10 8 min

Open-source AI coding agent skill that finds and fixes infra security misconfigs

6/10

Open-source AI coding agent skill that detects and fixes infrastructure security misconfigurations. Practical application of AI agents to security hardening workflows.

hackernews 2026-03-10 5 min

Remove invisible AI watermarks from Gemini images using reverse alpha math

6/10

Tool that removes invisible SynthID-style AI watermarks from Gemini-generated images using reverse alpha channel math. Highlights fragility of current AI watermarking approaches.

hackernews 2026-03-10 3 min

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA

6/10

Arbitrary-Rank Ablation (ARA) is a new fine-tuning-based decensoring method that dramatically reduces refusals in open-source models, outperforming prior abliteration techniques. Relevant to alignment robustness research and the ongoing arms race between safety training and circumvention.

reddit 2026-03-10 5 min

Threat actors are using fake Claude Code download pages to deploy a fileless infostealer via mshta.exe — developers should be aware

6/10

Active malvertising campaign uses fake Claude Code download portals (via hijacked Google Ads) to deliver a fileless infostealer via mshta.exe, specifically targeting developers. Directly relevant to AI tooling supply chain security.

reddit 2026-03-10 4 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative relevance score across all sources.

Top Authors

#1

multimodalart

2 items · avg 4.5/10

Qwen Image Multiple Angles 3D Camera

9.0

#2

prithivMLmods

2 items · avg 4.0/10

FireRed Image Edit 1.0 Fast

8.0

#3

Xiaoying Zhang

1 item · avg 7.0/10

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

7.0

#4

Andrew Chin

1 item · avg 7.0/10

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

7.0

#5

Ben Rank

1 item · avg 7.0/10

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

7.0

#6

Bingxiang He

1 item · avg 7.0/10

How Far Can Unsupervised RLVR Scale LLM Training?

7.0

Top Organizations

#1

ruvnet

4 items · avg 5.8/10

ruvnet/RuView

23.0

#2

karpathy

2 items · avg 8.0/10

karpathy/autoresearch

16.0

#3

alibaba

2 items · avg 6.5/10

alibaba/OpenSandbox

13.0

#4

anthropics

2 items · avg 6.5/10

anthropics/claude-code

13.0

#5

666ghj

2 items · avg 4.5/10

666ghj/BettaFish

9.0

#6

KeygraphHQ

1 item · avg 8.0/10

KeygraphHQ/shannon

8.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Memory Coach

A developer toolkit that implements RetroAgent-style dual intrinsic feedback loops for LLM agents — combining numerical task-completion signals with distilled language lessons stored in a retrievable memory buffer. Unlike static prompt engineering, agents continuously improve from their own failures without retraining. Build this as an open-source middleware layer that wraps existing agent frameworks like LangChain or AutoGen.

Customer support agents that learn from failed resolutions Coding assistants that accumulate project-specific lessons Game-playing agents for procedurally generated environments Enterprise workflow automation with self-improving task routing

https://arxiv.org/abs/2603.08561v1

AI Security Sandbox

A locally deployable, sandboxed platform for autonomous vulnerability discovery and patching, inspired by OSS-CRS liberating DARPA AIxCC systems for open-source use. The platform would let security teams run AI-powered fuzzing and patch generation against their own codebases without cloud dependencies or data exposure risks. Critical guardrails — isolated execution, audit logs, and human-in-the-loop patch approval — address the reward hacking behaviors documented in PostTrainBench.

Open-source project security auditing Enterprise pre-release vulnerability scanning Security research and red-teaming labs CI/CD pipeline integration for automated CVE detection

https://arxiv.org/abs/2603.08566v1 https://arxiv.org/abs/2603.08640v1

Grow-Not-Overwrite Finetuner

A fine-tuning service built on the 'Grow, Don't Overwrite' function-preserving expansion method, allowing teams to adapt foundation models to new tasks without catastrophic forgetting of base capabilities. Users upload their dataset, the service expands only a small subset of layers with mathematically equivalent initialization, and returns a model that excels at the new task while retaining original behavior. This directly solves the painful trade-off practitioners face between specialization and generality.

Domain-specific LLM adaptation for legal, medical, or finance verticals Continual learning pipelines for production models receiving new data Multi-task model serving without maintaining separate model copies Enterprise model customization without full retraining costs

https://arxiv.org/abs/2603.08647v1

Black-Box Model Auditor

A SaaS tool that uses the UNBOX approach — combining LLMs and text-to-image diffusion — to audit any vision API for learned biases and failure modes without requiring model access, gradients, or training data. Teams point it at a black-box API endpoint and receive human-readable reports describing what concepts the model has learned, where it underperforms, and what demographic or contextual biases exist. This fills a critical compliance gap as AI regulation tightens globally.

AI vendor due diligence for enterprise procurement Regulatory compliance auditing for deployed vision systems Bias detection in hiring, lending, or healthcare screening tools Competitive model benchmarking without white-box access

https://arxiv.org/abs/2603.08639v1

Adaptive Token Budget

A drop-in inference middleware inspired by CODA that dynamically allocates reasoning compute based on estimated query difficulty — routing easy requests through fast shallow paths and reserving extended chain-of-thought for genuinely hard problems. With 60%+ token cost reduction on easy tasks documented in research, this translates directly to API cost savings at scale. Build it as a proxy layer compatible with OpenAI, Anthropic, and open-source model APIs.

High-volume customer-facing chatbots with mixed query complexity Enterprise RAG pipelines where most queries are factual lookups Coding assistants balancing autocomplete vs. architectural reasoning Cost optimization layer for AI startups with tight inference budgets

https://arxiv.org/abs/2603.08659v1 https://arxiv.org/abs/2603.08660v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1

Visual Translate by Vozo

Translate text in your videos without recreating visuals

SaaS Artificial Intelligence Video

176

80

https://www.producthunt.com/r/WDYAI...

#2

Chronicle 2.0

AI presentations without the AI slop

Productivity Artificial Intelligence Design

142

76

https://www.producthunt.com/r/CNILD...

#3

sitefire.ai

Marketing suite for the agentic web

Public Relations Marketing SEO

112

32

https://www.producthunt.com/r/O5UCC...

#4

Claude Code Review

Multi-agent review catching bugs early in AI-generated code

Developer Tools Artificial Intelligence Development

104

5

https://www.producthunt.com/r/TX3N3...

#5

Fish Audio S2

Real Expressive AI Voices

Open Source Artificial Intelligence GitHub

101

12

https://www.producthunt.com/r/SZPNS...

#6

Your Next Store

AI-first platform for building commerce stores, fast

SaaS Artificial Intelligence E-Commerce

98

7

https://www.producthunt.com/r/JSAOC...

#7

Spine Swarm

Manage a team of AI agents that do real work

Productivity Artificial Intelligence Tech

89

12

https://www.producthunt.com/r/X5DFJ...

#8

CodeGuide

Generate PRDs, specs and wireframes your AI understands.

Productivity Developer Tools Artificial Intelligence

87

4

https://www.producthunt.com/r/WJ3IG...

#9

Sonarly

The AI that fixes prod autonomously

Software Engineering Developer Tools Artificial Intelligence

87

16

https://www.producthunt.com/r/RUAT5...

#10

MacQuit

Quit all running Mac apps in one click from your menu bar

Mac Productivity Menu Bar Apps

84

2

https://www.producthunt.com/r/AACQE...

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending and TrendShift, enriched with commit velocity and contributor activity.

1

GH Trending

KeygraphHQ/shannon

typescript 32,992 3,287 6,892 stars this week

Shannon Lite is a fully autonomous AI pentester for web apps and APIs achieving 96.15% (100/104 exploits) on the XBOW benchmark without hints — a strong SOTA result for automated vulnerability exploitation. Gaining 6,892 stars this week signals significant practitioner interest.

Build idea

A continuous security testing SaaS that automatically runs Shannon against staging environments on every deployment, delivering prioritized vulnerability reports to dev teams before code reaches production.

19 issues

2

TrendShift

karpathy/autoresearch

Python 8,700 1,200

Karpathy's new project using AI agents to autonomously run ML research experiments on single-GPU nanochat training — meta-AI doing AI research. Rapidly gained 19K stars in days, signaling high interest in automated research loops.

Build idea

A managed AI research acceleration platform where ML teams submit hypotheses and receive back fully run experiment results, ablations, and findings — compressing weeks of GPU experimentation into hours.

Karpathy's minimal ChatGPT-quality model trainable for ~$100, with 45K+ stars. Democratizes LLM training research and serves as the substrate for autoresearch experiments.

Build idea

A turnkey fine-tuning service for startups that lets them train a proprietary, domain-specific chat model on their own data for under $200, delivered as a deployable API endpoint.

26 commits/mo 71 issues

4

GH Trending

LMCache/LMCache

python 7,602 987 417 stars this week

LMCache provides a high-performance KV cache layer for LLMs, enabling faster inference by caching and reusing KV states across requests. With 7.6K stars and active development, it's becoming a serious infrastructure component for LLM serving.

Build idea

A drop-in LLM inference optimization layer sold to enterprises running self-hosted models, reducing GPU costs and latency by intelligently caching KV states across repeated or similar prompts.

python 7,308 540 3,262 stars this week

Alibaba's general-purpose sandbox platform for AI applications supporting multi-language SDKs, Docker/Kubernetes runtimes, and use cases including coding agents, GUI agents, agent evaluation, and RL training. Addresses a real infrastructure gap for safe agent execution with 7.3K stars and 3,262 new this week.

Build idea

A cloud-hosted secure sandbox API service for AI agent developers that provides isolated, metered execution environments — billed per agent run — eliminating the infrastructure burden of safely running untrusted AI-generated code.

120 commits/mo 52 issues

6

TrendShift

anthropics/claude-code

Shell 75,700 6,100

Anthropic's official agentic coding CLI with 76K+ stars, enabling natural language control of codebases including git workflows and complex refactoring. The dominant terminal-based coding agent with continued active development.

Build idea

A managed enterprise coding agent platform built on Claude Code that integrates with corporate SSO, audit logging, and internal codebases, giving large engineering teams a governed, policy-compliant AI coding assistant.

38 commits/mo 5869 issues

7

GH Trending

block/goose

rust 32,751 3,013 637 stars this week

Open-source, extensible AI agent built in Rust that goes beyond code suggestions to install, execute, edit, and test with any LLM. Strong traction with 32K+ stars and 256 commits last month signals active production use.

Build idea

A no-code workflow builder for non-technical business users that wraps Goose agents to automate repetitive software operations — like data pipeline maintenance or report generation — without requiring engineering involvement.

256 commits/mo 378 issues

8

GH Trending

inclusionAI/AReaL

python 4,596 382 991 stars this week

Fast reinforcement learning framework for LLM reasoning and agents, gaining 991 stars in a week. Targets single-GPU RL training for reasoning models, filling a gap between research and accessible RL fine-tuning.

Build idea

A fine-tuning SaaS that lets companies improve the reasoning capabilities of their private LLMs using reinforcement learning on domain-specific problem sets, accessible on a single GPU with no ML expertise required.

rust 64,315 8,566 1,536 stars this week

OpenAI's official lightweight coding agent that runs in the terminal, built in Rust with 64k+ stars and active development (667 commits last month). Represents OpenAI's open-source push into agentic coding tools competing with Claude Code.

Build idea

A developer productivity analytics platform that wraps Codex CLI to track, audit, and benchmark AI-assisted coding activity across engineering teams, providing ROI metrics and security oversight for CTOs.

667 commits/mo 1780 issues

RuView applies WiFi signal processing and DensePose-style models to achieve real-time human pose estimation, vital sign monitoring, and presence detection using only commodity WiFi hardware — no cameras required. Privacy-preserving sensing with significant surveillance and health monitoring implications.

Build idea

A privacy-first elder care monitoring subscription service that uses existing home WiFi routers to detect falls, monitor breathing, and track activity patterns — alerting caregivers without installing any cameras.

36 issues

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following.

1

Benson Wong · Tailscale and Elethink

@mostlygeek 274 113 repos

mostlygeek/llama-swap

Go 2,733 202

Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc

2

Nathan Brake · @mozilla.ai

@njbrake 296 50 repos

Machine Learning at Mozilla.ai

njbrake/agent-of-empires

Rust 1,063 81

Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI Coding Agent Terminal Session manager via tmux and git Worktrees

3

Brady Gaster

@bradygaster 863 94 repos

Brady Gaster is a PM Architect in the CoreAI division at Microsoft where he works on Apps, Agents, MIDI, and most recently, Squad

bradygaster/squad

TypeScript 730 96

Squad: AI agent teams for any project

4

David East · @google-labs-code

@davideast 2,895 106 repos

Working on @google-labs-code. Stitch and Jules <3

davideast/stitch-mcp

TypeScript 376 46

A CLI for moving AI-generated UI designs from Google’s Stitch platform into your development workflow.

5

Lukas Masuch · Snowflake

@lukasmasuch 1,346 72 repos

lukasmasuch/best-of-ml-python

23,301 3,104

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

6

qixing-jk

@qixing-jk 77 63 repos

qixing-jk/all-api-hub

TypeScript 1,972 112

一站式管理 New API 兼容中转站账号：余额/用量看板、自动签到、密钥一键导出到常用应用、网页内 API 可用性测试、渠道与模型同步/重定向 | New‑API relay manager: balance/usage, auto check‑in, one‑click key export to popular clients, in‑page API checks, channel/model sync & redirect

7

Saúl Ibarra Corretgé · @jitsi / @8x8

@saghul 1,959 164 repos

Fellow Jitster

saghul/txiki.js

C 2,970 200

A tiny JavaScript runtime

8

Austin Griffith

@austintgriffith 2,646 202 repos

👩‍🎤 builder on Ethereum

austintgriffith/ethskills

HTML 104 18

The missing knowledge between AI agents and production Ethereum.

9

Andy Anderson · ibm.com

@clubanderson 79 98 repos

Platform Engineering | Kubernetes | AI | Software Architect

clubanderson/clubTivi

Dart 2 2

Open-source cross-platform IPTV player with intelligent EPG mapping, multi-provider stream failover, and remote control support. Built with Flutter.

10

Gunnar Morling · Confluent

@gunnarmorling 2,586 304 repos

Technologist @ Confluent · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴

gunnarmorling/1brc

Java 7,971 2,209

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

11

Elie Habib

@koala73 1,880 19 repos

koala73/worldmonitor

TypeScript 35,107 5,898

Real-time global intelligence dashboard — AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface

12

sigoden

@sigoden 1,098 84 repos

sigoden/aichat

Rust 9,515 626

All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.

13

Teng Lin · XtalPi Inc.

@teng-lin 206 4 repos

teng-lin/notebooklm-py

Python 4,478 547

Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.

14

Yair Morgenstern

@yairm210 2,209 42 repos

yairm210/Unciv

Kotlin 10,152 1,788

Open-source Android/Desktop remake of Civ V

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week.

Arena Leaderboard — Top 15

#	Model	Type	Elo	Votes
1	claude-opus-4-6 Anthropic	Closed	1504	9,170
2	claude-opus-4-6-thinking Anthropic	Closed	1502	8,313
3	gemini-3.1-pro-preview Google	Closed	1500	4,041
4	grok-4.20-beta1 xAI	Closed	1491	5,280
5	gemini-3-pro Google	Closed	1485	39,923
6	gpt-5.4-high OpenAI	Closed	1479	3,503
7	gpt-5.2-chat-latest-20260210 OpenAI	Closed	1479	5,786
8	gemini-3-flash Google	Closed	1473	30,600
9	grok-4.1-thinking xAI	Closed	1473	39,309
10	claude-opus-4-5-20251101-thinking-32k Anthropic	Closed	1470	32,516
11	claude-opus-4-5-20251101 Anthropic	Closed	1467	37,462
12	dola-seed-2.0-preview Bytedance	Closed	1465	6,712
13	grok-4.1 xAI	Closed	1462	43,536
14	gemini-3-flash (thinking-minimal) Google	Closed	1462	22,846
15	gpt-5.4 OpenAI	Closed	1457	3,417

New & Trending Models

openai/gpt-oss-20b

7,401,682 downloads 4,443 likes 24 trending

Open Source 2025-08-04

OpenAI's open-source 20B parameter model released under Apache 2.0 with 7.4M downloads and 4.4K likes — a significant move toward open-weight releases from OpenAI, with vLLM support and quantization options (8-bit, mxfp4).

zai-org/GLM-5

234,052 downloads 1,762 likes 80 trending

Open Source 2026-02-11

GLM-5 is ZAI's flagship next-generation language model with strong trending metrics (80 trending score, 1762 likes), featuring a novel MoE DSA architecture. Represents a significant open-weight model release competing at the frontier level with MIT licensing.

sarvamai/sarvam-105b

2,048 downloads 198 likes 198 trending

Open Source 2026-03-03

Sarvam AI releases a 105B parameter model supporting 22+ Indian languages under Apache 2.0, using a custom MLA architecture — a significant open multilingual model targeting underserved South Asian language communities.

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

22,984 downloads 336 likes 304 trending

Open Source 2026-02-27

Qwen3.5-27B distilled from Claude 4.6 Opus reasoning traces to transfer chain-of-thought capabilities to a smaller open model, with 336 likes and strong download numbers. Represents the growing trend of distilling frontier model reasoning into accessible open weights.

Qwen/Qwen3-Coder-Next

1,181,726 downloads 1,102 likes 57 trending

Open Source 2026-01-30

Qwen3-Coder-Next is Qwen's latest coding-focused model with 1.18M downloads and 1,102 likes, suggesting strong community adoption. Minimal public documentation but download velocity indicates it's a meaningful coding model update.

sarvamai/sarvam-30b

4,221 downloads 137 likes 137 trending

Open Source 2026-03-03

Sarvam AI's 30B MoE model for 22+ Indian languages, companion to the 105B dense model — provides a more accessible inference option for the same multilingual coverage.

stepfun-ai/Step-3.5-Flash

132,663 downloads 700 likes 27 trending

Open Source 2026-02-01

StepFun's Step-3.5-Flash is a fast inference model with 132K downloads and Apache 2.0 license, backed by multiple arXiv papers — a competitive open-weight model worth benchmarking for latency-sensitive applications.

tencent/Penguin-VL-8B

576 downloads 43 likes 43 trending

Open Source 2026-03-05

Tencent's Penguin-VL-8B is a vision-language model built on Qwen3-8B with a custom vision encoder and arXiv paper — a new competitive open VLM in the 8B class worth benchmarking against InternVL and Qwen-VL.

zai-org/GLM-4.7-Flash

1,739,776 downloads 1,603 likes 22 trending

Open Source 2026-01-19

GLM-4.7-Flash is a lightweight MoE-based bilingual (EN/ZH) text generation model from ZAI with 1.7M downloads, designed for fast inference. Represents the efficient end of the GLM-4 family with MIT license.

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

70,471 downloads 150 likes 128 trending

Open Source 2026-02-27

GGUF quantized version of the Claude 4.6 Opus reasoning-distilled Qwen3.5-27B for local inference, with 70k downloads. Enables running frontier-distilled reasoning locally.

LiquidAI/LFM2-24B-A2B

20,505 downloads 279 likes 41 trending

Custom License 2026-02-24

LiquidAI's LFM2-24B-A2B is a 24B MoE model with only 2B active parameters, supporting 10 languages and targeting edge deployment. Interesting architecture for efficient multilingual inference.

MiniMaxAI/MiniMax-M2.5

448,370 downloads 1,143 likes 75 trending

Custom License 2026-02-12

MiniMax-M2.5 is a large-scale model with 448k downloads and 1,143 likes, available with FP8 support and Azure deployment. Limited public documentation makes it hard to assess technical novelty.

allenai/Olmo-Hybrid-7B

17,288 downloads 44 likes 44 trending

Open Source 2026-01-28

AllenAI's OLMo-Hybrid-7B is a fully open 7B model with a hybrid architecture, continuing the OLMo line of transparent research models. Relevant for researchers needing fully open (weights + data + code) baselines.

stepfun-ai/Step-3.5-Flash-Base

615 downloads 78 likes 33 trending

Open Source 2026-03-02

Base (pre-instruction-tuned) version of Step-3.5-Flash — useful for fine-tuning experiments but lower immediate impact than the instruct variant.

tencent/Penguin-VL-2B

213 downloads 21 likes 21 trending

Open Source 2026-03-05

Tencent's 2B vision-language model built on Qwen3-1.7B with a custom vision encoder — a compact VLM with an arXiv paper, interesting for edge deployment scenarios.

Model Buzz

GPT-5.4

hackernews 9/10 2026-03-10

KeygraphHQ/shannon

github 8/10 2026-03-10

karpathy/nanochat

trendshift 8/10 2026-03-10

GPT‑5.3 Instant

hackernews 8/10 2026-03-10

38 researchers red-teamed AI agents for 2 weeks. Here's what broke. (Agents of Chaos, Feb 2026) AI Security

reddit 8/10 2026-03-10

Hardening Firefox with Anthropic's Red Team

hackernews 8/10 2026-03-10

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

arxiv 7/10 2026-03-10

Code Review for Claude Code

hackernews 7/10 2026-03-10

anthropics/claude-code

trendshift 7/10 2026-03-10

Show HN: I wrote an LLM inference engine in pure Go – 48 tok/s zero dependencies

hackernews 7/10 2026-03-10

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live.

Leaderboard tracking 'uncensored' model capabilities — useful for researchers studying alignment and refusal behavior, though the framing is more community-driven than rigorous research.

ALL Bench Leaderboard

FINAL-Bench

static 47 47

apache-2.0

A consolidated leaderboard aggregating multiple benchmarks for model comparison — useful reference but derivative of existing evaluation infrastructure.

A Gradio demo combining text-to-video, image-to-video, and video extension capabilities — a convenience wrapper around existing video generation models with no novel technical contribution.

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

HuggingFaceFW

docker 126 126

HuggingFace FineWeb team's interactive playbook on generating high-quality synthetic training data at trillion-token scale — directly actionable guidance for practitioners building pretraining datasets.

faster-qwen3-tts

HuggingFaceM4

docker 141 74

Optimized TTS demo using Qwen3-based speech synthesis with speed improvements — demonstrates faster inference for the Qwen3 TTS pipeline.

LTX 2.3 Distilled

Lightricks

gradio 56 56

Lightricks releases LTX 2.3 Distilled, a distilled video generation model — distillation for video generation is technically interesting as it reduces inference cost while maintaining quality.

Official demo for Qwen3-TTS, Alibaba's text-to-speech model with 1.6K likes — a capable open TTS system worth evaluating for voice applications.

Wan2.2 Animate demo with nearly 5K likes — one of the more popular open video generation models, useful for benchmarking against commercial alternatives.

FLUX.2 [Klein] 9B

black-forest-labs

gradio 638 42

Black Forest Labs' FLUX.2 Klein 9B image generation demo — a smaller, faster variant of the FLUX.2 family for accessible high-quality image synthesis.

Free Unlimited Google Veo 3

deddytoyota

static 54 36

Unofficial wrapper claiming free access to Google Veo 3 with NSFW framing — likely a scraper or misleading demo, not a legitimate technical resource.

Flux2 Klein Face Swap

linoyts

gradio 96 35

Face swap application built on FLUX.2 Klein 9B with LoRA — a derivative application demo with limited technical novelty.

Microsoft's TRELLIS.2 generates high-fidelity 3D assets from images with 1.2K likes — a strong open 3D generation model from a major lab worth tracking for 3D content pipelines.

Z Image Turbo

mrfakename

gradio 2,506 77

Z Image Turbo demo with 2.5K likes suggests strong community interest in this fast image generation space — likely a distilled or optimized image model but details are sparse.

HuggingFace PRO-gated demo space — limited public technical value without PRO access, unclear what model or technique underlies it.

Qwen Image Multiple Angles 3D Camera

multimodalart

gradio 1,874 89

Demo using Qwen's vision model to generate multiple-angle views with 3D camera control from a single image — 1.9K likes indicates strong interest in this novel multiview generation capability.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-03-10

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Common Corpus is presented at ICLR 2026 as the largest openly licensed pre-training dataset for LLMs, directly addressing legal and copyright concerns in training data. Important for the open-source LLM ecosystem needing legally clean training corpora.

dataset pre-training large language models open data open science

ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-03-10

MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark

MedAraBench introduces a large-scale Arabic medical QA benchmark at ICLR 2026, addressing a significant gap in multilingual medical NLP evaluation. Useful contribution for underrepresented language research but narrow scope.

Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering

ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-03-10

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

ICLR 2026 paper providing theoretical analysis of transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, advancing understanding of in-context learning mechanisms. Contributes to foundational theory of why transformers generalize.

In-context learning Gaussian Mixture Models Theory

ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-03-10

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Task Tokens introduces a flexible conditioning mechanism for adapting transformer-based behavior foundation models in humanoid robotics without full retraining. Practical approach to multi-task adaptation in embodied AI at ICLR 2026.

Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control

ICLR 2026 Kaien Sho, Shinji Ito 2026-03-10

Submodular Function Minimization with Dueling Oracle

Theoretical ICLR 2026 paper on submodular function minimization using a dueling/pairwise comparison oracle. Highly specialized mathematical optimization work with limited direct ML practitioner relevance.

submodular minimization deling oracle preference-based optimization

ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-03-10

Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

ICLR 2026 benchmark evaluating MLLMs on scan-oriented academic paper reasoning, distinguishing between search (finding specific facts) and scan (holistic document understanding) tasks. Highlights a meaningful capability gap in current multimodal models for research automation.

Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning

ICLR 2026 Peng Sun, Tao Lin 2026-03-10

Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation

ICLR 2026 paper proposing N-th order recursive consistent velocity field estimation for any-step generation, simplifying few-step generative model training by removing complex multi-component losses. Advances consistency model efficiency.

Generative Models

ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-03-10

Masked Skill Token Training for Hierarchical Off-Dynamics Transfer

MSTT (Masked Skill Token Training) is a fully offline hierarchical RL framework for transferring policies across environments with different dynamics, using masked skill tokens to bridge the sim-to-real gap. Addresses a core challenge in embodied AI deployment.

Tranfser Learning Skills Hierarchical RL Embodied AI

ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-03-10

High Probability Bounds for Non-Convex Stochastic Optimization with Momentum

ICLR 2026 paper establishing high-probability convergence and generalization bounds for SGD with momentum in non-convex settings, filling a theoretical gap relevant to deep learning optimization. Important theoretical foundation but limited immediate practitioner impact.

Momentum nonconvex learning generalization

ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-03-10

Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training

Q-RAG trains embedders using value-based RL to support multi-step retrieval for complex long-context QA, going beyond single-step RAG limitations. Novel application of RL to retrieval training at ICLR 2026 with clear practical impact on multi-hop reasoning.

Reinforcement Learning RL QA Long-context RAG

ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-03-10

Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment

ICLR 2026 paper proposing cross-lingual alignment techniques to improve semantic proximity in multilingual information retrieval, addressing the gap between query and document languages in CLIR tasks.

Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval

ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-03-10

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5 Pro and others on standard computer vision tasks, revealing where frontier multimodal models actually stand versus specialized CV systems — important for practitioners choosing models for vision pipelines.

vision benchmark multimodal foundation models vision language models standard computer vision tasks

ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-03-10

CORDS - Continuous Representations of Discrete Structures

CORDS introduces continuous representations for variable-cardinality set prediction using neural fields and flow matching, enabling object detection and molecular modeling without fixed-size output assumptions.

Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching

ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-03-10

SCRAPL: Scattering Transform with Random Paths for Machine Learning

SCRAPL proposes randomized path sampling in wavelet scattering transforms to reduce computational cost while preserving perceptual quality gradients for audio/vision inverse problems.

scattering transform wavelets stochastic optimization ddsp perceptual quality assessment

ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-03-10

EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty

EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for probabilistic rare-event forecasting in imbalanced multivariate time-series, targeting safety-critical applications.

Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification

ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-03-10

Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth

Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to classical numerical methods — useful for scientific computing applications.

Neural Simulator Recurrent Depth AI4Simulation

ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-03-10

PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification

PoinnCARE applies hyperbolic space learning and multi-modal fusion to enzyme classification, better capturing hierarchical EC number relationships than Euclidean embedding approaches.

EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure

ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-03-10

From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training

Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality limitations of purely autoregressive interleaved audio-text generation.

Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning

ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-03-10

FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents

FingerTip 20K is a benchmark for proactive and personalized mobile GUI agents that act without explicit instructions, using contextual user history — a step toward more autonomous on-device AI assistants.

Mobile Agent LLM Agent GUI Proactive Agent Personalization

ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-03-10

Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings

Provides a rigorous physical-systems analysis of multi-resolution hash encoding's spatial kernel (used in NeRF/Instant-NGP), replacing heuristic hyperparameter tuning with principled design.

multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 387 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

387+ research items ready to explore