Weekly Intelligence

AI Quick Bites

April 06, 2026 · 369 items from 13 sources

Last refreshed: April 06, 2026 at 10:30 UTC
Next refresh: April 13, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week β€” selected from 369 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
Empirical evidence that tool-augmented agent frameworks are dramatically riskier than their underlying models, with credential leakage and privilege escalation emerging as systematic failure modes across all six tested frameworks.
arxiv 2026-04-06 20 min
03
Challenges the core assumption of multimodal RL training β€” models can improve reasoning scores even when visual input is corrupted, suggesting benchmark gains may not reflect genuine visual grounding.
arxiv 2026-04-06 18 min
04
Mechanistic finding that a 2D valence-arousal subspace in LLM representations directly controls refusal and sycophancy rates, offering a new interpretability handle for safety-relevant model behaviors.
arxiv 2026-04-06 18 min
05
Quantifies citation hallucination at scale across 10 models and provides an open-source fix that reduces hallucinated URLs by up to 79x β€” directly actionable for anyone building research or RAG agents.
arxiv 2026-04-06 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents
8/10
ACE benchmark quantifies adversarial effort to breach LLM agents in token/dollar cost rather than binary pass/fail, enabling game-theoretic analysis of attack economics across six budget-tier models. Novel framing that shifts agent security evaluation from capability to cost-efficiency.
hackernews 2026-04-06 8 min
Claude Code Found a Linux Vulnerability Hidden for 23 Years
8/10
Claude Code autonomously discovered a 23-year-old Linux kernel vulnerability during a coding session β€” a compelling real-world demonstration of AI-assisted vulnerability research with significant implications for security tooling.
hackernews 2026-04-06 10 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
8/10
Proposes honesty fine-tuning to make LLMs self-report hidden objectives during alignment auditing, addressing the weakness that models can deceive interrogators. Directly relevant to agentic safety and deceptive alignment research.
conferences 2026-04-06 20 min
Claude Code's source code has been leaked via a map file in their NPM registry
8/10
Claude Code's minified source was accidentally exposed via a source map file in the NPM package, revealing internal architecture including fake tools, frustration regexes, and stealth mode. Major accidental disclosure with significant implications for understanding production LLM agent design.
hackernews 2026-04-06 5 min
An Independent Safety Evaluation of Kimi K2.5
7/10
Independent safety evaluation of Kimi K2.5 finds it has fewer CBRNE refusals than GPT and Claude equivalents, concerning sabotage/self-replication propensity, and narrow political censorship β€” highlighting safety gaps in open-weight frontier models released without safety evaluations.
arxiv 2026-04-06 20 min
Learning the Signature of Memorization in Autoregressive Language Models
7/10
LT-MIA introduces the first transferable learned membership inference attack for LLMs, discovering an architecture-invariant memorization signature that transfers zero-shot from transformers to Mamba, RWKV-4, and RecurrentGemma (0.93-0.97 AUC), achieving 2.8x higher TPR at 0.1% FPR than baselines. Significant advance in LLM privacy attacks with broad architectural generalization.
arxiv 2026-04-06 20 min
Show HN: Live simulation of AI agents scamming each other (and getting caught)
7/10
Live simulation exposing critical trust vulnerabilities in AI agent payment ecosystems: one wallet registered 10,000+ fake agent services on x402, and ~1,900 MCP tools silently changed behavior post-approval with no reputation system to catch it. Demonstrates concrete, real-world attack surfaces emerging as agent payment rails go live at Stripe, Coinbase, and Visa.
hackernews 2026-04-06 5 min
KeygraphHQ/shannon
7/10
Shannon Lite is an autonomous white-box AI pentester that analyzes source code, identifies attack vectors, and executes real exploits against web apps and APIs β€” 35k stars signals strong practitioner interest in AI-driven offensive security tooling.
trendshift 2026-04-06 8 min
Airupt – open-source red-teaming for LLMs (79 attack vectors)
7/10
Airupt is an open-source red-teaming framework for LLMs covering 79 distinct attack vectors, providing a structured toolkit for systematic LLM vulnerability assessment. Novel breadth of attack coverage makes this a meaningful contribution to LLM security tooling.
hackernews 2026-04-06 5 min
Claude AI finds Vim, Emacs RCE bugs that trigger on file open
7/10
Claude AI autonomously discovered RCE vulnerabilities in Vim and Emacs that trigger on file open, demonstrating LLMs as effective automated vulnerability research tools. Concrete evidence of AI-assisted security research finding real critical bugs.
hackernews 2026-04-06 5 min
Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models
6/10
Proposes universal adversarial patches (UCGP) targeting infrared vision-language models in physical-world deployments, disrupting cross-modal semantic alignment rather than manipulating labels. Novel attack surface for IR-VLMs with real-world physical effectiveness demonstrated.
arxiv 2026-04-06 20 min
A Systematic Security Evaluation of OpenClaw and Its Variants
6/10
Systematic security evaluation of six OpenClaw agent frameworks across 205 test cases finds all exhibit substantial vulnerabilities including credential leakage and privilege escalation, with agentized systems significantly riskier than their underlying models alone.
arxiv 2026-04-06 20 min
Show HN: CargoWall – eBPF Firewall for GitHub Actions
6/10
CargoWall is an eBPF-based firewall for GitHub Actions that uses DNS proxying to restrict outbound connections from CI runners and LLM agents, directly addressing supply chain attack risks in agentic CI pipelines. Practical security tool with clear AI agent relevance.
hackernews 2026-04-06 5 min
WTF, Anthropic's Claude Code keeps track of every time you swear
6/10
Anthropic's leaked Claude Code source revealed a 'frustration detector' that tracks user profanity to infer emotional state β€” raising questions about undisclosed behavioral telemetry in AI coding tools and consent around affective monitoring.
hackernews 2026-04-06 6 min
Anthropic Races to Contain Leak of Code Behind Claude AI Agent
6/10
Anthropic accidentally exposed Claude Code's internal source code publicly, revealing unreleased features including an 'undercover mode' and frustration tracking β€” the leak was cloned within hours before takedown, exposing proprietary agent architecture.
hackernews 2026-04-06 5 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
#1
Qwen
3 items · avg 6.0/10
18.0
#2
r3gm
2 items · avg 6.0/10
12.0
#3
9.0
#4
prithivMLmods
2 items · avg 4.5/10
9.0
#5
8.0
#6
Zheng-Xin Yong
1 item · avg 7.0/10
7.0
Top Organizations
#1
microsoft
6 items · avg 6.3/10
38.0
#2
HKUDS
6 items · avg 5.3/10
32.0
#3
SakanaAI
2 items · avg 8.5/10
17.0
#4
openai
2 items · avg 8.0/10
16.0
#5
KeygraphHQ
2 items · avg 7.0/10
14.0
#6
NousResearch
2 items · avg 7.0/10
14.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Trust Firewall
A real-time reputation and behavioral monitoring layer for AI agent ecosystems that detects post-approval tool mutations, fake service registrations, and anomalous payment flows. As agent payment rails from Stripe, Coinbase, and Visa go live, there is no system to catch agents that change behavior after being approved or wallets that flood marketplaces with fake services. Build a lightweight middleware that fingerprints tool behavior at registration and continuously diffs it against live execution, flagging drift and revoking trust scores automatically.
AI agent marketplaces and MCP tool registries Autonomous payment and x402 protocol deployments Enterprise agentic workflow security auditing Multi-agent orchestration platforms
http://5.161.255.238:8888 https://arxiv.org/abs/2604.03131v1
Citation Integrity Guard
An agentic self-correction pipeline that validates academic and web citations generated by LLMs before they reach the user, catching hallucinated BibTeX entries and non-resolving URLs in real time. Research shows only 50.9% of LLM-generated BibTeX entries are fully correct and 3-13% of URLs never existed, yet deep research agents hallucinate at even higher rates. Build this as a drop-in post-processing layer or browser extension that runs two-stage verification β€” existence check then metadata validation β€” reducing citation errors by up to 79x.
AI-assisted academic writing and literature review tools Deep research agent outputs (Perplexity, ChatGPT Deep Research) Legal and compliance document generation Journalism and fact-checking workflows
https://arxiv.org/abs/2604.03173v1 https://arxiv.org/abs/2604.03159v1
Open-Weight Safety Scanner
A standardized, automated safety evaluation suite for open-weight frontier models that runs before public release, covering CBRNE refusal rates, self-replication propensity, sabotage behaviors, and political censorship patterns. The independent evaluation of Kimi K2.5 revealed alarming safety gaps that went undetected because no mandatory pre-release evaluation existed for open-weight models. Build this as an open-source CI/CD-style pipeline that model authors can run locally and publish results to a public leaderboard, creating community accountability without requiring centralized gatekeeping.
Open-weight model release pipelines (Hugging Face, Ollama) Enterprise procurement and model vetting workflows AI safety research benchmarking Government and regulatory compliance reporting
https://arxiv.org/abs/2604.03121v1 https://arxiv.org/abs/2604.03114v1
Local Coding Agent Stack
A fully local, privacy-preserving coding agent that combines on-device model inference (via Apple Silicon or WebGPU) with an agentic loop modeled on Claude Code's architecture β€” including tool use, frustration detection, and iterative self-correction β€” but running entirely offline. The Claude Code source leak revealed the concrete engineering patterns behind a production coding agent, and benchmarks show WebGPU inference is now viable for mid-size models in-browser. Build an open-source reference implementation that lets developers run a capable coding agent without sending code to external APIs.
Privacy-sensitive enterprise codebases Offline and air-gapped development environments Browser-native coding assistants with no backend Developer tools for regulated industries (finance, healthcare, defense)
https://alex000kim.com/posts/2026-03-31-... https://arxiv.org/abs/2604.02344 https://ai.georgeliu.com/p/running-googl... https://apfel.franzai.com
LLM Privacy Audit Tool
A membership inference attack toolkit that helps organizations detect whether their proprietary or sensitive data was used to train a given LLM, leveraging the newly discovered architecture-invariant memorization signature that transfers zero-shot across transformers, Mamba, and RWKV models. With LT-MIA achieving 0.93-0.97 AUC and 2.8x higher true positive rates than prior baselines, this attack surface is now practical enough to productize. Build a SaaS tool where enterprises submit sample documents and receive a probabilistic report on training data exposure risk across major open and closed models.
Legal discovery and IP litigation support GDPR and CCPA right-to-erasure compliance verification Enterprise data governance and model procurement auditing Journalism and investigative research into model training practices
https://arxiv.org/abs/2604.03199v1 https://arxiv.org/abs/2604.03121v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week β€” sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
GH Trending
SakanaAI/AI-Scientist-v2
python 5,029 696 1,201 stars this week
AI Scientist-v2 from SakanaAI uses agentic tree search to automate scientific discovery at workshop-paper quality, representing a significant leap in autonomous research agents. With 5K+ stars and 696 forks, this is one of the most substantive AI agent systems publicly released.
Build idea
A SaaS platform for biotech and pharma R&D teams that uses AI-Scientist-v2 to autonomously generate, test, and summarize novel research hypotheses, dramatically accelerating early-stage drug discovery literature and experiment design.
2
GH Trending
openai/codex
rust 73,388 10,317 5,169 stars this week
OpenAI's official lightweight coding agent written in Rust that runs directly in the terminal, enabling agentic code generation and editing workflows without leaving the CLI; 73K+ stars and breakout traction signal strong developer interest.
Build idea
A developer productivity tool that wraps Codex into a team-shared CLI environment with audit logs, role-based permissions, and usage analytics, sold as an enterprise coding agent solution for engineering teams wanting agentic workflows without leaving the terminal.
3
TrendShift
KeygraphHQ/shannon
TypeScript 35,300 3,600
Shannon Lite is an autonomous white-box AI pentester that analyzes source code, identifies attack vectors, and executes real exploits against web apps and APIs β€” 35k stars signals strong practitioner interest in AI-driven offensive security tooling.
Build idea
A continuous security testing SaaS that integrates Shannon into CI/CD pipelines to automatically pentest web apps and APIs on every code push, delivering actionable exploit reports before vulnerabilities reach production.
4
TrendShift
NousResearch/hermes-agent
Python 26,600 3,500
NousResearch's Hermes Agent framework with 26.6k stars β€” from the team behind the Hermes model series, this represents a serious open-source agent system worth tracking for its model-agent co-design approach.
Build idea
A vertical AI agent builder for legal and compliance teams that leverages Hermes Agent's model-agent co-design to deploy specialized autonomous agents for contract review, regulatory monitoring, and compliance reporting.
5
GH Trending
microsoft/VibeVoice
python 36,586 4,200 10,543 stars this week
Microsoft's open-source frontier voice AI project with 36K+ stars and 10,500+ stars this week β€” one of the fastest-growing repos in the batch. Signals Microsoft's push into open voice AI, potentially competitive with ElevenLabs and OpenAI's voice stack.
Build idea
A white-label voice AI platform for call centers and customer support teams that uses VibeVoice to deploy branded, low-latency conversational voice agents without dependency on ElevenLabs or OpenAI voice APIs.
6
TrendShift
Arthur-Ficial/apfel
Swift 1,900 58
Swift CLI tool exposing Apple's on-device FoundationModels framework for local LLM inference with no API keys or cloud dependency, gaining 1900 stars quickly. Interesting for Apple Silicon on-device AI development.
Build idea
A privacy-first AI writing and productivity app for macOS that runs entirely on-device using Apple's FoundationModels via apfel, marketed to professionals in regulated industries like healthcare and law who cannot send data to the cloud.
7
GH Trending
EricLBuehler/mistral.rs
rust 6,869 558 126 stars this week
Fast Rust-based LLM inference engine with broad model support, continuing to gain traction with 6.8k stars β€” solid alternative to llama.cpp for performance-critical deployments.
Build idea
A managed on-premise LLM inference appliance for enterprises, pre-configured with mistral.rs on high-performance hardware, offering a plug-and-play private AI backend with guaranteed throughput SLAs.
8
GH Trending
HKUDS/RAG-Anything
python 15,258 1,820 500 stars this week
All-in-one RAG framework from HKUDS supporting diverse data modalities, gaining strong traction at 15k stars with 500 new this week β€” broad coverage but likely incremental over existing RAG stacks.
Build idea
A document intelligence SaaS for enterprises that uses RAG-Anything to ingest mixed-modality corporate knowledge bases β€” PDFs, spreadsheets, images, and databases β€” and expose a unified Q&A API for internal tools and chatbots.
9
GH Trending
TabbyML/tabby
rust 33,304 1,720 131 stars this week
Tabby is a mature self-hosted AI coding assistant with 33K+ stars, offering a privacy-preserving alternative to GitHub Copilot. Steady but not a new release β€” included for its continued traction.
Build idea
A managed private Tabby hosting service for mid-market companies that want GitHub Copilot-level coding assistance without sending proprietary code to third-party servers, offered as a fully managed deployment on the customer's own cloud account.
10
GH Trending
Yeachan-Heo/oh-my-codex
typescript 17,008 1,626 13,476 stars this week
OmX extends OpenAI Codex with hooks, agent teams, and HUDs β€” a plugin/harness layer for coding agents that gained 13K+ stars in a week, signaling strong developer interest in agent orchestration tooling.
Build idea
A marketplace and management platform for OmX-compatible coding agent plugins and hook libraries, letting developers publish, monetize, and compose agent team configurations for specific tech stacks or workflows.

Trending Developers

Developers gaining traction on GitHub this week β€” shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
Илия
@777genius
777genius/claude-code-source-code
GitHub developer trending for a repo claiming to contain Claude Code source code β€” likely a leak or reconstruction, but provenance and completeness are unverified.
2
Benson Wong
@mostlygeek
mostlygeek/llama-swap
Developer behind llama-swap, a tool for reliable model swapping across local OpenAI/Anthropic-compatible servers (llama.cpp, vllm). Useful for local inference orchestration.
3
Cole Murray
@ColeMurray
ColeMurray/background-agents
Trending developer with an open-source background agents coding system repo β€” minimal detail available to assess technical novelty.
4
Mervin Praison
@MervinPraison
MervinPraison/PraisonAI
PraisonAI multi-agent framework for automating complex tasks with low-code interface β€” another entry in the crowded multi-agent framework space.
5
Alireza Rezvani
@alirezarezvani
alirezarezvani/claude-skills
Collection of 220+ Claude Code skills and agent plugins for various coding agents β€” useful prompt/plugin library but derivative in nature.
6
Frank Bria
@frankbria
frankbria/ralph-claude-code
Developer profile featuring an autonomous AI development loop for Claude Code with intelligent exit detection. Minimal technical detail available from profile alone.
7
jakevin
@jackwener
jackwener/opencli
Developer behind OpenCLI, a universal CLI hub that can wrap websites and apps into CLI interfaces with AI-native runtime. Interesting concept but sparse technical detail from profile.
8
Shantanu
@hauntsaninja
hauntsaninja/git_bayesect
Developer profile featuring a Bayesian git bisect tool β€” interesting engineering but not AI/ML relevant.
9
Duy /zuey/
@mrgoonie
mrgoonie/claudekit-skills
Developer profile for ClaudeKit skills collection β€” insufficient technical detail to evaluate.
10
LmeSzinc
@LmeSzinc
LmeSzinc/AzurLaneAutoScript
Game automation bot for Azur Lane β€” not AI/ML relevant.
11
Mahesh Sanikommu
@MaheshtheDev
12
Alexey Milovidov
@alexey-milovidov
alexey-milovidov/font-selector
Trending developer profile for ClickHouse creator β€” the featured repo is a font selector, not AI-related.
13
Bartek IwaΕ„czuk
@bartlomieju
14
Michael Bolin
@bolinfest
bolinfest/monaco-tm
Trending developer profile β€” featured repo is Monaco/TextMate grammar integration, not AI-related.
15
Andy Anderson
@clubanderson
clubanderson/clubTivi
Trending developer profile β€” featured repo is an IPTV player, not AI-related.
16
dkhamsing
@dkhamsing
dkhamsing/open-source-ios-apps
Trending developer profile β€” featured repo is a list of open-source iOS apps, not AI-related.
17
Lukas Holecek
@hluk
hluk/CopyQ
Clipboard manager developer profile β€” not AI-related.
18
Liangsheng Yin
@hnyls2002
hnyls2002/my-toolbox
Personal toolbox developer profile β€” not AI-relevant.
19
Keith Smiley
@keith
keith/reminders-cli
macOS reminders CLI developer profile β€” not AI-relevant.
20
Matt Van Horn
@mvanhorn
mvanhorn/last30days-skill
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
21
Fengda Huang
@phodal
phodal/routa
Build Your Agent Team for Real-World AI Development - Workspace-first multi-agent coordination platform for AI development, with shared S…
22
Stephen Berry
@stephenberry
stephenberry/glaze
Extremely fast, in memory, JSON and reflection library for modern C++. BEVE, CBOR, CSV, MessagePack, TOML, YAML, EETF

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard β€” Top 15
#ModelTypeEloVotes
1 claude-opus-4-6-thinking Anthropic Closed 1504 13,979
2 claude-opus-4-6 Anthropic Closed 1499 14,934
3 gemini-3.1-pro-preview Google Closed 1494 17,559
4 grok-4.20-beta1 xAI Closed 1491 7,380
5 gemini-3-pro Google Closed 1486 41,632
6 gpt-5.4-high OpenAI Closed 1484 7,160
7 grok-4.20-beta-0309-reasoning xAI Closed 1481 7,344
8 gpt-5.2-chat-latest-20260210 OpenAI Closed 1478 13,083
9 gemini-3-flash Google Closed 1474 30,966
10 grok-4.20-multi-agent-beta-0309 xAI Closed 1474 7,815
11 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1474 37,467
12 grok-4.1-thinking xAI Closed 1471 45,399
13 claude-opus-4-5-20251101 Anthropic Closed 1468 44,715
14 qwen3.5-max-preview Alibaba Closed 1467 5,899
15 dola-seed-2.0-pro Bytedance Closed 1465 2,986
New & Trending Models
zai-org/GLM-5
295,830 downloads 1,953 likes 67 trending
Open Source 2026-02-11
GLM-5 from Zhipu AI (ZAI) is a new MoE-based foundation model with 295K downloads and 1953 likes, backed by an arXiv paper β€” represents a significant new Chinese-developed frontier model release.
LiquidAI/LFM2.5-350M
17,695 downloads 232 likes 232 trending
Custom License 2026-03-31
LiquidAI's LFM2.5-350M is a compact multilingual edge model (10 languages) based on their liquid foundation model architecture β€” notable for on-device deployment with strong multilingual coverage at sub-1B scale.
chromadb/context-1
3,729 downloads 371 likes 116 trending
Open Source 2026-03-12
ChromaDB releases context-1, a fine-tune of OpenAI's gpt-oss-20b optimized for retrieval/context tasks β€” notable as a vector DB company entering the model space to improve RAG pipelines.
nvidia/Gemma-4-31B-IT-NVFP4
129,352 downloads 207 likes 207 trending
Custom License 2026-04-02
NVIDIA releases NVFP4 quantized Gemma-4-31B-IT using their ModelOpt toolkit β€” FP4 quantization enables significantly faster inference on Hopper/Blackwell GPUs with 129K downloads indicating strong adoption.
prism-ml/Bonsai-8B-gguf
45,185 downloads 444 likes 444 trending
Open Source 2026-03-18
PrismML's Bonsai-8B 1-bit GGUF is the breakout model this week with 444 trending score and 45K downloads β€” extreme 1-bit quantization of an 8B model enabling very low memory on-device inference.
unsloth/Qwen3-Coder-Next-GGUF
273,309 downloads 544 likes 23 trending
Open Source 2026-02-03
Unsloth's GGUF quantization of Qwen3-Coder-Next with 273K downloads signals strong demand for locally-runnable coding models; imatrix quantization improves quality at lower bit depths.
zai-org/GLM-4.7-Flash
996,011 downloads 1,653 likes 21 trending
Open Source 2026-01-19
GLM-4.7-Flash from Zhipu AI is a fast, lightweight bilingual (EN/ZH) model with nearly 1M downloads, positioned as an efficient alternative in the GLM-4 family for production inference.
zed-industries/zeta-2
1,445 downloads 112 likes 27 trending
Open Source 2026-03-23
Zeta-2 is Zed editor's next-edit-prediction model fine-tuned from ByteDance's Seed-Coder-8B, designed for inline code suggestions β€” a specialized coding assistant model from a developer tooling company.
0xSero/gemma-4-21b-a4b-it-REAP
536 downloads 55 likes 55 trending
gemma 2026-04-05
A pruned variant of Gemma-4-21B using the REAP (expert pruning) method from Cerebras, reducing MoE active parameters to 4B β€” interesting application of structured pruning to MoE models.
MiniMaxAI/MiniMax-M2.5
631,119 downloads 1,345 likes 36 trending
Custom License 2026-02-12
MiniMax-M2.5 is a large open-weight model with 630K+ downloads, supporting FP8 inference and Azure deployment β€” established model continuing to see community traction.
Qwen/Qwen3-Coder-Next
743,293 downloads 1,225 likes 30 trending
Open Source 2026-01-30
Qwen3-Coder-Next is a code-focused model from Alibaba's Qwen team with 743K downloads, suggesting strong community adoption for coding tasks β€” limited metadata available.
Rta-AILabs/Nandi-Mini-150M
5,642 downloads 98 likes 98 trending
Open Source 2026-04-01
Nandi-Mini-150M is a 150M parameter model supporting 11 Indian languages (Hindi, Tamil, Telugu, Kannada, etc.) β€” notable for low-resource multilingual coverage at tiny scale.
arcee-ai/Trinity-Large-Thinking
7,107 downloads 106 likes 106 trending
Open Source 2026-04-01
Arcee AI's Trinity-Large-Thinking is a multilingual MoE model with reasoning, tool-calling, and agentic capabilities β€” targets enterprise agentic workflows with broad language support.
nvidia/Nemotron-Cascade-2-30B-A3B
159,371 downloads 457 likes 50 trending
Custom License 2026-03-18
NVIDIA's Nemotron-Cascade-2 is a 30B MoE model (3B active) with SFT+RL training for reasoning and general tasks β€” hybrid architecture with 159K downloads showing solid adoption.
openai/gpt-oss-120b
3,753,149 downloads 4,649 likes 25 trending
Open Source 2025-08-04
OpenAI's open-source 120B model continues high download volume (3.75M) β€” established release, no new developments this week.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week β€” try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

Omni Video Factory
FrameAI4687
gradio 821 68
mit
A Gradio space supporting text-to-video, image-to-video, and video extension β€” popular demo (821 likes) but limited technical novelty as a UI wrapper.
NSFW Uncensored Adult Image
Heartsync
gradio 654 44
NSFW image generation space β€” not technically relevant to AI research or engineering.
Qwen3-TTS Demo
Qwen
gradio 1,816 43
apache-2.0
Qwen3-TTS is Alibaba's latest text-to-speech model demo with 1800+ likes on HuggingFace, indicating strong community interest in this new TTS release from the Qwen team.
Qwen3.5 Omni Offline Demo
Qwen
gradio 103 103
Qwen3.5 Omni is a new multimodal model from Alibaba supporting offline inference, extending the Qwen series with full omni-modal capabilities (audio, vision, text) in a single model.
Qwen3.5 Omni Online Demo
Qwen
gradio 55 55
apache-2.0
Online demo companion to the Qwen3.5 Omni offline demo; same model, different serving mode. Duplicate entry with the offline demo for practical purposes.
OmniVoice
k2-fsa
gradio 163 163
apache-2.0
OmniVoice offers high-quality voice cloning TTS supporting 600+ languages under Apache 2.0, making it one of the broadest multilingual TTS systems publicly available.
LTX 2.3 Sync
linoyts
gradio 104 58
LTX 2.3 Sync enables portrait animation and lip-sync generation, building on the LTX video generation model with audio-driven facial animation capabilities.
TRELLIS.2
microsoft
gradio 1,356 63
mit
Microsoft's TRELLIS.2 generates high-fidelity 3D assets from images with 1300+ likes, representing a significant update to their 3D generation pipeline with strong community adoption.
Voxtral TTS Demo
mistralai
gradio 179 69
Mistral's Voxtral TTS demo showcases their entry into the text-to-speech space, notable as Mistral has primarily focused on text LLMs β€” signals expansion into audio generation.
Z Image Turbo
mrfakename
gradio 2,802 93
Z Image Turbo is a fast image generation demo with 2800+ likes, suggesting a high-speed image synthesis model though details on the underlying architecture are sparse.
MTEB Leaderboard
mteb
docker 7,223 38
mit
The MTEB embedding leaderboard is the canonical benchmark for text embedding models; its continued trending indicates active community use for model selection in RAG and retrieval pipelines.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 2,151 128
Demo using Qwen's image model to generate multiple camera angles of a scene in 3D, with 2100+ likes indicating strong interest in controllable multi-view image generation.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 690 156
apache-2.0
FireRed Image Edit combines FireRed and Qwen image editing models for rapid instruction-based image editing via Transformers, with 690 likes suggesting practical utility.
Qwen-Image-Edit-2511-LoRAs-Fast
prithivMLmods
gradio 1,244 59
apache-2.0
Collection of LoRA adapters for Qwen image editing β€” useful for practitioners wanting style-specific editing but largely derivative of the base Qwen image edit work.
Wan2.2 14B Preview
r3gm
gradio 1,862 196
Wan2.2 14B is a large video generation model (image-to-video with text prompt) running with FP8 quantization and AOTI compilation for efficient inference, with 1800+ likes indicating strong interest.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-04-06
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Common Corpus is presented at ICLR 2026 as the largest openly licensed dataset for LLM pre-training, directly addressing legal and copyright concerns around training data. Important for practitioners who need legally defensible training pipelines.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-04-06
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
MedAraBench introduces a large-scale Arabic medical QA benchmark at ICLR 2026, filling a significant gap for multilingual medical NLP evaluation. Useful for researchers working on low-resource or non-English medical AI.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-04-06
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Theoretical ICLR 2026 paper analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, providing formal grounding for in-context learning behavior. Relevant for researchers seeking theoretical understanding of ICL.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-04-06
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Task Tokens proposes a flexible conditioning mechanism for adapting transformer-based behavior foundation models in humanoid robotics without full retraining. Incremental but practically useful for multi-task robot control.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-04-06
Submodular Function Minimization with Dueling Oracle
Theoretical paper on submodular function minimization using a dueling (pairwise comparison) oracle; niche optimization theory with limited direct AI application.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-04-06
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
ICLR 2026 benchmark evaluating MLLMs on scan-oriented academic paper reasoning, finding current models still far from autonomous research capability. Useful for understanding MLLM limitations on structured document understanding.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-04-06
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th Order Recursive Consistent Velocity Field Estimation for any-step generation, addressing computational overhead and complex loss functions in consistency models. Could improve few-step diffusion model practicality.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-04-06
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT introduces a fully offline hierarchical RL framework using masked skill token training for cross-dynamics policy transfer, enabling adaptation without environment interaction. Relevant for sim-to-real and offline RL researchers.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-04-06
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings; theoretically rigorous but incremental contribution to optimization theory.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-04-06
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG applies value-based RL to train embedders for multi-step retrieval in long-context QA, addressing the fundamental limitation of single-step RAG on complex multi-hop questions. Novel use of RL for retrieval training.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-04-06
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Cross-lingual alignment technique for improving semantic proximity in multilingual information retrieval; solid but incremental work in a well-studied area.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-04-06
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
ICLR 2026 systematic benchmark of GPT-4o, o4-mini, Gemini 1.5 Pro and others on standard computer vision tasks, revealing where multimodal foundation models still fall short of specialized CV models. Provides actionable signal for practitioners choosing models for vision tasks.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-04-06
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous representations for variable-cardinality discrete structure prediction (object detection, molecular modeling) using neural fields and flow matching. Novel approach to a fundamental set-prediction problem.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-04-06
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses randomized scattering transform paths to reduce computational cost of wavelet-based perceptual loss functions for audio/vision inverse problems. Niche but practically useful for audio ML practitioners.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Ε½ilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-04-06
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture for probabilistic rare-event forecasting in multivariate time series, combining evidential deep learning with extreme value theory to handle severe class imbalance. Relevant for safety-critical forecasting applications.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-04-06
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators analogous to classical numerical methods, allowing dynamic compute allocation. Useful for scientific ML but niche.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-04-06
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic multi-modal learning to enzyme classification, capturing hierarchical EC number relationships better than Euclidean methods. Specialized bioinformatics contribution.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-04-06
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality limitations of purely autoregressive approaches. Relevant for real-time voice AI development.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-04-06
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K is an ICLR 2026 benchmark for proactive and personalized mobile GUI agents that act without explicit instructions, covering 20K tasks across real device contexts. Advances evaluation infrastructure for autonomous mobile agents.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-04-06
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (Instant NGP's core technique), enabling principled hyperparameter selection instead of heuristics. Useful for practitioners working with neural radiance fields and implicit representations.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 369 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact β€” 7+ items are the ones worth your time.

369+ research items ready to explore