Weekly Intelligence

AI Quick Bites

May 25, 2026 · 332 items from 13 sources

Last refreshed: May 25, 2026 at 12:51 UTC
Next refresh: June 01, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week — selected from 332 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
Weak teachers can improve larger LLM students during pretraining distillation — overturning a core assumption and opening cheaper training strategies.
arxiv 2026-05-25 18 min
03
Geopolitical bias in LLMs is actively injected during post-training/RLHF, not inherited from data — an 18x odds shift in Qwen 2.5 demands urgent scrutiny of alignment processes.
arxiv 2026-05-25 18 min
04
Claude Code achieves 98.1% on a Lean 4 program verification benchmark, signaling that agentic provers have outpaced existing benchmarks and new evaluation methodology is urgently needed.
arxiv 2026-05-25 20 min
05
Training-free looped transformers improve frozen pretrained models at test time with no fine-tuning — a practical test-time compute trick applicable to any existing checkpoint.
arxiv 2026-05-25 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

microsoft/agent-governance-toolkit
8/10
Microsoft's Agent Governance Toolkit provides policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents, covering all 10 OWASP Agentic Top 10 risks. A significant open-source release for teams deploying production agents who need structured governance frameworks.
github 2026-05-25 10 min
Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
7/10
Demonstrates domain-camouflaged prompt injection attacks that disguise malicious instructions as legitimate domain content, evading existing detection systems in multi-agent LLM pipelines. Novel attack vector with direct implications for any production multi-agent system.
hackernews 2026-05-25 20 min
Gemini randomly dumped its system prompt
7/10
Gemini was observed spontaneously leaking its own system prompt without any explicit extraction attempt, raising concerns about prompt confidentiality in production deployments. The 115-score HN thread with 44 comments reflects real practitioner concern.
hackernews 2026-05-25 3 min
AI Agent Security Lecture
7/10
Lecture materials covering AI agent security topics including prompt injection, tool misuse, and adversarial inputs against agentic LLM systems — a structured educational resource from a credible researcher with practical attack/defense framing.
hackernews 2026-05-25 20 min
Remove-AI-Watermarks – CLI and library for removing AI watermarks from images
7/10
CLI tool and library for removing AI-generated image watermarks, sparking significant HN discussion (258 comments) about the robustness of AI content provenance systems. Directly relevant to the ongoing debate about watermarking as a reliable AI detection mechanism.
hackernews 2026-05-25 5 min
Mexican government breached by solo user with Claude, 150 GB exfiltrated
7/10
First-person account of a solo researcher breaching the Mexican government's systems using Claude as an AI-assisted hacking tool, exfiltrating 150 GB of data. Concrete real-world demonstration of AI-augmented offensive security capabilities with significant policy implications.
hackernews 2026-05-25 10 min
Tell HN: Claude Code now allows Anthropic to remotely inject system prompts
7/10
Discovery that Claude Code v2.1.150 added a remote system prompt injection mechanism via api.anthropic.com/api/claude_cli/bootstrap, allowing Anthropic to silently modify agent behavior at runtime. Significant supply-chain trust concern for enterprises deploying Claude Code in sensitive environments.
hackernews 2026-05-25 5 min
OpenAI Adopts Google's SynthID Watermark for AI Images with Verification Tool
7/10
OpenAI adopts Google DeepMind's SynthID watermarking standard for AI-generated images and launches a public verification tool, marking a significant cross-industry alignment on content provenance infrastructure. This interoperability move could become the de facto standard for AI image authentication.
hackernews 2026-05-25 6 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
Introduces honesty fine-tuning to make LLMs self-report hidden objectives under interrogation, advancing alignment auditing for agentic AI systems. Directly addresses the challenge of detecting deceptive or misaligned behavior in capable models.
conferences 2026-05-25 20 min
What political censorship looks like inside an LLM's weights (Qwen 3.5)
7/10
Technical analysis of political censorship embedded in Qwen 3.5's weights, revealing how ideological constraints manifest at the model internals level — important for understanding model transparency and geopolitical bias in open-weight models.
hackernews 2026-05-25 10 min
Microsoft's new multi-model agentic security system tops leading benchmark
7/10
Microsoft's multi-model agentic security system achieves top scores on a leading cybersecurity benchmark — demonstrates practical deployment of LLM agent ensembles for real-time threat detection and response.
hackernews 2026-05-25 8 min
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
6/10
Empirical study across 7 LLM pairs shows geopolitical bias originates in post-training (RLHF/alignment), not pretraining data, with Qwen 2.5 showing an 18x shift in China-favorability odds after post-training. Critical finding for AI alignment transparency and auditing of RLHF processes.
arxiv 2026-05-25 18 min
Sieve – scans Cursor/Claude chat history for leaked API keys
6/10
Highlights a real security risk: AI coding tools (Cursor, Claude Code, Copilot) embed secrets from .env files into plaintext SQLite transcript databases, and Sieve is a macOS tool that scans these chat histories for leaked API keys. Practical security finding relevant to any developer using AI coding assistants.
hackernews 2026-05-25 3 min
The Verification Problem (On OpenAI's Erdős Disproof)
6/10
Analysis of the verification problem raised by OpenAI's claimed Erdős disproof — examining how humans can validate mathematical reasoning produced by AI systems that may exceed human expert capability. Touches on a fundamental alignment and trust challenge for frontier AI.
hackernews 2026-05-25 8 min
You can access Gemini chat history without unlocking your phone with Android 16
6/10
Android 16 security vulnerability allows access to Gemini chat history from the lock screen without authentication. Practical privacy/security issue for AI assistant deployments on mobile, reportedly unacknowledged by Google.
hackernews 2026-05-25 4 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
#1
r3gm
2 items · avg 5.0/10
10.0
#2
prithivMLmods
2 items · avg 4.0/10
8.0
#3
Pál András Papp
1 item · avg 7.0/10
7.0
#4
TencentARC
1 item · avg 7.0/10
7.0
#5
7.0
#6
7.0
Top Organizations
#1
anthropics
4 items · avg 7.5/10
30.0
#2
HKUDS
4 items · avg 7.0/10
28.0
#3
ChromeDevTools
2 items · avg 8.0/10
16.0
#4
microsoft
2 items · avg 8.0/10
16.0
#5
rohitg00
4 items · avg 4.0/10
16.0
#6
ruvnet
3 items · avg 5.0/10
15.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Constraint Drift Monitor
A developer tool that tracks and enforces constraints throughout long-horizon agentic coding sessions, alerting when the agent begins to violate or forget earlier requirements. Constraint decay is a documented failure mode where LLM agents progressively ignore earlier instructions during complex backend code generation. Build a lightweight middleware layer that maintains a live constraint registry, scores each agent output against it, and injects reminders or halts execution when drift is detected.
Agentic coding assistants (Cursor, Claude Code, Copilot) Backend API and database schema generation Multi-step refactoring workflows CI/CD pipeline automation with LLM agents
https://arxiv.org/abs/2605.06445
AI Secret Vault Scanner
A cross-platform security tool that continuously monitors AI coding assistant chat histories, local SQLite transcript databases, and project directories for leaked API keys, tokens, and credentials. AI coding tools like Cursor and Claude Code store conversation history in plaintext databases that often contain secrets copied from .env files. Build an open-source daemon that watches these paths, integrates with pre-commit hooks, and alerts developers before secrets are committed or exfiltrated.
Developer workstation security Enterprise AI coding tool governance Open-source project protection Security audit tooling for AI-assisted codebases
https://apps.apple.com/us/app/sieve-secr...
LLM Bias Auditor
A SaaS platform that audits LLM outputs for geopolitical, cultural, and ideological biases introduced during post-training alignment, helping enterprises understand and document model behavior before deployment. Research shows that RLHF and alignment processes — not pretraining data — are the primary source of geopolitical bias, with some models showing 18x shifts in favorability after post-training. Build a structured evaluation suite that probes models across sensitive topics, compares base vs. instruction-tuned variants, and generates compliance-ready bias reports.
Enterprise AI procurement and vendor evaluation Regulatory compliance and AI governance Journalism and media organizations using LLMs Government and NGO AI deployment review
https://arxiv.org/abs/2605.23825v1 https://arstechnica.com/ai/2026/05/anthr...
Agent Skill Optimizer
A meta-learning layer for LLM agent frameworks that treats skill documents and system prompts as trainable external state, automatically refining them based on task performance feedback without touching model weights. Inspired by SkillOpt's approach of using a separate optimizer model to make bounded text edits validated against held-out scores, this tool plugs into existing agent harnesses and iteratively improves skill libraries across deployments. Build it as an open-source library compatible with LangChain, AutoGen, and custom agent loops.
Customer support agent quality improvement Code generation agent specialization Enterprise knowledge worker automation Multi-agent pipeline tuning
https://arxiv.org/abs/2605.23904v1 https://arxiv.org/abs/2605.23899v1
Prompt Injection Firewall
A security proxy for multi-agent LLM systems that detects and blocks domain-camouflaged prompt injection attacks, where malicious instructions are disguised as legitimate domain content to evade existing filters. As multi-agent pipelines process external data from emails, documents, and web content, domain-camouflaged injections represent a critical unaddressed attack surface. Build a real-time inspection layer that uses semantic analysis and domain-context modeling to flag instructions that are anomalous relative to expected content patterns, deployable as a sidecar in any agent pipeline.
Agentic email and document processing systems RAG pipelines ingesting external web content Customer-facing LLM chatbots Automated code review and PR agents
https://arxiv.org/abs/2605.22001

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1
Unabyss
MCP-native self-updating context layer for your AI
Productivity Artificial Intelligence
258
58
https://www.producthunt.com/r/3EY2V...
#2
own.page
Make your own personal website with bento tiles
Social Network Social Media Website Builder
223
33
https://www.producthunt.com/r/MG5MN...
#3
Yansu
AI that learns how you work and turns it into software
Productivity Artificial Intelligence Maker Tools
190
70
https://www.producthunt.com/r/YAPEO...
#4
Supaboard 3.0
AI data analysts that understand your business
SaaS Data & Analytics Data Visualization
156
18
https://www.producthunt.com/r/AV6HL...
#5
tweet.md
X posts as clean Markdown
API Social Media Artificial Intelligence
134
19
https://www.producthunt.com/r/UYSS5...
#6
Pi Coding Agent
The coding-agent harness you can make your own
Open Source Artificial Intelligence Development
110
6
https://www.producthunt.com/r/4RD3V...
#7
Tiny CV
Resume builder that fits on one page
Hiring Productivity GitHub
110
5
https://www.producthunt.com/r/G3B2V...
#8
LLMTest
Use the right LLMs in your apps. Setup fallbacks. Be happy.
API Developer Tools Artificial Intelligence
94
2
https://www.producthunt.com/r/ZTTNK...
#9
Orchestria
AI music engine with granular stem control
Web App Music Artificial Intelligence
90
7
https://www.producthunt.com/r/7LPYC...
#10
Databerry
Track all your business data in a single dashboard
Analytics Data & Analytics Business Intelligence
86
7
https://www.producthunt.com/r/RNGRK...
View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
GH Trending
ChromeDevTools/chrome-devtools-mcp
typescript 41,641 2,648 1,732 stars this week
Official Chrome DevTools MCP server (41k+ stars, 1,732 new this week) enables AI coding agents to directly interact with browser DevTools for debugging, inspection, and automation — a significant infrastructure piece for browser-native agents.
Build idea
Build a SaaS QA automation platform where AI agents autonomously detect, reproduce, and file bug reports by directly interacting with web apps through Chrome DevTools, eliminating manual testing cycles.
2
GH Trending
anthropics/claude-plugins-official
python 27,658 2,941 7,666 stars this week
Anthropic's official, curated directory of Claude Code plugins with 27k+ stars and 7,600+ stars this week alone — signals a rapidly growing ecosystem around Claude Code as a development platform.
Build idea
Launch a marketplace and certification service for enterprise Claude Code plugins, offering vetted, compliance-ready integrations for industries like finance, healthcare, and legal where teams need audited AI tooling.
3
GH Trending
microsoft/agent-governance-toolkit
python 2,107 379 508 stars this week
Microsoft's Agent Governance Toolkit provides policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents, covering all 10 OWASP Agentic Top 10 risks. A significant open-source release for teams deploying production agents who need structured governance frameworks.
Build idea
Offer a managed compliance-as-a-service platform that deploys and monitors the Agent Governance Toolkit for enterprises, providing dashboards, audit logs, and policy templates to meet regulatory requirements for AI agent deployments.
4
GH Trending
HKUDS/CLI-Anything
python 40,251 3,797 4,759 stars this week
CLI-Anything wraps arbitrary CLI tools to make them agent-native, enabling AI agents to invoke any software via a unified interface. 40k+ stars with 4,759 new this week indicates strong developer adoption for agent tooling.
Build idea
Build a no-code agent integration platform that lets businesses expose their existing CLI-based internal tools to AI agents without writing custom APIs, dramatically reducing the cost of automating legacy software workflows.
5
GH Trending
HKUDS/ViMax
python 7,473 1,156 3,018 stars this week
ViMax is an agentic video generation system that combines director, screenwriter, producer, and video generator roles in one pipeline. 7,473 stars with 3,018 new this week suggests it's a notable open-source entry in autonomous video creation.
Build idea
Create a B2B video marketing SaaS where brands input a product brief and receive fully produced short-form ad videos autonomously, cutting production costs and turnaround time from weeks to minutes.
6
GH Trending
Michael-A-Kuykendall/shimmy
rust 5,262 486 444 stars this week
Shimmy is a Python-free Rust inference server with OpenAI-compatible API, supporting GGUF and SafeTensors with hot model swapping and auto-discovery in a single binary. Addresses real pain points in local inference deployment with a minimal-dependency approach.
Build idea
Offer a managed on-premise LLM appliance service for privacy-sensitive enterprises — ship a pre-configured hardware box running Shimmy that IT teams can deploy in air-gapped environments with zero Python dependency headaches.
7
GH Trending
NVlabs/Sana
python 7,582 577 1,457 stars this week
NVIDIA Labs' Sana uses a Linear Diffusion Transformer for efficient high-resolution image synthesis, achieving strong quality with reduced compute. 7,582 stars with continued weekly growth indicates sustained research and practitioner interest.
Build idea
Build a high-volume AI image generation API service targeting e-commerce and media companies that need cost-efficient, high-resolution product visuals at scale, using Sana's reduced compute requirements to undercut competitors on price.
8
GH Trending
ai-dynamo/dynamo
rust 7,065 1,164 249 stars this week
Dynamo is a datacenter-scale distributed inference serving framework written in Rust, targeting high-throughput LLM deployment across clusters. 7k+ stars positions it as a serious infrastructure option alongside vLLM and TensorRT-LLM.
Build idea
Launch a cloud-agnostic LLM inference hosting service built on Dynamo that offers enterprises predictable high-throughput SLAs and multi-cluster failover, positioning as a performance-focused alternative to OpenAI's API for latency-sensitive applications.
9
GH Trending
anthropics/knowledge-work-plugins
python 14,623 1,796 2,253 stars this week
Anthropic-published open-source plugins for knowledge workers using Claude Cowork, covering document analysis, research, and productivity workflows — extends Claude's utility beyond coding into enterprise knowledge tasks.
Build idea
Build a vertical SaaS for law firms or consulting agencies that bundles and customizes these knowledge-work plugins into a branded Claude-powered research and document analysis workspace with firm-specific compliance controls.
10
GH Trending
can1357/oh-my-pi
typescript 7,202 581 2,361 stars this week
Terminal-based AI coding agent featuring hash-anchored edits, LSP integration, browser control, Python execution, and sub-agent support — a technically ambitious open-source alternative to Cursor/Claude Code with 2,300+ stars this week.
Build idea
Offer a self-hosted, privacy-first AI coding agent subscription for regulated industries like defense and finance, packaging oh-my-pi with enterprise support, SSO, and audit logging for teams that cannot use cloud-based coding assistants.

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
Lukasz Jagiello
@ljagiello
ljagiello/ctf-skills
Agent skills repo for solving CTF challenges across web exploitation, binary pwn, crypto, and more — interesting application of AI agents to security competitions but light on technical detail from the profile summary alone.
2
Ryan Marten
@RyanMarten
RyanMarten/harvey-labs
Harvey Labs benchmark for evaluating AI agent capabilities on legal work tasks. Domain-specific evaluation benchmark with limited broader applicability.
3
Mish Ushakov
@mishushakov
mishushakov/llm-scraper
LLM-powered web scraper that converts any webpage into structured data — useful tool but a well-trodden pattern.
4
Aaron Stannard
@Aaronontheweb
Aaronontheweb/dotnet-skills
Claude Code skills and sub-agents tailored for .NET developers. Niche utility for a specific developer community.
5
mumu
@ZhuLinsen
ZhuLinsen/daily_stock_analysis
LLM-powered stock analysis system for A/H/US markets with multi-source data and multi-channel push notifications. Practical application but derivative of existing LLM finance tooling.
6
Alireza Rezvani
@alirezarezvani
alirezarezvani/claude-skills
A large collection of 329 Claude Code skills, custom commands, and agent plugins. Useful as a resource dump but lacks novelty as a curated list.
7
Chris Tate
@ctate
ctate/3d-model-generator
GitHub profile with a 3D model generator project — minimal detail available.
8
rUv
@ruvnet
ruvnet/ruflo
Developer profile for an agent orchestration platform for Claude — insufficient detail to evaluate technical merit.
9
Vivek Chand
@vivekchand
vivekchand/clawmetry
Real-time observability dashboard for OpenClaw AI agents — potentially useful but insufficient detail from profile summary.
10
Liran Tal
@lirantal
lirantal/npm-security-best-practices
npm security best practices collection — generic package security, not AI-specific.
11
Chi Wang
@sonichi
sonichi/sutando
Chi Wang (AutoGen creator) trending profile — insufficient detail from summary to score the actual project.
12
Hans-Kristian Arntzen
@HansKristian-Work
HansKristian-Work/vkd3d-proton
VKD3D-Proton fork for Direct3D 12 on Linux via Vulkan. Not AI-related.
13
MichaIng
@MichaIng
MichaIng/DietPi
DietPi lightweight OS for single-board computers. Not AI-related.
14
Kai
@RealKai42
RealKai42/qwerty-learner
Qwerty-learner typing/vocabulary training software. Not AI-related.
15
Andre Rinas
@andreknieriem
andreknieriem/headunit-revived
GitHub profile for an Android Auto headunit app developer — no AI relevance.
16
Stanislas
@angristan
angristan/openvpn-install
GitHub profile known for an OpenVPN install script — no AI relevance.
17
J. Nick Koston
@bdraco
bdraco/frozenlist
GitHub profile for a Python library developer — no AI relevance.
18
Jean Boussier
@byroot
byroot/pysrt
GitHub profile for a Ruby/Python developer — no AI relevance.
19
Philip Rebohle
@doitsujin
doitsujin/dxvk
GitHub profile for DXVK (DirectX-to-Vulkan) developer — no AI relevance.
20
fzyzcjy
@fzyzcjy
fzyzcjy/flutter_rust_bridge
GitHub profile for Flutter/Rust binding generator developer — no AI relevance.
21
Max Lv
@madeye
madeye/meow-ios
iOS VPN/proxy client — not AI-related.
22
Jonathan Manning
@pinin4fjords
pinin4fjords/nf-metro
Bioinformatics pipeline visualization tool — not AI-relevant.
23
Ruben Fiszel
@rubenfiszel
rubenfiszel/scala-flow
Trending developer profile with no clear AI-specific contribution surfaced.
24
Yair Morgenstern
@yairm210
yairm210/Unciv
Open-source Android/Desktop remake of Civ V — not AI-related.
25
zsviczian
@zsviczian
zsviczian/obsidian-excalidraw-plugin
Obsidian plugin for Excalidraw drawings — not AI-related.

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard — Top 15
#ModelTypeEloVotes
1 claude-opus-4-6-thinking Anthropic Closed 1502 27,454
2 claude-opus-4-7-thinking Anthropic Closed 1500 12,920
3 claude-opus-4-6 Anthropic Closed 1498 29,240
4 claude-opus-4-7 Anthropic Closed 1492 13,571
5 muse-spark Meta Closed 1489 11,103
6 gemini-3.1-pro-preview Google Closed 1488 34,189
7 gemini-3-pro Google Closed 1486 41,331
8 gpt-5.5-high OpenAI Closed 1481 10,172
9 gemini-3.5-flash Google Closed 1480 5,907
10 gpt-5.4-high OpenAI Closed 1480 21,023
11 gpt-5.5 OpenAI Closed 1478 10,294
12 grok-4.20-beta1 xAI Closed 1478 22,458
13 gpt-5.2-chat-latest-20260210 OpenAI Closed 1477 27,988
14 qwen3.7-max-preview Alibaba Closed 1475 3,741
15 grok-4.20-beta-0309-reasoning xAI Closed 1475 21,572
New & Trending Models
deepseek-ai/DeepSeek-V4-Pro
4,820,866 downloads 4,249 likes 156 trending
Open Source 2026-04-22
DeepSeek-V4-Pro is the flagship release with 4.8M downloads and 4,249 likes, representing a major open-weight frontier model competing with top closed models; MIT license makes it highly deployable.
sapientinc/HRM-Text-1B
90,026 downloads 276 likes 270 trending
Open Source 2026-05-17
HRM-Text-1B introduces a Hierarchical Reasoning Model architecture with a prefix-LM design and an associated arXiv paper (2605.20613); 90K downloads and 276 likes signal strong community interest in this novel pre-alignment reasoning approach.
deepseek-ai/DeepSeek-V4-Flash
2,953,721 downloads 1,226 likes 69 trending
Open Source 2026-04-22
DeepSeek's V4-Flash model — a faster, lighter variant of DeepSeek-V4-Pro with nearly 3M downloads, offering a strong speed/quality tradeoff for production inference.
HuggingFaceBio/Carbon-500M
1,766 downloads 28 likes 24 trending
Open Source 2026-05-12
HuggingFace Bio's 500M genomic language model for DNA sequences with speculative decoding support; a notable entry in the biological foundation model space from a credible lab.
nvidia/Nemotron-Labs-Diffusion-14B
5,195 downloads 93 likes 90 trending
Custom License 2026-04-22
NVIDIA's 14B diffusion-based language model from Nemotron Labs; diffusion LMs are an emerging alternative to autoregressive generation and this is a significant-scale release from a major lab.
zai-org/GLM-5.1
165,923 downloads 1,688 likes 25 trending
Open Source 2026-04-03
GLM-5.1 from ZhipuAI (zai-org) is a MoE-based text generation model with 165K+ downloads and MIT license, supporting English and Chinese. The MoE DSA architecture tag suggests a novel sparse attention variant worth investigating.
inclusionAI/Ring-2.6-1T
4,766 downloads 94 likes 20 trending
Open Source 2026-05-14
Ring-2.6-1T is a 1-trillion parameter hybrid architecture model from inclusionAI; notable scale but limited public documentation and low download count suggest early-stage release.
nvidia/Nemotron-Labs-Diffusion-3B
19,496 downloads 26 likes 24 trending
Custom License 2026-03-02
3B parameter diffusion language model from NVIDIA; smaller companion to the 14B, useful for studying the diffusion LM paradigm at accessible scale.
nvidia/Nemotron-Labs-Diffusion-8B
27,984 downloads 26 likes 26 trending
Custom License 2026-03-18
8B diffusion language model from NVIDIA Nemotron Labs; part of a family exploring non-autoregressive text generation at scale.
openbmb/BitCPM-CANN-8B
689 downloads 25 likes 22 trending
Open Source 2026-05-15
OpenBMB's BitCPM 8B model optimized for Huawei CANN (Ascend) hardware; notable for targeting non-NVIDIA AI accelerators in the Chinese ecosystem.
Ex0bit/Qwen3.6-27B-PRISM-PRO-DQ
1,869 downloads 29 likes 29 trending
Open Source 2026-05-19
Dynamic quantization GGUF of Qwen3.6-27B with speculative decoding support for llama.cpp; useful for local inference but derivative quantization work.
Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF
37,628 downloads 93 likes 85 trending
Open Source 2026-05-18
GGUF quantization of a Qwen3.5-9B coding fine-tune with multi-token prediction for speculative decoding; high download count but derivative fine-tune work.
Jackrong/Qwopus3.6-27B-v2-MTP-GGUF
23,762 downloads 84 likes 82 trending
Open Source 2026-05-21
27B Qwen3.6 fine-tune with multi-token prediction targeting coding, math, and science tasks in GGUF format; derivative but well-rounded capability profile.
antirez/deepseek-v4-gguf
360,101 downloads 178 likes 30 trending
Open Source 2026-04-26
GGUF quantizations of DeepSeek-V4-Flash optimized for Apple Silicon Metal inference; notable for making a large MoE model accessible locally.
openbmb/BitCPM-CANN-8B-gguf
789 downloads 31 likes 27 trending
Open Source 2026-05-16
GGUF quantization of BitCPM-CANN-8B for broader local inference compatibility; derivative of the base model release.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week — try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

Carbon
HuggingFaceBio
docker 83 81
Interactive demo for Carbon-500M, HuggingFace Bio's genomic DNA language model; provides hands-on access to biological sequence modeling capabilities.
Qwen Image Edit 2509 LoRAs Fast
Onise
gradio 86 50
apache-2.0
Fast demo of a collection of Qwen-based image editing LoRAs; useful for exploring instruction-guided image editing but derivative of existing Qwen multimodal work.
DramaBox
ResembleAI
gradio 88 30
other
ResembleAI's DramaBox demo showcases expressive TTS with voice cloning capabilities; competitive in the growing expressive speech synthesis space.
Omni-Video-Factory-API-iframe
Saravutw
gradio 59 33
apache-2.0
Video generation API demo with iframe embedding; minimal documentation and unclear technical novelty.
Supertonic 3 (TTS)
Supertone
static 193 61
openrail
Supertonic-3 is a fast, on-device multilingual TTS system from Supertone; the on-device angle and multilingual accuracy make it notable for edge deployment use cases.
Pixal3D
TencentARC
gradio 254 88
Pixal3D from TencentARC achieves high-fidelity pixel-aligned image-to-3D generation; the pixel-alignment approach addresses a key quality bottleneck in single-image 3D reconstruction.
Wan2.2 14B Fast Preview
cbensimon
gradio 147 72
Fast preview demo of Wan2.2 14B image-to-video model using FP8 quantization and AOT compilation for accelerated inference. Multiple trending spaces around this model suggest a notable new release in video generation.
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P
mikeee
docker 358 76
mit
Uncensored fine-tune of Gemma-4 running via a repurposed Qwen chat space. Low technical novelty; primarily a jailbreak-adjacent model demo with no research substance.
Z Image Turbo
mrfakename
gradio 3,221 34
High-likes demo space for Z-Image Turbo, a fast image generation model. Minimal description available but strong community traction suggests a capable fast diffusion variant.
MTEB Leaderboard
mteb
docker 7,412 29
mit
The canonical MTEB embedding model leaderboard — a persistent reference resource for comparing text embedding models across retrieval, classification, and clustering benchmarks. Useful ongoing reference rather than a new release.
L2P - Z-Image 6B Pixel-Space
multimodalart
gradio 30 30
apache-2.0
Demo of Z-Image 6B operating in pixel space end-to-end via L2P (Latent-to-Pixel), bypassing the VAE bottleneck common in latent diffusion models. Architecturally interesting approach to direct pixel-space generation at scale.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 1,337 45
apache-2.0
Fast image editing demo combining FireRed-Image-Edit with Qwen-Image-Edit-Rapid via Transformers. Incremental demo of existing image editing models with no novel research contribution.
Qwen-Image-Edit-2511-LoRAs-Fast
prithivMLmods
gradio 1,500 50
apache-2.0
Collection of LoRA adapters for Qwen image editing models, enabling fast style/task-specific image editing. Useful practitioner resource but derivative of the base Qwen image edit work.
Wan2.2 14B Preview
r3gm
gradio 2,648 28
Another Wan2.2 14B image-to-video preview space; duplicate of the trending fast variant with lower engagement. See the fast preview space for the more relevant version.
Wan2.2 14B Fast Preview
r3gm
gradio 1,356 102
Top-trending Wan2.2 14B Fast demo using FP8 dynamic quantization and AOT compilation, enabling significantly faster image-to-video generation. The high trending score and likes indicate this is the most-used public access point for the Wan2.2 model.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-05-25
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Common Corpus is presented as the largest openly licensed dataset for LLM pre-training, addressing legal and copyright concerns around training data. Relevant for practitioners needing compliant pre-training data at scale.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-05-25
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
MedAraBench introduces a large-scale Arabic medical QA benchmark to address the underrepresentation of Arabic in NLP, particularly for clinical applications. Useful for multilingual medical NLP researchers but narrow in scope.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-05-25
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Theoretical study analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, providing formal grounding for in-context learning behavior. Advances mechanistic understanding of why pre-trained LLMs generalize.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-05-25
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Task Tokens introduces a flexible conditioning mechanism for adapting transformer-based behavior foundation models in humanoid robotics without full retraining. Addresses the practical challenge of task generalization in multi-modal robotic control.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-05-25
Submodular Function Minimization with Dueling Oracle
Theoretical work on submodular function minimization using a noisy pairwise comparison oracle; tangentially relevant to preference-based optimization but not directly impactful for mainstream AI/ML practitioners.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-05-25
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Introduces a benchmark for evaluating MLLMs on scan-oriented academic paper reasoning, distinguishing between search and scan cognitive tasks. Highlights a gap between current MLLM capabilities and autonomous research assistance.
Multimodal Large Language Models Academic Paper Reasoning Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-05-25
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th Order Recursive Consistent Velocity Field Estimation for any-step generation, simplifying few-step generative model training by removing complex multi-component losses. Could reduce computational overhead for consistency-model-style generation.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-05-25
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT is a fully offline hierarchical RL framework using masked skill token training to transfer policies across environments with different dynamics. Addresses a key sim-to-real and cross-domain transfer challenge without requiring online interaction.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-05-25
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings. Theoretically solid but incremental contribution to optimization theory.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-05-25
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG applies RL-based value training to embedders for multi-step retrieval in long-context RAG, addressing the limitation of single-step retrieval for complex multi-hop questions. Combines RL with retrieval training in a novel way.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-05-25
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Proposes cross-lingual alignment techniques to improve semantic proximity in multilingual information retrieval. Incremental improvement on CLIR with practical value for multilingual search systems.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-05-25
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5/2.0 Pro on standard computer vision tasks, revealing where frontier multimodal models still fall short of specialized CV models. Provides actionable signal for practitioners choosing between general and specialized vision models.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-05-25
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous representations for variable-cardinality discrete structure prediction using neural fields and flow matching. Novel approach for object detection and molecular modeling but niche applicability.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-05-25
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses random path sampling in scattering transforms to reduce computational cost while maintaining perceptual quality gradients for audio/vision inverse problems. Useful for audio ML practitioners but specialized.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Žilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-05-25
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for rare-event forecasting in multivariate time series. Addresses severe class imbalance and distributional uncertainty in anomaly detection.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-05-25
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators analogous to classical numerical methods. Relevant for scientific ML but limited broader AI impact.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-05-25
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic multi-modal learning to enzyme classification, capturing hierarchical enzyme relationships better than Euclidean methods. Niche but solid contribution to computational biology.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-05-25
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality limitations of purely autoregressive approaches. Relevant for real-time voice AI applications.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-05-25
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K benchmarks proactive and personalized mobile GUI agents that act without explicit instructions by leveraging user context. Pushes mobile agent research toward anticipatory behavior rather than reactive command execution.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-05-25
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (NeRF/NeRP backbone), replacing heuristic hyperparameter tuning with principled design. Useful for practitioners building neural field applications.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 332 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact — 7+ items are the ones worth your time.

332+ research items ready to explore