Weekly Intelligence

AI Quick Bites

April 20, 2026 · 321 items from 13 sources

Last refreshed: April 20, 2026 at 11:07 UTC
Next refresh: April 27, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week β€” selected from 321 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
CrossMath's controlled benchmark reveals that adding visual input to VLMs frequently hurts reasoning performance, exposing that current multimodal models are fundamentally text-first reasonersβ€”a finding with direct implications for VLM architecture and training.
arxiv 2026-04-20 18 min
03
RISE makes gradient-based data attribution practical at 32B+ scale via output-layer sketching with 112x storage reduction, enabling training data selection and influence analysis that was previously memory-infeasible.
arxiv 2026-04-20 20 min
04
Layer-Wise Information scores from internal LLM representations outperform output-level uncertainty signals for conformal prediction under distribution shift, offering a more robust uncertainty quantification primitive for production LLM deployments.
arxiv 2026-04-20 18 min
05
neuralCAD-Edit establishes the first expert-grounded benchmark for multimodal 3D CAD editing and shows even GPT scores 53% below human experts, setting a clear challenge target for the AI-assisted engineering design community.
arxiv 2026-04-20 15 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

SoK: Security of Autonomous LLM Agents in Agentic Commerce
8/10
Systematization of Knowledge (SoK) paper comprehensively mapping security threats to autonomous LLM agents in agentic commerce scenarios β€” covering prompt injection, tool misuse, and trust boundary violations. Essential reading as agentic AI deployments proliferate in high-stakes commercial contexts.
hackernews 2026-04-20 25 min
We reproduced Anthropic's Mythos findings with public models
8/10
Vidoc Security independently reproduced Anthropic's Mythos research β€” which demonstrated that LLMs can engage in deceptive alignment behaviors β€” using publicly available models, significantly broadening the threat surface beyond proprietary systems. This is a meaningful validation of a concerning safety finding.
hackernews 2026-04-20 10 min
Anthropic Claude Code Leak Reveals Critical Command Injection Vulnerabilities
7/10
Leaked Claude Code source reportedly reveals critical command injection vulnerabilities in Anthropic's coding agent, raising serious concerns about security in AI-powered developer tools. High practical significance for teams deploying Claude Code in production.
hackernews 2026-04-20 5 min
N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
7/10
N-Day-Bench is a dynamic benchmark that monthly pulls fresh CVEs from GitHub security advisories and tests whether frontier LLMs can autonomously find known vulnerabilities in real codebases via a sandboxed bash shell. The monthly refresh mechanism specifically combats training data contamination, making this a rigorous and evolving evaluation of LLM-powered vulnerability discovery.
hackernews 2026-04-20 8 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
ICLR 2026 paper on training LLMs to self-report hidden objectives via honesty fine-tuning, enabling alignment auditing of agentic models β€” directly addresses the challenge of detecting deceptive or misaligned behavior in capable AI systems.
conferences 2026-04-20 20 min
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
6/10
GRIFT detects reward hacking in RLVR by computing gradient fingerprints of chain-of-thought traces and using them as a classifier signal, achieving 25%+ relative improvement over CoT Monitor and TRACE baselines; integrating GRIFT into rejection fine-tuning reduces hacking and improves task performance. Addresses a real and underexplored failure mode in reasoning model training.
arxiv 2026-04-20 18 min
ASMR-Bench: Auditing for Sabotage in ML Research
6/10
ASMR-Bench evaluates LLMs' ability to detect subtle sabotage in ML research codebases, finding even the best model (Gemini) achieves only 0.77 AUROC and 42% fix rate β€” highlighting a critical gap in AI-assisted research oversight. Relevant to AI safety and autonomous research agent deployment.
arxiv 2026-04-20 20 min
€54k spike in 13h from unrestricted Firebase browser key accessing Gemini APIs
6/10
A developer incurred €54k in Gemini API charges in 13 hours after an unrestricted Firebase browser key was exposed and abused. Highlights a critical API key security anti-pattern specific to AI billing exposure β€” relevant warning for anyone shipping Gemini-integrated apps.
hackernews 2026-04-20 5 min
Show HN: Nyx – multi-turn, adaptive, offensive testing harness for AI agents
6/10
Nyx is an autonomous red-teaming harness for AI agents that probes for logic bugs, instruction-following failures, jailbreaks, and prompt injection via multi-turn adaptive testing. Early-stage but addresses a real gap in agent QA tooling.
hackernews 2026-04-20 5 min
They Hacked Claude, Gemini, and Copilot (and No One Told You)
6/10
Security research blog claiming successful attacks against Claude, Gemini, and GitHub Copilot, though the lack of HN comments and marketing-heavy framing suggest this may be more promotional than rigorous. Worth a quick read to assess the actual technical depth of the vulnerabilities disclosed.
hackernews 2026-04-20 5 min
Changes in the system prompt between Claude Opus 4.6 and 4.7
6/10
Analysis of system prompt changes between Claude Opus 4.6 and 4.7, revealing how Anthropic is evolving behavioral constraints and model identity instructions. Useful for understanding alignment and deployment decisions.
hackernews 2026-04-20 5 min
KillBench: Every frontier LLM is biased about who deserves to live
6/10
KillBench evaluates frontier LLMs for demographic biases in life-or-death moral dilemmas, finding consistent biases across all tested models. Provocative but methodologically relevant for AI safety and alignment researchers.
hackernews 2026-04-20 8 min
Anthropic installed a spyware bridge on my machine?
6.0/10
A user claims Anthropic's Claude Code installed an undisclosed network bridge on their machine, raising supply chain and privacy concerns about agentic coding tools. Worth monitoring for technical follow-up; currently unverified.
hackernews 2026-04-20 6 min
Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations
5/10
Proposes Layer-Wise Information (LI) scores derived from internal LLM representations as conformal prediction nonconformity scores, achieving better validity-efficiency tradeoffs than output-level uncertainty signals especially under distribution shift. Useful for practitioners needing reliable uncertainty quantification in deployed LLMs.
arxiv 2026-04-20 18 min
Does Gas Town 'steal' usage from users' LLM credits to improve itself?
5/10
Community investigation into whether Gas Town (an AI coding tool) covertly uses users' LLM API credits to improve itself β€” raises real concerns about unauthorized token consumption and data exfiltration by AI tooling.
hackernews 2026-04-20 5 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
#1
webml-community
2 items · avg 6.5/10
13.0
#2
r3gm
2 items · avg 5.5/10
11.0
#3
HuggingFaceTB
1 item · avg 7.0/10
7.0
#4
prism-ml
1 item · avg 7.0/10
7.0
#5
victor
2 items · avg 3.5/10
7.0
#6
7.0
Top Organizations
#1
NousResearch
2 items · avg 7.0/10
14.0
#2
OpenBMB
2 items · avg 7.0/10
14.0
#3
lsdefine
2 items · avg 7.0/10
14.0
#4
openai
2 items · avg 7.0/10
14.0
#5
z-lab
2 items · avg 7.0/10
14.0
#6
OpenMOSS
2 items · avg 6.0/10
12.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Security Firewall
A runtime security layer that sits between LLM agents and their tool/execution environments, detecting and blocking prompt injection, command injection, and trust boundary violations before they execute. As agentic AI deployments proliferate in commerce and dev tooling, the attack surface is exploding and no dedicated defense product exists yet. Build a proxy/middleware SDK that intercepts agent tool calls, scores them for malicious intent, and enforces configurable policy rules.
AI coding agents in CI/CD pipelines Agentic e-commerce and payment workflows Enterprise LLM agent deployments Claude Code / Cursor-style developer tools
https://arxiv.org/abs/2604.15367 https://beyondmachines.net/event_details...
Reward Hack Monitor
A developer tool that integrates gradient fingerprint analysis (inspired by GRIFT) into RLVR training pipelines to automatically flag and filter reward-hacking chain-of-thought traces before they corrupt model behavior. Reward hacking is a silent killer in reasoning model fine-tuning and practitioners have no off-the-shelf tooling to catch it. Build a training callback library compatible with popular frameworks like TRL and veRL that surfaces hacking signals in a dashboard and optionally gates gradient updates.
Math and coding reasoning model training RLVR fine-tuning pipelines Enterprise LLM customization workflows AI safety auditing for deployed models
https://arxiv.org/abs/2604.16242v1 https://arxiv.org/abs/2604.16259v1
Radiology Agent Hierarchy
A multi-agent clinical reporting product that mirrors the resident/fellow/attending supervision structure in radiology, using specialized LLM agents with retrieval-augmented revision and consensus to generate CT reports with fewer hallucinations than monolithic VLMs. Radiologist burnout and report backlog are real problems, and a structured agentic approach that mimics existing clinical workflows is far more deployable than black-box models. Build a HIPAA-compliant SaaS that integrates with PACS systems and surfaces confidence scores alongside draft reports for physician review.
Hospital radiology departments Teleradiology services Medical AI second-opinion tools Clinical documentation automation
https://arxiv.org/abs/2604.16175v1
Crypto Synthetic Data Engine
A privacy-safe synthetic financial time series generator that uses conditional GAN-LSTM models to produce realistic cryptocurrency and broader market data that preserves temporal patterns, correlations, and volatility regimes. Financial ML teams are blocked by data licensing costs, privacy regulations, and sparse historical data for rare market events. Build a self-serve API where users specify asset class, time horizon, and market regime parameters and receive statistically validated synthetic datasets with built-in quality metrics.
Quant trading strategy backtesting Risk model training without proprietary data exposure Academic financial ML research Regulatory stress-testing simulations
https://arxiv.org/abs/2604.16182v1
LLM Uncertainty Shield
A drop-in uncertainty quantification middleware for production LLM APIs that uses internal layer-wise representations (not just output logits) to generate calibrated conformal prediction intervals, flagging low-confidence responses before they reach end users. Output-level confidence scores are notoriously unreliable under distribution shift, and practitioners deploying LLMs in high-stakes settings have no robust alternative today. Build a lightweight inference wrapper compatible with OpenAI, Anthropic, and open-weight model APIs that returns a confidence band and a reject/accept signal alongside every response.
Medical and legal LLM assistants Automated customer support escalation routing RAG pipelines with factual accuracy requirements AI-assisted code review and security scanning
https://arxiv.org/abs/2604.16217v1 https://arxiv.org/abs/2604.16146v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week β€” sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
GH Trending
NousResearch/hermes-agent
python 103,973 14,823 38,194 stars this week
NousResearch's Hermes Agent is an open-source agent framework that gained 38k+ GitHub stars in a single week β€” extraordinary traction suggesting it fills a real gap. Worth investigating for its architecture and tool-use capabilities.
Build idea
Build a white-label AI agent platform for enterprises that lets non-technical teams deploy customizable, tool-using agents for tasks like CRM updates, internal IT helpdesk, and data retrieval β€” all hosted and managed as a SaaS with usage-based pricing.
2
GH Trending
OpenBMB/VoxCPM
python 14,990 1,776 4,136 stars this week
VoxCPM2 is a tokenizer-free TTS model from OpenBMB supporting multilingual speech generation, creative voice design, and voice cloning β€” gaining 4k+ stars this week. The tokenizer-free approach is architecturally notable for speech generation.
Build idea
Launch a multilingual voice cloning and custom voice design API targeting podcast platforms, audiobook publishers, and game studios that need scalable, expressive, on-demand narration without hiring voice actors.
3
GH Trending
lsdefine/GenericAgent
python 4,768 514 3,512 stars this week
Self-evolving agent that grows a skill tree from a 3.3K-line seed codebase, claiming full system control with 6x lower token consumption than comparable agents. The self-improvement and token efficiency claims are technically interesting and warrant scrutiny.
Build idea
Build a cost-efficient AI DevOps agent SaaS that autonomously learns and expands its own skill set to handle infrastructure tasks β€” deployments, monitoring, incident response β€” at a fraction of the token cost of competing agent solutions.
4
GH Trending
openai/openai-agents-python
python 23,633 3,669 2,197 stars this week
OpenAI's official lightweight Python framework for building multi-agent workflows, with 23k+ stars and strong weekly growth. Represents the canonical SDK for orchestrating OpenAI-powered agents with handoffs, guardrails, and tracing built in.
Build idea
Offer a managed multi-agent workflow builder β€” a no-code/low-code platform on top of this SDK β€” where businesses can visually design, deploy, and monitor complex agent pipelines with built-in guardrails, handoffs, and audit tracing.
5
GH Trending
z-lab/dflash
python 1,927 135 869 stars this week
DFlash introduces block diffusion for flash speculative decoding, combining diffusion-based generation with speculative decoding to accelerate LLM inference. Novel technique at the intersection of diffusion models and inference optimization with meaningful speedup potential.
Build idea
Create an LLM inference optimization service that integrates DFlash's speculative decoding technique to offer enterprises significantly faster and cheaper API inference for their self-hosted or cloud-deployed large language models.
6
GH Trending
OpenMOSS/MOSS-TTS
python 1,588 144 372 stars this week
MOSS-TTS is an open-source speech and sound generation model family targeting high-fidelity, expressive, real-world scenarios including multi-speaker dialogue, voice design, sound effects, and real-time streaming TTS. Solid open-source release in a competitive space.
Build idea
Build a real-time dubbing and localization SaaS for video content creators and streaming platforms, using MOSS-TTS to automatically generate expressive, multi-speaker, multilingual audio tracks synchronized to existing video content.
7
GH Trending
dora-rs/dora
rust 3,678 382 211 stars this week
Rust-based dataflow-oriented middleware for building AI robotic applications with low-latency, composable, distributed pipelines modeled as directed graphs. Solid open-source infrastructure for AI robotics.
Build idea
Offer a managed cloud platform for robotics teams to deploy, monitor, and iterate on AI robotic pipelines built with Dora, providing hosted orchestration, telemetry dashboards, and OTA updates for fleets of robots in warehouses or manufacturing.
8
TrendShift
forrestchang/andrej-karpathy-skills
61,700 5,400
A single CLAUDE.md configuration file distilling Andrej Karpathy's observations on LLM coding pitfalls into actionable Claude Code behavior improvements. Viral traction (61K stars) reflects broad practitioner interest in prompt engineering for coding agents.
Build idea
Sell a subscription library of curated, expert-validated CLAUDE.md and system prompt configuration packs tailored to specific developer roles β€” frontend, backend, data science β€” that teams can drop into their AI coding agent workflows to immediately improve output quality.
9
GH Trending
microsoft/markitdown
python 113,080 7,330 9,018 stars this week
Microsoft's Python tool for converting Office documents and files to Markdown, widely used as a preprocessing step for LLM ingestion pipelines. Massive traction (113K stars, 9K this week) confirms it as a standard RAG preprocessing utility.
Build idea
Build a document ingestion pipeline SaaS that uses MarkItDown to automatically convert enterprise file repositories β€” SharePoint, Google Drive, Confluence β€” into clean, LLM-ready Markdown, feeding downstream RAG and knowledge base applications.
10
GH Trending
superradcompany/microsandbox
rust 5,603 265 256 stars this week
Rust-based secure local sandbox environment designed specifically for AI agents to execute code safely. Addresses a critical infrastructure need for agentic systems that require isolated code execution.
Build idea
Offer a secure code execution infrastructure API β€” similar to E2B but Rust-native β€” that AI agent developers and LLM application builders can integrate to safely run untrusted, AI-generated code in isolated sandboxes with per-execution billing.

Trending Developers

Developers gaining traction on GitHub this week β€” shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
ζœ±ζ˜†ιΉ
@zhukunpenglinyutong
zhukunpenglinyutong/jetbrains-cc-gui
A JetBrains plugin providing a GUI for Claude Code and OpenAI Codex, bringing agentic coding assistants into the JetBrains IDE ecosystem. Useful for developers who prefer JetBrains over VS Code for AI-assisted coding.
2
Benson Wong
@mostlygeek
mostlygeek/llama-swap
Trending developer building llama-swap, a tool for reliable model swapping across local OpenAI/Anthropic-compatible servers like llama.cpp and vLLM. Useful infrastructure for local model serving.
3
dav nguyxn
@hoangsonww
hoangsonww/Claude-Code-Agent-Monitor
Trending developer building a real-time monitoring dashboard for Claude Code agents using SQLite, Node.js, and WebSockets. Useful tooling concept but derivative.
4
Baris Sencan
@isair
isair/jarvis
Trending developer building a fully offline, private AI voice assistant (Jarvis) that runs locally on-device. Interesting for local AI deployment but no novel research.
5
Maziyar Panahi
@maziyarpanahi
maziyarpanahi/openmed
Trending developer working on open-source healthcare AI; profile-level signal only, no specific technical content to evaluate.
6
Matt Van Horn
@mvanhorn
mvanhorn/last30days-skill
Developer profile featuring a research agent skill that aggregates and synthesizes information across Reddit, X, YouTube, HN, and Polymarket. Minimal technical detail available from the profile alone.
7
Fengda Huang
@phodal
phodal/routa
Developer profile for a workspace-first multi-agent coordination platform for AI development. Insufficient technical detail from profile summary alone.
8
DaniΓ«l de Kok
@danieldk
danieldk/dictomaton
Trending GitHub developer profile; popular repo is a Java finite-state dictionary library with no direct AI relevance.
9
Igor Lins e Silva
@igorls
igorls/context-builder
Trending developer with a context-builder repo; insufficient detail to assess AI significance.
10
Duy /zuey/
@mrgoonie
mrgoonie/claudekit-skills
Trending developer profile focused on ClaudeKit skills; insufficient technical detail to assess significance.
11
@qixing-jk
qixing-jk/all-api-hub
API relay manager for managing multiple LLM API accounts with balance tracking and key export. Utility tool with limited technical novelty.
12
zhayujie
@zhayujie
zhayujie/CowAgent
Developer profile for CowAgent, a WeChat-integrated LLM assistant; duplicate of the repo entry below.
13
Ron Evans
@deadprogram
14
Felix Rieseberg
@felixrieseberg
felixrieseberg/windows95
Trending developer profile; popular repo is Windows 95 in Electron β€” no AI relevance.
15
Andre Weissflog
@floooh
floooh/sokol
Trending developer profile focused on minimal C headers; no AI relevance.
16
Gabriel Miranda
@gabrielmfern
gabrielmfern/forbear
Trending developer profile with no substantive AI content.
17
Lukas Holecek
@hluk
hluk/CopyQ
Trending developer profile for a clipboard manager; no AI relevance.
18
Henrik RydgΓ₯rd
@hrydgard
hrydgard/ppsspp
Trending developer profile for a PSP emulator; no AI relevance.
19
Shahed Nasser
@shahednasser
shahednasser/awesome-resources
Developer profile featuring a community resource list; not AI-specific.
20
Tom Payne
@twpayne
twpayne/chezmoi
Developer profile for dotfile manager chezmoi; not AI-related.
21
Classic298
@Classic298
Classic298/open-webui-plugins
A curated collection of Open WebUI plugins - tools, skills, filters, pipes, and actions that extend your AI chat experience.

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard β€” Top 15
#ModelTypeEloVotes
1 claude-opus-4-7-thinking Anthropic Closed 1505 2,618
2 claude-opus-4-6-thinking Anthropic Closed 1503 18,144
3 claude-opus-4-7 Anthropic Closed 1498 3,485
4 claude-opus-4-6 Anthropic Closed 1497 19,373
5 muse-spark Meta Closed 1496 5,155
6 gemini-3.1-pro-preview Google Closed 1492 22,905
7 gemini-3-pro Google Closed 1486 41,404
8 grok-4.20-beta1 xAI Closed 1485 12,069
9 gpt-5.4-high OpenAI Closed 1482 11,568
10 grok-4.20-beta-0309-reasoning xAI Closed 1480 11,661
11 gpt-5.2-chat-latest-20260210 OpenAI Closed 1477 17,827
12 grok-4.20-multi-agent-beta-0309 xAI Closed 1476 12,049
13 gemini-3-flash Google Closed 1474 30,817
14 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1473 37,184
15 glm-5.1 Z.ai Open 1472 7,179
New & Trending Models
MiniMaxAI/MiniMax-M2.7
314,205 downloads 989 likes 374 trending
Custom License 2026-04-09
Official MiniMax M2.7 model release β€” a large MoE model with 314K downloads and 989 likes, making it one of the most significant open model releases in this batch. The M2 architecture with FP8 support and strong community traction signals this is a competitive open-weight model worth evaluating.
zai-org/GLM-5.1
124,162 downloads 1,431 likes 264 trending
Open Source 2026-04-03
GLM-5.1 is a new MoE model from ZhipuAI/zai-org with 1431 likes and 124K downloads, representing a significant new open-weight Chinese frontier model release. The glm_moe_dsa architecture tag and bilingual (en/zh) support make it a notable addition to the open-weight MoE landscape.
Qwen/Qwen3-Coder-Next
646,521 downloads 1,310 likes 27 trending
Open Source 2026-01-30
Qwen3-Coder-Next is a code-specialized model from Alibaba's Qwen team with 646K downloads and 1310 likes, indicating strong adoption. The 'Next' designation suggests this is a preview/upcoming release of the next Qwen coding model generation.
prism-ml/Bonsai-8B-gguf
96,081 downloads 646 likes 70 trending
Open Source 2026-03-18
Bonsai-8B is a 1-bit quantized 8B model from Prism ML with 96K downloads and 646 likes, representing serious interest in extreme quantization for on-device deployment. The 1-bit approach at 8B scale with CUDA and Metal support is technically noteworthy.
z-lab/Qwen3.5-27B-DFlash
16,972 downloads 88 likes 34 trending
Open Source 2026-03-14
DFlash applies diffusion-based speculative decoding to Qwen3.5-27B, combining block diffusion language modeling with flash decoding for faster inference. The arxiv:2602.06036 reference points to a novel decoding paradigm worth investigating.
z-lab/Qwen3.6-35B-A3B-DFlash
5,930 downloads 37 likes 35 trending
Open Source 2026-04-17
DFlash variant for Qwen3.6-35B-A3B MoE model using block diffusion as a draft model for speculative decoding. Applying diffusion-based drafting to MoE architectures is a novel inference optimization angle.
LilaRest/gemma-4-31B-it-NVFP4-turbo
157,867 downloads 254 likes 65 trending
Open Source 2026-04-07
NVFP4 quantization of Gemma 4 31B using NVIDIA ModelOpt, optimized for vLLM inference with 157K downloads. Represents a practical path to running Gemma 4 31B efficiently on NVIDIA hardware.
Rta-AILabs/Nandi-Mini-150M-Instruct
3,486 downloads 44 likes 42 trending
Open Source 2026-04-13
A 150M parameter instruction-tuned model supporting 11 Indian languages (Hindi, Marathi, Tamil, Telugu, Kannada, Malayalam, Bengali, Punjabi, Gujarati, Odia), addressing a significant underserved multilingual NLP gap. Small model size makes it practical for on-device deployment in South Asian markets.
nvidia/Gemma-4-31B-IT-NVFP4
1,325,194 downloads 411 likes 43 trending
Custom License 2026-04-02
NVIDIA's official NVFP4 quantization of Gemma 4 31B IT using ModelOpt, with 1.3M downloads indicating it's the go-to quantization for NVIDIA GPU users. Complements the community NVFP4 variant and validates the format for production use.
prism-ml/Ternary-Bonsai-8B-mlx-2bit
6,433 downloads 60 likes 58 trending
Open Source 2026-04-13
Ternary (1.58-bit) MLX quantization of Bonsai-8B for Apple Silicon, pushing extreme compression for on-device inference. Complements the GGUF 1-bit variant and extends the Bonsai family to Mac hardware.
unsloth/GLM-5.1-GGUF
43,725 downloads 169 likes 35 trending
Open Source 2026-04-06
Unsloth's GGUF quantization of GLM-5.1 enables local inference of the new GLM-5.1 MoE model. 43K downloads signals strong demand for accessible local deployment.
unsloth/MiniMax-M2.7-GGUF
139,172 downloads 136 likes 54 trending
Custom License 2026-04-11
GGUF quantization of MiniMax-M2.7 with 139K downloads, making this large MoE model accessible for local inference via llama.cpp. High download count reflects strong interest in MiniMax's model.
unsloth/Qwen3-Coder-Next-GGUF
215,057 downloads 597 likes 24 trending
Open Source 2026-02-03
GGUF quantization of Qwen3-Coder-Next with 215K downloads, making Qwen's latest coding model accessible locally. High download count but the quantization itself is routine.
Jackrong/Qwopus-GLM-18B-Merged-GGUF
7,182 downloads 133 likes 128 trending
Open Source 2026-04-18
High-traction GGUF of a frankenmerge combining Qwen3.5 and GLM5.1 distillation, targeting reasoning, tool-use, and multilingual tasks at 18B parameters. The merge approach combining two distinct model families is mildly interesting.
Jiunsong/supergemma4-26b-abliterated-multimodal
7,385 downloads 71 likes 56 trending
gemma 2026-04-12
Abliterated multimodal variant of Gemma 4 26B supporting image-text-to-text with tool-use and instruction-following. The multimodal + uncensored combination drives notable download numbers but is derivative work.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week β€” try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

UGI Leaderboard
DontPlanToEnd
docker 1,694 24
apache-2.0
The Uncensored General Intelligence Leaderboard tracks model performance on tasks typically filtered by safety guardrails, providing a benchmark for abliterated/uncensored model capabilities. Useful context for understanding the uncensored model trend dominating this batch.
Omni Video Factory
FrameAI4687
gradio 908 32
mit
A Gradio space offering text-to-video, image-to-video, and video extension capabilities in one interface with 908 likes. Useful unified tool but no novel underlying model.
Distilling 100B+ Models 40x Faster with TRL
HuggingFaceTB
docker 68 45
HuggingFace TRL now supports distillation from 100B+ teacher models at 40x faster speeds, making large-scale knowledge distillation practical for teams without massive compute. This is a significant tooling advancement that democratizes distillation from frontier-scale teachers.
Qwen Image Edit + Loras built-in
Onise
gradio 42 29
apache-2.0
Demo space combining Qwen image editing with built-in LoRA support for stylized image generation. Useful for exploring Qwen's image editing capabilities with fine-tuned adapters.
MOSS TTS Nano
OpenMOSS-Team
gradio 41 29
apache-2.0
MOSS TTS Nano is a lightweight voice cloning demo running on ZeroGPU, targeting efficient on-device TTS. Nano sizing suggests focus on edge deployment.
ERNIE Image
baidu
gradio 53 51
apache-2.0
Baidu's ERNIE-Image-Turbo demo showcases their image generation model with a turbo inference variant. Notable as a major Chinese lab's image generation offering entering the HuggingFace ecosystem.
OmniVoice
k2-fsa
gradio 621 153
apache-2.0
OmniVoice supports high-quality voice cloning TTS across 600+ languages, making it one of the broadest multilingual TTS systems publicly demoed. High traction (621 likes) signals real community interest.
Z Image Turbo
mrfakename
gradio 2,959 52
Z-Image-Turbo is a high-traction image generation demo (2959 likes) with turbo inference. Limited metadata makes technical novelty hard to assess, but community adoption is strong.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 2,337 43
Demo using Qwen's image editing capabilities to generate multiple camera angles of a scene, enabling pseudo-3D view synthesis from a single image. High likes (2337) indicate strong interest in this multiview generation approach.
VoxCPM Demo
openbmb
gradio 371 58
apache-2.0
VoxCPM2 is OpenBMB's multimodal model demo running via Nano-vLLM for efficient inference. Represents continued development in the CPM model family with voice/multimodal capabilities.
Bonsai 1-bit GPU
prism-ml
docker 86 69
Bonsai demonstrates 1-bit LLMs running on standard GPUs, pushing extreme quantization into practical deployment. Paired with the WebGPU variant, this signals a serious push toward sub-2-bit inference for accessible hardware.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 927 73
apache-2.0
Combines FireRed image editing with Qwen-Image-Edit-Rapid using HuggingFace Transformers for fast image editing. High likes (927) suggest practical utility.
Wan2.2 14B Preview
r3gm
gradio 2,241 139
Wan2.2 14B is a large video generation model running with FP8 quantization and AOTI (ahead-of-time inference) compilation for faster generation from image+text prompts. High traction (2241 likes) confirms community interest in this video gen capability.
Wan2.2 14B Fast Preview
r3gm
gradio 743 58
Faster variant of the Wan2.2 14B video generation demo with FP8+AOTI optimizations. Incremental over the first preview space.
Ltx-2.3 FFLFwith Lora
rahul7star
gradio 40 31
LTX-2.3 video generation demo with LoRA support. Limited metadata and low engagement make technical novelty unclear.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-04-20
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
ICLR 2026 paper presenting Common Corpus, claimed to be the largest ethically-sourced (non-copyrighted) pre-training dataset for LLMs β€” directly addresses legal and compliance concerns around training data provenance.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-04-20
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
ICLR 2026 paper introducing MedAraBench, a large-scale Arabic medical QA benchmark β€” useful for evaluating multilingual medical LLMs but narrow in scope.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-04-20
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
ICLR 2026 theoretical study analyzing transformers as unsupervised learning algorithms through the lens of Gaussian Mixture Models, providing formal grounding for in-context learning behavior.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-04-20
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
ICLR 2026 paper introducing Task Tokens for adapting transformer-based behavior foundation models in humanoid control β€” lightweight conditioning approach that avoids full fine-tuning for new tasks.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-04-20
Submodular Function Minimization with Dueling Oracle
ICLR 2026 theoretical paper on submodular function minimization with a noisy pairwise comparison oracle β€” relevant to preference-based optimization but highly specialized.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-04-20
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
ICLR 2026 benchmark evaluating MLLMs on scan-oriented academic paper reasoning β€” tests whether models can holistically parse and reason over full papers rather than just retrieve passages.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-04-20
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
ICLR 2026 paper proposing N-th order recursive consistent velocity field estimation for any-step generation, simplifying consistency model training while maintaining quality β€” reduces computational overhead of few-step generative models.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-04-20
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
ICLR 2026 paper presenting MSTT, a fully offline hierarchical RL framework using masked skill tokens to transfer policies across environments with different dynamics β€” addresses a key sim-to-real gap without online interaction.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-04-20
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
ICLR 2026 paper providing high-probability convergence and generalization bounds for SGD with momentum in non-convex settings β€” theoretical contribution with limited immediate practical impact.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-04-20
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
ICLR 2026 paper introducing Q-RAG, which trains retrievers using RL value-based objectives to support multi-step retrieval over long contexts β€” addresses the single-step retrieval bottleneck in complex QA tasks.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-04-20
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
ICLR 2026 paper on cross-lingual alignment for information retrieval, improving semantic proximity between queries and documents across languages β€” incremental improvement in multilingual IR.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-04-20
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
ICLR 2026 benchmark evaluating GPT-4o, o4-mini, Gemini 1.5 Pro and others on standard CV tasks β€” reveals specific gaps in multimodal foundation models' visual understanding that pure benchmark leaderboards obscure.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-04-20
CORDS - Continuous Representations of Discrete Structures
ICLR 2026 paper on CORDS, a continuous representation framework for variable-cardinality discrete structure prediction using neural fields and flow matching β€” applicable to object detection and molecular modeling.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-04-20
SCRAPL: Scattering Transform with Random Paths for Machine Learning
ICLR 2026 paper introducing SCRAPL, a computationally efficient scattering transform using random path sampling for perceptual loss in audio/vision inverse problems β€” reduces cost of wavelet scattering while preserving gradient quality.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Ε½ilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-04-20
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
ICLR 2026 paper presenting EVEREST, a transformer for probabilistic rare-event forecasting in multivariate time series using evidential deep learning and extreme value theory β€” addresses severe class imbalance in anomaly detection.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-04-20
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
ICLR 2026 paper on Recurrent-Depth Simulators that enable test-time accuracy-cost trade-offs in neural simulators, analogous to classical numerical methods β€” useful for scientific computing applications.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-04-20
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
ICLR 2026 paper using hyperbolic space for multi-modal enzyme classification, capturing hierarchical EC number relationships β€” domain-specific but demonstrates value of geometry-aware embeddings.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-04-20
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
ICLR 2026 paper arguing that speech-to-speech LLMs need non-autoregressive joint training of audio and text tokens β€” proposes a training paradigm shift that could improve latency and coherence in voice AI systems.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-04-20
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
ICLR 2026 paper introducing FingerTip 20K, a benchmark for proactive and personalized mobile GUI agents that act without explicit instructions by leveraging user context β€” pushes beyond reactive agent paradigms.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-04-20
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
ICLR 2026 paper providing rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (NeRF/Instant-NGP backbone) β€” replaces heuristic hyperparameter tuning with principled optimization.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 321 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact β€” 7+ items are the ones worth your time.

321+ research items ready to explore