Weekly Intelligence

AI Quick Bites

April 13, 2026 · 365 items from 13 sources

Last refreshed: April 13, 2026 at 11:05 UTC
Next refresh: April 20, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week β€” selected from 365 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
The most comprehensive taxonomy of credit assignment methods for RL-trained LLMs to date, with actionable benchmark protocols and a reporting checklist that will shape how the field evaluates agentic RL going forward.
arxiv 2026-04-13 25 min
03
PRA sets a new 4B-scale SOTA on MedQA and improves frozen models up to 25.7% without retraining, validating a paradigm where domain-specific reward modules are decoupled from policy models.
arxiv 2026-04-13 20 min
04
XFED proves federated learning is vulnerable to coordinated poisoning even without attacker communication, bypassing all 8 tested defenses and fundamentally challenging FL security assumptions.
arxiv 2026-04-13 20 min
05
UIPress delivers 9.1x inference speedup for UI-to-Code generation with only 0.26% additional parameters, directly addressing the token efficiency bottleneck in production VLM deployments.
arxiv 2026-04-13 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

System Card: Claude Mythos Preview [pdf]
9/10
Official system card for Claude Mythos Preview, Anthropic's most capable model to date, covering safety evaluations, cybersecurity capability assessments, and deployment constraints β€” essential primary source for understanding frontier model safety posture.
hackernews 2026-04-13 30 min
Reverse engineering Gemini's SynthID detection
8/10
Reverse-engineering attempt on Google Gemini's SynthID watermarking system, probing whether the detection mechanism can be circumvented or decoded. Novel attack surface on AI-generated content provenance with 54 HN comments indicating significant community interest.
hackernews 2026-04-13 10 min
Show HN: Prompt injection detector beats ProtectAI by 19% accuracy, 8.9x smaller
8/10
A 83MB DeBERTa-based prompt injection classifier that outperforms ProtectAI's leading model by 19% accuracy and 8.9x smaller footprint β€” 91.68% vs 72.28% accuracy with 101ms vs 646ms CPU latency, directly benchmarked on an independent dataset.
hackernews 2026-04-13 5 min
Exploiting the most prominent AI agent benchmarks
8/10
Berkeley RDI researchers demonstrate concrete exploits against leading AI agent benchmarks, showing how benchmark gaming can inflate reported performance β€” critical reading for anyone using these benchmarks to make model selection decisions.
hackernews 2026-04-13 12 min
Assessing Claude Mythos Preview's cybersecurity capabilities
8/10
Anthropic's red team formally assesses Claude Mythos Preview's offensive cybersecurity capabilities, providing structured evaluation of a frontier model's ability to assist with real-world exploits β€” directly relevant to AI safety and deployment policy.
hackernews 2026-04-13 10 min
Project Glasswing: Securing critical software for the AI era
8/10
Anthropic's Project Glasswing initiative deploys Claude Mythos to proactively find and fix vulnerabilities in critical open-source software β€” a landmark application of frontier AI to defensive security at scale.
hackernews 2026-04-13 8 min
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
7/10
Uses targeted weight pruning as a causal probe to show that harmful content generation in LLMs depends on a compact, harm-type-general set of weights distinct from benign capabilities, and that alignment compresses these weightsβ€”explaining why fine-tuning causes emergent misalignment. Provides mechanistic insight with direct implications for safer fine-tuning practices.
arxiv 2026-04-13 20 min
Your Agent Is Mine: Measuring Malicious Attacks on the LLM Supply Chain
7/10
Systematic measurement of malicious attack surfaces across the LLM agent supply chainβ€”covering model providers, plugins, and tool integrationsβ€”quantifying real attack vectors that can compromise agent behavior at scale. Directly relevant to anyone deploying LLM agents in production.
hackernews 2026-04-13 20 min
Claude mixes up who said what
7/10
Detailed analysis showing Claude systematically confuses speaker attribution in multi-turn conversations, misattributing statements between user and assistant β€” a concrete reliability failure with real implications for agentic and multi-party use cases.
hackernews 2026-04-13 6 min
Scan any LLM chatbot for vulnerabilities. Built by Mozilla
7/10
Mozilla-backed open-source security scanner (0Din project) that probes LLM chatbots for vulnerabilities including prompt injection, jailbreaks, and data leakage β€” a credible, practitioner-focused red-teaming tool with institutional backing.
hackernews 2026-04-13 5 min
AI Agent Sandboxes Got Security Wrong
7/10
Argues that current AI agent sandboxes focus on OS-level isolation while missing the semantic layer β€” agents can be manipulated to perform harmful actions within their permitted scope. Raises important architectural critique for anyone building or deploying agentic systems.
hackernews 2026-04-13 7 min
Ask HN: What's the state of multimodal prompt injection defence in 2026?
7/10
Practitioner shares structured test results from 225 multimodal prompt injection attacks across 5 modalities (text, image, audio, document, cross-modal), finding audio more defensible via FFT analysis but cross-modal attacks particularly hard to detect. Rare empirical data on multimodal attack surfaces.
hackernews 2026-04-13 5 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
Introduces honesty fine-tuning to make LLMs self-report hidden objectives when interrogated, improving alignment auditing reliability over naive prompting. Directly relevant to AI safety evaluations and model auditing pipelines.
conferences 2026-04-13 20 min
Defender – Local prompt injection detection for AI agents (no API calls)
7/10
Local npm package for detecting prompt injection attacks in AI agents without requiring external API calls, enabling offline/low-latency defense for agentic pipelines. Addresses a real and growing attack surface as LLM agents proliferate.
hackernews 2026-04-13 3 min
ALERT: Researchers discover 26 third-party AI LLM routers secretly injecting malicious tool c
7/10
Researchers identified 26 third-party LLM router services secretly injecting malicious tool calls to steal credentials, with specific risk to developers using AI coding agents like Claude Code on crypto wallets β€” a novel supply-chain attack vector targeting the LLM routing layer.
twitter 2026-04-13 3 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
#1
r3gm
2 items · avg 6.0/10
12.0
#2
7.0
#3
7.0
#4
7.0
#5
webml-community
1 item · avg 7.0/10
7.0
#6
7.0
Top Organizations
#1
NousResearch
4 items · avg 6.5/10
26.0
#2
26.0
#3
HKUDS
6 items · avg 4.0/10
24.0
#4
OpenBMB
2 items · avg 7.0/10
14.0
#5
aaif-goose
2 items · avg 7.0/10
14.0
#6
bytedance
2 items · avg 7.0/10
14.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Permission Firewall
A middleware layer that enforces privilege hierarchies across multi-source LLM agent deployments, resolving instruction conflicts before they reach the model. Inspired by the ManyIH research showing frontier models only achieve ~40% accuracy on instruction conflict tasks, this tool would intercept, classify, and arbitrate competing instructions from system prompts, users, plugins, and tool outputs. Builds directly on real supply-chain trust concerns surfaced by the Vercel/Claude Code telemetry incident and LLM supply chain attack research.
Enterprise agentic workflow orchestration Multi-plugin AI coding assistant sandboxing Customer-facing chatbot policy enforcement Regulated industries requiring audit trails of instruction sources
https://arxiv.org/abs/2604.09443v1 https://akshaychugh.xyz/writings/png/ver... https://arxiv.org/abs/2604.08407
Cheap LLM Eval Engine
A drop-in evaluation service that replaces expensive LLM-as-a-Judge pipelines with a lightweight BERT-based judge, dramatically cutting evaluation costs for teams running continuous model assessments. The service exposes a simple API accepting model outputs and references, returning quality scores comparable to GPT-4-level judges at a fraction of the cost. With AI coding tool economics becoming a major community pain point, affordable, reliable eval infrastructure is a clear gap.
CI/CD pipelines for LLM-powered products A/B testing prompt variants at scale Fine-tuning reward signal generation RAG retrieval quality monitoring
https://arxiv.org/abs/2604.09497v1 https://braw.dev/blog/2026-04-06-realloc...
Federated ML Security Scanner
A testing and auditing toolkit for federated learning deployments that probes for non-collusive model poisoning vulnerabilities, simulating independent adversarial clients to stress-test Byzantine-robust defenses. The XFED research demonstrates that current FL defenses are far weaker than assumed against uncoordinated attacks, creating an urgent need for practitioners to assess their own systems. The tool would generate attack scenario reports and recommend defense configurations.
Healthcare federated learning compliance audits Financial institution cross-silo FL deployments Mobile on-device federated model training Academic FL research benchmarking
https://arxiv.org/abs/2604.09489v1 https://arxiv.org/abs/2604.08407
Noisy Dataset Cleaner
A training-free preprocessing pipeline for generative model practitioners that applies spectral analysis to identify and regenerate corrupted high-frequency components in training images before or after model training, based on the SCoRe method. Many real-world datasets scraped from the web contain noise artifacts that silently degrade model output quality, and this tool surfaces and fixes those issues without requiring retraining. Pairs naturally with the growing trend of teams training on noisy scraped data at scale.
Diffusion model fine-tuning on web-scraped datasets Medical imaging model training data preparation Product image generation for e-commerce Synthetic data quality assurance pipelines
https://arxiv.org/abs/2604.09436v1 https://arxiv.org/abs/2604.09531v1
Process Reward API
A hosted API service that wraps the Process Reward Agent paradigm, providing step-wise domain-grounded reward signals to any frozen LLM at inference time without requiring retraining. Inspired by PRA achieving 25.7% improvement on frozen models and new SOTA on MedQA, this service lets teams plug in domain-specific reward modules β€” medical, legal, financial β€” to steer existing model outputs toward higher-quality reasoning. The decoupled architecture means customers can upgrade reward modules independently of their base model.
Medical question answering and clinical decision support Legal document reasoning and contract analysis Financial analysis and compliance checking Educational tutoring with step-level feedback
https://arxiv.org/abs/2604.09482v1 https://arxiv.org/abs/2604.09459v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week β€” sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
TrendShift
NousResearch/hermes-agent
Python 61,000 8,200
NousResearch's Hermes Agent framework with 61k stars β€” a production-oriented agent system designed to grow and adapt with users, representing a serious open-source alternative to commercial agent platforms.
Build idea
Build a white-label AI agent platform for enterprises that lets non-technical teams deploy, customize, and monitor autonomous agents for workflows like customer support, sales research, and internal IT helpdesks β€” without relying on OpenAI or Anthropic's proprietary agent APIs.
2
TrendShift
OpenBMB/VoxCPM
Python 9,500 1,100
VoxCPM2 is a tokenizer-free TTS system from OpenBMB supporting multilingual speech generation, creative voice design, and voice cloning β€” the tokenizer-free architecture is a notable departure from standard TTS pipelines with 9.5k stars.
Build idea
Launch a multilingual voice cloning SaaS for content creators and e-learning platforms that lets users generate studio-quality narration in any language using a custom cloned voice, with per-minute usage pricing.
3
GH Trending
aaif-goose/goose
rust 41,637 4,162 4,951 stars this week
Open-source, extensible AI agent written in Rust that goes beyond code suggestions to install, execute, edit, and test with any LLM backend. Strong traction (~42k stars, ~5k stars this week) signals real developer adoption.
Build idea
Offer a managed developer productivity platform built on Goose that gives software teams a self-hosted AI coding agent capable of autonomously handling full dev tasks β€” writing, testing, and deploying code β€” with audit logs and team access controls.
4
GH Trending
bytedance/Protenix
python 1,803 259 48 stars this week
ByteDance's open-source high-accuracy biomolecular structure prediction model, positioning as an AlphaFold-class alternative. Significant for computational biology and open-source scientific AI.
Build idea
Build a drug discovery SaaS for biotech startups and academic labs that provides on-demand protein structure prediction, binding site analysis, and mutation impact scoring via a simple API, at a fraction of the cost of wet-lab experiments.
5
GH Trending
google-deepmind/gemma
python 4,824 839 413 stars this week
Official Google DeepMind library for Gemma open-weight LLMs, seeing a breakout week with 413 new stars. Key reference repo for fine-tuning and deploying Gemma models.
Build idea
Create a fine-tuning and deployment platform specifically optimized for Gemma models, targeting regulated industries like healthcare and legal that need on-premise open-weight LLMs with compliance guarantees and no data leaving their infrastructure.
6
GH Trending
google-research/timesfm
python 16,754 1,583 1,585 stars this week
Google Research's pretrained time-series foundation model for forecasting, surging with 1,585 stars this week β€” signals growing interest in domain-specific foundation models beyond NLP/vision.
Build idea
Build a plug-and-play demand forecasting SaaS for retail and supply chain teams that uses TimesFM under the hood to deliver accurate multi-horizon forecasts from raw sales CSVs with zero ML expertise required.
7
GH Trending
rivet-dev/agent-os
rust 2,678 109 863 stars this week
Portable open-source OS for AI agents using WebAssembly and V8 isolates, claiming ~6ms cold starts and 32x cost reduction vs. sandboxes. Addresses a real bottleneck in agent deployment infrastructure.
Build idea
Offer a serverless agent hosting platform built on Agent OS that lets developers deploy AI agents at scale with near-instant cold starts and pay-per-execution pricing, undercutting existing sandbox-based agent infrastructure costs.
8
GH Trending
EricLBuehler/mistral.rs
rust 6,972 568 111 stars this week
Fast Rust-based LLM inference engine with broad model support β€” steady traction project (7k stars) with ongoing updates for local deployment.
Build idea
Sell a turnkey on-premise LLM appliance for enterprises β€” pre-configured hardware plus mistral.rs software β€” that delivers fast, private local inference for internal tools without cloud dependency or per-token costs.
9
TrendShift
JuliusBrussee/caveman
Python 18,700 852
Viral Claude Code skill that reduces token usage by ~65% by compressing prompts into minimal 'caveman' syntax β€” surprisingly effective prompt compression technique with 18k+ stars, though more trick than research.
Build idea
Build a browser extension or API middleware product that automatically compresses prompts sent to any LLM API using caveman-style syntax, reducing token costs by up to 65% for high-volume AI applications with a simple monthly subscription.
10
GH Trending
NVIDIA/personaplex
python 9,154 1,288 2,331 stars this week
NVIDIA's PersonaPlex release β€” a framework for persona-conditioned model training/inference with 9k+ stars and strong weekly growth; details sparse but NVIDIA provenance adds credibility.
Build idea
Launch a persona-as-a-service platform for customer experience teams that uses PersonaPlex to train and deploy brand-consistent AI personas for chatbots, virtual assistants, and interactive marketing campaigns across multiple channels.

Trending Developers

Developers gaining traction on GitHub this week β€” shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
Mervin Praison
@MervinPraison
MervinPraison/PraisonAI
GitHub developer profile for PraisonAI agent framework β€” profile page, not a specific technical item.
2
Ismail Pelaseyed
@homanp
homanp/infinite-monitor
Developer profile trending on GitHub; popular repo is a real-time monitoring tool. Insufficient technical substance to evaluate.
3
jakevin
@jackwener
jackwener/OpenCLI
Developer profile trending on GitHub; popular repo is a universal CLI hub. Minimal AI-specific relevance.
4
Nathan Brake
@njbrake
njbrake/agent-of-empires
Developer profile trending on GitHub; popular repo is an agent session manager. Covered separately under the repo entry.
5
Fengda Huang
@phodal
phodal/routa
Developer profile trending on GitHub; popular repo is a multi-agent coordination platform. Insufficient detail to evaluate.
6
Shahed Nasser
@shahednasser
shahednasser/awesome-resources
GitHub developer profile for Shahed Nasser, whose popular repo is a community-curated awesome-resources list. Not AI-specific and not technically substantive.
7
ζœ±ζ˜†ιΉ
@zhukunpenglinyutong
zhukunpenglinyutong/jetbrains-cc-gui
GitHub profile whose popular repo is a JetBrains GUI plugin for Claude Code and Codex. Marginally relevant as a developer tool wrapper.
8
Alex Kuleshov
@0xAX
0xAX/linux-insides
GitHub developer profile for linux-insides book author β€” not AI-related.
9
ιƒ‘θ―š (Cheng Zheng)
@1c7
1c7/chinese-independent-developer
GitHub developer profile for a Chinese indie developer list curator β€” not AI-related.
10
Jamie Magee
@JamieMagee
JamieMagee/highway
GitHub developer profile β€” not AI-related.
11
Lalit Maganti
@LalitMaganti
LalitMaganti/syntaqlite
GitHub developer profile for SQLite tooling β€” not AI-related.
12
Jorrit Rouwe
@jrouwe
jrouwe/JoltPhysics
Physics/collision detection library developer trending on GitHub β€” not AI-relevant.
13
Leonardo Maldonado
@leonardomso
leonardomso/33-js-concepts
JavaScript concepts repo developer trending on GitHub β€” not AI-relevant.
14
Ran Aroussi
@ranaroussi
ranaroussi/yfinance
Developer of yfinance trending on GitHub β€” not AI-relevant.
15
tangly1024
@tangly1024
tangly1024/NotionNext
GitHub developer profile for a NextJS+Notion blog builder. Not AI-relevant.
16
Arseny Kapoulkine
@zeux
zeux/meshoptimizer
GitHub profile for a mesh optimization library author. Not AI-relevant.
17
Elie Steinbock
@elie222
elie222/inbox-zero
The world's best AI personal assistant for email. Open source app to help you reach inbox zero fast.
18
Eric Curtin
@ericcurtin
ericcurtin/inferrs
A TurboQuant inference server
19
Frank Bria
@frankbria
frankbria/ralph-claude-code
Autonomous AI development loop for Claude Code with intelligent exit detection

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard β€” Top 15
#ModelTypeEloVotes
1 claude-opus-4-6-thinking Anthropic Closed 1504 16,278
2 claude-opus-4-6 Anthropic Closed 1496 17,416
3 muse-spark Meta Closed 1493 3,268
4 gemini-3.1-pro-preview Google Closed 1492 20,531
5 gemini-3-pro Google Closed 1486 41,585
6 grok-4.20-beta1 xAI Closed 1486 9,689
7 gpt-5.4-high OpenAI Closed 1484 9,681
8 grok-4.20-beta-0309-reasoning xAI Closed 1478 9,781
9 gpt-5.2-chat-latest-20260210 OpenAI Closed 1477 15,704
10 grok-4.20-multi-agent-beta-0309 xAI Closed 1476 10,112
11 gemini-3-flash Google Closed 1474 30,918
12 claude-opus-4-5-20251101-thinking-32k Anthropic Closed 1473 37,307
13 glm-5.1 Z.ai Open 1471 5,326
14 grok-4.1-thinking xAI Closed 1471 47,508
15 claude-opus-4-5-20251101 Anthropic Closed 1468 47,320
New & Trending Models
zai-org/GLM-5.1
35,906 downloads 1,103 likes 1103 trending
Open Source 2026-04-03
GLM-5.1 is a major update to ZhipuAI's GLM series using MoE-DSA architecture, achieving top trending scores on HuggingFace with 1,100+ likes and 35K downloads in under two weeks. The MoE-DSA (Dynamic Sparse Attention) architecture represents a notable architectural innovation for efficient large-scale training.
MiniMaxAI/MiniMax-M2.7
18,279 downloads 568 likes 568 trending
Custom License 2026-04-09
MiniMax-M2.7 is a new MoE text-generation model with strong trending traction (568 likes, 18K downloads shortly after release), suggesting a notable open release from MiniMax AI worth evaluating for capability benchmarks.
LilaRest/gemma-4-31B-it-NVFP4-turbo
28,829 downloads 179 likes 179 trending
Open Source 2026-04-07
NVFP4 quantization of Gemma-4 31B using NVIDIA's ModelOpt toolkit, optimized for vLLM inference. High downloads (28.8k) and likes (179) indicate this is a practically useful quantization for production NVIDIA deployments.
LiquidAI/LFM2.5-350M
34,559 downloads 267 likes 35 trending
Custom License 2026-03-31
LFM2.5-350M is a tiny edge-optimized model from Liquid AI supporting 10+ languages. High download count (34.5k) for a sub-400M parameter multilingual model suggests strong interest in efficient edge deployment.
MiniMaxAI/MiniMax-M2.5
780,760 downloads 1,377 likes 36 trending
Custom License 2026-02-12
MiniMax-M2.5 has accumulated 780k downloads and 1.3k likes, making it one of the most-downloaded models in this batch. FP8-supported and Azure-deployable; the scale of adoption warrants attention even without detailed release notes here.
Qwen/Qwen3-Coder-Next
679,412 downloads 1,260 likes 37 trending
Open Source 2026-01-30
Qwen3-Coder-Next is a coding-focused model from Alibaba's Qwen team with 679K downloads and Apache-2.0 license, indicating strong community adoption for code generation tasks.
nvidia/Gemma-4-31B-IT-NVFP4
757,154 downloads 364 likes 158 trending
Custom License 2026-04-02
NVIDIA's NVFP4 quantization of Gemma-4-31B-IT using ModelOpt achieves 757K downloads, demonstrating practical deployment optimization for the new Gemma-4 architecture on NVIDIA hardware.
openai/gpt-oss-120b
3,468,454 downloads 4,679 likes 32 trending
Open Source 2025-08-04
OpenAI's open-source 120B model with 3.4M downloads and Apache-2.0 license, supporting vLLM and MXFP4 quantization β€” a significant open release from OpenAI with an associated arXiv paper.
prism-ml/Bonsai-8B-gguf
76,787 downloads 570 likes 126 trending
Open Source 2026-03-18
Bonsai-8B is a 1-bit quantized GGUF model optimized for on-device inference (CUDA/Metal) with 76K downloads and 570 likes, representing strong community interest in extreme quantization for edge deployment.
unsloth/GLM-5.1-GGUF
28,533 downloads 129 likes 129 trending
Open Source 2026-04-06
Unsloth's GGUF quantization of GLM-5.1, a MoE-DSA architecture model with strong bilingual (EN/ZH) capabilities. High download count (28K+) indicates strong community uptake for local inference.
z-lab/Qwen3.5-27B-DFlash
5,689 downloads 50 likes 40 trending
Open Source 2026-03-14
DFlash applies diffusion-based speculative decoding to Qwen3.5-27B, combining flash decoding with diffusion LM techniques for efficiency gains. References arxiv:2602.06036 suggesting a novel inference acceleration approach worth investigating.
0xSero/gemma-4-21b-a4b-it-REAP
4,443 downloads 81 likes 26 trending
gemma 2026-04-05
Gemma-4 21B variant with REAP (expert pruning) applied to the MoE architecture, reducing active parameters. Interesting application of structured pruning to MoE models for efficiency.
AIDC-AI/Marco-Mini-Instruct
720 downloads 34 likes 34 trending
Open Source 2026-04-08
Marco-Mini is a multilingual MoE model built on Qwen3 via upcycling and on-policy distillation, supporting 30+ languages. The upcycling + distillation combination for multilingual MoE is a noteworthy training approach.
Zigeng/DMax-Coder-16B
637 downloads 28 likes 28 trending
Open Source 2026-04-08
DMax-Coder-16B is a 16B coding model fine-tuned on LLaDA2.0-mini (a diffusion LLM architecture) using code trajectories, with an associated arXiv paper β€” interesting for diffusion-based LLM coding research.
arcee-ai/Trinity-Large-Thinking
15,066 downloads 151 likes 45 trending
Open Source 2026-04-01
Trinity-Large-Thinking from Arcee AI is a multilingual MoE reasoning model with tool-calling and agentic capabilities under Apache-2.0, targeting enterprise deployment with thinking/reasoning modes.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week β€” try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

See-through: Layer Decomposition
24yearsold
gradio 72 42
mit
A layer decomposition demo for image see-through effects; limited metadata makes technical depth hard to assess, but the concept has niche computer vision interest.
ACE-Step v1.5
ACE-Step
gradio 501 35
mit
ACE-Step v1.5 is an updated music generation foundation model with 501 likes, representing continued iteration on open-source audio generation capabilities.
Omni Video Factory
FrameAI4687
gradio 862 43
mit
Omni Video Factory is a multi-mode video generation space (text-to-video, image-to-video, video extension) with 862 likes; useful demo but derivative of existing video generation pipelines.
Qwen3-TTS Demo
Qwen
gradio 1,855 40
apache-2.0
Qwen3-TTS demo from Alibaba with 1,855 likes indicates strong community interest in Qwen's text-to-speech capabilities as part of the broader Qwen3 ecosystem.
daVinci-MagiHuman
SII-GAIR
gradio 152 32
daVinci-MagiHuman appears to be a human-focused generative model demo with limited metadata; insufficient information to assess technical novelty.
OmniVoice
k2-fsa
gradio 406 242
apache-2.0
OmniVoice is a high-quality voice cloning TTS system supporting 600+ languages with Apache-2.0 license and strong trending score (242), making it one of the most accessible multilingual TTS tools available.
Unfolding Robotics: Open-Source Shirt Folding from Data to Deployment
lerobot
docker 60 60
LeRobot's open-source shirt-folding robot demo covers the full pipeline from data collection to deployment, representing a concrete end-to-end dexterous manipulation benchmark with reproducible open-source methodology.
Flux2 Klein Face Swap
linoyts
gradio 270 48
A face swap application using Flux.2 Klein 9B LoRA; technically interesting as a LoRA application but raises ethical concerns and is derivative of existing face-swap approaches.
TRELLIS.2
microsoft
gradio 1,394 40
mit
TRELLIS.2 from Microsoft generates high-fidelity 3D assets from images with 1,394 likes; a significant update to one of the leading open 3D generation systems.
Z Image Turbo
mrfakename
gradio 2,866 65
Z Image Turbo is a fast image generation demo with 2,866 likes and strong trending, suggesting a high-quality turbo-distilled image model worth benchmarking for speed/quality tradeoffs.
MTEB Leaderboard
mteb
docker 7,251 28
mit
The MTEB embedding leaderboard is a standard reference resource for evaluating text embedding models; no new developments, trending due to ongoing community use.
Qwen Image Multiple Angles 3D Camera
multimodalart
gradio 2,230 79
A demo using Qwen's image model to generate consistent multi-angle views with 3D camera control, achieving 2,230 likes β€” demonstrates practical novel-view synthesis capability from a multimodal LLM.
Wan2.2 I2V LoRA Demo
obsxrver
gradio 46 35
A Wan2.2 image-to-video LoRA demo featuring a 'Blink' effect; niche creative application with limited technical novelty beyond the base model.
VoxCPM Demo
openbmb
gradio 296 128
apache-2.0
VoxCPM2 is a voice-capable multimodal model demo from OpenBMB running on Nano-vLLM, with 296 likes and strong trending β€” notable for combining voice and vision in a compact deployable package.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 802 117
apache-2.0
FireRed Image Edit combines FireRed and Qwen-Image-Edit-Rapid for fast instruction-based image editing; 802 likes suggest community utility but the approach is derivative of existing edit pipelines.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-04-13
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Common Corpus is presented as the largest openly licensed dataset for LLM pre-training, directly addressing legal and ethical concerns around copyrighted training data. Critical infrastructure for researchers who need legally defensible pre-training data.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-04-13
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
Large-scale Arabic medical QA benchmark addressing a significant gap in multilingual NLP evaluation, particularly for clinical applications. Useful for researchers working on low-resource medical NLP.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-04-13
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Theoretical analysis showing transformers can implicitly perform unsupervised learning (specifically EM-like inference on Gaussian Mixtures) during in-context learning, providing formal grounding for why ICL works. Advances mechanistic understanding of transformer inference-time learning.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-04-13
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Task Tokens introduces lightweight prompt-based adaptation of transformer behavior foundation models for humanoid control without full retraining. Enables flexible task specification for multi-modal robotic policies.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-04-13
Submodular Function Minimization with Dueling Oracle
Theoretical work on submodular function minimization using noisy pairwise comparison oracles, relevant to preference-based optimization. Niche theoretical contribution with limited direct AI engineering impact.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-04-13
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Benchmark evaluating MLLMs on scan-oriented academic paper reasoning (holistic document understanding vs. targeted search), revealing significant gaps in current models' ability to autonomously process research literature.
Multimodal Large Language Models; Academic Paper Reasoning; Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-04-13
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th order recursive consistent velocity field estimation for any-step generation, simplifying consistency model training while maintaining quality across variable step counts. Addresses computational overhead of few-step generative models.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-04-13
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT is a fully offline hierarchical RL framework using masked skill tokens to transfer policies across environments with different dynamics, without requiring online interaction. Addresses a key practical barrier in sim-to-real transfer.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-04-13
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings, filling a theoretical gap. Primarily of interest to optimization theorists rather than practitioners.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-04-13
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG trains retrievers using RL value-based objectives to support multi-step retrieval for complex multi-hop QA, going beyond single-step RAG limitations. Applying RL to embedder training is a novel and practically impactful direction.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-04-13
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Proposes cross-lingual alignment techniques to improve semantic proximity in multilingual information retrieval. Solid but incremental work in a well-studied area.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-04-13
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5/2.0 on standard CV tasks (depth, segmentation, optical flow, etc.), revealing where frontier multimodal models still fall short of specialized vision models. Essential reading for anyone deploying VLMs on vision-heavy tasks.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-04-13
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous neural field representations for variable-cardinality discrete structure prediction (object detection, molecular modeling), enabling invertible encoding/decoding without padding. Novel approach to a fundamental representation challenge.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-04-13
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses random path sampling in scattering transforms to reduce computational cost of perceptual loss functions for audio/vision inverse problems. Niche but useful for practitioners working with differentiable signal processing.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Ε½ilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-04-13
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture combining evidential deep learning and extreme value theory for probabilistic rare-event forecasting in multivariate time series. Addresses severe class imbalance and distributional uncertainty in anomaly detection.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-04-13
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to adaptive step-size in classical numerical methods. Useful for scientific ML applications requiring flexible compute budgets.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-04-13
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE uses hyperbolic space for multi-modal enzyme classification, capturing hierarchical EC number relationships better than Euclidean methods. Domain-specific but demonstrates value of geometry-aware embeddings for biology.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-04-13
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality issues of purely autoregressive approaches. Relevant to anyone building real-time voice AI systems.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-04-13
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K benchmarks proactive and personalized mobile GUI agents that act without explicit instructions by inferring user intent from context. Pushes mobile agent evaluation beyond reactive instruction-following toward anticipatory behavior.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-04-13
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (Instant-NGP), replacing heuristic hyperparameter tuning with principled design. Useful for practitioners working with neural radiance fields and implicit neural representations.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 365 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact β€” 7+ items are the ones worth your time.

365+ research items ready to explore