Weekly Intelligence

AI Quick Bites

May 18, 2026 · 325 items from 13 sources

Last refreshed: May 18, 2026 at 12:51 UTC
Next refresh: May 25, 2026 at 09:00 UTC
Created by Vatsal Bagri · 𝕏 · LinkedIn

Highlights

The five most consequential developments in AI this week β€” selected from 325 items across 13 sources. These are the things an AI engineer, researcher, or founder needs to know.

02
Asteria makes second-order LLM training (SOAP, KL-Shampoo) practically viable at 7B scale by offloading optimizer state and async computation, removing the main systems barrier to moving beyond AdamW.
arxiv 2026-05-18 20 min
03
PCM cuts GRPO-based VLA robot learning wall-clock time by 2.38x with no performance loss by identifying and skipping gradient computation on trajectory phases that don't contribute learning signal.
arxiv 2026-05-18 20 min
04
The Explore-then-Act paradigm with a new verifiable exploration metric directly addresses why LLM agents fail in novel environmentsβ€”a foundational capability gap for real-world deployment.
arxiv 2026-05-18 20 min
05
Fully Open Meditron is the first clinical LLM with a fully auditable pipeline (data + training + eval), setting a reproducibility standard that matters as medical AI faces regulatory scrutiny.
arxiv 2026-05-18 20 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

Natural Language Autoencoders Produce Unsupervised Explanations LLM Activation
8/10
Anthropic's Transformer Circuits team introduces Natural Language Autoencoders that produce unsupervised, human-readable explanations of LLM internal activations β€” a meaningful advance in mechanistic interpretability without requiring manual labeling.
hackernews 2026-05-18 20 min
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
7/10
Trains LLMs to self-report hidden objectives via honesty fine-tuning, improving alignment auditing by making deceptive goal-pursuit detectable through direct interrogation. Directly relevant to AI safety and model auditing pipelines.
conferences 2026-05-18 20 min
Company behind GLiNER model released open source model for running LLM guardrail
7/10
GLiGuard is an open-source small language model for LLM safety moderation that runs 16x faster than comparable guardrail solutions, from the team behind the widely-used GLiNER model. Directly deployable for production LLM safety pipelines with significant inference cost reduction.
hackernews 2026-05-18 7 min
A Geometric Calculator Inside a Neural Network
7/10
Goodfire AI research reveals that neural networks implement geometric computation structures internally, providing mechanistic evidence for how arithmetic is encoded in model weights β€” advances interpretability understanding.
hackernews 2026-05-18 10 min
DeepSeek-V4-Flash means LLM steering is interesting again
7/10
Argues that DeepSeek-V4-Flash's open weights reignite practical interest in activation steering / representation engineering, since the model's internals are accessible for mechanistic intervention. Provides hands-on analysis of steering vectors as a viable interpretability and control technique.
hackernews 2026-05-18 8 min
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
6/10
Combines formal methods (Linear Temporal Logic) with LLMs to enable runtime monitoring and auditing of AI agent behavior against temporally extended safety constraints. Shows small-model labelers can match frontier LLM judges and that predictive monitors significantly reduce violation rates β€” practically useful for AI governance and compliance pipelines.
arxiv 2026-05-18 20 min
Anthropic's Mythos helped find macOS bugs that bypass Apple security
6/10
Anthropic's Mythos AI system was used to discover macOS vulnerabilities that bypass Apple security controls, demonstrating practical AI-assisted vulnerability research. Noteworthy as a concrete example of LLMs finding real OS-level security bugs.
hackernews 2026-05-18 5 min
MemPrivacy-1.7B-SFT
6/10
MemPrivacy-1.7B is a small SFT model for privacy-aware agent memory management, detecting and filtering sensitive information in personalized memory systems β€” addresses a real gap in LLM agent deployments.
huggingface_models 2026-05-18 10 min
The Agent Security Stack: Transport, Identity, Policy, Runtime
6/10
Proposes a structured security framework for AI agents covering transport, identity, policy, and runtime layers β€” useful architectural thinking for teams deploying production agents.
hackernews 2026-05-18 7 min
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
5/10
Improves cross-lingual information retrieval through better multilingual embedding alignment, addressing semantic proximity across languages. Solid but incremental work in a well-studied area.
conferences 2026-05-18 15 min
Chrome's Silent Gemini Nano Download Has a Consent Problem
5/10
Documents Chrome silently downloading Gemini Nano without user consent, raising privacy and data governance concerns about on-device AI model deployment.
hackernews 2026-05-18 4 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors
#1
12.0
#2
11.0
#3
r3gm
2 items · avg 4.5/10
9.0
#4
prithivMLmods
2 items · avg 4.0/10
8.0
#5
TencentARC
1 item · avg 7.0/10
7.0
#6
7.0
Top Organizations
#1
anthropics
6 items · avg 5.7/10
34.0
#2
bytedance
4 items · avg 7.0/10
28.0
#3
yichuan-w
2 items · avg 8.0/10
16.0
#4
ruvnet
3 items · avg 4.3/10
13.0
#5
AIDC-AI
2 items · avg 6.0/10
12.0
#6
ArthurBrussee
2 items · avg 6.0/10
12.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Memory Evolution Kit
A drop-in library that implements population-based natural language memory evolution for LLM agents, inspired by FORGE's gradient-free approach. Instead of expensive fine-tuning, agents share and propagate their best-performing memory artifacts across a population, continuously improving task performance. This is especially valuable for teams deploying weaker/cheaper models who need performance gains without retraining costs.
Customer support agent fleets that learn from top-performing conversation patterns Coding assistants that evolve debugging and code-review heuristics over time Cyber-defense and security agents operating in adversarial environments Enterprise workflow automation where agents handle repetitive multi-step tasks
https://arxiv.org/abs/2605.16233v1 https://arxiv.org/abs/2605.16205v1
AI Compliance Watchdog
A runtime monitoring SaaS that uses lightweight LTL-based formal rules combined with small fine-tuned labeler models to audit and intervene on LLM agent behavior in production. Inspired by research showing small models can match frontier LLM judges for compliance checking at a fraction of the cost, this tool lets enterprises define temporal safety constraints and get real-time violation alerts and automatic intervention hooks. It directly addresses the growing regulatory pressure on AI deployments in finance, healthcare, and legal sectors.
Financial services AI agents subject to regulatory compliance (MiFID, SOX) Healthcare LLM assistants requiring HIPAA-aligned behavior monitoring Enterprise AI governance and audit trail generation Agentic coding pipelines where unsafe code generation must be intercepted
https://arxiv.org/abs/2605.16198v1 https://claude.com/blog/claude-platform-...
Deep Research Copilot
A self-hosted or API-accessible deep research agent built on the Searcher-Navigator architecture from Argus, designed to parallelize web evidence gathering while keeping synthesis context lean and cost-efficient. Unlike monolithic research agents that bloat context windows, this tool spawns multiple specialized searchers and feeds only distilled evidence to a navigator that produces structured reports. It targets knowledge workers, analysts, and developers who need reliable, citation-backed research at scale without paying frontier model prices for every query.
Competitive intelligence and market research for product teams Legal and regulatory research requiring multi-source evidence assembly Scientific literature review and hypothesis generation for researchers Investor due diligence and financial analysis workflows
https://arxiv.org/abs/2605.16217v1 https://arxiv.org/abs/2605.16143v1
AI Tutor Feedback Auditor
A benchmarking and correction layer for LLM-based tutoring systems that detects and fixes the systematic over-rejection of valid reasoning and over-validation of incorrect answers identified in recent research. The tool wraps existing tutoring LLMs with a calibration module trained on human-verified feedback pairs, surfacing confidence scores and flagging likely feedback errors before they reach students. EdTech companies and learning platforms can integrate this to measurably improve pedagogical quality without replacing their underlying models.
K-12 and university online learning platforms using AI grading STEM tutoring apps covering logic, math, and programming Corporate training and certification platforms with AI assessors Language learning apps where nuanced feedback quality is critical
https://arxiv.org/abs/2605.16207v1
Codebase Context Compressor
A developer tool that sits between large codebases and AI coding assistants, automatically compressing and prioritizing context using token-reduction techniques to cut LLM input costs by 50-70%. Inspired by both the community interest in token reduction tools and Anthropic's own guidance on managing large codebase context with Claude Code, this product generates smart CLAUDE.md-style summaries, dependency graphs, and relevance-ranked file chunks tailored to the current task. It integrates as a VS Code extension or CLI pre-processor, making AI-assisted development on enterprise-scale repos economically viable.
Enterprise engineering teams using Claude Code or Copilot on monorepos Open source maintainers reviewing large PRs with AI assistance DevOps and platform engineers navigating complex infrastructure-as-code repos AI coding agent pipelines where context window costs compound across many calls
https://adola.app/ https://claude.com/blog/how-claude-code-...

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1
SocLeads 3.0
Scrape emails from socials and maps by location
Email Social Media Marketing
210
47
https://www.producthunt.com/r/ALZHO...
#2
LobeHub
Your Chief Agent Operator for multi-agent work
Productivity Artificial Intelligence
166
39
https://www.producthunt.com/r/GDRCL...
#3
ReactVision Studio
Build AR/VR Apps in React Native + ship directly to devices
Virtual Reality Developer Tools Augmented Reality
132
8
https://www.producthunt.com/r/NULQJ...
#4
Searchad.ai
Run Apple Search Ads by Chatting with AI
iOS Marketing Apple
110
5
https://www.producthunt.com/r/5QMQT...
#5
Shadow
AI computer screen and voice control with custom automation
Productivity Writing Meetings
109
9
https://www.producthunt.com/r/RTMG3...
#6
Origio
A personalized way to discover where to live
Travel Remote Work Data & Analytics
98
4
https://www.producthunt.com/r/4KAC3...
#7
M1 by Montage
Agentic UI that scales on demand
User Experience Developer Tools Artificial Intelligence
95
8
https://www.producthunt.com/r/VUCSA...
#8
Triggered Agents by Adaptive
AI agents that run automatically on business events
Developer Tools Artificial Intelligence Marketing automation
93
3
https://www.producthunt.com/r/7SJYK...
#9
LandingHero AI
24/7 Salesperson on Your Website
Sales
91
2
https://www.producthunt.com/r/5WBKE...
#10
Krea 2
An image model built for style control and moodboards
Photography Artificial Intelligence Graphics & Design
88
2
https://www.producthunt.com/r/5K3FW...
View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week β€” sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1
GH Trending
yichuan-w/LEANN
python 11,423 1,020 450 stars this week
MLSys 2026 paper introducing LEANN, a vector index achieving 97% storage reduction for on-device RAG with maintained accuracy β€” enables fully private RAG on personal devices without cloud dependency.
Build idea
Build a privacy-first personal AI assistant app (for lawyers, doctors, or executives) that runs fully on-device RAG over sensitive documents using LEANN, marketed as a zero-cloud, zero-data-leak alternative to cloud-based AI search tools.
2
GH Trending
anthropics/skills
python 136,764 16,143 4,823 stars this week
Anthropic's public repository for Agent Skills β€” reusable capability modules for Claude agents β€” with massive traction (136K stars, ~5K this week). Represents Anthropic's official framework for composable agent capabilities.
Build idea
Launch a marketplace where developers publish, sell, and monetize Claude Agent Skills as plug-and-play capability modules, similar to an app store for composable AI agent behaviors targeting enterprise automation buyers.
3
GH Trending
bytedance/UI-TARS
python 10,631 795 268 stars this week
ByteDance's native GUI interaction agent framework with 10K+ stars, enabling automated interaction with desktop/web UIs without accessibility APIs. Strong research and engineering foundation for computer-use agents.
Build idea
Build a no-code QA automation SaaS that uses UI-TARS to visually interact with and test web and desktop applications without requiring accessibility APIs or custom test scripts, targeting SMBs priced out of enterprise QA tools.
4
GH Trending
bytedance/UI-TARS-desktop
typescript 34,562 3,465 2,563 stars this week
Desktop application stack for UI-TARS, ByteDance's multimodal GUI agent, connecting frontier models with agent infrastructure β€” 34K stars with strong weekly growth. Practical open-source computer-use agent stack.
Build idea
Offer a managed desktop automation service for repetitive back-office workflows (data entry, report generation, legacy software interaction) where businesses pay per task completed by a UI-TARS-powered computer-use agent.
5
GH Trending
AIDC-AI/Pixelle-Video
python 17,970 2,572 3,327 stars this week
Pixelle-Video is a fully automated AI short video generation engine with 17K+ stars and strong weekly growth, targeting end-to-end video content creation pipelines. Useful for developers building automated media workflows.
Build idea
Build a white-label short-form video content factory SaaS for e-commerce brands and agencies, using Pixelle-Video to auto-generate product showcase videos from a URL or product feed at scale.
6
GH Trending
ArthurBrussee/brush
rust 4,551 249 486 stars this week
Brush is a Rust-based 3D reconstruction library (likely Gaussian Splatting) with 4.5K stars, offering cross-platform 3D scene reconstruction. Rust implementation suggests performance focus and broad deployment potential.
Build idea
Launch a SaaS platform for real estate agents and architects that converts smartphone video walkthroughs into photorealistic 3D Gaussian Splat scenes embeddable on property listing pages, powered by Brush's cross-platform reconstruction engine.
7
GH Trending
Hmbown/DeepSeek-TUI
rust 31,750 2,684 7,444 stars this week
DeepSeek-TUI is a terminal-based coding agent for DeepSeek models built in Rust, gaining 7.4K stars this week alone. Strong traction suggests real developer demand for lightweight, local DeepSeek coding workflows.
Build idea
Offer a subscription-based, privacy-first coding assistant CLI tool built on DeepSeek-TUI targeting developers in regulated industries who need a fully local, auditable AI coding workflow with no cloud telemetry.
8
GH Trending
MemoriLabs/Memori
python 14,583 2,173 323 stars this week
Memori is an LLM-agnostic memory infrastructure layer that converts agent execution and conversations into structured persistent state for production systems. Addresses a real gap in production agent deployments.
Build idea
Sell a managed memory-as-a-service layer for enterprise AI agent deployments, providing persistent, structured agent state with compliance logging, audit trails, and multi-agent memory sharing out of the box.
9
GH Trending
anthropics/financial-services
python 24,992 3,457 5,977 stars this week
Anthropic's official reference repo for Claude applied to financial services use cases, gaining ~6K stars this week. Part of a pattern of Anthropic releasing vertical-specific implementation guides.
Build idea
Build a vertical SaaS for wealth management firms that uses Claude's financial services patterns to automate client portfolio summaries, regulatory report drafting, and compliance Q&A, pre-configured for SEC and FINRA requirements.
10
GH Trending
millionco/react-doctor
typescript 10,073 322 2,430 stars this week
Static analysis tool that catches bad React patterns generated by AI coding agents, gaining 2,400+ stars this week. Addresses a real pain point as agentic code generation becomes mainstream.
Build idea
Offer a CI/CD plugin and dashboard service that automatically scans AI-generated React pull requests for anti-patterns using React Doctor, giving engineering teams a quality gate specifically tuned for agentic code output.

Trending Developers

Developers gaining traction on GitHub this week β€” shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1
Soju06
@Soju06
Soju06/codex-lb
Developer behind codex-lb, a multi-account load balancer and proxy for Codex/ChatGPT with usage tracking and OpenCode-compatible endpoints. Useful infrastructure for teams managing LLM API costs.
2
mumu
@ZhuLinsen
ZhuLinsen/daily_stock_analysis
Developer profile for an LLM-powered stock analysis system for A/H/US markets; the underlying project has some interest but the profile itself is not substantive.
3
Michael Ramos
@backnotprop
backnotprop/plannotator
Developer profile for Plannotator, a tool to visually annotate and review coding agent plans and diffs. Interesting niche but minimal public traction.
4
Garry Tan
@garrytan
garrytan/gstack
YC president Garry Tan's Claude Code setup with 23 opinionated agent tools (CEO, designer, eng manager roles). Interesting as a practitioner workflow but derivative.
5
Matt Van Horn
@mvanhorn
mvanhorn/last30days-skill
GitHub profile featuring an AI agent skill that researches topics across Reddit, X, YouTube, and HN then synthesizes summaries. Minimal technical detail available.
6
rUv
@ruvnet
ruvnet/ruflo
GitHub profile for a developer building Claude-based multi-agent orchestration tools. Limited technical detail from profile alone.
7
Trevin Chow
@tmchow
tmchow/hzl
GitHub profile featuring a Kanban task manager for OpenClaw agents β€” minimal technical detail available.
8
ζœ±ζ˜†ιΉ
@zhukunpenglinyutong
zhukunpenglinyutong/desktop-cc-gui
A VibeCoding GUI client for developers β€” minimal technical detail available, low AI research relevance.
9
ιƒ‘θ―š (Cheng Zheng)
@1c7
1c7/chinese-independent-developer
GitHub profile of a Chinese independent developer; not directly AI-relevant.
10
James Newton-King
@JamesNK
JamesNK/Newtonsoft.Json
GitHub profile of the Json.NET creator; not AI-relevant.
11
Indra Aryadi
@RajaSunrise
RajaSunrise/RajaSunrise
Generic trending developer profile with no notable AI-relevant projects.
12
Alexey Milovidov
@alexey-milovidov
alexey-milovidov/go-clickhouse
GitHub profile for ClickHouse creator β€” not AI-relevant content.
13
Andre Rinas
@andreknieriem
andreknieriem/headunit-revived
GitHub profile for Android Auto headunit app developer β€” not AI-relevant.
14
Stanislas
@angristan
angristan/openvpn-install
GitHub profile for OpenVPN setup scripts β€” not AI-relevant.
15
Daniel Γ–ster
@dalathegreat
dalathegreat/Battery-Emulator
Developer profile for EV battery emulator software β€” not AI-relevant.
16
Marco Cadetg
@domcyrus
domcyrus/rustnet
Developer profile for Rustnet network monitoring tool β€” not AI-relevant.
17
Leigh
@leighmcculloch
leighmcculloch/looks.wtf
GitHub profile for a trending developer with no AI-relevant content β€” just a collection of ASCII faces.
18
Liran Tal
@lirantal
lirantal/npm-security-best-practices
Trending GitHub developer profile focused on npm security best practices β€” not AI-related.
19
Mohsen Azimi
@mohsen1
mohsen1/tsz
Trending GitHub developer profile for a TypeScript performance tooling author β€” not AI-relevant.
20
Nat
@nazt
nazt/kien-thai
Trending GitHub developer profile with a Thai language teaching project β€” not AI-relevant.
21
R.I.Pienaar
@ripienaar
ripienaar/free-for-dev
Trending GitHub developer profile known for free-for-dev SaaS list β€” not AI-relevant.
22
robobun
@robobun
robobun/robobun
Trending GitHub developer profile with no substantive AI content.
23
Stephen Berry
@stephenberry
stephenberry/glaze
Trending GitHub developer profile for a C++ serialization library author β€” not AI-relevant.
24
Zoltan Kochan
@zkochan
zkochan/git-wt
Trending developer profile with no clear AI relevance β€” git-wt is a git worktree tool.

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard β€” Top 15
#ModelTypeEloVotes
1 claude-opus-4-6-thinking Anthropic Closed 1502 25,736
2 claude-opus-4-7-thinking Anthropic Closed 1500 11,197
3 claude-opus-4-6 Anthropic Closed 1498 27,338
4 claude-opus-4-7 Anthropic Closed 1492 11,792
5 muse-spark Meta Closed 1490 10,785
6 gemini-3.1-pro-preview Google Closed 1489 31,925
7 gemini-3-pro Google Closed 1486 41,331
8 gpt-5.5-high OpenAI Closed 1484 8,595
9 gpt-5.4-high OpenAI Closed 1479 19,345
10 grok-4.20-beta1 xAI Closed 1479 21,190
11 gpt-5.2-chat-latest-20260210 OpenAI Closed 1477 26,000
12 gpt-5.5 OpenAI Closed 1476 8,686
13 grok-4.20-beta-0309-reasoning xAI Closed 1476 19,765
14 grok-4.20-multi-agent-beta-0309 xAI Closed 1475 19,883
15 gemini-3-flash Google Closed 1473 30,750
New & Trending Models
deepseek-ai/DeepSeek-V4-Pro
3,435,748 downloads 4,026 likes 152 trending
Open Source 2026-04-22
DeepSeek-V4-Pro is DeepSeek's flagship open-weight model with 3.4M downloads and 4k likes β€” one of the most significant open-weight model releases this cycle, with FP8 support and strong benchmark results.
deepseek-ai/DeepSeek-V4-Flash
1,904,105 downloads 1,140 likes 96 trending
Open Source 2026-04-22
DeepSeek-V4-Flash is DeepSeek's latest fast inference model with 1.9M downloads and FP8 support β€” a major open-weight release targeting high-throughput deployment scenarios.
JonasGeiping/stream-qwen3.5-27b
652 downloads 15 likes 15 trending
Open Source 2026-05-12
Stream-Qwen3.5-27B introduces a multi-stream parallel cognition architecture on top of Qwen3.5, enabling monitorable parallel reasoning threads β€” a novel architectural experiment in interpretable/parallel LLM cognition worth watching.
Qwen/WebWorld-8B
1,376 downloads 44 likes 25 trending
Open Source 2026-02-13
WebWorld-8B from Qwen is a web agent world model fine-tuned on Qwen3-8B for long-horizon browser tasks, trained on synthetic trajectories with accessibility tree/HTML understanding β€” a strong open-source web agent model with a dedicated dataset.
ByteDance-Seed/Cola-DLM
19 likes 19 trending
Open Source 2026-05-15
ByteDance's Cola-DLM is a diffusion language model using latent diffusion and flow-matching for text generation β€” represents continued exploration of non-autoregressive LLM paradigms with an associated arxiv paper.
IAAR-Shanghai/MemPrivacy-1.7B-SFT
763 downloads 25 likes 23 trending
cc-by-nc-nd-4.0 2026-05-09
MemPrivacy-1.7B is a small SFT model for privacy-aware agent memory management, detecting and filtering sensitive information in personalized memory systems β€” addresses a real gap in LLM agent deployments.
inclusionAI/Ring-2.6-1T
2,406 downloads 74 likes 73 trending
Open Source 2026-05-14
Ring-2.6-1T is a 1-trillion parameter hybrid architecture model from inclusionAI with compressed-tensor support β€” notable scale for an open-weight release, though limited documentation is available.
nvidia/Kimi-K2.6-NVFP4
36,700 downloads 20 likes 20 trending
Custom License 2026-05-11
NVIDIA's FP4 quantization of Kimi-K2.6 using ModelOpt β€” demonstrates NVIDIA's push for FP4 inference efficiency on their hardware stack, with 36k downloads indicating practical adoption.
z-lab/Qwen3.6-27B-DFlash
62,256 downloads 305 likes 25 trending
Open Source 2026-04-23
Qwen3.6-27B fine-tuned with DFlash (diffusion-based flash decoding) for speculative decoding efficiency, referencing arxiv:2602.06036; 62K downloads suggests real adoption of this inference optimization approach.
zai-org/GLM-5.1
248,947 downloads 1,662 likes 34 trending
Open Source 2026-04-03
GLM-5.1 from Zhipu AI is a MoE-based bilingual (EN/ZH) model with 248K downloads and 1662 likes, referencing arxiv:2602.15763; a significant Chinese frontier model release worth tracking for multilingual benchmarks.
zed-industries/zeta-2.1
1,819 downloads 40 likes 28 trending
Open Source 2026-05-07
Zed's Zeta-2.1 is a fine-tuned Seed-Coder-8B model specialized for next-edit prediction and edit suggestion within the Zed editor; represents a purpose-built code editing model from a developer tools company with real deployment context.
FINAL-Bench/Darwin-28B-REASON
139 downloads 22 likes 19 trending
Open Source 2026-05-17
Darwin-28B-REASON is a 28B reasoning model fine-tuned via reasoning trace distillation on Qwen3.5, targeting GPQA-level benchmarks with chain-of-thought and test-time compute scaling.
FrontiersMind/Nandi-Mini-600M-Early-Checkpoint
18,203 downloads 90 likes 67 trending
Open Source 2026-05-13
Nandi-Mini-600M is a small multilingual model supporting 10+ Indian languages (Hindi, Tamil, Telugu, etc.) β€” notable for Indic language coverage at a compact size, with 18k downloads suggesting real community interest.
XiaomiMiMo/MiMo-V2.5-Pro
60,415 downloads 527 likes 20 trending
Open Source 2026-04-27
Xiaomi's MiMo-V2.5-Pro is a long-context, agent-capable model with FP8 support and 60k downloads β€” competitive open-source offering but limited public technical documentation.
antirez/deepseek-v4-gguf
295,917 downloads 143 likes 73 trending
Open Source 2026-04-26
GGUF quantizations of DeepSeek-V4-Flash by antirez (Redis creator), optimized for Apple Silicon with 2-bit and 4-bit variants β€” notable for the author's profile and 295k downloads indicating strong local inference demand.
Model Buzz

Trending Spaces

The hottest interactive demos and apps on HuggingFace Spaces this week β€” try them live. Flame icon = HuggingFace trending score. Hearts = community likes.

The ultimate guide to RL environments: building and scaling them in the LLM era
AdithyaSK
docker 161 35
A comprehensive guide to building and scaling RL environments for LLM training β€” practical resource covering environment design patterns for RLHF/RLAIF era, useful for practitioners implementing reward-based LLM training.
Omni Video Factory
FrameAI4687
gradio 1,085 34
mit
Omni Video Factory is a Gradio demo supporting text-to-video, image-to-video, and video extension β€” a useful demo space but not a novel research contribution.
HiDream O1 Image
HiDream-ai
gradio 99 56
mit
HiDream O1 Image is a reasoning-augmented image generation demo β€” applies o1-style thinking to image synthesis, an emerging direction worth monitoring.
Qwen Image Edit 2509 LoRAs Fast
Onise
gradio 35 29
apache-2.0
A collection of Qwen-based image editing LoRAs with a fast inference demo β€” useful for practitioners doing image editing fine-tuning but incremental.
DramaBox
ResembleAI
gradio 58 58
other
DramaBox by ResembleAI is an expressive TTS system with voice cloning capabilities β€” demonstrates high-quality emotional speech synthesis with cloning, relevant for audio AI practitioners.
Supertonic 3 (TTS)
Supertone
static 127 104
openrail
Supertonic-3 is an on-device, multilingual TTS system claiming lightning-fast inference β€” the on-device angle and multilingual support make this notable for edge deployment of speech synthesis.
Pixal3D
TencentARC
gradio 160 147
Pixal3D from TencentARC achieves high-fidelity pixel-aligned image-to-3D generation β€” pixel alignment is a key quality improvement over prior methods, and Tencent's backing suggests strong engineering behind the demo.
Qwen Image Edit + Loras built-in
akhaliq
gradio 53 40
apache-2.0
Demo space combining Qwen image editing with built-in LoRA support for stylized image editing. Useful for exploring LoRA-augmented instruction-based image editing but incremental.
Wan2.2 14B Fast Preview
cbensimon
gradio 60 56
Fast preview of Wan2.2 14B image-to-video model using FP8 quantization and AOT compilation for accelerated inference. Signals continued progress on efficient video generation at scale.
OmniVoice
k2-fsa
gradio 870 43
apache-2.0
High-quality voice cloning TTS system supporting 600+ languages with 870 likes, suggesting broad multilingual coverage is a differentiator. Noteworthy for its language breadth but limited technical detail available.
Wan2.2 14B Preview
kulkas2pintu
gradio 155 32
Another community-hosted Wan2.2 14B image-to-video demo; duplicate of other Wan2.2 spaces with no additional novelty.
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P
mikeee
docker 268 105
mit
Uncensored Gemma-4 variant running in a repurposed Qwen chat space; primarily a jailbreak-adjacent demo with minimal technical contribution.
FireRed Image Edit 1.0 Fast
prithivMLmods
gradio 1,279 68
apache-2.0
Fast image editing demo combining FireRed and Qwen-Image-Edit-Rapid via Transformers, with 1279 likes indicating community traction. Incremental tooling on top of existing models.
Qwen-Image-Edit-2511-LoRAs-Fast
prithivMLmods
gradio 1,444 54
apache-2.0
Collection of Qwen Image Edit LoRAs with 1444 likes; demonstrates the ecosystem forming around Qwen's image editing capabilities but is derivative of the base model.
Wan2.2 14B Preview
r3gm
gradio 2,612 47
Wan2.2 14B image-to-video preview with 2612 likes; one of several community spaces showcasing this model's FP8 inference capabilities.

Conference Papers

Accepted papers from top AI conferences via OpenReview.

Showing accepted papers from active venues. Next deadlines: ICML 2026 (submissions open), NeurIPS 2026 (coming soon).

ICLR 2026 Pierre-Carl Langlais, Pavel Chizhov, Catherine Arnett et al. 2026-05-18
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
ICLR 2026 paper introducing Common Corpus, the largest openly licensed dataset for LLM pre-training designed to avoid copyright issues; directly addresses the legal sustainability of open LLM training data.
dataset pre-training large language models open data open science
ICLR 2026 Mouath Abu Daoud, Leen Kharouf, Omar El Hajj et al. 2026-05-18
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
Large-scale Arabic medical QA dataset and benchmark for ICLR 2026, addressing a significant gap in multilingual medical NLP. Useful for researchers working on low-resource language models in clinical domains.
Dataset Benchmark Large Language Models Arabic Natural Language Processing Medical Question Answering
ICLR 2026 Zhiheng Chen, Ruofan Wu, Guanhua Fang et al. 2026-05-18
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Theoretical study showing transformers can implicitly perform unsupervised learning (e.g., Gaussian mixture fitting) during inference, advancing understanding of in-context learning mechanisms. Provides formal grounding for why ICL works beyond simple pattern matching.
In-context learning Gaussian Mixture Models Theory
ICLR 2026 Ron Vainshtein, Zohar Rimon, Shie Mannor et al. 2026-05-18
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Introduces Task Tokens for adapting transformer-based behavior foundation models in humanoid control without full retraining, enabling flexible multi-task conditioning. Relevant to the growing field of generalist robot policies.
Reinforcement Learning Hierarchial Reinforcement Learning Behavior Foundation Models Humanoid Control
ICLR 2026 Kaien Sho, Shinji Ito 2026-05-18
Submodular Function Minimization with Dueling Oracle
Theoretical work on submodular function minimization using noisy pairwise comparison oracles, with connections to preference-based optimization. Niche theory paper with limited direct ML practitioner impact.
submodular minimization deling oracle preference-based optimization
ICLR 2026 Rongjin Li, Zichen Tang, Xianghe Wang et al. 2026-05-18
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Benchmarks MLLMs on scan-oriented academic paper reasoning β€” a harder task than retrieval, requiring models to synthesize information across full documents. Highlights a key gap between current MLLM capabilities and autonomous research assistance.
Multimodal Large Language Models Academic Paper Reasoning Scan-Oriented Reasoning
ICLR 2026 Peng Sun, Tao Lin 2026-05-18
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Proposes N-th order recursive consistent velocity field estimation for any-step generation, simplifying few-step generative model training while reducing computational overhead versus consistency models. Incremental but practically useful improvement.
Generative Models
ICLR 2026 Zeyu Feng, Haiyan Yin, Yew-Soon Ong et al. 2026-05-18
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MSTT is a fully offline hierarchical RL framework using masked skill tokens to transfer policies across environments with different dynamics. Addresses a practical sim-to-real gap problem without requiring online interaction.
Tranfser Learning Skills Hierarchical RL Embodied AI
ICLR 2026 Shaojie Li, Pengwei Tang, Bowei Zhu et al. 2026-05-18
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Provides high-probability convergence and generalization bounds for SGD with momentum in non-convex settings, filling a theoretical gap. Primarily of interest to optimization theorists rather than practitioners.
Momentum nonconvex learning generalization
ICLR 2026 Artyom Sorokin, Nazar Buzun, Aleksandr Anokhin et al. 2026-05-18
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Q-RAG trains retrieval embedders using RL value functions to support multi-step retrieval for complex multi-hop QA, going beyond single-step RAG limitations. Applies RL-based training signal to the retrieval component itself, which is a novel angle.
Reinforcement Learning RL QA Long-context RAG
ICLR 2026 Seongtae Hong, Youngjoon Jang, Jungseob Lee et al. 2026-05-18
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Improves cross-lingual information retrieval through better multilingual embedding alignment, addressing semantic proximity across languages. Solid but incremental work in a well-studied area.
Cross-Lingual Alignment Information Retrieval Multilingual Embedding Cross-Lingual Information Retrieval
ICLR 2026 Rahul Ramachandran, Ali Garjani, Roman Bachmann et al. 2026-05-18
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Systematic benchmark of GPT-4o, o4-mini, Gemini 1.5/2.0 Pro on standard CV tasks (depth, segmentation, optical flow, etc.), revealing where frontier multimodal models still fall short versus specialized vision models. Important calibration for teams deciding when to use VLMs vs. task-specific models.
vision benchmark multimodal foundation models vision language models standard computer vision tasks
ICLR 2026 Tin Hadži Veljković, Erik J Bekkers, Michael Tiemann et al. 2026-05-18
CORDS - Continuous Representations of Discrete Structures
CORDS introduces continuous neural field representations for variable-cardinality set prediction (object detection, molecular modeling), enabling diffusion/flow matching over discrete structures. Novel representation approach with broad applicability.
Continuous set representations Neural fields Variable-cardinality prediction Invertible encoding/decoding Diffusion and flow matching
ICLR 2026 Christopher Mitcheltree, Vincent Lostanlen, Emmanouil Benetos et al. 2026-05-18
SCRAPL: Scattering Transform with Random Paths for Machine Learning
SCRAPL uses random path sampling in scattering transforms to reduce computational cost while preserving perceptual gradient quality for audio/vision inverse problems. Niche but useful for audio ML practitioners.
scattering transform wavelets stochastic optimization ddsp perceptual quality assessment
ICLR 2026 Antanas Ε½ilinskas, Robert Noel Shorten, Jakub Marecek et al. 2026-05-18
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
EVEREST is a transformer architecture for probabilistic rare-event forecasting in multivariate time series, combining evidential deep learning with extreme value theory to handle severe class imbalance. Relevant for anomaly detection in production systems.
Transformer models Uncertainty quantification Evidential deep learning Extreme value theory Imbalanced classification
ICLR 2026 Harris Abdul Majid, Pietro Sittoni, Francesco Tudisco et al. 2026-05-18
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Recurrent-Depth Simulator enables test-time accuracy-cost trade-offs in neural simulators by varying recurrent depth, analogous to adaptive step-size in classical numerical methods. Useful for scientific ML applications requiring flexible compute budgets.
Neural Simulator Recurrent Depth AI4Simulation
ICLR 2026 Kun XIE, Peng Zhou, Xingyi Zhang et al. 2026-05-18
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
PoinnCARE applies hyperbolic space multi-modal learning to enzyme classification, capturing hierarchical EC number relationships better than Euclidean methods. Domain-specific but methodologically interesting for bio-ML researchers.
EC number prediction enzyme function hyperbolic space learning multi-modal learning enzyme structure
ICLR 2026 Tianqiao Liu, Xueyi Li, Hao Wang et al. 2026-05-18
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Proposes non-autoregressive joint training for audio-language models in speech-to-speech systems, addressing latency and quality issues in interleaved audio-text generation. Relevant to real-time voice AI applications.
Large Multimodal Models Multi-token Prediction Non-Autoregressive Learning
ICLR 2026 Qinglong Yang, Haoming Li, Haotian Zhao et al. 2026-05-18
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
FingerTip 20K benchmarks proactive and personalized mobile GUI agents that act without explicit instructions by inferring user intent from context. Pushes mobile agent evaluation beyond reactive instruction-following toward anticipatory behavior.
Mobile Agent LLM Agent GUI Proactive Agent Personalization
ICLR 2026 Tianxiang Dai, Jonathan Fan 2026-05-18
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Provides rigorous spatial kernel analysis of Multi-Resolution Hash Encoding (Instant-NGP), replacing heuristic hyperparameter tuning with principled design. Useful for practitioners building NeRF/neural field pipelines.
multi-resolution hash encoding implicit neural representations neural fields point spread function spatial kernel analysis

Deep Dive

All 325 items scored and categorized. Relevance scores reflect novelty, technical depth, and practical impact β€” 7+ items are the ones worth your time.

325+ research items ready to explore