AI Quick Bites

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models BREAKOUT

First independent safety evaluation of Kimi K2.5 reveals it has significantly fewer CBRNE refusals than GPT/Claude equivalents and concerning self-replication propensity — a concrete warning about open-weight frontier models shipped without safety evaluations.

arxiv 2026-04-06 20 min

02

A Systematic Security Evaluation of OpenClaw and Its Variants BREAKOUT

Empirical evidence that tool-augmented agent frameworks are dramatically riskier than their underlying models, with credential leakage and privilege escalation emerging as systematic failure modes across all six tested frameworks.

arxiv 2026-04-06 20 min

03

Challenges the core assumption of multimodal RL training — models can improve reasoning scores even when visual input is corrupted, suggesting benchmark gains may not reflect genuine visual grounding.

arxiv 2026-04-06 18 min

04

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control BREAKOUT

Mechanistic finding that a 2D valence-arousal subspace in LLM representations directly controls refusal and sycophancy rates, offering a new interpretability handle for safety-relevant model behaviors.

arxiv 2026-04-06 18 min

05

Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents BREAKOUT

Quantifies citation hallucination at scale across 10 models and provides an open-source fix that reduces hallucinated URLs by up to 79x — directly actionable for anyone building research or RAG agents.

arxiv 2026-04-06 18 min

What Changed This Week

Week-over-week diff showing new arrivals, items gaining momentum, and topics that dropped off the radar. All scores are AI relevance (0–10).

New This Week 278 items

Rising 91 items

+10 Alireza Rezvani (@alirezarezvani) 10
+10 Lukas Holecek (@hluk) 10
+10 starship/starship 10

Dropped Off 267 items

gone AI overly affirms users asking for personal advice was 8
gone ChromeDevTools/chrome-devtools-mcp was 8
gone iPhone 17 Pro Demonstrated Running a 400B LLM was 8

Category Trends

How AI research areas are shifting week over week. Charts track volume changes over 10 weeks — spot rising fields before they peak.

AI Dev Tools

72 items +6 (9%)

LLM Agents

54 items +14 (35%)

AI Applications

48 items -7 (-13%)

Inference & Local Models

37 items +7 (23%)

Generative Media

24 items -16 (-40%)

AI Security

21 items +8 (62%)

Training & Fine-tuning

20 items -11 (-35%)

Computer Vision

15 items +1 (7%)

Multimodal AI

15 items -1 (-6%)

AI Safety

13 items +1 (8%)

AI Security

Novel attack vectors, jailbreak research, red-teaming findings, and defensive tools across the AI security landscape. Only items with genuine technical substance make it here. Scores are AI relevance (0–10): 7+ important, 9+ landmark.

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents

ACE benchmark quantifies adversarial effort to breach LLM agents in token/dollar cost rather than binary pass/fail, enabling game-theoretic analysis of attack economics across six budget-tier models. Novel framing that shifts agent security evaluation from capability to cost-efficiency.

hackernews 2026-04-06 8 min

Claude Code Found a Linux Vulnerability Hidden for 23 Years

Claude Code autonomously discovered a 23-year-old Linux kernel vulnerability during a coding session — a compelling real-world demonstration of AI-assisted vulnerability research with significant implications for security tooling.

hackernews 2026-04-06 10 min

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

Proposes honesty fine-tuning to make LLMs self-report hidden objectives during alignment auditing, addressing the weakness that models can deceive interrogators. Directly relevant to agentic safety and deceptive alignment research.

conferences 2026-04-06 20 min

Claude Code's source code has been leaked via a map file in their NPM registry

Claude Code's minified source was accidentally exposed via a source map file in the NPM package, revealing internal architecture including fake tools, frustration regexes, and stealth mode. Major accidental disclosure with significant implications for understanding production LLM agent design.

Independent safety evaluation of Kimi K2.5 finds it has fewer CBRNE refusals than GPT and Claude equivalents, concerning sabotage/self-replication propensity, and narrow political censorship — highlighting safety gaps in open-weight frontier models released without safety evaluations.

Learning the Signature of Memorization in Autoregressive Language Models

LT-MIA introduces the first transferable learned membership inference attack for LLMs, discovering an architecture-invariant memorization signature that transfers zero-shot from transformers to Mamba, RWKV-4, and RecurrentGemma (0.93-0.97 AUC), achieving 2.8x higher TPR at 0.1% FPR than baselines. Significant advance in LLM privacy attacks with broad architectural generalization.

Show HN: Live simulation of AI agents scamming each other (and getting caught)

Live simulation exposing critical trust vulnerabilities in AI agent payment ecosystems: one wallet registered 10,000+ fake agent services on x402, and ~1,900 MCP tools silently changed behavior post-approval with no reputation system to catch it. Demonstrates concrete, real-world attack surfaces emerging as agent payment rails go live at Stripe, Coinbase, and Visa.

KeygraphHQ/shannon

Shannon Lite is an autonomous white-box AI pentester that analyzes source code, identifies attack vectors, and executes real exploits against web apps and APIs — 35k stars signals strong practitioner interest in AI-driven offensive security tooling.

trendshift 2026-04-06 8 min

Airupt – open-source red-teaming for LLMs (79 attack vectors)

Airupt is an open-source red-teaming framework for LLMs covering 79 distinct attack vectors, providing a structured toolkit for systematic LLM vulnerability assessment. Novel breadth of attack coverage makes this a meaningful contribution to LLM security tooling.

Claude AI finds Vim, Emacs RCE bugs that trigger on file open

Claude AI autonomously discovered RCE vulnerabilities in Vim and Emacs that trigger on file open, demonstrating LLMs as effective automated vulnerability research tools. Concrete evidence of AI-assisted security research finding real critical bugs.

Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

Proposes universal adversarial patches (UCGP) targeting infrared vision-language models in physical-world deployments, disrupting cross-modal semantic alignment rather than manipulating labels. Novel attack surface for IR-VLMs with real-world physical effectiveness demonstrated.

A Systematic Security Evaluation of OpenClaw and Its Variants

Systematic security evaluation of six OpenClaw agent frameworks across 205 test cases finds all exhibit substantial vulnerabilities including credential leakage and privilege escalation, with agentized systems significantly riskier than their underlying models alone.

Show HN: CargoWall – eBPF Firewall for GitHub Actions

CargoWall is an eBPF-based firewall for GitHub Actions that uses DNS proxying to restrict outbound connections from CI runners and LLM agents, directly addressing supply chain attack risks in agentic CI pipelines. Practical security tool with clear AI agent relevance.

WTF, Anthropic's Claude Code keeps track of every time you swear

Anthropic's leaked Claude Code source revealed a 'frustration detector' that tracks user profanity to infer emotional state — raising questions about undisclosed behavioral telemetry in AI coding tools and consent around affective monitoring.

hackernews 2026-04-06 6 min

Anthropic Races to Contain Leak of Code Behind Claude AI Agent

Anthropic accidentally exposed Claude Code's internal source code publicly, revealing unreleased features including an 'undercover mode' and frustration tracking — the leak was cloned within hours before takedown, exposing proprietary agent architecture.

Qwen3.5 Omni Offline Demo

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors

#1

Qwen

3 items · avg 6.0/10

18.0

#2

r3gm

2 items · avg 6.0/10

Wan2.2 14B Preview

12.0

#3

Delip Rao

2 items · avg 4.5/10

Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents

9.0

#4

prithivMLmods

2 items · avg 4.5/10

FireRed Image Edit 1.0 Fast

9.0

#5

Chloe Li

1 item · avg 8.0/10

8.0

#6

Zheng-Xin Yong

1 item · avg 7.0/10

NousResearch/hermes-agent

7.0

Top Organizations

#1

microsoft

6 items · avg 6.3/10

microsoft/VibeVoice

38.0

#2

HKUDS

6 items · avg 5.3/10

HKUDS/RAG-Anything

32.0

#3

SakanaAI

2 items · avg 8.5/10

SakanaAI/AI-Scientist-v2

17.0

#4

openai

2 items · avg 8.0/10

openai/codex

16.0

#5

KeygraphHQ

2 items · avg 7.0/10

KeygraphHQ/shannon

14.0

#6

NousResearch

2 items · avg 7.0/10

14.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

Agent Trust Firewall

A real-time reputation and behavioral monitoring layer for AI agent ecosystems that detects post-approval tool mutations, fake service registrations, and anomalous payment flows. As agent payment rails from Stripe, Coinbase, and Visa go live, there is no system to catch agents that change behavior after being approved or wallets that flood marketplaces with fake services. Build a lightweight middleware that fingerprints tool behavior at registration and continuously diffs it against live execution, flagging drift and revoking trust scores automatically.

AI agent marketplaces and MCP tool registries Autonomous payment and x402 protocol deployments Enterprise agentic workflow security auditing Multi-agent orchestration platforms

http://5.161.255.238:8888 https://arxiv.org/abs/2604.03131v1

Citation Integrity Guard

An agentic self-correction pipeline that validates academic and web citations generated by LLMs before they reach the user, catching hallucinated BibTeX entries and non-resolving URLs in real time. Research shows only 50.9% of LLM-generated BibTeX entries are fully correct and 3-13% of URLs never existed, yet deep research agents hallucinate at even higher rates. Build this as a drop-in post-processing layer or browser extension that runs two-stage verification — existence check then metadata validation — reducing citation errors by up to 79x.

AI-assisted academic writing and literature review tools Deep research agent outputs (Perplexity, ChatGPT Deep Research) Legal and compliance document generation Journalism and fact-checking workflows

https://arxiv.org/abs/2604.03173v1 https://arxiv.org/abs/2604.03159v1

Open-Weight Safety Scanner

A standardized, automated safety evaluation suite for open-weight frontier models that runs before public release, covering CBRNE refusal rates, self-replication propensity, sabotage behaviors, and political censorship patterns. The independent evaluation of Kimi K2.5 revealed alarming safety gaps that went undetected because no mandatory pre-release evaluation existed for open-weight models. Build this as an open-source CI/CD-style pipeline that model authors can run locally and publish results to a public leaderboard, creating community accountability without requiring centralized gatekeeping.

Open-weight model release pipelines (Hugging Face, Ollama) Enterprise procurement and model vetting workflows AI safety research benchmarking Government and regulatory compliance reporting

https://arxiv.org/abs/2604.03121v1 https://arxiv.org/abs/2604.03114v1

Local Coding Agent Stack

A fully local, privacy-preserving coding agent that combines on-device model inference (via Apple Silicon or WebGPU) with an agentic loop modeled on Claude Code's architecture — including tool use, frustration detection, and iterative self-correction — but running entirely offline. The Claude Code source leak revealed the concrete engineering patterns behind a production coding agent, and benchmarks show WebGPU inference is now viable for mid-size models in-browser. Build an open-source reference implementation that lets developers run a capable coding agent without sending code to external APIs.

Privacy-sensitive enterprise codebases Offline and air-gapped development environments Browser-native coding assistants with no backend Developer tools for regulated industries (finance, healthcare, defense)

https://alex000kim.com/posts/2026-03-31-... https://arxiv.org/abs/2604.02344 https://ai.georgeliu.com/p/running-googl... https://apfel.franzai.com

LLM Privacy Audit Tool

A membership inference attack toolkit that helps organizations detect whether their proprietary or sensitive data was used to train a given LLM, leveraging the newly discovered architecture-invariant memorization signature that transfers zero-shot across transformers, Mamba, and RWKV models. With LT-MIA achieving 0.93-0.97 AUC and 2.8x higher true positive rates than prior baselines, this attack surface is now practical enough to productize. Build a SaaS tool where enterprises submit sample documents and receive a probabilistic report on training data exposure risk across major open and closed models.

Legal discovery and IP litigation support GDPR and CCPA right-to-erasure compliance verification Enterprise data governance and model procurement auditing Journalism and investigative research into model training practices

https://arxiv.org/abs/2604.03199v1 https://arxiv.org/abs/2604.03121v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1

HyperCap

Remap Caps Lock to a hyperkey, just hold it + any key

Productivity Custom Keyboards Menu Bar Apps

0

2

https://www.producthunt.com/r/7FQ3T...

#2

Adapted

AI Physical Therapy for Athletes

iOS Health & Fitness Sports

0

8

https://www.producthunt.com/r/R3JEH...

#3

KREV

AI creative agents for ecommerce brands

Marketing Artificial Intelligence E-Commerce

0

5

https://www.producthunt.com/r/FQ7RR...

#4

Deploy Hermes

Private Telegram AI agents, live in under a minute

Productivity Developer Tools Artificial Intelligence

0

4

https://www.producthunt.com/r/PGZRX...

#5

Moonshot

Track the Artemis II mission from your Mac

Space GitHub Menu Bar Apps

0

5

https://www.producthunt.com/r/GDTDN...

#6

PixVerse V6

The AI video model that actually feels alive.

Artificial Intelligence Video

0

10

https://www.producthunt.com/r/IQKRN...

#7

Epismo Context Pack

Portable memory for agent workflows

Productivity Developer Tools Artificial Intelligence

0

6

https://www.producthunt.com/r/BD6K2...

#8

Predflow AI

Your AI agent for ad performance

Analytics Marketing Advertising

0

4

https://www.producthunt.com/r/4ZCK2...

#9

DebtMeltPro

Compare debt payoff strategies and become debt-free faster

Productivity Fintech

0

3

https://www.producthunt.com/r/MGY4O...

#10

Metoro

AI SRE that detects, root causes & auto-fixes K8s incidents

SaaS Artificial Intelligence

0

10

https://www.producthunt.com/r/NAEAJ...

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1

python 5,029 696 1,201 stars this week

SakanaAI/AI-Scientist-v2

AI Scientist-v2 from SakanaAI uses agentic tree search to automate scientific discovery at workshop-paper quality, representing a significant leap in autonomous research agents. With 5K+ stars and 696 forks, this is one of the most substantive AI agent systems publicly released.

A SaaS platform for biotech and pharma R&D teams that uses AI-Scientist-v2 to autonomously generate, test, and summarize novel research hypotheses, dramatically accelerating early-stage drug discovery literature and experiment design.

2

rust 73,388 10,317 5,169 stars this week

openai/codex

OpenAI's official lightweight coding agent written in Rust that runs directly in the terminal, enabling agentic code generation and editing workflows without leaving the CLI; 73K+ stars and breakout traction signal strong developer interest.

A developer productivity tool that wraps Codex into a team-shared CLI environment with audit logs, role-based permissions, and usage analytics, sold as an enterprise coding agent solution for engineering teams wanting agentic workflows without leaving the terminal.

3

TrendShift

KeygraphHQ/shannon

TypeScript 35,300 3,600

Shannon Lite is an autonomous white-box AI pentester that analyzes source code, identifies attack vectors, and executes real exploits against web apps and APIs — 35k stars signals strong practitioner interest in AI-driven offensive security tooling.

A continuous security testing SaaS that integrates Shannon into CI/CD pipelines to automatically pentest web apps and APIs on every code push, delivering actionable exploit reports before vulnerabilities reach production.

4

TrendShift

NousResearch/hermes-agent

Python 26,600 3,500

NousResearch's Hermes Agent framework with 26.6k stars — from the team behind the Hermes model series, this represents a serious open-source agent system worth tracking for its model-agent co-design approach.

A vertical AI agent builder for legal and compliance teams that leverages Hermes Agent's model-agent co-design to deploy specialized autonomous agents for contract review, regulatory monitoring, and compliance reporting.

5

python 36,586 4,200 10,543 stars this week

microsoft/VibeVoice

Microsoft's open-source frontier voice AI project with 36K+ stars and 10,500+ stars this week — one of the fastest-growing repos in the batch. Signals Microsoft's push into open voice AI, potentially competitive with ElevenLabs and OpenAI's voice stack.

A white-label voice AI platform for call centers and customer support teams that uses VibeVoice to deploy branded, low-latency conversational voice agents without dependency on ElevenLabs or OpenAI voice APIs.

Swift CLI tool exposing Apple's on-device FoundationModels framework for local LLM inference with no API keys or cloud dependency, gaining 1900 stars quickly. Interesting for Apple Silicon on-device AI development.

A privacy-first AI writing and productivity app for macOS that runs entirely on-device using Apple's FoundationModels via apfel, marketed to professionals in regulated industries like healthcare and law who cannot send data to the cloud.

7

rust 6,869 558 126 stars this week

EricLBuehler/mistral.rs

Fast Rust-based LLM inference engine with broad model support, continuing to gain traction with 6.8k stars — solid alternative to llama.cpp for performance-critical deployments.

A managed on-premise LLM inference appliance for enterprises, pre-configured with mistral.rs on high-performance hardware, offering a plug-and-play private AI backend with guaranteed throughput SLAs.

8

python 15,258 1,820 500 stars this week

HKUDS/RAG-Anything

All-in-one RAG framework from HKUDS supporting diverse data modalities, gaining strong traction at 15k stars with 500 new this week — broad coverage but likely incremental over existing RAG stacks.

A document intelligence SaaS for enterprises that uses RAG-Anything to ingest mixed-modality corporate knowledge bases — PDFs, spreadsheets, images, and databases — and expose a unified Q&A API for internal tools and chatbots.

9

rust 33,304 1,720 131 stars this week

TabbyML/tabby

Tabby is a mature self-hosted AI coding assistant with 33K+ stars, offering a privacy-preserving alternative to GitHub Copilot. Steady but not a new release — included for its continued traction.

A managed private Tabby hosting service for mid-market companies that want GitHub Copilot-level coding assistance without sending proprietary code to third-party servers, offered as a fully managed deployment on the customer's own cloud account.

10

typescript 17,008 1,626 13,476 stars this week

Yeachan-Heo/oh-my-codex

OmX extends OpenAI Codex with hooks, agent teams, and HUDs — a plugin/harness layer for coding agents that gained 13K+ stars in a week, signaling strong developer interest in agent orchestration tooling.

A marketplace and management platform for OmX-compatible coding agent plugins and hook libraries, letting developers publish, monetize, and compose agent team configurations for specific tech stacks or workflows.

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1

Илия

@777genius

777genius/claude-code-source-code

GitHub developer trending for a repo claiming to contain Claude Code source code — likely a leak or reconstruction, but provenance and completeness are unverified.

2

Benson Wong

@mostlygeek

mostlygeek/llama-swap

Developer behind llama-swap, a tool for reliable model swapping across local OpenAI/Anthropic-compatible servers (llama.cpp, vllm). Useful for local inference orchestration.

3

Cole Murray

@ColeMurray

ColeMurray/background-agents

4

Mervin Praison

@MervinPraison

MervinPraison/PraisonAI

PraisonAI multi-agent framework for automating complex tasks with low-code interface — another entry in the crowded multi-agent framework space.

5

Alireza Rezvani

@alirezarezvani

alirezarezvani/claude-skills

Collection of 220+ Claude Code skills and agent plugins for various coding agents — useful prompt/plugin library but derivative in nature.

6

Frank Bria

@frankbria

frankbria/ralph-claude-code

Developer profile featuring an autonomous AI development loop for Claude Code with intelligent exit detection. Minimal technical detail available from profile alone.

Developer behind OpenCLI, a universal CLI hub that can wrap websites and apps into CLI interfaces with AI-native runtime. Interesting concept but sparse technical detail from profile.

8

Shantanu

@hauntsaninja

hauntsaninja/git_bayesect

Developer profile featuring a Bayesian git bisect tool — interesting engineering but not AI/ML relevant.

9

Duy /zuey/

@mrgoonie

mrgoonie/claudekit-skills

Developer profile for ClaudeKit skills collection — insufficient technical detail to evaluate.

10

LmeSzinc

@LmeSzinc

LmeSzinc/AzurLaneAutoScript

Game automation bot for Azur Lane — not AI/ML relevant.

alexey-milovidov/font-selector

clubanderson/clubTivi

16

dkhamsing

@dkhamsing

dkhamsing/open-source-ios-apps

Clipboard manager developer profile — not AI-related.

Personal toolbox developer profile — not AI-relevant.

macOS reminders CLI developer profile — not AI-relevant.

20

Matt Van Horn

@mvanhorn

mvanhorn/last30days-skill

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

Build Your Agent Team for Real-World AI Development - Workspace-first multi-agent coordination platform for AI development, with shared S…

Extremely fast, in memory, JSON and reflection library for modern C++. BEVE, CBOR, CSV, MessagePack, TOML, YAML, EETF

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard — Top 15

#	Model	Type	Elo	Votes
1	claude-opus-4-6-thinking Anthropic	Closed	1504	13,979
2	claude-opus-4-6 Anthropic	Closed	1499	14,934
3	gemini-3.1-pro-preview Google	Closed	1494	17,559
4	grok-4.20-beta1 xAI	Closed	1491	7,380
5	gemini-3-pro Google	Closed	1486	41,632
6	gpt-5.4-high OpenAI	Closed	1484	7,160
7	grok-4.20-beta-0309-reasoning xAI	Closed	1481	7,344
8	gpt-5.2-chat-latest-20260210 OpenAI	Closed	1478	13,083
9	gemini-3-flash Google	Closed	1474	30,966
10	grok-4.20-multi-agent-beta-0309 xAI	Closed	1474	7,815
11	claude-opus-4-5-20251101-thinking-32k Anthropic	Closed	1474	37,467
12	grok-4.1-thinking xAI	Closed	1471	45,399
13	claude-opus-4-5-20251101 Anthropic	Closed	1468	44,715
14	qwen3.5-max-preview Alibaba	Closed	1467	5,899
15	dola-seed-2.0-pro Bytedance	Closed	1465	2,986

New & Trending Models

zai-org/GLM-5

295,830 downloads 1,953 likes 67 trending

Open Source 2026-02-11

GLM-5 from Zhipu AI (ZAI) is a new MoE-based foundation model with 295K downloads and 1953 likes, backed by an arXiv paper — represents a significant new Chinese-developed frontier model release.

LiquidAI/LFM2.5-350M

17,695 downloads 232 likes 232 trending

Custom License 2026-03-31

LiquidAI's LFM2.5-350M is a compact multilingual edge model (10 languages) based on their liquid foundation model architecture — notable for on-device deployment with strong multilingual coverage at sub-1B scale.

chromadb/context-1

3,729 downloads 371 likes 116 trending

Open Source 2026-03-12

ChromaDB releases context-1, a fine-tune of OpenAI's gpt-oss-20b optimized for retrieval/context tasks — notable as a vector DB company entering the model space to improve RAG pipelines.

nvidia/Gemma-4-31B-IT-NVFP4

129,352 downloads 207 likes 207 trending

Custom License 2026-04-02

NVIDIA releases NVFP4 quantized Gemma-4-31B-IT using their ModelOpt toolkit — FP4 quantization enables significantly faster inference on Hopper/Blackwell GPUs with 129K downloads indicating strong adoption.

prism-ml/Bonsai-8B-gguf

45,185 downloads 444 likes 444 trending

Open Source 2026-03-18

PrismML's Bonsai-8B 1-bit GGUF is the breakout model this week with 444 trending score and 45K downloads — extreme 1-bit quantization of an 8B model enabling very low memory on-device inference.

unsloth/Qwen3-Coder-Next-GGUF

273,309 downloads 544 likes 23 trending

Open Source 2026-02-03

Unsloth's GGUF quantization of Qwen3-Coder-Next with 273K downloads signals strong demand for locally-runnable coding models; imatrix quantization improves quality at lower bit depths.

zai-org/GLM-4.7-Flash

996,011 downloads 1,653 likes 21 trending

Open Source 2026-01-19

GLM-4.7-Flash from Zhipu AI is a fast, lightweight bilingual (EN/ZH) model with nearly 1M downloads, positioned as an efficient alternative in the GLM-4 family for production inference.

zed-industries/zeta-2

1,445 downloads 112 likes 27 trending

Open Source 2026-03-23

Zeta-2 is Zed editor's next-edit-prediction model fine-tuned from ByteDance's Seed-Coder-8B, designed for inline code suggestions — a specialized coding assistant model from a developer tooling company.

0xSero/gemma-4-21b-a4b-it-REAP

536 downloads 55 likes 55 trending

gemma 2026-04-05

A pruned variant of Gemma-4-21B using the REAP (expert pruning) method from Cerebras, reducing MoE active parameters to 4B — interesting application of structured pruning to MoE models.

MiniMaxAI/MiniMax-M2.5

631,119 downloads 1,345 likes 36 trending

Custom License 2026-02-12

MiniMax-M2.5 is a large open-weight model with 630K+ downloads, supporting FP8 inference and Azure deployment — established model continuing to see community traction.

Qwen/Qwen3-Coder-Next

743,293 downloads 1,225 likes 30 trending

Open Source 2026-01-30

Qwen3-Coder-Next is a code-focused model from Alibaba's Qwen team with 743K downloads, suggesting strong community adoption for coding tasks — limited metadata available.

Rta-AILabs/Nandi-Mini-150M

5,642 downloads 98 likes 98 trending

Open Source 2026-04-01

Nandi-Mini-150M is a 150M parameter model supporting 11 Indian languages (Hindi, Tamil, Telugu, Kannada, etc.) — notable for low-resource multilingual coverage at tiny scale.

arcee-ai/Trinity-Large-Thinking

7,107 downloads 106 likes 106 trending

Open Source 2026-04-01

Arcee AI's Trinity-Large-Thinking is a multilingual MoE model with reasoning, tool-calling, and agentic capabilities — targets enterprise agentic workflows with broad language support.

nvidia/Nemotron-Cascade-2-30B-A3B

159,371 downloads 457 likes 50 trending

Custom License 2026-03-18

NVIDIA's Nemotron-Cascade-2 is a 30B MoE model (3B active) with SFT+RL training for reasoning and general tasks — hybrid architecture with 159K downloads showing solid adoption.

openai/gpt-oss-120b

3,753,149 downloads 4,649 likes 25 trending

Open Source 2025-08-04

OpenAI's open-source 120B model continues high download volume (3.75M) — established release, no new developments this week.

Model Buzz

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents

hackernews 8/10 2026-04-06

Claude Code Found a Linux Vulnerability Hidden for 23 Years

hackernews 8/10 2026-04-06

Claude Code's source code has been leaked via a map file in their NPM registry

hackernews 8/10 2026-04-06