AI Quick Bites

2% of ICML papers desk rejected because the authors used LLM in their reviews

6/10

ICML officially desk-rejected 2% of submitted papers because authors used LLMs to write peer reviews, violating conference policy—a concrete enforcement action with implications for academic integrity in AI research. Sets a precedent for how top venues will police LLM misuse in the review process.

hackernews 2026-03-23 5 min

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

ESA competition where 200+ teams hunted backdoor triggers in deep forecasting models for spacecraft telemetry — novel application of trojan detection to time-series safety-critical systems with public benchmark and competition solutions released.

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

Demonstrates that LLM chain-of-thought faithfulness scores are classifier-dependent to a degree that reverses model rankings — three classifiers on identical data produce 69.7% to 82.6% faithfulness rates with non-overlapping confidence intervals, undermining cross-study comparisons.

Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning

Proposes a two-stage multi-modal contrastive learning framework that transfers knowledge from text vulnerability descriptions to network payload classification, reducing shortcut learning in cybersecurity ML models. Releases a synthetic CVE/payload benchmark; addresses a real generalization gap in production security ML.

I built a runtime guardrail that stops AI agents from doing dumb things

MoltGuard is a runtime guardrail tool that intercepts and blocks dangerous tool calls from AI agents before execution, claiming 16K+ downloads; concept is sound but technical depth in the post is limited.

hackernews 2026-03-23 3 min

Are developers trusting AI-generated code too much?

Developer built a proxy to scan AI-generated code for hardcoded secrets, unsafe patterns, and prompt injection hidden in comments — addresses a real and underappreciated attack surface in AI-assisted coding workflows.

hackernews 2026-03-23 3 min

I built an AI agent after the OpenClaw mess — zero permissions by default, runs free on Ollama

Developer built a zero-permissions-by-default AI agent running on Ollama in response to OpenClaw's CVSS 8.8 RCE vulnerability and 30k+ exposed instances — addresses real security concerns in agentic systems but lacks deep technical detail.

reddit 2026-03-23 4 min

FSF statement on copyright infringement lawsuit Bartz v. Anthropic

FSF statement on the Bartz v. Anthropic copyright settlement — important for understanding how AI training data licensing disputes are being resolved and implications for open-source AI development.

hackernews 2026-03-23 6 min

Cross-Model Void Convergence: GPT-5.2 and Claude Opus 4.6 Deterministic Silence

Research paper on 'Cross-Model Void Convergence' — a phenomenon where GPT-5.2 and Claude Opus 4.6 produce deterministic silence (refusal/null output) under specific conditions. If methodologically sound, this is an interesting behavioral alignment/safety finding across frontier models.

hackernews 2026-03-23 20 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative AI relevance score (0–10 per item) across all sources.

Top Authors

#1

webml-community

2 items · avg 5.5/10

Nemotron 3 Nano WebGPU

11.0

#2

r3gm

2 items · avg 5.0/10

Wan2.2 14B Preview

10.0

#3

prithivMLmods

2 items · avg 4.5/10

FireRed Image Edit 1.0 Fast

9.0

#4

Hui Zhong

2 items · avg 3.5/10

Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

7.0

#5

Eric A. Moreno

1 item · avg 7.0/10

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

7.0

#6

Sebastian Gerard

2 items · avg 3.5/10

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

7.0

Top Organizations

#1

alainnothere

1 item · avg 8.0/10

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

8.0

#2

ChromeDevTools

1 item · avg 7.0/10

ChromeDevTools/chrome-devtools-mcp

7.0

#3

dimensionalOS

1 item · avg 7.0/10

dimensionalOS/dimos

7.0

#4

htdt

1 item · avg 7.0/10

Show HN: Claude Code skills that build complete Godot games

7.0

#5

langchain-ai

1 item · avg 7.0/10

langchain-ai/deepagents

7.0

#6

openai

1 item · avg 7.0/10

openai/codex

7.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

LLM Sycophancy Shield

A middleware layer or evaluation harness that detects and flags when LLMs are likely to reverse factually grounded answers under user pressure. Research shows that even richer in-context evidence fails to prevent sycophantic reversals, so this tool would monitor conversation turns for capitulation patterns and alert users or downstream systems. Build it as a lightweight API wrapper or browser extension that scores each model response for evidence-grounding consistency.

Enterprise chatbots where factual accuracy is critical (legal, medical, finance) AI tutoring systems where students might pressure the model toward wrong answers Automated fact-checking pipelines LLM evaluation and red-teaming workflows

https://arxiv.org/abs/2603.20162v1 https://arxiv.org/abs/2603.20172v1

Agentic Science Autopilot

A domain-configurable agentic framework that lets researchers run full analysis pipelines — data ingestion, statistical inference, visualization, and draft generation — with minimal scaffolding, inspired by Claude Code autonomously executing high-energy physics workflows. The key insight is a 'Just Furnish Context' pattern: give the agent rich domain context upfront and let it self-direct. Build it as a configurable template library covering common scientific domains (genomics, climate, economics) with sandboxed execution environments.

Academic research labs needing to accelerate exploratory data analysis Pharmaceutical and biotech companies running repetitive assay pipelines Government agencies processing large open datasets Science journalism and policy research requiring rapid evidence synthesis

https://arxiv.org/abs/2603.20179v1 https://arxiv.org/abs/2603.20132v1

Trojan Scan for Time-Series

A commercial security auditing tool that detects backdoor triggers in deep learning models trained on time-series data, targeting industries like aerospace, energy, and finance where forecasting models are safety-critical. The ESA competition demonstrated that 200+ teams could hunt trojans in spacecraft telemetry models, validating demand for systematic tooling. Build a SaaS platform where customers upload their trained forecasting models and receive a backdoor risk report with trigger candidates and mitigation recommendations.

Spacecraft and satellite telemetry monitoring systems Industrial IoT predictive maintenance models Algorithmic trading and financial forecasting models Power grid and critical infrastructure anomaly detection

https://arxiv.org/abs/2603.20108v1 https://arxiv.org/abs/2603.20181v1

Dialogue-Aware Reasoning Bench

A benchmarking and fine-tuning dataset toolkit that specifically tests and improves LLM reasoning when tasks are embedded inside multi-turn task-oriented dialogues, addressing the documented performance gap versus isolated reasoning settings. Current benchmarks overestimate real-world capability because they test reasoning in isolation, not mid-conversation. Build a dataset generator that wraps standard reasoning tasks (math, logic, QA) inside realistic dialogue scaffolds, plus a fine-tuning recipe to close the gap.

Customer service AI that must reason while managing conversation context AI coding assistants handling multi-turn debugging sessions Healthcare triage chatbots requiring accurate reasoning under conversational pressure LLM provider evaluation and model selection tooling

https://arxiv.org/abs/2603.20133v1 https://arxiv.org/abs/2603.20101v1

Smart Video Seek API

A developer API that brings efficient long-video understanding to any application by intelligently seeking answer-critical frames rather than densely sampling, achieving dramatically lower token costs with higher accuracy. Inspired by VideoSeek's 93% frame reduction with a 10-point accuracy gain, this product wraps the seek logic into a simple endpoint: send a video URL and a query, get back a grounded answer with timestamped evidence. Monetize on a per-query basis targeting media, surveillance, and e-learning platforms.

E-learning platforms enabling natural language search inside lecture recordings Legal and compliance teams reviewing hours of meeting or deposition footage Sports analytics extracting specific play moments from game footage Security and surveillance systems querying long camera recordings

https://arxiv.org/abs/2603.20185v1 https://arxiv.org/abs/2603.20180v1

Product Hunt Weekly

Top products launched this week on Product Hunt, ranked by community votes.

#1

Zoer.ai

Build full-stack webapps from the database up

Productivity Website Builder Vibe coding

188

26

https://www.producthunt.com/r/DGY2W...

#2

Tobira.ai

A network where AI agents find deals for their humans

Productivity Developer Tools Artificial Intelligence

166

28

https://www.producthunt.com/r/G5JOB...

#3

Honestly

Real reviews from Reddit & YouTube when shopping online

Browser Extensions Chrome Extensions E-Commerce

119

4

https://www.producthunt.com/r/FAH2F...

#4

Fastlane

Remix viral videos into content for your business

Marketing Social media marketing Vibe coding

120

11

https://www.producthunt.com/r/LTQWN...

#5

Iris

Send work beautifully, pinned feedback, see what they viewed

Design Tools Productivity Freelance

93

4

https://www.producthunt.com/r/3B4QR...

#6

Claude Usage Tracker

See exactly how much you spend on Claude, across every tool

Open Source Developer Tools GitHub

92

3

https://www.producthunt.com/r/FPUNW...

#7

Pause.do

Interrupt scrolling, tab overload, and AI autopilot

Chrome Extensions Productivity Artificial Intelligence

91

2

https://www.producthunt.com/r/VCCW5...

#8

Nomie

AI wellness app that turns doomscrolling into self‑care

Health & Fitness Productivity Artificial Intelligence

88

1

https://www.producthunt.com/r/3RIZW...

#9

AlphaClaw Apex

OpenClaw harness and fleet manager for Mac

Productivity Open Source Artificial Intelligence

85

1

https://www.producthunt.com/r/VTELX...

#10

WeixinClawBot

The official WeChat pipeline for OpenClaw

Messaging Artificial Intelligence

83

2

https://www.producthunt.com/r/E4TIK...

View full leaderboard on Product Hunt

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending (weekly) and TrendShift, enriched with commit velocity and contributor activity. Stars = total GitHub stars. "Stars this week" = new stars gained.

1

ChromeDevTools/chrome-devtools-mcp

typescript 30,942 1,831 1,717 stars this week

Official Chrome DevTools MCP (Model Context Protocol) server enabling coding agents to inspect, debug, and interact with browser state directly; significant for browser-based agent workflows and agentic web automation.

A SaaS QA automation platform that uses AI agents to autonomously detect, reproduce, and diagnose frontend bugs by connecting to live browser sessions via the Chrome DevTools MCP server.

2

python 2,135 325 995 stars this week

dimensionalOS/dimos

Agentic OS for physical robotics platforms (humanoids, quadrupeds, drones) enabling natural language programming and multi-agent coordination with hardware I/O — gaining rapid traction with ~1K stars/week and represents a meaningful step toward accessible robotics AI.

A robotics-as-a-service platform for warehouse and logistics operators that lets non-engineers program and coordinate fleets of robots using plain English commands built on top of dimos.

3

python 16,908 2,398 5,498 stars this week

langchain-ai/deepagents

LangChain's deep agent harness with planning tools, filesystem backend, and subagent spawning — 5.5K stars this week signals strong traction; represents LangChain's answer to complex multi-step agentic task execution.

A managed agentic workflow service for enterprises that automates complex, multi-step back-office tasks — like financial reconciliation or compliance reporting — using deepagents' planning and subagent orchestration.

4

rust 67,025 8,962 1,578 stars this week

openai/codex

OpenAI's lightweight terminal-based coding agent with 67K stars — the official open-source CLI coding agent from OpenAI, directly competing with Claude Code and Gemini CLI in the agentic coding space.

A developer productivity tool for software agencies that wraps OpenAI Codex CLI into a team-shared terminal environment with audit logs, role-based permissions, and billing controls for client project work.

5

typescript 18,221 1,554 1,941 stars this week

promptfoo/promptfoo

Comprehensive LLM testing and red-teaming framework supporting prompt evaluation, vulnerability scanning, and CI/CD integration across GPT, Claude, Gemini, and Llama — 18K+ stars and 1.9K new this week makes it the leading open-source tool for systematic LLM security testing.

An LLM compliance and security auditing service for regulated industries (finance, healthcare) that continuously red-teams customers' AI applications using promptfoo and delivers certified vulnerability reports.

6

python 57,658 4,859 3,564 stars this week

unslothai/unsloth

Unsloth now includes a web UI (Unsloth Studio) for training and running open models like Qwen, DeepSeek, and Gemma locally with significant memory/speed optimizations. 57k+ stars and 3.5k new stars this week reflects continued dominance in efficient local fine-tuning.

A no-code fine-tuning platform for SMBs that lets businesses upload their proprietary data and produce custom, locally-deployable open-source models using Unsloth's memory-efficient training backend.

7

python 3,667 609 514 stars this week

vllm-project/vllm-omni

Official vLLM extension for omni-modality model inference (text, vision, audio in one framework), extending vLLM's high-throughput serving to multimodal models. Significant for production deployment of models like Gemini-style omni architectures.

A multimodal AI inference API service targeting media and e-commerce companies that need high-throughput processing of mixed text, image, and audio inputs — such as automated product cataloging or content moderation.

8

python 18,206 1,243 6,297 stars this week

volcengine/OpenViking

ByteDance's Volcengine open-sources OpenViking, a context database for AI agents that unifies memory, resources, and skills through a file-system paradigm with hierarchical context delivery and self-evolution. 6.3k stars this week suggests significant interest in structured agent context management.

A persistent agent memory management SaaS that gives enterprise AI assistants long-term, structured context about company knowledge, past interactions, and workflows using OpenViking's hierarchical context database.

9

NousResearch/hermes-agent

python 10,706 1,333 2,665 stars this week

NousResearch's open-source agent framework built around their Hermes model series, gaining significant traction (2,665 stars this week); worth watching as a capable open-weight agent stack.

A privacy-first AI assistant platform for law firms and consultancies that deploys fully on-premise using the open-weight Hermes agent stack, ensuring sensitive client data never leaves the organization.

10

typescript 13,500 1,027 4,586 stars this week

alibaba/page-agent

Alibaba's JavaScript in-page GUI agent enabling natural language control of web interfaces directly in the browser; strong traction (4,586 stars this week) and practical for web automation use cases.

A browser extension product for enterprise users that lets non-technical employees automate repetitive web-based workflows — like data entry across SaaS tools — using natural language instructions powered by page-agent.

Trending Developers

Developers gaining traction on GitHub this week — shipping open-source AI tools, models, and frameworks worth following. Ranked by weekly trending position.

1

Sebastian Raschka

@rasbt

rasbt/LLMs-from-scratch

Sebastian Raschka's GitHub profile, best known for the LLMs-from-scratch repo — a widely-used educational resource for building ChatGPT-style LLMs in PyTorch. Profile listing, not a new release.

2

comfyanonymous

@comfyanonymous

comfyanonymous/ComfyUI_examples

ComfyUI workflow examples repository from the creator of ComfyUI — useful reference for node-based diffusion pipelines but primarily example content rather than new tooling.

3

Jarrod Watts

@jarrodwatts

jarrodwatts/claude-hud

Claude Code HUD plugin surfacing real-time context usage, active tools, running agents, and todo progress — useful observability layer for Claude Code power users.

4

Matt Van Horn

@mvanhorn

mvanhorn/last30days-skill

AI agent skill that aggregates and synthesizes information from Reddit, X, YouTube, HN, Polymarket, and the web — useful multi-source research agent pattern.

Workspace-first multi-agent coordination platform for AI development with shared state — one of several emerging multi-agent orchestration frameworks targeting software development workflows.

CLI tool bridging Google's Stitch AI-generated UI designs into developer workflows via MCP — early-stage integration tool with limited documentation.

Universal CLI hub that wraps websites and apps into a command-line AI-native runtime — interesting concept but early-stage.

Design language specification aimed at improving AI-generated UI quality — interesting prompt engineering angle for design systems but early-stage.

cryppadotta/scryfall-mcp

MCP server wrapping the Scryfall Magic: The Gathering API — niche hobby project with minimal broader AI relevance.

11

Lawrence Chen

@lawrencecchen

lawrencecchen/awesome-libghostty

Curated list of libghostty projects — not AI-relevant.

Fazm Desktop macOS app — insufficient information to assess AI relevance.

13

qixing-jk

@qixing-jk

qixing-jk/all-api-hub

API relay manager for managing multiple AI API accounts with balance tracking and key export. Utility tool with no novel AI research value.

dreamhunter2333/cloudflare_temp_email

Cloudflare-based temporary email service — not AI-related.

CLI client for X API v2 — not AI-related.

Go compression library — not AI-related.

Papermark open-source DocSend alternative — not AI-related.

21

Alireza Rezvani

@alirezarezvani

alirezarezvani/claude-skills

+192 Claude Code skills & agent plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, pr…

22

Michael Ramos

@backnotprop

backnotprop/plannotator

Annotate and review coding agent plans and code diffs visually, share with your team, send feedback to agents with one click.

Squad: AI agent teams for any project

Models & Benchmarks

New model releases, arena rankings, and benchmark results across frontier and open-source AI models this week. Arena Elo = LMSys battle rating. Trending = HuggingFace trending score. Buzz = AI relevance (0–10).

Arena Leaderboard — Top 15

#	Model	Type	Elo	Votes
1	claude-opus-4-6-thinking Anthropic	Closed	1502	11,801
2	claude-opus-4-6 Anthropic	Closed	1501	12,546
3	gemini-3.1-pro-preview Google	Closed	1493	14,677
4	grok-4.20-beta1 xAI	Closed	1492	7,396
5	gemini-3-pro Google	Closed	1486	41,762
6	gpt-5.4-high OpenAI	Closed	1485	4,965
7	gpt-5.2-chat-latest-20260210 OpenAI	Closed	1482	10,140
8	grok-4.20-beta-0309-reasoning xAI	Closed	1481	4,504
9	gemini-3-flash Google	Closed	1475	31,060
10	claude-opus-4-5-20251101-thinking-32k Anthropic	Closed	1474	37,036
11	grok-4.1-thinking xAI	Closed	1472	43,930
12	claude-opus-4-5-20251101 Anthropic	Closed	1469	41,976
13	claude-sonnet-4-6 Anthropic	Closed	1465	9,843
14	qwen3.5-max-preview Alibaba	Closed	1464	4,252
15	gpt-5.3-chat-latest OpenAI	Closed	1464	8,942

New & Trending Models

nvidia/Nemotron-Cascade-2-30B-A3B

5,346 downloads 212 likes 212 trending

Custom License 2026-03-18

NVIDIA's Nemotron-Cascade-2 is a 30B total / 3B active MoE reasoning model with an associated arxiv paper (2603.19220), combining SFT and RL post-training. The extremely high trending score (212) and novel cascade architecture for efficient reasoning make this the standout NVIDIA release this week.

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

151,482 downloads 1,062 likes 339 trending

Open Source 2026-02-27

Qwen3.5-27B fine-tuned via knowledge distillation from Claude 4.6 Opus reasoning traces, achieving strong chain-of-thought performance at 27B scale. 151k downloads and 1k+ likes indicate significant community adoption; notable for distilling frontier closed-model reasoning into an open-weight model.

deepseek-ai/DeepSeek-V3.2

293,362 downloads 1,326 likes 23 trending

Open Source 2025-12-01

DeepSeek's V3.2 update with 293K+ downloads and 1326 likes, continuing the high-performance open-weight frontier model series. FP8 support and strong benchmark results make this a significant incremental release for practitioners running large open models.

microsoft/bitnet-b1.58-2B-4T

15,096 downloads 1,391 likes 34 trending

Open Source 2025-04-15

Microsoft's BitNet b1.58 2B model trained on 4 trillion tokens, implementing 1.58-bit quantization for extreme efficiency. Represents a meaningful step toward ultra-low-bit LLMs that can run on edge hardware with minimal memory footprint.

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

103,484 downloads 283 likes 70 trending

Custom License 2026-03-10

NVIDIA's Nemotron-3 Super 120B MoE model (120B total, 12B active) using a novel 'latent-MoE' architecture with multi-token prediction, trained on a curated multilingual dataset. Strong downloads (103K) and a new architecture variant make this a notable open-weight frontier release.

openai/gpt-oss-120b

4,549,831 downloads 4,602 likes 28 trending

Open Source 2025-08-04

OpenAI's open-source 120B model released on HuggingFace with Apache 2.0 license, 4.5M+ downloads and an arxiv paper. Significant as OpenAI's first major open-weight release at frontier scale, with vLLM support and quantization variants.

zai-org/GLM-5

136,040 downloads 1,854 likes 52 trending

Open Source 2026-02-11

GLM-5 from Zhipu AI is a new MoE-based bilingual (EN/ZH) foundation model with 136K+ downloads and 1854 likes, representing a significant open-weight release with a DSA architecture variant worth tracking against Qwen and Llama families.

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

447,707 downloads 317 likes 97 trending

Open Source 2026-02-27

GGUF-quantized version of the Claude-distilled Qwen3.5-27B reasoning model for local inference. 447k downloads makes it one of the most-downloaded variants, enabling local deployment of frontier-distilled reasoning.

MiniMaxAI/MiniMax-M2.5

492,806 downloads 1,268 likes 70 trending

Custom License 2026-02-12

MiniMax-M2.5 is a large-scale mixture-of-experts text generation model with 492k downloads and 1.2k likes. Limited public technical details but strong download traction suggests competitive performance.

Multilingual-Multimodal-NLP/IndustrialCoder

287 downloads 30 likes 30 trending

Open Source 2026-03-13

Specialized code model targeting industrial hardware description languages (Verilog, CUDA, Triton) and chip/CAD design, backed by an arXiv paper (2603.16790). Niche but high-value domain where general LLMs underperform.

Qwen/Qwen3-Coder-Next

1,232,461 downloads 1,165 likes 38 trending

Open Source 2026-01-30

Next iteration of Qwen's coding-focused model with 1.2M+ downloads and strong trending score, suggesting a significant update to the Qwen3 coder series. Limited metadata but high community traction indicates practical utility for code generation tasks.

Tesslate/OmniCoder-9B

19,168 downloads 358 likes 134 trending

Open Source 2026-03-12

Multimodal coding model fine-tuned from Qwen3.5-9B supporting image-text-to-text for agentic coding tasks, with 19K+ downloads and strong trending. Targets code generation with visual context, a useful niche for developer tooling.

nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

28,713 downloads 54 likes 54 trending

Custom License 2026-03-07

NVIDIA's Nemotron-3 Nano 4B model in BF16, part of a new Nemotron-H architecture series trained on specialized datasets including agentic, math, and competitive programming data. Compact model with strong post-training pipeline targeting edge/local deployment.

silx-ai/Quasar-10B

138 downloads 30 likes 28 trending

Open Source 2026-03-09

Quasar-10B is a linear attention model fine-tuned from Qwen3.5-9B-Base supporting up to 2M token context via GLA (Gated Linear Attention). Noteworthy for the extreme context length capability at 10B scale, though from a lesser-known lab.

Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

130,983 downloads 137 likes 39 trending

Open Source 2026-03-03

Smaller 9B GGUF variant of the Claude-distilled reasoning model for resource-constrained local inference. Complements the 27B version for edge deployment.

Model Buzz

Mistral AI Releases Forge

hackernews 8/10 2026-03-23

GPT‑5.4 Mini and Nano

hackernews 8/10 2026-03-23

OpenCode – Open source AI coding agent

hackernews 8/10 2026-03-23

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

arxiv 7/10 2026-03-23

Anthropic's Hidden Vercel Competitor "Antspace"

hackernews 7/10 2026-03-23

Show HN: Claude Code skills that build complete Godot games

hackernews 7/10 2026-03-23

openai/codex

promptfoo/promptfoo

unslothai/unsloth

vllm-project/vllm-omni