AI Quick Bites

Open-source tool for removing safety fine-tuning from open-weight LLMs, sparking substantial HN discussion (83 comments) about model alignment and the fragility of RLHF-based safety measures. Directly relevant to AI safety research and the open-weight model risk surface.

hackernews 2026-03-09 5 min

Show HN: Golf Scanner – OSS tool to find and audit every MCP server

Open-source Go binary that discovers all MCP servers configured across IDEs and runs security audits — addressing a real emerging risk as engineers routinely connect AI agents to production systems without vetting MCP server permissions.

hackernews 2026-03-09 5 min

OBLITERATUS

OBLITERATUS by 'pliny-the-prompter' is a one-click model jailbreak/liberation tool and chat playground — high trending score (153) and AGPL license. Directly relevant to LLM safety red-teaming research.

huggingface_spaces 2026-03-09 3 min

THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics

THEMIS is a new multimodal benchmark for evaluating MLLMs on detecting scientific paper fraud (image manipulation, data fabrication) in real-world academic scenarios. Fills a genuine gap in evaluation tooling for integrity detection.

conferences 2026-03-09 20 min

Heads up: prompt injection payload targeting OpenClaw agents circulating in the wild

Real-world prompt injection payload targeting OpenClaw agents circulating in the wild, disguised as a post-context-compaction audit message to trick agents into reading attacker-controlled files. Concrete in-the-wild example of indirect prompt injection exploiting agent memory/tool-use patterns.

reddit 2026-03-09 3 min

The L in "LLM" Stands for Lying

High-engagement blog post (472 HN comments) arguing that LLM hallucination is structural rather than a fixable bug — the model generates plausible text rather than grounded truth. Useful framing for practitioners setting user expectations, though not novel research.

hackernews 2026-03-09 8 min

Anthropic Cowork feature creates 10GB VM bundle on macOS without warning

Claude Code's new Cowork feature silently downloads and installs a ~10GB VM bundle on macOS without user consent, raising significant concerns about transparency and permission models in agentic dev tools. 186 HN comments signal strong community concern.

hackernews 2026-03-09 5 min

steerling-8b

Steerling-8B is a novel causal diffusion LM with interpretability-first design, featuring concept-steering and masked diffusion architecture — interesting for alignment and interpretability research, though low downloads suggest early-stage.

huggingface_models 2026-03-09 4 min

Meta’s AI smart glasses and data privacy concerns

Investigation reveals Meta's Ray-Ban smart glasses workers have broad access to user video/audio data used for AI training, raising serious surveillance and data privacy concerns with real-world AI deployment implications.

hackernews 2026-03-09 6 min

This Github repo can permanently removes LLM censorship in 45 minutes. It's called Heretic. No

Tweet about 'Heretic,' an open-source tool claiming to permanently remove refusal behaviors from local LLMs (Llama, Qwen, Gemma) via fine-tuning in 45 minutes — noteworthy as an alignment/safety concern and jailbreak-adjacent technique that bypasses prompt-level defenses entirely.

twitter 2026-03-09 2 min

BREAKING: researchers planted a single bad actor inside a group of LLM agents.

Research finding that a single malicious LLM agent embedded in a multi-agent network can prevent consensus — important adversarial robustness result for multi-agent system designers, though the tweet is thin on methodology.

twitter 2026-03-09 1 min

When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models

5/10

Discovers 'Backdoor Modality Collapse' in multimodal diffusion models — multi-modal attacks degenerate to single-modality dominance with negligible cross-modal interaction, revealing that high attack success rates mask fundamental reliance on a subset of modalities. Introduces TMA and CTI metrics to quantify this behavior.

arxiv 2026-03-09 18 min

Top Contributors

Authors and organizations making the biggest impact this week, ranked by cumulative relevance score across all sources.

Top Authors

#1

prithivMLmods

2 items · avg 112.5/10

FireRed Image Edit 1.0 Fast

225.0

#2

FrameAI4687

1 item · avg 182.0/10

Omni Video Factory

182.0

#3

pliny-the-prompter

1 item · avg 153.0/10

OBLITERATUS

153.0

#4

r3gm

1 item · avg 138.0/10

Wan2.2 14B Preview

138.0

#5

multimodalart

2 items · avg 66.5/10

Qwen Image Multiple Angles 3D Camera

133.0

#6

HuggingFaceFW

1 item · avg 99.0/10

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

99.0

#7

mrfakename

1 item · avg 88.0/10

Z Image Turbo

88.0

#8

HuggingFaceM4

1 item · avg 86.0/10

faster-qwen3-tts

86.0

#9

microsoft

1 item · avg 59.0/10

TRELLIS.2

59.0

#10

selfit-camera

1 item · avg 55.0/10

Omni Image Editor

55.0

Top Organizations

#1

openclaw

1 item · avg 358025.0/10

openclaw/openclaw

358025.0

#2

anthropics

3 items · avg 70655.4/10

anthropics/skills

211966.3

#3

f

1 item · avg 195658.0/10

f/prompts.chat

195658.0

#4

shadcn-ui

1 item · avg 141185.0/10

shadcn-ui/ui

141185.0

#5

microsoft

1 item · avg 117539.3/10

microsoft/markitdown

117539.3

#6

openai

2 items · avg 47749.2/10

openai/codex

95498.4

#7

toeverything

1 item · avg 85119.9/10

toeverything/AFFiNE

85119.9

#8

affaan-m

1 item · avg 83857.0/10

affaan-m/everything-claude-code

83857.0

#9

ruvnet

4 items · avg 17993.1/10

ruvnet/RuView

71972.2

#10

karpathy

2 items · avg 34715.0/10

karpathy/nanochat

69430.0

Build Ideas

Actionable product ideas distilled from this week's highest-scoring research and discussions. Each includes specific use cases and the source material that inspired it.

LLM Acceptance Criteria Coach

A developer tool that prompts users to define explicit acceptance criteria, test cases, and edge cases before submitting any code-generation request to an LLM. The tool enforces a TDD-style discipline by blocking prompt submission until minimal criteria are set, then automatically validates LLM output against those criteria and flags hallucinated or untestable claims. This directly addresses the structural reliability gap that makes LLM-generated code unreliable in production.

IDE plugins for VS Code and JetBrains CI/CD pipeline LLM code review gates Enterprise AI coding assistants with audit trails LLM-assisted test suite generation

https://blog.katanaquant.com/p/your-llm-... https://acko.net/blog/the-l-in-llm-stand...

Real-Time Streaming Voice

A production-ready streaming TTS middleware layer that wraps any LLM-based text-to-speech pipeline with prosodic boundary detection, enabling low-latency, natural-sounding synthesis from streaming text input. Using the boundary-aware early stopping technique from recent research, it prevents mid-word cuts and unnatural pauses that plague current real-time voice AI systems. This is a drop-in SDK for developers building voice agents, meeting bots, or read-aloud features.

Voice AI agents and call center bots Real-time document and article read-aloud Live captioning and audio narration tools Multilingual voice interfaces for LLM apps

https://arxiv.org/abs/2603.06444v1

Robot Demo Pairing Studio

A data management and annotation platform for robotics teams that structures paired cross-embodiment demonstrations, making it easy to record, align, and export 'data analogies' between different robot morphologies. The 22.5% transfer improvement from paired data over large unpaired datasets means most teams are leaving performance on the table due to poor data organization. This tool closes the gap between raw demonstration recordings and policy-ready training sets.

Cross-embodiment policy transfer for warehouse robots Generalist robot policy training pipelines Academic robotics lab data management Robot simulation-to-real transfer workflows

https://arxiv.org/abs/2603.06450v1

Inference Behavior Steering API

A lightweight inference-time activation steering API built on the COLD-Steer technique that lets developers steer LLM behavior — tone, persona, factual focus, safety constraints — using only a handful of in-context examples rather than fine-tuning. With 95% steering effectiveness at 50x lower sample cost than baselines, this makes behavior customization accessible without model ownership or GPU budgets. Offered as a middleware layer that wraps any open or proprietary LLM endpoint.

Enterprise LLM persona and brand voice control Safety constraint enforcement at inference time Domain-specific assistant behavior tuning A/B testing LLM output styles in production

https://arxiv.org/abs/2603.06495v1

Multimodal Backdoor Auditor

A security auditing tool for teams deploying multimodal diffusion models that automatically detects 'Backdoor Modality Collapse' — where adversarial attacks cause the model to silently rely on a single modality while appearing to use all inputs. Using the TMA and CTI diagnostic metrics introduced in recent research, the tool surfaces hidden modality dependencies and attack vulnerabilities before production deployment. This fills a critical gap in MLSecOps tooling for image-text and video-text pipelines.

Pre-deployment red-teaming for multimodal AI products Compliance audits for AI systems in regulated industries Research reproducibility checks on multimodal model claims Continuous monitoring of fine-tuned diffusion models

https://arxiv.org/abs/2603.06508v1

Trending Repos

Repositories gaining serious momentum this week — sourced from GitHub Trending and TrendShift, enriched with commit velocity and contributor activity.

1

typescript 32,802 3,267 6,900 stars this week

KeygraphHQ/shannon

Shannon Lite is a fully autonomous AI pentester for web apps and APIs, achieving 96.15% (100/104 exploits) on a hint-free variant of the XBOW benchmark — the strongest published result on this benchmark. Represents a significant capability milestone for autonomous offensive AI security agents.

A continuous web application security service that automatically runs autonomous penetration tests on staging environments before each deployment, delivering detailed exploit reports and remediation guidance without requiring human security researchers.

2

TrendShift

karpathy/autoresearch

Python 8,700 1,200

Karpathy's AI agent system that autonomously runs ML research experiments on single-GPU nanochat training setups. Represents a concrete step toward self-directed AI research loops — high signal given the author's track record.

A cloud platform where ML teams submit research hypotheses and receive fully automated experiment results, ablation studies, and model comparisons — turning a single GPU overnight into a junior researcher's week of work.

3

rust 64,048 8,538 1,437 stars this week

openai/codex

OpenAI's official lightweight terminal-based coding agent written in Rust, accumulating 64K stars rapidly. Signals OpenAI's move to productize agentic coding workflows natively in the terminal, competing directly with Claude Code.

A managed enterprise terminal coding agent service that integrates with corporate codebases via SSO and VPN, providing audit logs, usage controls, and compliance guardrails on top of agentic coding workflows for regulated industries.

4

python 7,586 985 632 stars this week

LMCache/LMCache

High-performance KV cache layer for LLM inference that decouples cache from compute, enabling cross-instance cache sharing and significant latency/cost reduction. Gaining solid momentum (632 stars/week) as a production-grade inference optimization tool.

A drop-in LLM inference optimization layer sold to AI-native companies that dramatically cuts their GPU costs by intelligently sharing and persisting KV caches across inference instances, offered as a managed service with a cost-savings guarantee.

5

python 15,240 1,462 1,735 stars this week

QwenLM/Qwen-Agent

Official agent framework from Alibaba's Qwen team supporting Qwen 3.0+, featuring MCP protocol integration, function calling, code interpreter, and RAG. Strong weekly growth (1,735 stars) signals growing adoption of Qwen as an agent backbone.

A no-code enterprise AI agent builder that lets operations teams deploy Qwen-backed agents with pre-built connectors to internal tools like Salesforce, Jira, and Confluence, without writing a single line of code.

6

python 7,058 520 3,959 stars this week

alibaba/OpenSandbox

Alibaba's general-purpose sandboxed execution platform for AI agents, supporting multi-language SDKs, Docker/Kubernetes runtimes, and scenarios including coding agents, GUI agents, and RL training. Notable breakout week (3,959 stars) for a production-grade agent execution infrastructure.

A sandboxed AI agent execution cloud that lets developers deploy untrusted or experimental agents in fully isolated, billable runtime environments with usage metering, making it safe and economical to run third-party AI agents in production.

7

TrendShift

anthropics/claude-code

Shell 74,900 6,000

Anthropic's official agentic coding CLI dominating GitHub trends with 74,900 stars and a surrounding ecosystem explosion this week. The anchor project driving most of the claude-skills/OpenClaw activity in this digest.

A SaaS layer on top of Claude Code that provides team-wide session management, shared context, cost allocation by developer or project, and compliance logging for enterprises adopting agentic coding at scale.

8

python 87,793 9,307 7,152 stars this week

anthropics/skills

Anthropic's official public repository for Agent Skills — the canonical source for the claude-skills/OpenClaw ecosystem that has spawned dozens of derivative repos this week. 87,793 stars and 7,152 new this week make it the most-watched AI dev tools repo in this cycle.

A marketplace where developers publish, monetize, and discover verified Claude Agent Skills — earning revenue-share each time their skill is invoked by other users' Claude Code workflows.

9

python 26,391 3,117 3,150 stars this week

bytedance/deer-flow

ByteDance's open-source SuperAgent framework with 26k+ stars that orchestrates research, coding, and content creation via sandboxed subagents with memory and tool use. Gaining serious traction as a multi-agent harness for complex long-horizon tasks.

A research and competitive intelligence SaaS that deploys DeerFlow-based multi-agent pipelines to continuously monitor industries, synthesize findings from the web, and deliver structured briefings to executive teams on a scheduled basis.

10

python 4,570 381 969 stars this week

inclusionAI/AReaL

Fast reinforcement learning framework for LLM reasoning and agent training, emphasizing simplicity and flexibility. With nearly 1k stars gained this week, it's gaining traction as a practical alternative to complex RL pipelines for post-training LLMs.