🕵️

Undercover Mode — Stealth Attribution

Auto-strips AI identity when contributing to public repos

P2Security

Summary

~90 lines in undercover.ts. Auto-activated for Anthropic employees (USER_TYPE === 'ant') on non-internal repos. Strips Co-Authored-By attribution, forbids mentioning internal details, prevents references to unreleased models in commits.

Technical Details

System prompt injected: "You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Your commit messages MUST NOT contain ANY Anthropic-internal information. Do not blow your cover." Forbidden strings include: Capybara, Tengu, Opus 4.7, Sonnet 4.8, Claude Code, internal repo names, Slack channels, internal short links. Dead-code-eliminated in external builds — regular users never encounter it. But existence raises questions about AI companies contributing anonymously to open-source. The irony: they built Undercover Mode to prevent leaks, then leaked the entire source code.

Implementation Pattern

TypeScript (conceptual)

// Undercover mode activation (conceptual)
const FORBIDDEN_STRINGS = [
  'Capybara', 'Tengu', 'Opus 4.7', 'Sonnet 4.8',
  'Claude Code', /* internal repo names, Slack channels */
];

function shouldActivate(user: User, repo: Repository): boolean {
  return user.type === 'ant' && !repo.isInternal;
}

function injectUndercoverPrompt(): string {
  return `You are operating UNDERCOVER in a PUBLIC repository.
Your commits MUST NOT contain Anthropic-internal information.
Do not blow your cover.`;
}

Architecture Insight

The irony is instructive: they built a system to prevent information leaks, then leaked the system itself. This validates the V5.3 principle: standards should be hooks (automated), not hope (manual).

Official / Public Basis

~90 lines in undercover.ts found in source. Dead-code-eliminated in external builds. Triggered only for Anthropic employees (USER_TYPE === 'ant').

Governance Concerns

Raises fundamental questions about AI transparency: should companies disclose when AI contributes to open-source? LightHope policy recommendation: always disclose AI involvement.

LightHope Ecosystem Mapping

LightHope — transparency policy decision (always disclose AI involvement vs. configurable), attribution standards for AI-generated content, internal vs. public contribution governance

Related Discoveries

🛡️Anti-Distillation — Training Data Poisoning 🏷️Internal Model Codenames 🚩44 Feature Flags — Hidden Roadmap

← 44 Feature Flags — Hidden Roadmap Anti-Distillation — Training Data Poisoning →