Chapter 11.9

Insider Threat & Human-Layer Security

Insider threat is the one attack vector that runs through almost every other one, which is why it is the dominant gap keeping frontier programs at RAND Security Level 2 — and why the path to SL4-5 is bought with human-layer controls and organizational friction, not more cryptography.

GOODPUT

What you'll decide here

How far up the two-person-rule ladder you go for weight-bearing operations — advisory logging, mandatory dual-approval, or hard cryptographic m-of-n — and what that costs you in research velocity and on-call latency.
Where the egress choke point lives and how tight you set it: who can move weights off the secure enclave at all, through which inspected path, at what rate cap — and the legitimate bulk-distribution workflows that policy will break.
Whether you compartmentalize weight access (need-to-know cells, separation of duties between the people who train, deploy, and audit) or run a flat-trust research org — and the talent and collaboration cost of the former.
What personnel-security depth your threat model justifies: background screening tier, re-investigation cadence, continuous-evaluation feeds, and the legal/privacy ceiling on behavioral monitoring in each jurisdiction you operate.
How offboarding and vendor/contractor access are engineered as same-day, automatic, and provable — because the gap between termination intent and access revocation is the single most reliably-exploited insider window.

Every chapter before this one hardened a boundary against an outsider: a fence against a drone (Chapter 11.2), a root of trust against a firmware implant (Chapter 11.4), a TEE against a privileged host (Chapter 11.5), a microsegment against lateral movement (Chapter 11.7), envelope encryption against a stolen disk (Chapter 11.8). The insider is the actor those controls were not built to stop, because the insider is inside the boundary by design — holding a valid badge, a valid credential, a legitimate reason to touch the asset, and often the very privileges the control plane grants to keep the cluster running. This is why RAND's analysis treats insider threat not as one of its 38 attack vectors but as a property that runs through most of them: the human with access can socially engineer, can exfiltrate, can plant, can disable, can simply copy. It is the reason the consensus assessment puts frontier labs at roughly Security Level 2 — defending against opportunistic outsiders and the casual insider, but not the determined one — and the reason the climb to SL4-5 is dominated by human-layer controls rather than more silicon.

This chapter applies the security calculus to people, where the question is not "secure vs insecure" but how much friction you impose on your own researchers and operators, and what that friction costs in the currency this guide cares about — goodput. Two-person rules slow every weight-bearing action. Compartmentalization fractures the collaboration that frontier research runs on. Behavioral monitoring collides with privacy law and with the trust that retains talent. Every control in this chapter buys down insider risk by spending something the business values, and the engineering discipline is to spend it where the threat model actually justifies it and nowhere else.

Why the insider is the dominant unaddressed vector

Start with the base rates, because they are worse than intuition suggests and they are getting worse. Across the general enterprise population, the 2025 Ponemon Cost of Insider Risks study — the largest the institute has run — puts the average annual cost of insider incidents at $17.4M, with negligent insiders driving ~55% of incidents, malicious insiders ~25%, and credential theft ~20% but the costliest per event at ~$779,797 (Ponemon / DTEX, 2025). Verizon's 2025 DBIR, across 12,195 confirmed breaches, finds the human element present in ~60% of breaches and that the leading motive for deliberate misuse is now convenience (60%) — the researcher emailing a file to a personal account to work from home — ahead of financial gain (Verizon DBIR, 2025). And the dwell time is the part that should frighten an AI operator: the average insider incident took 81 days to detect and contain in 2025; an exfiltration of weights does not need 81 days, it needs the minutes it takes to copy a checkpoint.

Now apply the AI-data-center multiplier. The asset is not a customer database that loses value the moment it leaks and can be partially mitigated after the fact; it is a frontier model weight file — a single artifact, copyable in full, whose theft is permanent, whose value RAND treats as a national-security concern, and which an insider with legitimate access can carry out in one act. The standard enterprise insider playbook (DLP on email, a UEBA license, an annual training module) was calibrated for a world where the worst insider day costs low-seven-figures and is recoverable. The frontier-weights world inverts that: the worst insider day is a one-way door, and the people best positioned to open it are precisely the high-trust, high-access researchers and infrastructure engineers you cannot operate without.

You cannot encrypt your way past someone authorized to decrypt

RAND's framework assigns a Security Level by whether the program can stop an adversary trying to steal weights in under two months. The reason almost no operator is honestly above SL2-SL3 is not that their cryptography is weak — modern envelope encryption and GPU confidential computing are genuinely strong (Chapter 11.5, Chapter 11.8). It is that the insider holds the keys those controls were built to protect. You cannot encrypt your way past a person who is authorized to decrypt. Past SL3, every additional level is overwhelmingly bought with human-layer controls — two-person rules, compartmentalization, continuous personnel evaluation, hardened egress — each of which taxes the research it protects. That tradeoff, not the math, is the real frontier of weights security.

Personnel security: the screening-depth decision

The first fork is upstream of any technical control: who do you let near the asset at all, and how confident are you in them over time? Personnel security is a ladder, and each rung trades cost, hiring friction, and legal exposure against assurance. The naive enterprise default — a one-time pre-employment background check, then implicit lifetime trust — is the floor, and for frontier weights it is well below the threat model. A determined insider is rarely a bad hire on day one; they are a good hire whose circumstances, loyalties, or coercion exposure change in year three. A point-in-time check cannot see that. Continuous evaluation — periodic re-investigation, plus automated feeds (financial-distress signals, criminal records, in some regimes foreign-travel and contact reporting) — is the control that addresses the time dimension, and it is also the control that most directly collides with privacy law and with the culture of a research organization that recruited on autonomy and trust.

The consequence of climbing this ladder is not abstract. Higher screening tiers lengthen time-to-hire in a market where the scarce input is exactly the people you are screening; they shrink the candidate pool (a clearance-style requirement can exclude the majority of the global talent you would otherwise recruit); and they impose ongoing program cost and a compliance surface that varies by jurisdiction. Run a single global standard and you over-control in permissive jurisdictions and break the law in strict ones (EU/UK employment-monitoring and data-protection regimes sharply constrain what a US-style continuous-evaluation feed may even collect). The defensible answer is tiered: deep screening and continuous evaluation gated to the small population with standing weight access, lighter controls for the broad population — which is itself an argument for shrinking that high-access population in the first place.

Two-person rules and separation of duties: the friction dial

The two-person rule is the canonical insider control because it converts a single point of human failure into a conspiracy requirement: no one person can perform the sensitive action alone. The fork is how strong you make it, and the strength is a literal dial with a goodput cost at every setting. At the weak end, advisory two-person logs and notifies but does not block — it deters the casual insider and creates an audit trail, costs almost nothing, and stops a determined one not at all. In the middle, mandatory dual-approval requires a second authorized human to approve before the action proceeds: now an exfiltration or a destructive change needs two colluding insiders or one insider plus a successful social-engineering of the approver. At the strong end, cryptographic m-of-n splits the capability itself — a quorum of key-holders (Shamir secret sharing, threshold signatures, or hardware quorum on an HSM, Chapter 11.8) must each act for the weight to be decrypted, exported, or signed — so collusion must defeat the cryptography, not just the policy.

The downstream cost is paid in latency and in the most expensive currency a frontier program has: researcher time. Make every weight read a dual-approval event and you have inserted a human in the loop of the inner research cycle; make it an m-of-n ceremony and you have made a 3am incident response wait on assembling a quorum. The engineering art is to apply the strong rule only to the irreversible, weight-bearing operations — export of a full checkpoint past the egress boundary, key rotation/revocation, deletion of the canonical copy, deployment of a new weights version to production — and to leave the high-frequency, reversible, in-enclave operations on advisory or no two-person rule at all. Frontier-lab practice is converging on a related pattern: no standing access to weight-bearing infrastructure, with engineers requesting time-limited, business-justified, peer-approved access per task (Anthropic's "multi-party authorization" design; OpenAI's insider-threat safeguards for unreleased weights, 2025). The just-in-time grant is two-person-by-construction without taxing every action equally.

The two-person-rule ladder: strength vs goodput cost

Control rung	What it blocks	Collusion floor	Goodput / latency cost	Where it fits
None / pure logging	Nothing in real time; deters only via after-the-fact audit	1 (any authorized person)	Zero	High-frequency reversible in-enclave ops
Advisory two-person (notify, don't block)	Casual/opportunistic insider; raises detection odds	1 (but visible)	Negligible	Routine privileged ops where speed matters
Mandatory dual-approval	Solo malicious insider acting alone	2 colluders, or 1 + social-engineered approver	Approver-availability latency; on-call friction	Sensitive but recoverable actions; config changes
Just-in-time, peer-approved, time-boxed grant	Standing-access abuse; dormant credential misuse	2 (requester + approver) per task window	Per-task request overhead; tooling investment	All weight-infrastructure access (frontier default)
Cryptographic m-of-n (quorum key release)	Solo and small-collusion theft of the capability itself	m colluding key-holders (defeat the crypto)	Quorum-assembly latency; painful in incidents	Weight export, key rotation, canonical-copy deletion

How the dual-control setting trades insider assurance against the friction it imposes. 'Collusion floor' is the minimum conspiracy required to defeat it. Apply the strong rungs only to irreversible, weight-bearing operations.

Separation of duties is the structural sibling of the two-person rule: rather than requiring two people for one action, it ensures that no single role spans an end-to-end abuse path. The person who can train and export a model should not also be the person who administers the audit log that would record the export; the person who provisions access should not be the person who reviews access grants; the person who writes deployment automation should not be the sole approver of what it deploys. Collapse these and you have built a single role that can both commit and conceal — the textbook insider precondition. The cost of enforcing separation is organizational: more roles, more handoffs, and a standing temptation for a small fast-moving team to re-merge duties "just to ship," which is exactly how the control quietly dies.

Egress as the choke point

If you can stop weights from leaving, you have bounded the worst insider outcome regardless of who holds what credential inside. This is why egress is the single most important architectural choke point for the human layer — and why RAND's defensive recommendations center on hardening interfaces against weight exfiltration and centralizing weights to a small number of monitored, access-controlled systems. The design goal is that there is exactly one sanctioned path for a weight artifact to leave the secure enclave, that path is inspected and rate-limited, every other path is physically or logically severed, and the rate cap is set below the bandwidth a full checkpoint would require to leave unnoticed.

The forks are concrete. Removable media: ban it outright at weight-bearing tiers (no USB mass storage, no writable optical, ports disabled in firmware) or allow it under dual control — the ban is simple and strong and occasionally infuriating for legitimate transfers. Outbound network: default-deny egress from the enclave with an explicit allowlist, deep inspection on the one sanctioned path, and hardware-enforced rate caps on the backend fabric (Chapter 11.7) — the cost is that inline inspection and caps at 400/800G are non-trivial to engineer and can throttle legitimate bulk distribution. The attestation-gated release from Chapter 11.5 closes the loop on the in-use side: weights decrypt only into an attested TEE, so even an insider with the ciphertext and the storage credentials cannot turn them into a usable model off the sanctioned hardware. The honest tension is that the tighter you draw egress, the more you break the routine, legitimate movement of multi-terabyte checkpoints between research, eval, and deployment — and teams that find the sanctioned path too slow will route around it, recreating the very exfiltration channel you closed.

The exception path is the attack path

Every egress control breeds an exception process for the legitimate cases it blocks — the urgent cross-org model handoff, the partner eval, the incident that needs data off the box now. That exception process, not the control, is where insiders operate, because it is staffed by humans under time pressure who are trained to say yes. If your break-glass procedure is a Slack message to a tired on-call engineer at 2am, you have a hole the size of your busiest night. Break-glass must be more controlled than the steady state, not less: pre-authorized scope, automatic dual-approval, full session recording, time-boxed, and reviewed after the fact as a security event. An exception path that is faster and quieter than the front door is not an exception path — it is the door the attacker uses.

PAM, behavioral monitoring, and the privacy ceiling

Privileged Access Management (PAM) is the operational spine of insider control: it removes standing administrative credentials, brokers time-limited just-in-time grants, vaults and rotates secrets, and — critically — records privileged sessions so that an insider with legitimate root cannot act invisibly. PAM is what makes the just-in-time grant and the no-standing-access pattern real rather than aspirational, and it is where insider control and control-plane secrets management (Chapter 11.7; key hierarchy in Chapter 11.8) physically converge. The decision is depth: session metadata only, full keystroke/command logging, or full screen recording of privileged sessions — each step buys forensic and deterrent value and adds storage, review burden, and employee-surveillance discomfort.

Behavioral monitoring / UEBA sits on top: baselining normal access patterns per user and alerting on deviation — the researcher who suddenly pulls checkpoints they have never touched, the off-hours bulk read, the access from a new device or geography, the resignation-correlated spike in data access that is the most reliable malicious-insider tell in the literature. The value is real and the limits are equally real. UEBA is a detection control, not a prevention control — it tells you an exfiltration is happening or has happened, against an asset where 'has happened' may already be unrecoverable. It generates false positives that erode analyst trust and, over time, get tuned into silence. And it runs straight into the privacy ceiling: pervasive employee monitoring is sharply constrained by EU/UK data-protection and works-council regimes, is a live cause of attrition among exactly the autonomous researchers you are trying to retain, and can poison the culture of trust that is itself a deterrence asset. The fork is not whether to monitor but how much, scoped to which population, under what legal basis, and with what transparency — and the answer legitimately differs by jurisdiction and by the sensitivity tier of the access.

~SL2

where consensus assesses frontier labs sit; insider threat is the dominant gap blocking SL4-5, which need human-layer controls not more crypto

2024-2025RAND RRA2849-1 (Securing AI Model Weights); IST SL5 Task Force

<2 months

RAND theft benchmark: a Security Level is defined by stopping an adversary attempting weight theft inside this window

2024RAND RRA2849-1

38 vectors

distinct attack vectors in RAND's model; insider threat spans most of them rather than being one isolated path

2024RAND RRA2849-1 (5 SL, 5 OC tiers, 38 vectors)

$17.4M

average annual cost of insider risk per organization (largest Ponemon insider study to date)

2025Ponemon / DTEX 2025 Cost of Insider Risks

~55% / ~25%

share of insider incidents that are negligent vs malicious; credential theft ~20% but costliest at ~$779,797/event

2025Ponemon 2025 Cost of Insider Risks

81 days

average time to detect and contain an insider incident (down from 86 in 2023); far longer than a checkpoint copy takes

2025Ponemon 2025 Cost of Insider Risks

~60%

of breaches involve the human element; convenience (60%) now leads deliberate-misuse motive ahead of financial gain (33%)

2025Verizon 2025 DBIR (12,195 breaches)

no standing access

frontier pattern: time-limited, peer-approved, business-justified grants to weight infrastructure (multi-party authorization)

2025Anthropic Frontier Model Security; OpenAI frontier-risk

Offboarding and the vendor/contractor surface

The most reliably-exploited insider window is mundane: the gap between the moment a person's relationship with the organization ends — termination, resignation, contract expiry, project rotation — and the moment their access is actually revoked. A departing insider with a grievance and a still-live credential is the textbook malicious case, and the literature consistently finds data-access spikes correlated with resignations. The decision is whether offboarding is event-driven and automatic (HR status change immediately deprovisions every system through identity automation, revokes keys, invalidates sessions, disables badges, and produces a provable revocation record) or ticket-driven and manual (someone files a request, IT works the queue, badges and SaaS grants linger for days). Manual offboarding does not fail loudly; it fails silently, leaving orphaned access that no one notices until an audit or an incident finds it. For weight-bearing access, same-day is not a target, it is the floor, and the only way to hit it reliably is to make deprovisioning a consequence of the HR event rather than a task that depends on a human remembering.

Vendors and contractors are the same problem with weaker levers. They are inside your boundary — maintenance technicians in the data hall, integration partners with fabric access, managed-service staff with admin rights, neocloud and colo operators whose own employees can reach your racks (Chapter 11.6). You typically cannot screen them to your own standard, cannot monitor them under your own policy, and cannot offboard them through your own HR system, so their access tends to outlive its purpose. The controls that work are structural: escort and supervision for physical access to weight-bearing zones, time-boxed and scoped credentials that expire by default, no shared accounts (every contractor action attributable to a named human), and contractual flow-down of your security obligations with audit rights. The fork is supply-chain-shaped — see Chapter 11.3 for hardware provenance — but the human-access piece is yours to own: a contractor with a standing admin credential and no expiry is an insider you did not hire and cannot fire.

Deep dive: compartmentalization — the need-to-know cell vs the flat research org

The most powerful insider control is also the one most in tension with how frontier research actually works: compartmentalization — partitioning weight access into need-to-know cells so that no single insider can reach more than a slice of the crown jewels. In its strong form, the people who can read a given model's weights are a small named set; trainers, deployers, and auditors are different populations; pre-release weights live in a cell distinct from production; and movement of an artifact between cells is itself a two-person, egress-controlled event. The security payoff is large and direct: compartmentalization shrinks the high-access population (the lever that makes every other personnel control cheaper), it caps the blast radius of any single compromised or malicious insider, and it is the structural prerequisite for honestly claiming a higher Security Level — because RAND's higher SLs assume an adversary cannot simply recruit one well-placed person and get everything.

The cost is equally direct and is the reason most labs resist it. Frontier research runs on broad, fast, informal access to models, checkpoints, and each other's work; compartmentalization replaces that with boundaries, approvals, and handoffs, and the measured productivity hit of moving from a flat-trust SL2 posture to a compartmentalized SL4 posture is real and largely unquantified in public — it is the open question the field has not answered. There is also a deception risk: cells that are drawn for security but cut across natural collaboration lines breed shadow-sharing, where researchers recreate informal access channels outside the controls, which is worse than no compartmentalization because it is invisible. The defensible path is to compartmentalize the asset aggressively (a small cell for full canonical weights, tight egress between cells) while keeping the research environment as open as the threat model allows (derived artifacts, sandboxed copies, eval harnesses), accepting that this is a perpetual negotiation between the security organization and the research organization rather than a setting you configure once.

Culture, deterrence, and the limits of control

Every control in this chapter is defeatable by a sufficiently motivated, sufficiently placed insider, which is why the last layer is not technical at all: it is deterrence and culture. Deterrence is the credible expectation that an insider action will be detected, attributed, and consequential — and it is mostly free relative to its effect. Pervasive logging that employees know is comprehensive, attribution that reaches every action to a named human (no shared accounts, ever), a visible insider-risk program, clear and enforced consequences, and a trusted, low-friction reporting channel for colleagues who notice something — these change the insider's expected-value calculation before any control is tested. The negligent majority (the ~55% of incidents that are carelessness, not malice) are addressed less by enforcement than by making the secure path the easy path, so that 'convenience misuse' — the Verizon-leading motive — has nowhere to go.

The human layer is where security and the rest of the business are most directly opposed, and pretending otherwise produces worse security, not better. Over-control a research org and you do not get a secure lab — you get attrition of the people the controls were protecting, shadow workarounds that defeat the controls invisibly, and a culture of mutual suspicion that erodes the trust deterrence depends on. Under-control it and the base rates above tell you what happens. There is no setting that optimizes both; there is only a defensible, threat-model-driven balance that you revisit as the asset value, the regulatory environment, and the threat landscape move — and that you implement with enough transparency that the people inside the boundary understand why the friction exists. The governance regime that ratifies these tradeoffs and the metrics that prove they are working live in Chapter 11.11; the detection and response machinery that catches the insider who beats the controls lives in Chapter 11.12.

The insider threat is the human face of the asset protected in Chapter 11.8 (weights at-rest/in-transit/in-use) and bounded by the in-use attestation gate of Chapter 11.5. It is scored against the SL/OC framework introduced in Chapter 11.1, and it shares the egress choke point and secrets plane with Chapter 11.7. The vendor/contractor surface connects to supply-chain provenance in Chapter 11.3 and multi-tenant operator trust in Chapter 11.6. PAM session logging and behavioral telemetry ride the observability plane of Chapter 10.6, and the converged human-and-system incident command that catches the insider who beats these controls is detailed in Chapter 11.12 and Chapter 14.11. The governance, audit, and metrics that ratify the velocity-vs-security tradeoffs sit in Chapter 11.11; distinct from data-governance and privacy, treated in Chapter 10.10.