Evidence Register

Structured, maintainer-facing register tracing extracted sources into openbotrisk’s site pages, signals taxonomy, and reading decisions.

Note

This is the project’s bibliographic memory: one row per source read, structured so that extracted evidence can be traced into site pages, the signals/techniques taxonomy, and reading decisions. It is not the narrative evidence review — that lives in the Foundations, Background, and Technical territory sections. Per GOVERNANCE.md §6 and EVIDENCE-REVIEW.md §6 the register is public, including sources read and rejected.

Purpose and scope

This register is the structured projection of the per-source extraction entries in working/register-entries/. It exists so that the analytical fields the extraction prompt produces — evidence basis, signals / techniques, threat types, framing distance, what it cannot show — are queryable across the whole corpus rather than buried in free text.

It tracks every source read, with provenance and review state.
It records what kind of evidence each source actually is (evidence basis), so a vendor marketing claim is never silently treated as equivalent to a controlled study.
It cross-indexes sources by signal/technique family, so a reader can find which sources cover JA3/JA4, mouse dynamics, residential proxies, GAN/RL evasion, etc.
It tracks scarce-resource abuse as a cross-cutting tag family where relevant, not as a separate evidence category.
It carries the framing-distance ledger — the project’s central analytical discipline (EVIDENCE-REVIEW.md §5).
It records sources read and rejected, so they aren’t re-read.

The source of truth for any individual source is its entry file under working/register-entries/. This page does not re-read source material and does not infer beyond the entries.

How to update this register

Append-only by default.

One register id (SRC-NNN) per distinct source. Distinct sources from the same vendor stay as separate rows — they are never folded together. Selecting between overlapping sources is the page-writing step’s job, not the register’s.
Add one inventory row per new distinct source, in the relevant category sub-table.
Add the source to the framing-distance ledger and to any signals/techniques cross-index rows it belongs to.
If the source concerns scarce-resource abuse, add it to the scarce-resource abuse index and carry the conditional fields from the extraction entry.
Set evidence basis, provenance (agent + model), and review state from the entry’s run-metadata block — do not leave provenance blank.
Record sources read and rejected in the rejected table with the reason. Rejected entries are kept, not deleted — the row is the “don’t re-read” record.
Do not rewrite an existing judgement silently. If a judgement changes, add a dated note in the update log explaining why and preserve the prior context.
Keep current relevance separate from future relevance in the entry; the register surfaces current project impact only.

Versioning and reconciliation. Re-extracting a source under a new prompt version, or with a different agent, produces a new versioned entry file alongside the old one — the old file is never overwritten or deleted. Filename convention: stem = source slug, suffix = version/state, agent/model recorded inside the file’s run-metadata block (not in the filename):

<slug>.md — original extraction
<slug>.v2.md — re-extraction under prompt v2 (.v3, … as prompts advance)
<slug>.combined.md — a reconciliation of two or more extractions of the same source

A re-extraction does not create a new register row. It is added to the existing source’s row under the same SRC-NNN: list all its files in the entry file cell, and tag the cell with a reconciliation state — [single], [multiple — unreconciled], or [combined] (see appendix). canonical for citation = the .combined file if one exists, else the latest version. Reconciling multiple extractions into a .combined file is done when useful — there is no obligation to reconcile immediately.

The mechanics of projecting a reviewed entry into this register (assigning ids, versioning, cross-index maintenance, the update-log line) are specified in prompts/register-update-prompt.md.

Migration state (2026-06-02)

This register was migrated from the flat working/reading-register.md. The earlier register stored everything analytical in a single free-text Notes column and did not record which agent/model produced each extraction. As a result:

provenance is not recorded for every migrated row. It is populated going forward from the extraction prompt’s run-metadata block (v2).
evidence basis and signals / techniques are migrated from the old one-line notes where those notes supported them; otherwise marked see entry.
The framing-distance ledger and threat types are mostly tbd — backfill from entry: those fields live in the per-source entry files (working/register-entries/), which were not re-read during migration.
review state is migrated — review pending for all rows: extraction quality cannot be judged from a one-line note.

Completing the register is a mechanical backfill pass: an agent reads each working/register-entries/<slug>.md, lifts evidence basis, signals / techniques, threat types, framing distance, and what it cannot show, and fills the stubs. Treat any cell marked tbd/see entry/not recorded as unverified until that pass runs.

Extraction inventory

Columns: id · source · org / authors · year · evidence basis · operational proximity · signals / techniques · threat types · provenance (agent / model) · review state · project impact · entry file. Vocabulary for the ordinal/controlled columns is defined in the appendix.

Foundations

id	source	org / authors	year	evidence basis	operational proximity	signals / techniques	threat types	provenance	review state	project impact	entry file
SRC-001	Automated Threats to Web Applications (project page)	OWASP	n.d.	taxonomy	n/a (taxonomy)	OAT category set	All OAT (taxonomy)	not recorded	migrated — review pending	OAT taxonomy spine for threat-type vocabulary; project page only, full Handbook queued	`owasp-automated-threats-to-web-applications.md`
SRC-027	Automated Threat Handbook: Web Applications v1.3	OWASP / Watson & Zaw	2026	taxonomy	n/a (taxonomy)	21 OAT categories; countermeasure classes; symptoms; fingerprinting/reputation/rate/monitoring classes	All OAT (taxonomy)	not recorded; provisional draft	needs review	Full OAT Handbook taxonomy and countermeasure-class reference; provisional extraction produced without repo scope docs, so verify before citation	`[single]`; canonical: `owasp-automated-threat-handbook-v1v3.md`
SRC-028	How to Get and Use Cookies in Playwright	armanabbasi / Medium	2023	capability-doc	capability	Playwright browser contexts; cookie extraction; session/authentication state capture	Not threat-specific	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2	needs review	Low-level foundations example showing browser automation can access and preserve cookie/session state	`[single]`; canonical: `medium-playwright-cookies-source-extraction.md`
SRC-041	Top 15 Scraper Sites to Enhance Your Data Collection Skills	ScrapingBee	2026	capability-doc	capability (training / sandbox)	scraping practice sites; static HTML; pagination; authentication; cookies/sessions; JSON APIs; JavaScript rendering; proxy management; CAPTCHA handling	Not threat-specific; scraper skill-building and production-path context	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Low-priority foundations/context source for how scraping skills are taught and how vendors frame the move from sandbox practice to production scraping	`[single]`; canonical: `scrapingbee-2026-scraper-test-sites(1).md`
SRC-059	Cross-Origin Resource Sharing (CORS)	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	CORS; same-origin policy; Origin header; preflight OPTIONS; Access-Control-* headers; credentialed cross-origin requests	Not threat-specific; browser-side cross-origin data access and CSRF-adjacent reasoning	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Foundation reference for CORS; useful mainly to prevent treating CORS as a general anti-scraping defence	`[single]`; canonical: `mdn-2026-cors(1).md`
SRC-060	HTTP caching	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	Cache-Control; private/shared/proxy/CDN caches; ETag; Last-Modified; If-None-Match; Vary; conditional requests	Not threat-specific; cache-aware crawling, origin-load interpretation, and shared-cache privacy risk	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Foundation reference for caching and why repeated crawler/scraper requests may affect origin servers differently depending on cache behaviour	`[single]`; canonical: `mdn-2026-http-caching(1).md`
SRC-061	HTTP authentication	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	401/403/407; WWW-Authenticate; Authorization; Proxy-Authenticate; Proxy-Authorization; Basic auth; bearer tokens	Not threat-specific; credential-bearing requests, proxy authentication, and access-control boundary concepts	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Foundation reference for separating HTTP authentication, proxy authentication, login sessions, and account-abuse concepts	`[single]`; canonical: `mdn-2026-http-authentication(1).md`
SRC-062	Using HTTP cookies	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	Set-Cookie; Cookie; session IDs; session/permanent cookies; Secure; HttpOnly; SameSite; Domain/Path; session fixation	Not threat-specific; session state, tracking, account takeover, scraping behind login, and cookie-continuity detection	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Core foundation source for explaining how stateless HTTP becomes stateful through cookies and sessions	`[single]`; canonical: `mdn-2026-using-http-cookies(1).md`
SRC-063	User-Agent header	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	User-Agent strings; browser/crawler/tool identification; User-Agent reduction; Client Hints; Navigator.userAgent	Not threat-specific; crawler identification, spoofing, browser impersonation, and passive fingerprinting surface	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Foundation entry for what User-Agent is and why it sits between compatibility, crawler identity, bot detection, and privacy	`[single]`; canonical: `mdn-2026-user-agent-header(1).md`
SRC-064	HTTP headers	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	request/response/representation/payload headers; User-Agent; Accept-Language; Cookie; Authorization; CORS; cache; proxy headers	Not threat-specific; header-based detection, spoofing/mismatch checks, session/authentication-bearing requests, and proxy-aware analysis	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Neutral vocabulary bridge between basic HTTP mechanics and later sources on header-order checks, proxy headers, and browser impersonation	`[single]`; canonical: `mdn-2026-http-headers(1).md`
SRC-065	Overview of HTTP	MDN Web Docs	2026	reference-doc	n/a (foundational reference)	HTTP requests/responses; client-server model; user-agents; browser resource fetching; proxies; cookies/sessions; authentication	Not threat-specific; foundation for scraping, crawling, request-pattern detection, and session-based abuse	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Plain-language foundation for browsers, crawlers, and scripts as user-agents making HTTP requests	`[single]`; canonical: `mdn-2026-overview-of-http(1).md`
SRC-069	OAuth 2.0 authentication vulnerabilities	PortSwigger Web Security Academy	2026	methods-taxonomy	capability	OAuth grant types; authorization endpoints; redirect URI validation; state parameter; authorization codes; access tokens; scope validation; OpenID Connect	OAuth authentication bypass; token/code leakage; forced profile linking; third-party authentication abuse; account takeover	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	OAuth / third-party-authentication foundation entry; broadens login-abuse coverage beyond password forms	`[single]`; canonical: `portswigger-2026-oauth-2-authentication-vulnerabilities(1).md`
SRC-070	How to secure your authentication mechanisms	PortSwigger Web Security Academy	2026	control-guidance	capability (defensive control guidance)	password strength checking; zxcvbn; generic errors; response-time equalisation; IP-based rate limiting; CAPTCHA; MFA/2FA; password reset/change flows	credential disclosure; username enumeration; brute-force login; password-reset abuse; weak MFA; credential-stuffing-adjacent login abuse	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Defensive counterpart to login-vulnerability entries; useful for “what closes down easy routes” without treating controls as proven sufficient	`[single]`; canonical: `portswigger-2026-secure-authentication-mechanisms(1).md`
SRC-071	Vulnerabilities in password-based login	PortSwigger Web Security Academy	2026	methods-taxonomy	capability	username enumeration; status-code/error-message/response-timing differences; brute-force wordlists; account locking; IP blocking; rate limiting; CAPTCHA; HTTP Basic Auth; Authorization header	brute-force login; credential stuffing; username enumeration; account takeover; basic-auth brute force	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Core foundation source for credential stuffing and password-login abuse mechanics; shows why simple IP/account-lock controls are partial	`[single]`; canonical: `portswigger-2026-password-based-login-vulnerabilities(1).md`
SRC-072	Authentication vulnerabilities	PortSwigger Web Security Academy	2026	vulnerability-taxonomy	capability	broken authentication; brute force; authentication bypass; password-based login; MFA weaknesses; third-party authentication; OAuth	account takeover; brute-force login; authentication bypass; post-login attack-surface expansion; high-privilege account compromise	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Foundation overview for authentication as an attack surface and how login abuse can lead to account takeover and follow-on exploitation	`[single]`; canonical: `portswigger-2026-authentication-vulnerabilities(1).md`
SRC-073	Digital Identity Guidelines: Authentication and Authenticator Management	NIST / Temoshok et al.	2025	standards-reference / authentication-guidance / control-requirements	control	AAL; MFA; phishing resistance; passwords; credential-stuffing throttling; session secrets; fraud indicators; browser cookies	Credential stuffing; credential cracking; account takeover; session abuse	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Standards foundation for authentication, sessions, rate limiting, and why fraud indicators do not replace authenticators	`[single]`; canonical: `nist-2025-sp-800-63b-4-authentication-authenticator-management(1).md`
SRC-090	HTTP/2 and HTTP/3 protocol foundations	IETF / Thomson, Benfield & Bishop	2022	protocol-standard / technical specification	n/a (foundational protocol standard)	HTTP/2; HTTP/3; streams; multiplexing; HPACK; QPACK; QUIC; ALPN; binary framing; protocol identifiers	Not threat-specific; protocol foundation for HTTP fingerprinting and request-layer analysis	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Current protocol-standard foundation for HTTP/2 and HTTP/3; use RFC 9113 for current HTTP/2 claims rather than RFC 7540	`[single]`; canonical: `rfc-9113-9114-http2-http3-protocol-foundations(1).md`
SRC-091	Multi-process Architecture	Chromium Projects	n.d.	browser-architecture reference / design documentation	n/a (foundational browser architecture)	browser process; renderer process; Blink; Mojo; IPC; sandboxing; GPU/network/storage services; process isolation	Not threat-specific; browser-native automation and browser-security foundation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Foundation for explaining why modern browsers are not simple HTTP clients and why browser-native automation has a different threat surface	`[single]`; canonical: `chromium-multi-process-architecture(1).md`
SRC-092	Hypertext Transfer Protocol Version 2 (HTTP/2)	IETF / Belshe, Peon & Thomson	2015	protocol-standard / technical specification	n/a (historical protocol standard)	HTTP/2; binary framing; streams; multiplexing; HPACK; ALPN; h2/h2c; server push	Not threat-specific; historical HTTP/2 protocol foundation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Historical HTTP/2 specification; keep for provenance/history, but use RFC 9113 / `SRC-090` for current HTTP/2 wording	`[single]`; canonical: `rfc-7540-http2(1).md`
SRC-093	Application Security Verification Standard, Version 5.0.0	OWASP Foundation	2025	standards-reference / control-requirements / defensive-guidance	control	anti-automation; business logic; rate limiting; session management; authentication; authorization; HTTP validation; logging	Credential stuffing; scraping; scalping; sniping; account creation; denial of inventory; expediting; DoS; account aggregation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Defensive-control foundation connecting automated-abuse categories to verifiable application-security requirements	`[single]`; canonical: `owasp-2025-application-security-verification-standard-5-0(1).md`

Vendor and industry

Treated as evidence of what the field claims, not independent proof. Efficacy/prevalence figures are vendor-measured.

id	source	org / authors	year	evidence basis	operational proximity	signals / techniques	threat types	provenance	review state	project impact	entry file
SRC-002	Bot scores, JA3/JA4, Detection IDs, Web Bot Auth, custom rules (Bots docs)	Cloudflare	2026	capability-doc	capability	bot score 1–99, JA3/JA4, Detection IDs, Web Bot Auth, WAF rule fields	Not threat-specific	not recorded	migrated — review pending	Supports “Cloudflare exposes/uses X”, not “X works”	`cloudflare-2026-bot-scores-detection-engines-ja3-ja4-web-bot-auth-custom-rules.md`
SRC-003	Bot Management documentation	Cloudflare	2026	capability-doc	capability	bot score; WAF custom rules; Workers; Bot Analytics; logs; verified bots; JavaScript detections; machine-learning model updates; endpoint-specific policy	bot traffic; login automation; application abuse; unwanted access to protected resources	not recorded; ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Primary Cloudflare entry for per-request scoring and endpoint-specific bot policy; legacy migrated row now has a v3 re-extraction attached	`[multiple — unreconciled]`; previous: `cloudflare-2026-bot-management-docs.md`; canonical: `cloudflare-2026-bot-management(1).md`
SRC-004	Bot Protect, AI Detection Engine, 2025 Global Bot Security Report	DataDome	2025–2026	vendor-claim; threat-intel	observed (vendor-measured)	intent-based detection; signal-family taxonomy	tbd — backfill from entry	not recorded	migrated — review pending	Intent-based framing; 2.8% “fully protected” is vendor-measured; pair with SRC-015 for external evidence	`datadome-2025-2026-bot-protect-ai-detection-global-bot-security-report.md`
SRC-005	Bot Management (product brochure)	Netacea	n.d.	vendor-claim	claimed	server-side / no-client-JS positioning; 2 case studies	tbd — backfill from entry	not recorded	migrated — review pending	Product-positioning evidence	`netacea-bot-management-product-brochure.md`
SRC-006	Technical Showcase: ML in Advanced Bot Management	Netacea	n.d.	methods-taxonomy	capability	supervised/unsupervised, real-time/batch, general/specific; Intent Analytics	Not threat-specific	not recorded	migrated — review pending	ML-methods taxonomy; no reproducible detail	`netacea-technical-showcase-machine-learning.md`
SRC-007	Death by a Billion Bots	Netacea	2023	survey	claimed (self-report)	440-executive survey; $85.6M/company business-impact framing	tbd — backfill from entry	not recorded	migrated — review pending	Survey evidence; origin/geopolitical claims out of scope	`netacea-2023-death-by-a-billion-bots.md`
SRC-008	Bot Manager, ACTIR, Agentic AI Security Report	Arkose Labs	2023–2026	vendor-claim; survey	claimed	dynamic challenges; attacker-cost framing; agentic-AI survey	tbd — backfill from entry	not recorded	migrated — review pending	Account-integrity + attacker-cost angle; several reports gated	`arkose-2023-2026-bot-manager-actir-agentic-ai-reports.md`
SRC-009	Bot Defense, Adversarial Techniques, AI Agent Trust, 2026 Benchmark	Kasada	2025–2026	vendor-claim; threat-intel	observed (vendor-measured)	solver/proxy/CAPTCHA pricing; proof-of-execution; AI-agent governance	tbd — backfill from entry	not recorded	migrated — review pending	Strong attacker-economy angle	`kasada-2025-2026-bot-defense-adversarial-retooling-ai-agent-trust.md`
SRC-010	Sightline, AI Agent Detection, OpenClaw, 2026 benchmark	HUMAN Security / PerimeterX	2026	vendor-claim; threat-intel	observed (vendor-measured)	cyberfraud-journey framing; AI-agent detection signal categories; OpenClaw observations	tbd — backfill from entry	not recorded	migrated — review pending	Concrete AI-agent detection signal categories	`human-2026-sightline-bot-mitigation-ai-agent-detection-openclaw.md`
SRC-034	2021 Credential Stuffing Report	F5 Labs / Vinberg & Overson	2021	threat-intel; empirical-operational	observed (vendor-measured)	credential-spill aggregation; login success-rate / password-reset / diurnal anomalies; browser automation; CAPTCHA-solving microwork; attacker sophistication tiers	credential stuffing; account takeover	Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3	needs review	Primary observed-use anchor for credential stuffing; combines spill-supply evidence with vendor-measured login abuse against large production sites	`[single]`; canonical: `f5-2021-credential-stuffing-report.md`
SRC-036	OpenClaw in the wild: How autonomous agents can drive abuse at scale	HUMAN Security / Kaiserman & Cirlig	2026	empirical-operational	observed (vendor-measured)	autonomous-agent browser automation; exposed agent gateways; request bursts; referral UTM tagging; reconnaissance; directory/file probing	synthetic engagement; referral manipulation; reconnaissance; browser-native automation abuse	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Observed-use evidence for agentic browser automation abuse, with explicit attribution caveats	`[single]`; canonical: `human-2026-openclaw-in-the-wild(1).md`
SRC-037	Agentic Visibility: How to See AI Agents in Your Traffic	HUMAN Security / McArtney	2026	capability-doc	capability	AI-agent identification and classification; trust levels; HTTP Message Signatures and key directories; session and route analysis; dashboard visibility	Not threat-specific; AI-agent visibility and analytics contamination	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Product/capability framing for agentic visibility, trust classification, and the shift from visibility to control	`[single]`; canonical: `human-2026-agentic-visibility-how-to-see-ai-agent-traffic(1).md`
SRC-038	State of Agentic Traffic – May 2026	HUMAN Security / Kaiserman	2026	empirical-operational	observed (vendor-measured)	agentic-traffic telemetry; named agent/operator mix; sector distribution; page-route categorisation; blocking rate; policy controls	Not threat-specific; AI-agent traffic across product/search, account, authentication, content, and checkout routes	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Current vendor-telemetry snapshot of agentic traffic patterns and route exposure, not proof of malicious intent	`[single]`; canonical: `human-2026-state-agentic-traffic-may(1).md`
SRC-039	2026 Thales Bad Bot Report: Bad Bots in the Agentic Age	Thales / Imperva	2026	empirical-operational; threat-intel	observed (vendor-measured)	bot traffic classes; AI crawlers/fetchers; API endpoint targeting; browser impersonation; session consistency; residential/mobile proxies; CAPTCHA solving; headless automation; signed AI bots	account takeover; API abuse; scraping; scalping/inventory hoarding; SMS pumping; carding/payment-flow abuse	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Broad vendor-measured snapshot of production bot, API, ATO, inventory, and AI-agent abuse; useful but not independent prevalence evidence	`[single]`; canonical: `thales-2026-bad-bot-report(1).md`
SRC-040	AI-Empowered Botnets and API Visibility Gaps: Attack Trends in Financial Services	Akamai Security	2026	empirical-operational	observed (vendor-measured)	WAF/API alerts; API endpoint attack tracking; BOLA/BOPLA; shadow/zombie APIs; behavioural heuristics; user-risk telemetry; low-and-slow tactics; headless browsers; AI crawler classification	API abuse; scraping/AI crawler activity; bot evasion; financial-services web attacks; ATO/fraud adjacent	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Vendor telemetry source for financial-services API/bot abuse and the public-data boundary; alerts are not proof of successful attacks	`[single]`; canonical: `akamai-2026-financial-services-security-trends(1).md`
SRC-049	How to Restore Fairness in Online Ticketing by Fighting Ticket Bots	DataDome / Falokun	2026	methods-taxonomy	claimed	ticket-bot lifecycle; account creation/takeover; rapid refresh; availability scraping; checkout automation; CAPTCHA bypass; virtual waiting rooms; intent-aware detection	ticket bots; scalping; queue abuse; limited-stock attacks; checkout automation; resale	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Vendor taxonomy for ticket bots / slot-sniping and scarce-inventory abuse; useful when paired with FTC legal-record evidence	`[single]`; canonical: `datadome-2026-ticket-bots(1).md`
SRC-051	Comodo ModSecurity WAF Rules Update: The 2026 Solution / SBB-WAF-Rules	StopBadBots / sminozzi	2025	tooling-readme	capability	ModSecurity rules; WAF augmentation; user-agent blocklists; AI-crawler blocking; scanner detection; behavioural thresholds; WordPress hardening	bot blocking; scanner/reconnaissance; AI crawler blocking; web-application probing	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Defensive-tooling example for the simple WAF/blocklist/behavioural-threshold end of the control stack	`[single]`; canonical: `stopbadbots-2025-sbb-waf-rules(1).md`
SRC-054	Block AI Bots - Cloudflare bot solutions docs	Cloudflare	2026	capability-doc	capability	verified AI crawler classification; unverified AI-like bot blocking; hostname-level controls; ad-hostname blocking; AI Crawl Control	AI crawler access; unverified AI-like crawling; content-access governance	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Current-trend source for AI crawler management as defensive product categorisation and publisher/content-access governance	`[single]`; canonical: `cloudflare-2026-block-ai-bots(1).md`
SRC-055	Overview - Cloudflare Turnstile docs	Cloudflare	2026	capability-doc	capability	non-interactive JavaScript challenges; proof-of-work; proof-of-space; Web API probing; browser quirks; human-behaviour checks; pre-clearance cookie	automated scripts; non-human browser environments; protected form/login flows	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Challenge-system / CAPTCHA-alternative source showing the move from visual puzzles to browser/environment/human-like signal evaluation	`[single]`; canonical: `cloudflare-2026-turnstile(1).md`
SRC-056	Detection IDs - Cloudflare bot solutions docs	Cloudflare	2026	capability-doc	capability	Detection IDs/tags; claimed-browser consistency; HTTP header order; heuristics; verified-bot and anomaly detections; Logpush; WAF custom rules	predictable bot behaviour; header-order mismatch; browser impersonation; endpoint-specific bot traffic	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Concrete Cloudflare source for coherence checks and turning detection signals into rules, analytics, and logging	`[single]`; canonical: `cloudflare-2026-detection-ids(1).md`
SRC-057	Bot detection engines - Cloudflare bot solutions docs	Cloudflare	2026	methods-taxonomy	capability	heuristic checks; malicious fingerprints; JavaScript detections; headless-browser detection; headers; session characteristics; browser signals; supervised ML; bot score; anomaly detection	simple automation; headless-browser automation; sophisticated bots; malicious fingerprints	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Central Cloudflare defensive-methods taxonomy; useful vendor-side mirror of scraper-side evasion layers	`[single]`; canonical: `cloudflare-2026-bot-detection-engines(1).md`
SRC-058	Overview - Cloudflare bot solutions docs	Cloudflare	2026	capability-doc	capability	Bot Fight Mode; Super Bot Fight Mode; Bot Analytics; firewall variables; WAF; Turnstile; API Shield; DDoS protection; defensive stack	automated traffic; known bot patterns; unwanted crawling; resource abuse; automated endpoint interaction	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Cloudflare defensive-stack overview bridging simple controls, challenge systems, per-request scoring, analytics, and endpoint-specific policy	`[single]`; canonical: `cloudflare-2026-bot-solutions-overview(1).md`
SRC-079	DataDome Releases VM-Based Obfuscation: The Next Evolution in Client-Side Detection Security	DataDome / Vayno	2026	capability-doc / vendor product announcement / defensive architecture explanation	capability	VM obfuscation; client-side detection; browser detection; Device Check; Slider; WebAssembly; dynamic code regeneration; proprietary bytecode; anti-reverse-engineering	Bot detection code protection; client-side detection arms race	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Direct vendor source for VM-based obfuscation applied to commercial client-side bot-detection logic	`[single]`; canonical: `datadome-2026-vm-based-obfuscation-client-side-detection(1).md`
SRC-084	Commercial CAPTCHA-solving API ecosystem	CapSolver; Hayes; HasData / Skakun	2025–2026	capability-doc / vendor marketing / tutorial ecosystem / vendor-adjacent benchmark	capability	CAPTCHA-solving APIs; reCAPTCHA; Turnstile; Geetest; AWS WAF CAPTCHA; token generation; solver markets; AI agents; automation workflows	CAPTCHA defeat; scraping; automated account management; price monitoring; SEO/SERP automation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Capability evidence that CAPTCHA-solving APIs are openly marketed and integrated into automation and AI-agent workflows	`[single]`; canonical: `commercial-captcha-solving-api-ecosystem-2026(1).md`
SRC-086	OpenClaw: exposed AI-agent gateways and enterprise risk	Bitsight / Cruz	2026	empirical-exposure measurement / attack-surface analysis / threat-intelligence	observed (exposure measurement)	OpenClaw; exposed services; internet scanning; autonomous agents; integrations; prompt injection; RCE; credential exposure; WebSocket API; weak token	AI-agent exposure; exposed-agent attack surface; misconfiguration	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Complements HUMAN OpenClaw by measuring exposed gateways and configuration/blast-radius risk rather than traffic abuse	`[single]`; canonical: `bitsight-2026-openclaw-exposed-ai-agent-gateways(1).md`
SRC-087	How the Peer-to-Business Model Redefines App Monetization	Infatica SDK Experts	2025	capability-doc / business-model description / vendor marketing	capability	peer-to-business SDK; residential proxies; idle bandwidth; opt-in peers; public web data; geo-restrictions; rate limits; CAPTCHA walls	Commercial scraping infrastructure; proxy supply-chain; access-barrier bypass	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Vendor business-model source for how residential/peer proxy supply can be built through SDK-enabled app users	`[single]`; canonical: `infatica-2025-peer-to-business-app-monetization-sdk(1).md`
SRC-088	Residential Proxies: Definition, Use Cases, and Best Providers	Bright Data / Zanini	2026	capability-doc / vendor marketing / market map	capability	residential proxies; ISP proxies; rotating proxies; sticky sessions; geo-targeting; IP reputation; rate-limit and IP-ban avoidance	Web scraping; price monitoring; ad verification; sneaker and ticket purchasing; SEO monitoring; social-media management	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Commercial proxy-ecosystem source explaining residential proxy capabilities and use cases, including limited-stock purchasing	`[single]`; canonical: `brightdata-2026-residential-proxies-definition-use-cases-best-providers(1).md`

SRC-098 | X-Force Threat Intelligence Index 2026 | IBM X-Force | 2026 | threat-intelligence synthesis / vendor report | observed-vendor-threat-intel | public-facing application exploitation; supply-chain compromise; credential theft; AI-assisted social engineering; identity protection; weak authentication; misconfiguration | cloud/application exploitation; credential theft; supply-chain compromise; ransomware/extortion context; AI-assisted operations | not recorded; raw IBM TXT/HTML source supplied | needs extraction / review | Broad current threat-landscape context for identity, cloud/application exposure, AI-enabled acceleration, and basic security hygiene; not bot-specific evidence | [multiple — raw source; extraction pending]; canonical: X-Force Threat Intelligence Index 2026(1).txt; supporting: X-Force 2026 Threat Intelligence Index - Executive Summary _ IBM(1).html |
SRC-099 | 2025 Cloud Threat Hunting and Defense Landscape | Recorded Future Insikt Group | 2026 | threat-intelligence synthesis / observed incidents / mitigations and detections | threat-intelligence synthesis | cloud/SaaS abuse; valid accounts; tokens/keys/service accounts; cloud APIs; CI/CD; backups/snapshots; SaaS functionality; LLM/ML service abuse | cloud abuse; SaaS abuse; credential abuse; account takeover; third-party compromise; cloud ransomware; supply-chain abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Cloud/SaaS adversarial-infrastructure anchor; useful for showing legitimate cloud functions and identities as attack infrastructure, not bot-specific telemetry | [single]; canonical: recordedfuture-2026-cloud-saas-abuse-adversarial-infrastructure(1).md |
SRC-100 | Quarterly Threat Intelligence Report: Q1 2026 | KasadaIQ | 2026 | vendor telemetry / marketplace monitoring / threat-intelligence assessment | observed-vendor-telemetry | threat enablers; bots-as-a-service; automated checkout; account markets; verification/KYC/2FA bypass services; residential proxies; no-code/vibe-coded bots; AI-account demand | account takeover; automated checkout; scalping; credential markets; verification bypass; reselling communities; limited-inventory abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Main bot/automated-abuse source for SaaSification of adversarial infrastructure and market-enabled automation; use cautiously because raw telemetry and source lists are not reproducible | [single]; canonical: kasada-2026-q1-threat-enablers-saasification-adversarial-infrastructure(1).md |
SRC-101 | Mythos and the cost of attacking | Summers / Netwrix | 2026 | opinion / strategic commentary / vendor blog | low | AI attacker economics; cost-of-attack framing; OODA loop; Pyramid of Pain; vulnerability discovery; phishing; command-and-control; intent detection | Not threat-specific; AI-enabled attacker cost framing | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Low-priority security-economics context for the claim that cheap AI can reduce attacker cost; do not use as empirical evidence | [single]; canonical: summers-2026-mythos-cost-of-attacking-ai-security-economics(1).md |

Academic and research

id	source	org / authors	year	evidence basis	operational proximity	signals / techniques	threat types	provenance	review state	project impact	entry file
SRC-011	ML-Based Detection and Evasion Techniques for Advanced Web Bots (PhD thesis, Bournemouth)	Iliou, C.	2022	empirical-academic	capability	sophistication taxonomy (simple→advanced); web-log + mouse detection; RL & GAN evasion	tbd — backfill from entry	not recorded	migrated — review pending	Primary academic anchor; controlled/academic setting	`iliou-2022-thesis-advanced-web-bots.md`
SRC-012	Towards a framework for detecting advanced web bots (ARES)	Iliou et al.	2019	empirical-academic	capability	advanced-bot AUC ~0.68 at low FPR; proxy labels	tbd — backfill from entry	not recorded	migrated — review pending	Cleanest source for “simple-bot results hide weak advanced-bot detection”	`iliou-2019-ares-detecting-advanced-web-bots.md`
SRC-013	Web Bot Detection Evasion Using GANs (CSR)	Iliou et al.	2021	empirical-academic	capability	GAN evasion of CNN mouse/touch detectors; web mouse recall → ~0.45	tbd — backfill from entry	not recorded	migrated — review pending	Adversarial framing	`iliou-2021-csr-web-bot-detection-evasion-gans.md`
SRC-014	Web Bot Detection Evasion Using Deep RL (ARES)	Iliou et al.	2022	empirical-academic	capability	RL web-log evasion; detection/evasion as repeated game	tbd — backfill from entry	not recorded	migrated — review pending	PoC mechanism, not observed campaigns	`iliou-2022-ares-web-bot-detection-evasion-deep-rl.md`
SRC-015	FP-Inconsistent (arXiv 2406.07647)	Venugopalan et al.	2025	empirical-operational	measured (honey-site)	purchased evasive bot traffic vs DataDome/BotD on a honey site; fingerprint inconsistency rules	impression / ad fraud	not recorded	migrated — review pending	Strongest operational academic anchor; external evidence on DataDome	`venugopalan-2025-fp-inconsistent-fingerprint-inconsistencies-evasive-bot-traffic.md`
SRC-016	FP-Inspector (IEEE S&P)	Iqbal et al.	2021	empirical-academic	n/a (not bot-use)	detecting fingerprinting scripts (static + dynamic JS)	Not threat-specific	not recorded	migrated — review pending	Foundations for fingerprinting section; not direct bot-detection evidence	`iqbal-2021-fingerprinting-the-fingerprinters-fp-inspector.md`
SRC-017	Browser fingerprints for web authentication (ACM TWEB)	Andriamilanto et al.	2021	empirical-academic	n/a (not bot-use)	fingerprint distinctiveness/stability at scale	Not threat-specific (auth context)	not recorded	migrated — review pending	Auth context not bots; 2016–17 data, needs replication caveat	`andriamilanto-2021-large-scale-browser-fingerprints-web-authentication.md`
SRC-018	Detecting Bad Bots via TLS Fingerprints (arXiv 2602.09606)	Jarad & Bıçakcı	2026	empirical-academic	measured (weak labels)	JA4/TLS classification; XGBoost/CatBoost AUC ~0.998	tbd — backfill from entry	not recorded	migrated — review pending	Strong headline metrics; labelling (“bot” in app field) is a real caveat	`jarad-2026-handshakes-tell-truth-tls-fingerprints-ja4-bad-bots.md`
SRC-019	BeCAPTCHA-Mouse (arXiv 2005.00890)	Acien et al.	2021	empirical-academic	capability	mouse-dynamics detection; synthetic (function/GAN) trajectories; public benchmark	Not threat-specific	not recorded	migrated — review pending	Constrained point-and-click task	`acien-2021-becaptcha-mouse-synthetic-mouse-trajectories.md`
SRC-020	Hacking reCAPTCHA v3 using RL (arXiv 1903.01003)	Akrout et al.	2019	empirical-academic	capability	RL mouse-movement vs reCAPTCHA v3 score	CAPTCHA defeat	not recorded	migrated — review pending	2019 PoC against one setup; narrow, likely stale	`akrout-2019-recaptcha-v3-reinforcement-learning.md`
SRC-033	FP-Agent: Fingerprinting AI Browsing Agents	Wang, Shafiq & Vekaria	2026	empirical-academic; empirical-operational	measured (honey-site)	browser fingerprints; behavioural fingerprints; typing latency; paste/change events; scroll and mouse movement; XGBoost/SHAP; Cloudflare free-tier case study	Not threat-specific; AI-agent detection on benign web tasks	Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3	needs review	Independent measured anchor for AI-agent detectability; external check of Cloudflare free-tier behaviour, not enterprise efficacy	`[single]`; canonical: `wang-2026-fp-agent-fingerprinting-ai-browsing-agents.md`
SRC-047	Balancing Security and Privacy: Web Bot Detection, Privacy Challenges, and Regulatory Compliance under the GDPR and AI Act	Martínez Llamas et al.	2025	review / methods-taxonomy / legal-regulatory analysis	capability	network/request data; browser/device/TLS fingerprinting; behavioural biometrics; proxies; headless browsers; adversarial fingerprints; PETs; GDPR / AI Act controls	bot detection/evasion; credential stuffing; scraping; scalping; privacy/compliance risk	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Academic taxonomy and compliance anchor for detection signals, evasion classes, privacy risks, and GDPR / AI Act implications; re-extraction attached for review-paper framing	`[multiple — unreconciled]`; previous: `martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act(1).md`; canonical: `martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md`
SRC-048	How long does it take to get owned?	Wardle	2019	empirical-academic	measured (honey identities)	honey identities; leaked credential publication; paste sites; login monitoring; 2FA alerts; honeytokens; IP/user-agent cautions	leaked-credential use; credential stuffing adjacent; account takeover risk	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Independent measured-use evidence for leaked credential use and honey-account methodology; small and dated but transparent	`[single]`; canonical: `wardle-2019-how-long-does-it-take-to-get-owned(1).md`
SRC-066	Browser Fingerprinting: A survey	Laperdrix, Bielova, Baudry & Avoine	2020	review; methods-taxonomy; foundations	n/a (foundational survey)	browser/device fingerprinting; User-Agent; HTTP headers; JavaScript APIs; Canvas; WebGL; AudioContext; fonts; plugins; extensions; entropy; anonymity sets; fingerprinting defences	cross-site tracking; stateless device identification; browser/device re-identification; privacy loss; dual-use security/fraud signal	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Core browser-fingerprinting foundation source covering concepts, attributes, metrics, history, and defences; not observed bot-abuse evidence	`[single]`; canonical: `laperdrix-2020-browser-fingerprinting-survey(1).md`; source family notes ACM 2020 paper with arXiv v2 alternate
SRC-067	How Unique is Whose Web Browser? The Role of Demographics in Browser Fingerprinting among US Users	Berke et al.	2025	empirical-measurement; dataset paper	measured (browser-attribute / demographic dataset)	browser fingerprinting; demographics; User-Agent; languages; timezone; screen resolution; platform; hardware concurrency; device memory; WebGL; entropy; anonymity sets; demographic inference	cross-site tracking; passive/active fingerprinting; re-identification risk; demographic inference; unequal privacy risk	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Empirical update to fingerprinting foundations; useful for unequal privacy-risk and demographic-inference framing, not bot-abuse prevalence	`[single]`; canonical: `berke-2025-how-unique-whose-web-browser-demographics(1).md`
SRC-068	Battling bots and bad data: enhancing data quality in online surveys	Sudbury & Marks	2026	empirical-methods; review-informed case study	measured (online survey quality controls)	CAPTCHA/reCAPTCHA; open-ended bot checks; attention checks; consistency checks; quota screening; speed/page-time checks; IP/location/duplicate checks; Qualtrics fraud controls	online survey bots; bad data; low-quality responses; duplicate participation; quota-gaming; survey fraud	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Review-informed case study showing bot-like automation and inattentive humans can degrade online survey data; supports layered controls outside classic cybersecurity	`[single]`; canonical: `sudbury-marks-2026-battling-bots-bad-data(1).md`
SRC-074	Secure Development of a Hooking-Based Deception Framework Against Keylogging Techniques	Sajid, Ahmed & Sosnoski	2025	empirical-method demonstration / preprint	measured-but-bounded	API hooking; runtime instrumentation; EasyHook; Microsoft Detours; decoy injection; input perturbation; anti-hooking resilience	Not bot-specific; keylogging deception and credential-theft context	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Smaller related entry for cyber-deception, runtime instrumentation, and credential-theft-adjacent defences	`[single]`; canonical: `sajid-2025-hooking-based-deception-keylogging(1).md`
SRC-075	Protecting Client Browsers with a Principal-based Approach	Cao	2014	dissertation / architecture proposal / method demonstration	n/a (browser-security foundation)	browser principals; client-side isolation; Virtual Browser; JavaScript sandboxing; third-party JavaScript; JShield; XSS; postMessage	Not bot-specific; browser security and malicious-content detection foundation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Browser-security foundation for client-side isolation, JavaScript virtualisation, and principal boundaries	`[single]`; canonical: `cao-2014-protecting-client-browsers-principal-based-approach(1).md`
SRC-076	Layered obfuscation: a taxonomy of software obfuscation techniques for layered security	Xu, Zhou, Ming & Lyu	2020	review / taxonomy / conceptual framework	n/a (foundation / taxonomy)	layered obfuscation; reverse engineering; control-flow obfuscation; data obfuscation; JavaScript obfuscation; application-layer obfuscation	Not bot-specific; anti-reverse-engineering and client-side protection background	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Low-priority academic foundation for layered obfuscation as risk management rather than one magic protection	`[single]`; canonical: `xu-2020-layered-obfuscation-taxonomy-software-security(1).md`
SRC-078	Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries	Sudhir et al.	2026	empirical-method demonstration / deobfuscation research / preprint	measured-but-bounded	VM obfuscation; virtualization obfuscation; deobfuscation; VMProtect; Themida; Tigress; symbolic emulation; CFG recovery	Not bot-specific; reverse-engineering arms-race background	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Academic counterweight to VM-obfuscation claims; shows VM obfuscation is strong but actively attacked by deobfuscation research	`[single]`; canonical: `sudhir-2026-pushan-trace-free-deobfuscation-vm-obfuscated-binaries(1).md`
SRC-082	AI/ML for cybersecurity and cyber-risk management	Kolhar & Sridevi	2025–2026	review / conceptual framework / governance overview	n/a (background)	AI cybersecurity; supervised and unsupervised learning; anomaly detection; UEBA; SOC; adversarial ML; XAI; governance	Broad cybersecurity detection and governance; not bot-specific	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Low-priority background source for ML/security governance language; use cautiously because it is broad and not bot-specific	`[single]`; canonical: `kolhar-sridevi-2025-2026-ai-ml-cybersecurity-governance-background(1).md`
SRC-083	API Security Testing and Exploitation Techniques	Kolhar & Gundoor	2026	API-security taxonomy / testing-methods overview / defensive-guidance	control-and-capability	API security; BOLA; broken authentication; rate limiting; business-logic abuse; API scraping; OAuth2; OIDC; JWT; shadow APIs	API abuse; brute force; API scraping; business-logic abuse; API DoS	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Secondary overview for API abuse, API security testing, and business-logic risk; less authoritative than OWASP/NIST/PortSwigger	`[single]`; canonical: `kolhar-gundoor-2026-api-security-testing-exploitation-techniques(1).md`
SRC-085	CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training	Chen et al.	2026	empirical-measurement / method demonstration / preprint	measured-but-bounded	CAPTCHA solving; GUI agents; VLMs; ReCAP; OCR; slider CAPTCHA; image grid; self-correction; reasoning-action traces	CAPTCHA defeat; AI-agent automation; challenge-response bypass capability	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Academic capability evidence that GUI agents can be trained for modern interactive CAPTCHA tasks, bounded by synthetic/benchmark setting	`[single]`; canonical: `chen-2026-recap-captcha-native-gui-agents(1).md`
SRC-089	Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet	Choi et al.	2020	empirical-measurement / infrastructure-analysis	measured	open proxies; residential proxies; proxy geolocation; ASN; blacklists; IP reputation; malicious activity; evasion infrastructure	Proxy-enabled abuse infrastructure; not a specific web-abuse campaign	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Independent empirical foundation for residential/open proxy infrastructure and blacklist overlap	`[single]`; canonical: `choi-2020-understanding-proxy-ecosystem(1).md`

SRC-097 | Measuring the Changing Cost of Cybercrime | Anderson et al. | 2019 | literature synthesis / measurement framework / cost analysis | framework | cost decomposition; criminal revenue; direct/indirect losses; defence costs; supporting infrastructure; botnets; pay-per-install; measurement bias | Not bot-specific; cybercrime economics and supporting-infrastructure cost framing | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Non-vendor balance source for cost/economics claims; useful to separate attacker revenue, victim loss, defence cost, and wider social cost | [duplicate upload deduped]; canonical: anderson-2019-measuring-changing-cost-cybercrime(1).md; duplicate: anderson-2019-measuring-changing-cost-cybercrime(2).md |

Threat surface and territory

Capability / infrastructure evidence, not proof of malicious use or bypass success. README/marketing claims are not independent test results.

id	source	org / authors	year	evidence basis	operational proximity	signals / techniques	threat types	provenance	review state	project impact	entry file
SRC-021	Official documentation (Playwright / Puppeteer / Selenium)	project maintainers	2026	capability-doc	capability	baseline browser-automation capability layer	Not threat-specific	not recorded	migrated — review pending	Capability, not intent; sits beneath stealth/cloud layers	`playwright-puppeteer-selenium-2026-browser-automation-docs.md`
SRC-022	undetected-chromedriver (GitHub/PyPI)	ultrafunkamsterdam	2021–2024	tooling-readme	capability	Selenium ChromeDriver evasion layer; explicit IP-reputation caveat	Not threat-specific	not recorded	migrated — review pending	README claims, not independent tests	`ultrafunkamsterdam-2024-undetected-chromedriver-docs-github.md`
SRC-023	puppeteer-extra-plugin-stealth (GitHub/npm)	berstend	2018–2023	tooling-readme	claimed	modular evasion catalogue: webdriver, plugins, codecs, WebGL	Not threat-specific	not recorded	migrated — review pending	“Passes public bot tests” ≠ production	`berstend-2023-puppeteer-extra-plugin-stealth-docs-github.md`
SRC-024	Anti-scraping bypass, stealth, proxies, fingerprints, Cloudflare bypass	ScrapFly	2025–2026	capability-doc; vendor-claim	claimed	API-level bypass (`asp`); byte-perfect JA4/HTTP2/QUIC claims; names Nodriver/Camoufox/UC Mode	scraping	not recorded	migrated — review pending	Documents the attacker mental model for Cloudflare	`scrapfly-2025-2026-anti-scraping-bypass-stealth-proxies-fingerprints.md`
SRC-025	Web Unlocker, Browser API, proxies, agentic web execution	Bright Data	2026	capability-doc; vendor-claim	claimed	managed proxies/fingerprints/CAPTCHA + cloud browsers; password entry disabled by default	scraping	not recorded	migrated — review pending	Compliance/public-data framing	`brightdata-2026-web-unlocker-browser-api-proxies-anti-bot-bypass.md`
SRC-026	Cloud-browser & agent docs (Browserless / Browserbase / Hyperbrowser)	respective vendors	2026	capability-doc	capability	cloud browsers + AI-agent infra; stealth, proxies, CAPTCHA-solving, persistent sessions	Not threat-specific	not recorded	migrated — review pending	Bridges automation to agentic browsers	`browserless-browserbase-hyperbrowser-2026-cloud-browser-agent-automation-docs.md`
SRC-029	How to Use Rnet: The Blazing-Fast Python HTTP Client	RoundProxies / Marius Bernard	2025	tooling-readme; capability-doc	capability	browser TLS/HTTP2 impersonation; JA3/custom fingerprints; header order; cookies; sticky sessions; proxies; WebSockets	scraping	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2	needs review	Scraper-side evidence that browser-like TLS/protocol impersonation is treated as a normal evasion capability	`[single]`; canonical: `roundproxies-rnet-source-extraction.md`
SRC-030	How to Bypass Cloudflare Turnstile	ScrapFly / Hisham	2026	tooling-readme; vendor-claim; capability-doc	claimed	Turnstile modes; browser fingerprinting; canvas/WebGL; behavioural signals; JA3/JA4; proof-of-work/token handling; cloud browsers; residential proxies	scraping; CAPTCHA/challenge-response evasion	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2	needs review	Turnstile-specific scraper-side account of challenge-response signal families and bypass classes	`[single]`; canonical: `scrapfly-cloudflare-turnstile-source-extraction.md`
SRC-031	How to Bypass Imperva Incapsula when Web Scraping in 2026	ScrapFly / Bernardas Alisauskas	2026	tooling-readme; capability-doc; vendor-claim	claimed	Imperva/Incapsula block indicators; JA3/JA4; IP reputation; header order; JS/canvas/WebGL/audio fingerprinting; cookies/sessions; rate limiting; stealth browsers	scraping; API scraping; WAF/bot-protection evasion	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2	needs review	Imperva-specific scraper-side view of WAF/bot-protection detection surfaces and claimed evasion patterns	`[single]`; canonical: `scrapfly-imperva-incapsula-source-extraction.md`
SRC-032	Quo vadis, crawlers? Progress and what’s next on safeguarding our infrastructure	Wikimedia Foundation	2026	empirical-operational; threat-intel	observed (first-party operator-reported)	AI crawler traffic; residential proxies; browser-identity spoofing; rate-limit circumvention; robot-policy updates; identification-tiered API limits; bot detection	scraping / aggressive crawling; infrastructure strain	Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3	needs review	First named-operator account in the register; strong operator-side evidence for AI-crawler pressure and residential-proxy evasion, but platform-specific	`[single]`; canonical: `wikimedia-2026-quo-vadis-crawlers-infrastructure.md`
SRC-035	How we’re dealing with bots and the reselling of driving tests	DVSA / Ryder	2023	threat-intel	observed (platform-side)	appointment search/reservation automation; CAPTCHA; bot-protection measures; ADI-service monitoring; cancellation-rate and account-link controls	scarce-resource appointment abuse; slot-sniping; slot-resale; denial of inventory	Codex / GPT-5 / source-extraction-prompt v3	needs review	Concrete public-sector appointment-abuse example for the booking-style worked example and scarce-resource abuse lane	`[single]`; canonical: `dvsa-2023-bots-reselling-driving-tests.md`
SRC-042	Is Web Scraping Legal? Key Insights and Guidelines You Need to Know	ScrapingBee	2026	legal-explainer	n/a (legal context; not use evidence)	terms-of-service; copyright/fair-use risk; personal-data processing; GDPR/CCPA/CFAA framing; robots.txt; rate limiting; CAPTCHA/paywall/login/IP-block risk	scraping; unwanted automation; unauthorised-access risk; privacy-risky data collection	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Scraper-side governance framing around the boundary between technical capability and permission; verify legal claims against primary/specialist sources before use	`[single]`; canonical: `scrapingbee-2026-web-scraping-legal-guidelines(1).md`
SRC-043	Advanced Web Scraping: Hidden Techniques Pro Developers Actually Use	ScrapingBee	2026	bypass-guide	capability	async orchestration; multiprocessing; rate control; backoff/jitter; circuit breakers; recursive filtering; JavaScript rendering; AJAX/API discovery; proxy rotation; CAPTCHA solving	web scraping at scale; large-scale data extraction; pagination-limit circumvention; anti-blocking	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Public scraper-side engineering-pattern source for robust extraction at scale; high dual-use, so cite technique families only	`[single]`; canonical: `scrapingbee-2026-advanced-web-scraping-hidden-techniques(1).md`
SRC-044	Best Price Scraping Tools for 2026: Top Services Compared	ScrapingBee	2026	capability-doc	capability	price scraping; scraping APIs; no-code scrapers; JavaScript rendering; headless browsers; proxy rotation; CAPTCHA handling; anti-bot reliability; AI extraction	ecommerce scraping; price intelligence; competitive-intelligence collection; unwanted automated price collection	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Market-map evidence showing price scraping as packaged commercial service with anti-bot handling as a core buying criterion	`[single]`; canonical: `scrapingbee-2026-price-scraping-tools(1).md`
SRC-045	How To Bypass PerimeterX Anti-Bot Protection System In 2026	ScrapingBee / Krukowski	2026	bypass-guide	capability	IP reputation; TLS fingerprints; HTTP/2; header order; browser fingerprinting; cookies/tokens; session continuity; behavioural signals; residential/mobile proxies; stealth browsers	anti-bot evasion; web scraping against PerimeterX/HUMAN-protected sites; browser automation	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Public scraper-side evidence that named-defender bypass thinking is framed as multi-layer signal alignment across IP, TLS, HTTP, fingerprint, session, and behaviour	`[single]`; canonical: `scrapingbee-2026-perimeterx-human-bypass(1).md`
SRC-046	Avoiding bot detection: How to scrape the web without getting blocked? / browser-fingerprinting	niespodd	n.d. / ongoing	tooling-readme	claimed	browser fingerprinting; anti-detection; stealth browsers; Puppeteer/Playwright/Selenium; residential proxies; CAPTCHA solving; TLS/JA3/JA4; WebGL/fonts/client hints/WebDriver	scraper-side evasion; anti-detection; proxy-assisted scraping; browser automation	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Public scraper-side/evasion mental model and tooling-ecosystem map; maintainer claims, not independent effectiveness evidence	`[single]`; canonical: `niespodd-browser-fingerprinting(1).md`
SRC-050	FTC Brings First-Ever Cases Under the BOTS Act	Federal Trade Commission	2021	legal-record	observed (enforcement record)	automated ticket search/reservation; IP-address concealment; fictitious accounts; multiple credit cards; purchase-limit circumvention	ticket bots; scalping; limited-stock inventory capture; purchase-limit circumvention; resale-market abuse	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	High-value observed-use / enforcement evidence for ticket-bot abuse and limited-stock automation; cite as FTC allegations/orders unless underlying records are checked	`[single]`; canonical: `ftc-2021-first-bots-act-cases(1).md`
SRC-052	bad-asn-list: open-source ASN blocklist for cloud/hosting/colo traffic	Hamachek	n.d. (~2019–2020 unverified)	tooling-readme; empirical-operational	observed (first-party anecdotal)	datacenter/hosting/colo ASN blocklist; IP→ASN lookup; network-origin reputation; VPN/hosting egress; signup fraud scoring	fake account creation; signup abuse; datacenter-origin automation	Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3	needs review	Worked example of network-origin / ASN-reputation detection and the datacenter-blocking → residential-proxy arms-race baseline; anecdotal and dated	`[single]`; canonical: `hamachek-bad-asn-list-datacenter-asn-blocklist.md`
SRC-053	The Best Web Scraping API to Avoid Getting Blocked	ScrapingBee	2026	capability-doc	capability	managed scraping API; headless Chrome; JavaScript rendering; selector waits; custom interactions; proxy rotation; residential/stealth proxies; AI extraction; LLM/RAG data ingestion	web scraping; commercial scraping infrastructure; anti-blocking abstraction; ecommerce and LLM data collection	ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3	needs review	Commercial capability source showing scraping-as-a-service packaging of browsers, proxies, extraction, geotargeting, and anti-blocking features	`[single]`; canonical: `scrapingbee-2026-web-scraping-api(1).md`
SRC-077	On the Architecture of Bot Detection Services	Tschacher	2021	technical explainer / architecture analysis / attacker-aware commentary	n/a (architecture analysis)	passive detection; client-side detection; JavaScript fingerprinting; TLS/TCP/IP fingerprinting; HTTP headers; IP reputation; cookies; sessions; JavaScript obfuscation; JavaScript VMs	Bot detection architecture; browser automation detection context	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Architecture-level source connecting client/server signal layers, spoofability, and protection of client-side detection scripts	`[single]`; canonical: `tschacher-2021-architecture-of-bot-detection-services(1).md`
SRC-080	Ticketmaster v. Prestige Entertainment West: ticket bots, dummy accounts, CAPTCHA, and legal remedies	Ticketmaster litigation; Proskauer summary; Ballon context	2018–2019	court pleadings/order / litigation allegations / settlement summary / legal analysis	observed (legal case / alleged conduct)	ticket bots; dummy accounts; CAPTCHA; access controls; CFAA; DMCA; BOTS Act; purchase limits; resale	Ticket scalping; automated ticket purchasing; purchase-limit circumvention; inventory capture	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Strong legal-case evidence for alleged ticket-bot activity, dummy accounts, CAPTCHA circumvention context, and settlement remedies	`[single]`; canonical: `ticketmaster-prestige-2018-2019-ticket-bots-settlement(1).md`
SRC-081	U.S. Senate Ticketmaster / Taylor Swift case: scalper bots, Verified Fan, and live-event ticketing	Berchtold; Bradish; Guardian / U.S. Senate hearing	2023	public testimony / contested case account / secondary press reporting	observed-claim	Ticketmaster; Taylor Swift; Verified Fan; scalper bots; access-code servers; BOTS Act; secondary ticketing; live events	Ticket bots; queue pressure; access-code-server attack pressure; scalping; live-event ticketing abuse	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	High-profile public-hearing source for ticket-bot pressure, with clear caveat that core bot-volume claims come from Live Nation/Ticketmaster	`[single]`; canonical: `us-senate-2023-ticketmaster-taylor-swift-scalper-bots(1).md`
SRC-094	Detecting Post-Compromise Threat Activity in Microsoft Cloud Environments	CISA	2021	government advisory / detection guidance / incident-linked TTPs	observed-guidance (historical / archived)	Microsoft cloud identity; Azure AD; M365/O365; federated identity; forged tokens; OAuth/SAML; service principals; API access; Sparrow	cloud identity compromise; API-based persistence; post-compromise cloud activity; credential/service-account abuse	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Supporting official source for cloud identity/API persistence after compromise; archived and historical, so not current trend evidence	`[single]`; canonical: `cisa-2021-post-compromise-microsoft-cloud-identity-api-access(1).md`
SRC-095	Scattered Spider	FBI / CISA / RCMP / ASD ACSC / AFP / CCCS / NCSC-UK	2025	government advisory / investigation-derived TTPs / MITRE ATT&CK mapping	observed-investigative	helpdesk social engineering; SIM swap; MFA fatigue; OTP; valid accounts; SSO; RMM/remote-access tools; cloud discovery; ransomware	identity abuse; account takeover; helpdesk compromise; valid-account intrusion; legitimate-tool abuse; ransomware/extortion	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Strong official non-vendor evidence for identity/social-engineering/legitimate-tool abuse, but actor-specific and not bot-specific	`[single]`; canonical: `cisa-fbi-2025-scattered-spider-identity-helpdesk-legitimate-tools(1).md`
SRC-096	Hiding in Plain Sight: Tracking Bulletproof Hosting and Abused RDP Infrastructure	Censys	2026	internet-scale scanning analysis / technical measurement / threat detection	measured infrastructure	bulletproof hosting; abused RDP; Windows hostnames; VM templates; ASNs; VPS; infrastructure clustering; takedown evasion	adversarial infrastructure; ransomware infrastructure context; hosting abuse; persistence/evasion	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Infrastructure-measurement source for adversarial hosting and abused RDP patterns; balances generic threat reports with internet-scale artifact analysis	`[single]`; canonical: `censys-2026-bulletproof-hosting-abused-rdp-infrastructure(1).md`
SRC-102	Commercial automation cost stack 2026: scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification	Combined pricing / market source cluster	2026	vendor pricing pages / vendor-adjacent comparison / industry survey / pricing snapshots	market availability	scraping APIs; managed scraping; residential proxies; CAPTCHA solving; temporary SMS; browser rendering; proxy routing; account verification inputs	scraping; CAPTCHA defeat; account creation; credential-stuffing support; indirect scarce-resource automation	ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern	needs review	Cost-of-capability source showing key automation inputs are modular and purchasable; not abuse prevalence or effectiveness evidence	`[single]`; canonical: `commercial-automation-cost-stack-2026-scraping-proxies-captcha-sms(1).md`

Framing-distance ledger

The project’s central analytical discipline: each source approximates the real problem differently and fails to represent it differently (EVIDENCE-REVIEW.md §5). The what it cannot show column below is migrated from the old register’s notes where present; everything else is tbd — backfill from entry pending a read of working/register-entries/.

id	source	what it approximates	what it fails to represent	what it cannot show (migrated)
SRC-001	OWASP OAT	a shared vocabulary for automated-threat types	tbd — backfill from entry	Taxonomy/ontology only — no detection or prevalence evidence
SRC-002	Cloudflare bots docs	what one major vendor’s control plane exposes and uses	tbd — backfill from entry	That any exposed signal actually works in production — only that Cloudflare uses/exposes it
SRC-003	Cloudflare Bot Mgmt	the product surface area and WAF/Workers variables	tbd — backfill from entry	Efficacy; product structure ≠ detection performance
SRC-004	DataDome	intent-based detection framing and signal families	tbd — backfill from entry	“Fully protected” exposure figures are vendor-measured, not independently verifiable
SRC-005	Netacea brochure	server-side / no-client-JS detection positioning	tbd — backfill from entry	Case-study results are vendor-reported
SRC-006	Netacea ML showcase	a taxonomy of ML approaches to bot detection	tbd — backfill from entry	No reproducible method detail; cannot validate any approach
SRC-007	Netacea survey	executive-perceived business impact of bots	tbd — backfill from entry	Self-reported survey; not measured prevalence or efficacy
SRC-008	Arkose	attacker-cost / dynamic-challenge framing; agentic-AI claims	tbd — backfill from entry	Several reports gated; survey/vendor evidence only
SRC-009	Kasada	the attacker economy (solver/proxy/CAPTCHA pricing)	tbd — backfill from entry	Vendor/threat-intel framing; pricing claims not independently audited
SRC-010	HUMAN/PerimeterX	AI-agent detection signal categories; OpenClaw observations	tbd — backfill from entry	Vendor/threat-intel; detection-category claims not externally verified
SRC-011	Iliou thesis	advanced-bot detection in a controlled academic setting	tbd — backfill from entry	Controlled/academic setting; does not establish production behaviour
SRC-012	Iliou 2019	that simple-bot metrics hide weak advanced-bot detection	tbd — backfill from entry	Proxy labels; advanced-bot AUC ~0.68 at low FPR is the honest figure
SRC-013	Iliou 2021	GAN evasion of CNN mouse/touch detectors	tbd — backfill from entry	One adversarial strategy; recall drop to ~0.45 is not a worst-case bound
SRC-014	Iliou 2022 RL	detection/evasion as a repeated game (RL evasion)	tbd — backfill from entry	PoC mechanism, not observed campaigns
SRC-015	FP-Inconsistent	evasive bot traffic vs commercial detectors on a honey site	tbd — backfill from entry	Threat model is impression fraud; one honey site, not general production
SRC-016	FP-Inspector	detecting fingerprinting scripts, not bots	tbd — backfill from entry	Not direct bot-detection evidence
SRC-017	Andriamilanto	fingerprint distinctiveness/stability at scale (auth)	tbd — backfill from entry	Auth context not bots; 2016–17 data, needs replication
SRC-018	Jarad TLS	JA4/TLS classification of bad bots	tbd — backfill from entry	Labelling (“bot” in app field) caveats the headline AUC ~0.998
SRC-019	BeCAPTCHA-Mouse	mouse-dynamics detection + synthetic trajectories	tbd — backfill from entry	Constrained point-and-click task; not free browsing
SRC-020	Akrout reCAPTCHA	RL mouse movement vs one reCAPTCHA v3 setup	tbd — backfill from entry	2019 PoC against one setup; likely stale
SRC-021	automation docs	the baseline automation capability layer	tbd — backfill from entry	Capability, not intent or malicious use
SRC-022	undetected-chromedriver	a Selenium evasion layer’s claimed capabilities	tbd — backfill from entry	README claims, not independent tests; maintainer’s own IP-reputation caveat
SRC-023	puppeteer-stealth	a modular evasion catalogue	tbd — backfill from entry	Passing public bot tests ≠ evading production detection
SRC-024	ScrapFly	the attacker mental model for Cloudflare bypass	tbd — backfill from entry	Byte-perfect-fingerprint claims are vendor claims, not verified bypass success
SRC-025	Bright Data	managed bypass infrastructure + cloud browsers	tbd — backfill from entry	Compliance framing; capability claims not independently verified
SRC-026	cloud-browser/agent docs	cloud-browser + AI-agent infrastructure features	tbd — backfill from entry	Feature availability, not evasion efficacy
SRC-027	OWASP Handbook v1.3	the defender’s naming layer for automated web-application threats and broad countermeasure classes	no prevalence, detection performance, algorithmic detail, production telemetry, or empirical AI-agent evidence; intent categories may not separate cleanly in observed traffic	Taxonomy and countermeasure suggestions only; cannot show detectability, prevalence, or efficacy
SRC-028	Playwright cookies tutorial	the basic automation capability of reading and preserving cookie/session state from a browser context	no adversarial setting, no bot detection, no cookie replay at scale, no session binding or risk-scoring controls	Cannot show that cookies are sufficient to impersonate users or bypass bot detection
SRC-029	RoundProxies Rnet	scraper-side HTTP-client evolution toward browser-like TLS/HTTP2/header/cookie/proxy behaviour	no defender-side logic, production traffic, behavioural JS challenges, account history, graph/entity signals, or verified bypass outcomes	Cannot show that Rnet is undetectable or that TLS matching is sufficient to bypass modern anti-bot systems
SRC-030	ScrapFly Turnstile	scraper-side understanding of Cloudflare Turnstile challenge mechanisms and bypass classes	no verified Cloudflare internals, defender telemetry, independent success rates, false-positive handling, or broader Cloudflare decisioning	Cannot show the complete Turnstile signal set or that any bypass method works reliably across production sites
SRC-031	ScrapFly Imperva	scraper-side view of websites protected by Imperva/Incapsula and the signal families scraper tooling believes matter	no Imperva internals, cross-customer telemetry, signal weightings, independent validation, durable success rates, or false-positive behaviour	Cannot show Imperva’s actual internal trust-score mechanism or that listed bypass approaches work reliably
SRC-032	Wikimedia crawlers	operator-side view of AI-crawler pressure, residential-proxy evasion, and tiered API/rate-limit response	first-party blog; platform-specific; no methodology, false-positive rate, or independent verification; open knowledge platform differs from commercial booking/e-commerce targets	Cannot show prevalence outside Wikimedia, rigorous defence efficacy, or direct generality to credential stuffing, ATO, or scalping
SRC-033	FP-Agent	current commercial AI browsing agents’ browser and behavioural fingerprints during realistic benign web tasks; includes a Cloudflare free-tier check	controlled honey site; benign tasks; closed-world known agents; narrow human population; point-in-time; not enterprise bot management	Cannot show production abuse prevalence, open-world detection, durability under adversarial humanisation, or general Cloudflare/vendor efficacy
SRC-034	F5 credential stuffing	end-to-end credential-stuffing landscape from spill supply to login traffic against large consumer sites	vendor telemetry from large enterprise customers; disclosed spill data only; 2020-era tooling; no reproducible method	Cannot show web-wide prevalence, independent detection efficacy, current-era tooling coverage, or generality to smaller booking-style targets
SRC-035	DVSA driving-test bots	public-sector appointment-slot abuse: automated monitoring, booking, holding, resale, and platform mitigation	platform-side public blog; no traffic counts, bot counts, detection logs, or false-positive rates; platform-specific	Cannot show prevalence, exact automation mechanism, share of bookings affected, all third-party service behaviour, or mitigation efficacy
SRC-036	HUMAN OpenClaw	vendor-observed exposed autonomous-agent gateways producing browser-automation traffic across engagement manipulation and reconnaissance patterns	vendor telemetry; attribution uncertainty; no raw data, query method, full validation, or disclosed detection logic	Cannot show autonomous execution for every request, internet-wide prevalence, attack success, or harm magnitude
SRC-037	HUMAN Agentic Visibility	a vendor capability model for identifying, classifying, measuring, and controlling AI-agent traffic	product explanation; dashboard screenshots; no independent evaluation or primary telemetry in this source	Cannot show traffic prevalence, classification accuracy, abuse prevalence, or effectiveness of the controls
SRC-038	HUMAN State of Agentic Traffic May 2026	monthly vendor telemetry on observed AI-agent mix, sector destinations, page-route categories, and blocking rates	HUMAN-visible traffic only; opaque classification; traffic is not necessarily malicious; one month snapshot	Cannot show malicious intent, internet-wide prevalence, attack success, or accuracy of named-agent attribution
SRC-039	Thales Bad Bot Report 2026	production-facing bot-management view of automated traffic, API-first abuse, AI clients, ATO, and inventory abuse across protected customers	vendor-visible traffic; sampling/classification opaque; mixes telemetry, analyst interpretation, and product framing	Cannot show internet-wide prevalence, classification quality, attack success rates, comparative vendor efficacy, or AI causality
SRC-040	Akamai financial-services trends	edge/WAF/API telemetry view of financial-services web attacks, API visibility gaps, AI-labelled bot traffic, and scraping/API targeting	Akamai customer/product visibility; finance-sector-specific; alerts not success; mixes telemetry, survey claims, and interpretation	Cannot show market-wide prevalence, classification accuracy, detection efficacy, AI causality, or generalisation beyond Akamai-protected finance traffic
SRC-041	ScrapingBee scraper test sites	the educational pipeline from controlled scraper practice to production-style dynamic scraping	training-site article; not abuse, evasion success, prevalence, or harm	Cannot show real-world bot traffic, abuse, hostile scraping, or that practice sites cause misuse
SRC-042	ScrapingBee legal guidelines	scraper-vendor compliance framing around lawful/risky/impermissible scraping	vendor legal explainer; not primary law, legal advice, or jurisdiction-specific analysis	Cannot determine legality, replace legal advice, establish lawful customer behaviour, or validate legal-risk reduction claims
SRC-043	ScrapingBee advanced scraping techniques	public scraper-side engineering maturity: scaling, reliability, JavaScript rendering, pagination workarounds, and anti-blocking infrastructure	capability guide; no observed abuse, target-specific impact, independent validation, or success rates	Cannot show abusive use, prevalence, success against protected targets, or defender impact; high dual-use
SRC-044	ScrapingBee price scraping tools	commercial packaging of price scraping as a normal business workflow with anti-bot handling as a feature	vendor comparison/marketing; not neutral benchmark or defender-side evidence	Cannot show independent tool performance, abuse prevalence, or that price-scraping activity is lawful or wanted by targets
SRC-045	ScrapingBee PerimeterX/HUMAN bypass guide	the public evasion mental model for named commercial bot management as multi-layer signal alignment	scraper-vendor bypass narrative; no independent success evidence, raw tests, or defender confirmation	Cannot show PerimeterX/HUMAN weakness, bypass success rates, observed abuse, or target-specific effectiveness
SRC-046	niespodd browser-fingerprinting	a public scraper-side anti-detection taxonomy and tooling ecosystem map	maintainer claims/tool catalogue; no raw tests, denominators, success rates, or independent validation	Cannot establish tool effectiveness, prevalence, legality, safety, or production bypass success
SRC-047	Martínez Llamas et al. privacy/GDPR/AI Act review	academic synthesis of detection signal families, evasion classes, privacy risks, and regulatory controls	review/taxonomy, not production telemetry or empirical measurement	Cannot show abuse prevalence, detection efficacy, legal compliance in a specific deployment, or current production practice
SRC-048	Wardle honey identities	independent measurement of leaked-credential use through honey identities and paste-site publication	small, dated, paste-site-only honey experiment; observes unauthorised access, not necessarily bots	Cannot show market-wide prevalence, automation share, modern credential-stuffing infrastructure, or vendor-control efficacy
SRC-049	DataDome ticket bots	vendor taxonomy of ticket-bot activity across account preparation, sale/queue pressure, checkout, and resale	vendor explainer/product marketing; no primary telemetry, independent measurement, or product validation	Cannot show ticket-bot prevalence, DataDome efficacy, false-positive/negative rates, or that bots dominate resale prices
SRC-050	FTC BOTS Act cases	enforcement-record proximity to alleged real ticket-bot use against scarce-inventory ticketing flows	legal press release; allegations/proposed orders, not a technical measurement study or prevalence estimate	Cannot show defence success rates, detection signals, full bot architecture, or generalisation beyond named enforcement cases
SRC-051	StopBadBots SBB-WAF-Rules	small-site / self-managed-server defensive controls: WAF rules, blocklists, user-agent filters, scanner heuristics	maintainer claims attached to tooling; no independent evaluation, prevalence, or false-positive measurement	Cannot show general blocking efficacy, current blocklist quality, low false positives, or coverage of browser-native agents
SRC-052	Hamachek bad-asn-list	a concrete operator account of datacenter/ASN blocking for signup abuse and a reusable defensive artifact	single-site anecdote; dated and no methodology/false-positive measurement	Cannot establish general efficacy, current usefulness, false-positive rates, or prevalence beyond one operator account
SRC-053	ScrapingBee web scraping API	commercial scraping-as-a-service abstraction of browsers, proxies, extraction, geotargeting, and anti-blocking	vendor product page; no independent benchmark, target list, raw logs, or defender corroboration	Cannot show abuse, effectiveness, success rate validity, or customer legality
SRC-054	Cloudflare Block AI Bots	site-owner control over AI crawler and AI-like crawler access	product-control documentation; no classification details, traffic counts, effectiveness, or harm evidence	Cannot show AI crawler prevalence, malice, false positives, effectiveness, or legal/policy sufficiency
SRC-055	Cloudflare Turnstile	browser/form challenge systems that assess browser environment and behaviour before or instead of visual puzzles	vendor documentation; no independent bypass, accessibility, usability, or privacy assessment	Cannot show real-world challenge effectiveness, advanced-bot resistance, user impact, abuse prevalence, or legal sufficiency
SRC-056	Cloudflare Detection IDs	operational bot detection where low-level signals are exposed for rule-making and troubleshooting	no full detection list, exact logic, performance data, or false-positive metrics	Cannot prove zero human overlap, adaptation resistance, prevalence, or correctness of a particular rule
SRC-057	Cloudflare bot detection engines	layered production detection across heuristics, JavaScript detections, ML, anomaly detection, headers, sessions, and browser signals	product-method summary; not a model disclosure, benchmark, audit, or telemetry study	Cannot prove JS/ML effectiveness, data necessity, false-positive rates, or real-world attack prevalence
SRC-058	Cloudflare bot solutions overview	the packaging of bot defence as a layered operational stack from simple challenges to enterprise scoring and analytics	vendor capability overview; no independent prevalence, error-rate, or enforcement-outcome data	Cannot show effectiveness, comparative performance, prevalence, or legal/governance sufficiency
SRC-059	MDN CORS	the browser cross-origin sharing mechanism that later scraping/browser-security explanations rely on	neutral technical reference, not threat or efficacy evidence	Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-060	MDN HTTP caching	cache semantics that affect server load, repeated crawler requests, conditional requests, and shared-cache privacy/security issues	neutral technical reference, not threat or efficacy evidence	Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-061	MDN HTTP authentication	HTTP and proxy authentication concepts behind credential-bearing automated requests and access-control boundaries	neutral technical reference, not threat or efficacy evidence	Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-062	MDN cookies	cookie/session mechanics that support session continuity, login state, tracking, and bot-management cookie checks	neutral technical reference, not threat or efficacy evidence	Cannot show account-abuse prevalence, attacker use, detection effectiveness, or legality
SRC-063	MDN User-Agent header	the basic identity/compatibility string used by browsers, crawlers, tools, and spoofing or reduction discussions	neutral technical reference, not threat or efficacy evidence	Cannot show spoofing prevalence, malicious use, detection effectiveness, or legality
SRC-064	MDN HTTP headers	the vocabulary of HTTP request/response fields later used in header-based detection and scraper-side spoofing discussions	neutral technical reference, not threat or efficacy evidence	Cannot show malicious header use, prevalence, detection effectiveness, or legality
SRC-065	MDN Overview of HTTP	the basic client-server/user-agent model behind browsers, crawlers, scripts, proxies, cookies, and request patterns	neutral technical reference, not threat or efficacy evidence	Cannot show abuse, prevalence, attacker use, detection effectiveness, or legality
SRC-066	Laperdrix browser-fingerprinting survey	the browser/device fingerprinting layer used in tracking, fraud detection, bot detection, and privacy-invasive identification	survey/foundation source; not current production bot telemetry or observed abuse; browser/API surfaces have changed since publication	Cannot show current prevalence, modern anti-bot performance, abuse against a target, or legal compliance
SRC-067	Berke et al. browser demographics	browser/device-attribute identifiability and unequal privacy risk across demographic groups	US Prolific sample; Dec 2023 device/browser snapshot; not bot traffic or anti-bot detection	Cannot show bot prevalence, detection performance, malicious use, or that a specific API change fixes demographic inference risk
SRC-068	Sudbury & Marks survey bots	automated and low-quality participation in incentivised online surveys and layered survey-quality controls	online-survey domain; not commercial web scraping, credential stuffing, ticket bots, or vendor telemetry	Cannot prove all bad responses were bots, quantify internet-wide bot prevalence, or validate commercial bot-management controls
SRC-069	PortSwigger OAuth vulnerabilities	abuse of third-party login and OAuth-based authentication flows, including account takeover and token/API misuse	educational vulnerability taxonomy; not measured prevalence, bot volume, or incident counts	Cannot prove OAuth is generally unsafe, quantify automation around OAuth, or replace OAuth/OIDC specifications and BCPs
SRC-070	PortSwigger secure authentication	defensive hardening against automated login abuse and authentication bypass	practical guidance; not empirical control-effectiveness evidence	Cannot prove a control is sufficient, quantify CAPTCHA/rate-limit effectiveness, or replace formal standards such as ASVS/NIST
SRC-071	PortSwigger password login vulnerabilities	automated login abuse through brute force, credential stuffing, username enumeration, and weak password-login controls	educational attack/defence taxonomy; not real-world attack-frequency or success-rate evidence	Cannot quantify credential-stuffing prevalence or prove CAPTCHA, rate limiting, IP blocking, or account locking works in the wild
SRC-072	PortSwigger authentication vulnerabilities	authentication as an attack surface linking bot automation to account takeover and follow-on exploitation	educational overview/lab framing; not production telemetry or observed abuse evidence	Cannot show automation prevalence, economic harm, bot-detection effectiveness, or legal/regulatory status
SRC-073	NIST SP 800-63B-4	standards-backed authentication, session, throttling, and authenticator-management controls for account-abuse contexts	normative control guidance; no bot-abuse telemetry, no control-effectiveness data, and no vendor/tool comparison	Cannot show credential-stuffing prevalence, detection performance, or that a specific fraud indicator reliably separates bots from humans
SRC-074	Sajid et al. hooking deception	runtime deception and API-hooking concepts that may inform anti-tamper/instrumentation thinking	endpoint keylogging domain, not web-bot detection or browser automation; preprint	Cannot show web-abuse prevalence, bot detection, or operational effectiveness in browser/client anti-bot systems
SRC-075	Cao browser principals	browser principal boundaries, JavaScript virtualisation, and client-side isolation concepts	older browser-security dissertation; not current bot-detection telemetry or modern browser-agent evidence	Cannot show modern browser-automation detection, current anti-bot efficacy, or observed abuse
SRC-076	Xu layered obfuscation	layered obfuscation as risk management and a taxonomy of obfuscation targets and layers	general software-security taxonomy, not bot-specific and not empirical bot evidence	Cannot show that client-side bot-detection obfuscation works, or how attackers respond in production
SRC-077	Tschacher bot-detection architecture	architecture-level explanation of passive bot detection, spoofable client signals, and layered signal collection	public technical commentary, not telemetry, benchmark, or vendor-validated architecture	Cannot quantify prevalence, false positives, or the effectiveness of any specific detection service
SRC-078	Pushan deobfuscation	reverse-engineering pressure against VM-obfuscated binaries and limits of obfuscation as a durable defence	binary deobfuscation research, not browser JavaScript bot detection; preprint	Cannot show that DataDome-style VM obfuscation is broken or effective in browser deployments
SRC-079	DataDome VM obfuscation	commercial use of VM-based obfuscation to protect exposed client-side bot-detection logic	vendor product announcement; no independent measurement, no raw attack data, no performance or usability data	Cannot prove the protection works, quantify attacker cost increase, or show detection efficacy
SRC-080	Ticketmaster Prestige litigation	legal-case evidence of alleged large-scale automated ticket purchasing, dummy accounts, and access-control circumvention	allegations and settlement context, not a full technical study or final trial finding on every claim	Cannot provide bot-code details, systematic prevalence, or independently measured detection/control performance
SRC-081	U.S. Senate Ticketmaster hearing	high-profile public account of ticket-bot pressure and access-code-server attack claims around the Taylor Swift presale	contested public testimony; key technical claims come from Live Nation/Ticketmaster; antitrust frame is partly separate	Cannot independently verify bot volume, causality, or platform-specific failure mechanics
SRC-082	Kolhar & Sridevi AI/ML cybersecurity	broad AI/ML cybersecurity and governance vocabulary for anomaly detection, SOC, UEBA, and human oversight	generic cybersecurity overview, not bot-specific and not empirical	Cannot support bot-specific claims, prevalence, or method effectiveness without stronger sources
SRC-083	Kolhar & Gundoor API security	API abuse vocabulary: BOLA, broken auth, rate limits, business logic, API scraping, shadow APIs	secondary book-chapter overview; less authoritative than OWASP/NIST/PortSwigger and not observed abuse	Cannot quantify API abuse or validate specific API-security controls in production
SRC-084	CAPTCHA-solving ecosystem	open commercial solver market and integration of CAPTCHA solving into automation and AI-agent workflows	vendor/tutorial/benchmark ecosystem; not independent prevalence or defender-side validation	Cannot show abuse volume, real-target success rates, or that a solver works against a given site
SRC-085	Chen ReCAP	controlled evidence that GUI agents can be trained to solve modern interactive CAPTCHA variants	synthetic and benchmark-focused preprint; not live abuse or deployed solver telemetry	Cannot show operational CAPTCHA-bypass prevalence, target impact, or production reliability
SRC-086	Bitsight OpenClaw exposure	internet-exposure measurement of AI-agent gateways and deployment-risk/blast-radius framing	stronger for exposure than abuse; vendor scan methodology and coverage require checking before using numbers	Cannot show confirmed abuse from each exposed agent or full internet-wide completeness
SRC-087	Infatica P2B SDK	residential proxy supply-chain business model using SDK-enabled app users and idle bandwidth	vendor marketing; compliance and consent claims are not independently validated	Cannot prove opt-in quality, compliance, effectiveness, abuse prevalence, or end-use legitimacy
SRC-088	Bright Data residential proxies	commercial explanation of residential/ISP proxy capabilities, rotation, sticky sessions, and limited-stock use cases	vendor market map; not independent measurement and not proof of effectiveness	Cannot establish provider quality, legal use, real-world success, or abuse prevalence
SRC-089	Choi proxy ecosystem	independent empirical comparison of open and residential proxies and blacklist overlap	dataset vintage and prior residential dataset source limit currentness; not a specific web-abuse campaign	Cannot show current proxy market structure, target-specific abuse, or detection performance in bot-management systems
SRC-090	RFC 9113/9114	current HTTP/2 and HTTP/3 protocol mechanics used as foundation for protocol-layer interpretation	protocol standards, not bot evidence or detection evidence	Cannot show abuse, fingerprint prevalence, or effectiveness of protocol-level detection
SRC-091	Chromium multi-process architecture	browser architecture vocabulary for renderer/browser processes, sandboxing, IPC, and browser-native automation context	design reference, not bot/security telemetry	Cannot show browser-automation detection, abuse prevalence, or fingerprinting behaviour
SRC-092	RFC 7540	historical HTTP/2 protocol specification and original terminology	obsoleted by RFC 9113; historical only for current HTTP/2 claims	Cannot support current HTTP/2 wording where RFC 9113 differs or supersedes it
SRC-093	OWASP ASVS 5.0.0	standards-backed application controls for anti-automation, business logic, auth, sessions, API validation, and logging	requirements standard, not threat taxonomy, telemetry, or modern bot-management method source	Cannot show prevalence, detection effectiveness, false-positive trade-offs, or AI/browser-agent trends
SRC-094	CISA Microsoft cloud post-compromise	cloud identity compromise, forged token/OAuth/SAML abuse, and API-based persistence after compromise	historical SolarWinds/SVR-linked advisory; not bot-specific; archived; no raw telemetry or prevalence	Cannot show current 2026 cloud threat prevalence, bot-specific abuse, SaaS pricing, or detection efficacy
SRC-095	Scattered Spider advisory	identity/helpdesk/social-engineering abuse using valid accounts, MFA manipulation, SSO, and legitimate remote tools	actor-specific; investigation-derived; not bot-specific; no neutral prevalence or raw victim logs	Cannot show bot automation prevalence, full campaign reconstruction, or general market-wide identity-abuse rates
SRC-096	Censys bulletproof hosting / RDP	infrastructure measurement of abused hosting patterns, exposed RDP artifacts, and clustering signals	not bot-specific; intent uncertain for individual hosts; no full raw dataset; infrastructure evidence, not abuse outcome evidence	Cannot prove criminal intent for every host, quantify bot abuse, or show account/booking/scraping impacts
SRC-097	Anderson cybercrime costs	security-economics framing separating criminal revenue, direct loss, indirect loss, defence cost, and social cost	2019 synthesis; not bot-specific; not current automation pricing or 2026 telemetry	Cannot show current bot markets, current attacker costs, or effectiveness of specific controls
SRC-098	IBM X-Force 2026	current vendor threat-landscape framing around public-facing application exploitation, credential theft, supply chain/cloud dependencies, and AI-accelerated operations	raw IBM page/source rather than reviewed extraction; vendor perspective; broad cyber, not bot-specific	Cannot show bot-specific prevalence, detailed methodology from the supplied raw page, or independent validation
SRC-099	Recorded Future cloud/SaaS abuse	cloud and SaaS functionality as adversarial infrastructure: valid accounts, APIs, CI/CD, storage, backups, LLM/ML services	threat-intelligence synthesis; many examples based on third-party reporting; not bot-specific; no raw telemetry for all cases	Cannot provide neutral prevalence, complete primary evidence, or scraping/proxy/CAPTCHA-specific evidence
SRC-100	Kasada threat enablers	mature service economy around automated checkout, account markets, verification bypass, bots-as-a-service, and reselling communities	vendor telemetry and marketplace monitoring; source list/method not fully reproducible; product-sector lens	Cannot independently validate all figures, prove all account sales led to abuse, or isolate AI as causal
SRC-101	Netwrix Mythos cost of attacking	strategic argument that AI lowers marginal attacker costs and compresses decision cycles	opinion/vendor commentary; speculative in places; not bot-specific and not empirical	Cannot verify Mythos claims, quantify attacker adoption, or support pricing/prevalence claims
SRC-102	Commercial automation cost stack	modular purchasability and approximate cost of scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification	pricing/market snapshots; not observed abuse; effectiveness and terms vary; sources are vendor or vendor-adjacent	Cannot prove malicious use, success rates, legal compliance, or that any component works against a specific target

Signals and techniques cross-index

The view the flat register lacked: which sources cover which technical material. Membership is migrated from the inventory’s signals / techniques column; treat as a starting index to be completed by the backfill pass.

signal / technique family	sources
TLS / network fingerprints (JA3/JA4)	SRC-002, SRC-018, SRC-024, SRC-029, SRC-030, SRC-031, SRC-045, SRC-046, SRC-047, SRC-057, SRC-066, SRC-077, SRC-090, SRC-092, SRC-096, SRC-102
Browser fingerprinting (JS / canvas / attributes / scripts)	SRC-016, SRC-017, SRC-023, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-055, SRC-057, SRC-063, SRC-066, SRC-067, SRC-075, SRC-077, SRC-079, SRC-091
Behavioural — mouse / touch dynamics	SRC-011, SRC-013, SRC-019, SRC-020, SRC-030, SRC-033, SRC-034, SRC-039, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-068, SRC-084, SRC-085
Behavioural — request/session timing and navigation patterns	SRC-027, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-051, SRC-055, SRC-057, SRC-068, SRC-073, SRC-077, SRC-080, SRC-081, SRC-083, SRC-084, SRC-088, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100
ML detection methods (supervised/unsupervised, boosting, CNN)	SRC-006, SRC-011, SRC-018, SRC-019, SRC-033, SRC-047, SRC-057, SRC-067, SRC-082, SRC-083, SRC-085
Adversarial evasion (GAN / RL)	SRC-011, SRC-013, SRC-014, SRC-020, SRC-047
Fingerprint-inconsistency / evasion detection	SRC-015, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-056, SRC-057, SRC-077, SRC-079
Browser automation & stealth layers	SRC-021, SRC-022, SRC-023, SRC-024, SRC-028, SRC-030, SRC-031, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-041, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-053, SRC-077, SRC-079, SRC-084, SRC-085, SRC-086, SRC-091
HTTP headers / header order / protocol details	SRC-002, SRC-003, SRC-027, SRC-029, SRC-031, SRC-037, SRC-039, SRC-040, SRC-041, SRC-042, SRC-045, SRC-046, SRC-047, SRC-051, SRC-053, SRC-056, SRC-057, SRC-059, SRC-060, SRC-061, SRC-063, SRC-064, SRC-065, SRC-066, SRC-069, SRC-071, SRC-073, SRC-077, SRC-083, SRC-090, SRC-092, SRC-093, SRC-094, SRC-098, SRC-099
Cookies / session persistence	SRC-002, SRC-028, SRC-029, SRC-031, SRC-034, SRC-039, SRC-041, SRC-045, SRC-047, SRC-055, SRC-061, SRC-062, SRC-065, SRC-066, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-075, SRC-077, SRC-083, SRC-091, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099
Proxy / infrastructure / cloud browsers	SRC-024, SRC-025, SRC-026, SRC-029, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-036, SRC-039, SRC-040, SRC-041, SRC-043, SRC-044, SRC-045, SRC-046, SRC-047, SRC-052, SRC-053, SRC-061, SRC-065, SRC-086, SRC-087, SRC-088, SRC-089, SRC-096, SRC-099, SRC-100, SRC-102
AI-agent signals & governance	SRC-008, SRC-009, SRC-010, SRC-026, SRC-032, SRC-033, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-053, SRC-054, SRC-058, SRC-082, SRC-084, SRC-085, SRC-086, SRC-098, SRC-099, SRC-100, SRC-101
Intent / journey / score-based detection	SRC-002, SRC-003, SRC-004, SRC-009, SRC-010, SRC-027, SRC-030, SRC-031, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-057, SRC-058, SRC-079, SRC-080, SRC-081, SRC-093
Challenge-response / CAPTCHA / proof-of-work	SRC-009, SRC-020, SRC-030, SRC-034, SRC-035, SRC-039, SRC-041, SRC-042, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-058, SRC-068, SRC-070, SRC-071, SRC-080, SRC-081, SRC-084, SRC-085, SRC-093, SRC-100, SRC-102
Taxonomy / canonical threat vocabulary	SRC-001, SRC-027
Countermeasure classes / symptoms	SRC-027, SRC-032, SRC-034, SRC-035, SRC-039, SRC-040, SRC-042, SRC-047, SRC-049, SRC-051, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-068, SRC-070, SRC-073, SRC-079, SRC-083, SRC-093
API abuse / API endpoint visibility	SRC-039, SRC-040, SRC-041, SRC-053, SRC-069, SRC-083, SRC-093, SRC-094, SRC-098, SRC-099
Credential / account-abuse signals	SRC-034, SRC-039, SRC-047, SRC-048, SRC-049, SRC-052, SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-080, SRC-081, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100, SRC-102
Scarce-resource / inventory-abuse signals	SRC-035, SRC-039, SRC-049, SRC-050, SRC-080, SRC-081, SRC-088, SRC-093, SRC-100, SRC-102
Standards / canonical reference (`standards` / `reference-doc`)	SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-066, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093
Scraper training / practice environments	SRC-041
Commercial scraping / scraping-as-a-service	SRC-043, SRC-044, SRC-045, SRC-053, SRC-087, SRC-088, SRC-102
Legal / governance boundary for scraping and bot detection	SRC-042, SRC-047, SRC-050, SRC-066, SRC-067, SRC-073, SRC-080, SRC-081, SRC-082, SRC-093, SRC-097
Defensive tooling / WAF / blocklists	SRC-051, SRC-052, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-073, SRC-079, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099
Honey accounts / honeypots / honeytokens	SRC-048
Network-origin / IP / ASN reputation	SRC-045, SRC-047, SRC-052, SRC-087, SRC-088, SRC-089, SRC-096, SRC-100, SRC-102
AI crawlers / content-access governance	SRC-032, SRC-039, SRC-054, SRC-058, SRC-086
HTTP foundations / web basics	SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093
CORS / browser security boundaries	SRC-059
Caching / conditional requests	SRC-060
Browser-fingerprinting surveys / privacy measurement	SRC-016, SRC-017, SRC-066, SRC-067, SRC-075, SRC-077
Authentication / account-abuse foundations	SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-083, SRC-093
Survey/data-quality abuse and form-quality controls	SRC-068
Client-side detection code protection / obfuscation	SRC-075, SRC-076, SRC-077, SRC-078, SRC-079
Residential / peer-proxy ecosystem	SRC-087, SRC-088, SRC-089, SRC-100, SRC-102
CAPTCHA-solving / solver ecosystem	SRC-084, SRC-085
Legal / hearing evidence for ticket bots	SRC-050, SRC-080, SRC-081
Browser architecture / browser-native automation foundations	SRC-075, SRC-077, SRC-091
API security / business-logic controls	SRC-083, SRC-093, SRC-094, SRC-099
Cloud / SaaS / identity abuse	SRC-094, SRC-095, SRC-098, SRC-099
Threat infrastructure / bulletproof hosting / RDP	SRC-096, SRC-099
Security economics / cost-of-abuse framing	SRC-097, SRC-100, SRC-101, SRC-102
Commercial automation cost stack	SRC-100, SRC-102
Government advisories / official TTP guidance	SRC-094, SRC-095

Scarce-resource abuse index

Scarce-resource abuse is a cross-cutting tag family for sources about competition over a limited transactional resource. It is not a fifth category: sources still belong to foundations, vendor, academic, or threat-surface.

Rows are added only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Otherwise these fields are not applicable.

id	tags	scarce_resource_targeted	abuse_phase	website_facing_action	evidence_of_use	abuse_outcome
SRC-035	scarce-resource-abuse; slot-sniping; limited-inventory; appointment-abuse; inventory-hoarding; booking-flow-abuse; availability-polling; cancellation-monitoring; fast-booking; auto-booking; slot-resale	appointment	monitoring / booking / holding / resale / cancellation exploitation	polling availability / completing booking / holding inventory / reselling	observed-use	ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness / operational load
SRC-039	scarce-resource-abuse; limited-inventory; inventory-hoarding; denial-of-inventory; scalping; booking-flow-abuse; availability-polling	booking / product / reservation	monitoring / booking / holding	polling availability / completing booking / holding inventory	vendor-measured	inventory unavailable / distorted metrics / operational load
SRC-049	scarce-resource-abuse; ticketing-abuse; limited-inventory; scalping; queue-abuse; account-preparation; ticket-resale; booking-flow-abuse; availability-polling; fast-booking	ticket	account preparation / queue entry / monitoring / booking / resale	creating accounts / entering queue / polling availability / completing booking / reselling	capability-only	ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness
SRC-050	scarce-resource-abuse; ticketing-abuse; limited-inventory; inventory-hoarding; scalping; purchase-limit-circumvention; account-preparation; ticket-resale	ticket	account preparation / booking / holding / resale	automated search and reservation / completing booking / using accounts and payment identities / reselling	legal-record	inventory unavailable / inflated resale price / degraded fairness
SRC-100	scarce-resource-abuse; limited-inventory; scalping; automated-checkout; account-preparation; reselling-communities; verification-bypass	product / ticket / booking	account preparation / monitoring / checkout / resale	using accounts / bypassing verification / completing checkout / reselling	vendor-measured / market-evidence	inventory unavailable / inflated resale price / degraded fairness
SRC-102	scarce-resource-abuse; limited-inventory; indirect-cost-stack; CAPTCHA-solving; proxies; temporary-SMS; scraping-APIs	product / ticket / appointment / booking (indirect)	monitoring / account preparation / challenge solving / booking support	polling availability / solving challenge / account verification / completing booking support	market-evidence / capability-only	indirect capability support only; no specific abuse outcome evidenced

Read and rejected

Recorded so they aren’t re-read (EVIDENCE-REVIEW.md §6). Both are title-collision retrieval artefacts — “Actions Speak Louder than Words” papers pulled in by string match, unrelated to bots.

id	source	org / authors	year	reason rejected	entry file
SRC-R01	Loyalty Program Building Blocks (Economics and Sociology)	Kwiatek et al.	2018	Marketing/consumer-perception study. No bot/abuse content. Out of scope; keep only if a loyalty-abuse adjacency is later added.	`kwiatek-2018-actions-speak-louder-loyalty-program-building-blocks.md`
SRC-R02	Figurative Language & Gesturing in Entrepreneurial Pitches (AMJ)	Healey et al.	2018	Communication/persuasion study. No bot/abuse content. Out of scope; possible use only for a dissemination/communication note.	`healey-2018-actions-speak-louder-than-words-entrepreneurial-pitches.md`

Queued

Not yet read or not yet extracted as a distinct source. This queue has been pruned so it no longer lists sources already represented by SRC-027, SRC-034, SRC-039, SRC-046, SRC-066, SRC-067, SRC-069–SRC-073, or SRC-079–SRC-093.

Highest-priority gaps

source / area	category	why flagged
Remaining foundations primers: browser storage, DNS/TLS/CDN basics, IP/network identity	foundations	MDN HTTP/CORS/cookies/header basics are represented by `SRC-059`…`SRC-065`, PortSwigger authentication foundations by `SRC-069`…`SRC-072`, NIST/OWASP control standards by `SRC-073` and `SRC-093`, and HTTP/2/3 RFCs by `SRC-090`/`SRC-092`; remaining need is DNS/TLS/CDN, browser storage, and IP/network identity foundations.
Independent in-the-wild bot, credential-stuffing, ad-fraud, fake-account, or scraping measurement studies	academic / threat-surface	Observed-use lane remains thinner than capability/vendor evidence; independent measurement is highest value.
Victim/operator engineering postmortems from platforms affected by scraping, credential stuffing, account creation, booking abuse, or crawler pressure	threat-surface	Balances vendor telemetry with first-party target/operator accounts.
Primary legal/enforcement records: BOTS Act complaints/orders, UK ticketing/consumer-law material, DVSA booking terms, regulator guidance where legal claims become load-bearing	legal-record / governance	Vendor legal explainers are not enough for legal claims.

Useful but second-order

source / area	category	why flagged
OpenAI agent / Operator documentation and safety material; Anthropic Claude computer-use / browser-use material; Browser Use and Skyvern docs	vendor / threat-surface	Needed to represent agent-builder framing rather than only defender-vendor framing.
Akamai, Imperva/Thales, F5 technical docs for bot management, WAF controls, API-security controls, and exposed rule/score/control-plane fields	vendor	Cloudflare control-plane docs are now represented by `SRC-003`, `SRC-054`…`SRC-058`; non-Cloudflare technical docs remain incomplete.
Anti-detect browsers and stealth tooling as distinct entries: Multilogin, GoLogin, Camoufox, Nodriver, SeleniumBase UC Mode	threat-surface	Currently mostly present through secondary mentions and catalogues, not their own source entries.
Browser-extension and userscript automation material: Tampermonkey/userscripts, browser add-ons used for page monitoring or form automation	threat-surface	Important for the “individual tool running inside the browser” / slot-sniping argument.
Public datasets for methodology investigations: ad-fraud, clickstream, login/session, credential-stuffing proxies, web-log, fraud graph datasets	methodology / academic	Needed to connect the written review to reproducible public-data investigations and to state framing distance clearly.
Additional Cloudflare Radar / AI crawler / bot traffic reports	vendor telemetry	Cloudflare product/capability docs are now represented; remaining Cloudflare gap is telemetry/prevalence.

Recently resolved from the old queue

old queued item	resolved by	note
OWASP Automated Threat Handbook full source	SRC-027	Now extracted; still marked needs review because the extraction was provisional.
Imperva Bad Bot Report annual	SRC-039	Covered via 2026 Thales / Imperva Bad Bot Report.
F5 Labs reports	SRC-034	Credential-stuffing report extracted; further F5 technical docs can still be queued separately if needed.
niespodd/browser-fingerprinting GitHub catalogue	SRC-046	Now extracted as its own threat-surface source.
Akamai financial-services report	SRC-040	Vendor telemetry report extracted; technical docs remain queued separately.
Ticket-bot / scarce-resource enforcement example	SRC-050	FTC BOTS Act source added; more primary legal records can still be useful.
MDN HTTP/CORS/cookies/header foundations	SRC-059…SRC-065	Core MDN foundations now extracted; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity.
Cloudflare bot product/control-plane docs	SRC-003, SRC-054…SRC-058	Capability layer now better represented; Cloudflare telemetry/Radar remains separate if needed.
Browser-fingerprinting survey / empirical update	SRC-066, SRC-067	Laperdrix survey and Berke demographic-fingerprinting paper added; remaining fingerprinting work should be targeted rather than generic survey collection.
Web-bot detection privacy / methods review	SRC-047	Re-extraction attached to existing Martínez Llamas row; use as methods/privacy/governance anchor, not observed-use evidence.
PortSwigger authentication foundations	SRC-069…SRC-072	Worked authentication/OAuth/password-login sources added; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity.
NIST authentication/session standard	SRC-073	SP 800-63B-4 added as authentication and session-management control foundation.
OWASP ASVS application-security controls	SRC-093	ASVS 5.0 added as defensive-control foundation for anti-automation, business logic, auth, sessions, API validation, and logging.
HTTP/2 and HTTP/3 protocol standards	SRC-090, SRC-092	RFC 9113/9114 added for current protocol claims; RFC 7540 kept as historical/obsolete HTTP/2 source.
Chromium browser architecture	SRC-091	Multi-process architecture added for browser-native automation and browser architecture foundations.
Proxy ecosystem and residential proxy supply	SRC-087, SRC-088, SRC-089	Commercial peer/residential proxy sources and independent proxy-ecosystem measurement added.
VM obfuscation and client-side bot-detection code protection	SRC-076, SRC-078, SRC-079	General obfuscation taxonomy, deobfuscation counterweight, and DataDome VM-obfuscation source added.
Ticketmaster legal/hearing evidence	SRC-080, SRC-081	Earlier Ticketmaster/Prestige legal case and 2023 Senate/Taylor Swift hearing source added.
CAPTCHA-solving ecosystem and GUI-agent CAPTCHA capability	SRC-084, SRC-085	Commercial solver ecosystem and ReCAP GUI-agent CAPTCHA-solving paper added.
OpenClaw exposure measurement	SRC-086	Bitsight exposure source added as complement to HUMAN OpenClaw traffic-abuse source.
API-security/business-logic overview	SRC-083	Secondary API-security chapter added; OWASP/NIST/PortSwigger remain stronger control references.
Cloud/SaaS abuse and official identity-abuse advisories	SRC-094, SRC-095, SRC-098, SRC-099	CISA Microsoft cloud, Scattered Spider, IBM X-Force, and Recorded Future entries added; still avoid treating these as bot-specific.
Bulletproof hosting / adversarial infrastructure measurement	SRC-096	Censys RDP/BPH infrastructure source added.
Cybercrime economics and automation cost framing	SRC-097, SRC-101, SRC-102	Anderson gives non-vendor economics framework; Summers is low-priority opinion; commercial cost stack gives availability/pricing context.
SaaSification of automated abuse infrastructure	SRC-100, SRC-102	Kasada Q1 2026 and commercial automation cost stack added; cite as vendor/market evidence, not independent prevalence.

Appendices

Register taxonomy

category — foundations / vendor / academic / threat-surface (EVIDENCE-REVIEW §2).

evidence basis — what kind of evidence the source actually is. This is the column that prevents a marketing claim being treated as equivalent to a study.

empirical-academic — controlled study or dataset experiment in a research setting.
empirical-operational — measurement against real or purchased traffic / a live honey site.
survey — practitioner/executive survey; self-reported.
vendor-claim — vendor marketing / efficacy / prevalence claims; vendor-measured, not independently verifiable.
capability-doc — product or platform documentation describing what a system exposes or can do.
tooling-readme — open-source tool README / docs; maintainer claims, not independent tests.
bypass-guide — scraper-side or evasion-side guidance describing ways to avoid blocking or align detection surfaces. High dual-use; cite only at technique-family level, not as a recipe.
methods-taxonomy — a categorisation of methods with no reproducible detail.
taxonomy — a canonical categorisation of the field (e.g. OWASP OAT).
threat-intel — vendor observation/threat reports.
legal-record — court filings, indictments, or enforcement actions. Used for technique and operational-proximity evidence only, with actor/campaign attribution stripped (EVIDENCE-REVIEW.md §3).
legal-explainer — non-authoritative legal/compliance explainer or guidance source. Treat as context only; load-bearing legal claims require primary law, regulator guidance, specialist legal analysis, or legal records.
primary-law / regulator-guidance — primary legal text or regulator guidance used only under the regulatory-constraint lane. Treat as jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.
reference-doc — neutral technical reference documentation used for foundations/protocol concepts. Not evidence of abuse, prevalence, or defensive performance.
standards-reference / protocol-standard — normative standards or specifications used for control/protocol foundations. Not evidence of abuse or control effectiveness.
control-requirements / authentication-guidance / defensive-guidance — defensive requirements or implementation guidance. Useful for control vocabulary; not proof that controls work in a given deployment.
browser-architecture reference / architecture analysis — architecture explanations used to understand browser/runtime/client-server boundaries. Not telemetry or detection evidence.
empirical-method demonstration — method evaluation in a bounded research setting. Rigour may be high, but operational proximity remains limited unless it measures real-world abuse.
empirical-exposure measurement / infrastructure-analysis — measurement of exposed services or infrastructure such as proxies, not necessarily measurement of confirmed abuse.
vendor product announcement / tutorial ecosystem — vendor or adjacent ecosystem material documenting available capabilities or workflow culture; treat as capability evidence unless independently measured.
review — literature review or survey source synthesising prior work. Use for academic/foundation surveys where the source is not itself measuring live abuse.
empirical-measurement — original measurement or dataset paper that measures a relevant phenomenon but not necessarily bot abuse directly.
dataset paper — source whose primary contribution is a dataset or dataset-linked measurement.
empirical-methods — empirical methods paper or case study evaluating controls/processes in a bounded setting.
review-informed case study — case study grounded in prior literature but not designed as a controlled comparative experiment.
control-guidance — defensive implementation guidance or checklist. Useful for controls vocabulary, not proof of control efficacy.
vulnerability-taxonomy — educational or reference taxonomy of weakness classes; not prevalence evidence.

operational proximity — how close the source sits to observed abuse against a real target. Orthogonal to evidence basis (which records source type/rigour); the two are tracked separately so a rigorous lab result and a vendor blog are not flattened onto one axis. Where a source mixes levels (e.g. a vendor report carrying both capability claims and production telemetry), the cell records the highest level the source independently supports, with a parenthetical caveat.

capability — establishes only that a tool, technique, or capability exists or is feasible. Includes documentation, tool READMEs, and controlled academic PoCs — a lab demonstration that an evasion works is still not observation of real-target abuse; its rigour lives in evidence basis, not here.
claimed — an interested party asserts the capability is used or works against targets, without independent observation (vendor “we stop X” claims, bypass-vendor “works against Y” claims, self-report surveys).
observed — the activity has been seen against a real or realistic target, but not cleanly or independently quantified (vendor telemetry reports — vendor-measured; victim engineering postmortems; enforcement/legal records describing technique).
measured — controlled or operational measurement quantifying the activity against a real or realistic target (honey-site experiments; in-the-wild measurement studies).
n/a — the source is a taxonomy or a non-bot-use foundation; the axis does not apply.
control — the source is control guidance or a requirements standard. It can say what should be built or verified, but not whether the control works against a specific live threat.
measured-but-bounded — a controlled benchmark or method evaluation with quantitative results, but outside the project’s core live web-abuse setting.
observed-claim / observed-exposure — public testimony, legal/hearing claims, or exposure scans. Useful as operational-proximity evidence, but weaker than independent measurement of confirmed abuse.

Migrated and pre-v3 rows carry provisional proximity values assigned from the one-line summaries; they inherit the row’s standing review state and are part of the same entry-file backfill, not yet reviewed.

Scarce-resource abuse tags — a cross-cutting tag family, not a top-level category. Apply scarce-resource-abuse as the umbrella tag when a source concerns scarce transactional resources, and add the more specific tags supported by the source:

slot-sniping
limited-inventory
appointment-abuse
reservation-abuse
ticketing-abuse
inventory-hoarding
denial-of-inventory
scalping
queue-abuse
booking-flow-abuse
availability-polling
cancellation-monitoring
fast-booking
auto-booking
slot-resale
ticket-resale
reservation-resale
booking-transfer
account-preparation

Scarce-resource abuse fields — conditional fields for sources tagged scarce-resource-abuse. They apply only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Availability polling may be scraping-like, but the abuse pattern is competition for a scarce transactional resource, so do not collapse it into generic scraping.

scarce_resource_targeted — appointment / ticket / reservation / product / booking / queue position / other.
abuse_phase — monitoring / account preparation / queue entry / booking / holding / transfer / resale / no-show / cancellation exploitation.
website_facing_action — polling availability / entering queue / solving challenge / completing booking / holding inventory / changing booking / transferring booking / reselling.
evidence_of_use — measured-use / observed-use / vendor-measured / legal-record / regulatory-record / market-evidence / capability-only / controlled-PoC. This is the scarce-resource-specific use classification; it does not replace operational proximity, which remains the broader corpus-level capability-to-use axis.
abuse_outcome — ordinary users blocked / inventory unavailable / inflated resale price / no-show / degraded fairness / distorted metrics / operational load.

Regulatory-constraint tag and fields — conditional vocabulary for sources admitted under EVIDENCE-REVIEW.md §2.6. This is a cross-cutting lane, not a category. Apply regulatory-constraint only when the source is being read for how a rule constrains a technique family.

jurisdiction — UK / EU / US / Canada / Australia / other.
currency — free text; must include an as-of date and the caveat subject to change; not verified current. Honest token if unknown: as-of unknown — verify before use.
constrains_technique — the technique family the rule bears on, preferably matching an existing signals-and-techniques cross-index row such as browser fingerprinting, cookies/session persistence, behavioural signals, scraping/access control, or AI crawlers/content-access governance.
operational proximity — always n/a for regulatory-constraint entries. These sources explain technique-deployment constraints; they are not evidence of abuse prevalence, bot behaviour, or control effectiveness.

provenance — extraction agent and model, from the entry’s run-metadata block (e.g. Claude Code / Opus 4.8). not recorded for rows migrated before the v2 prompt. Where a source has several extraction files (different prompt versions or agents), provenance lists each.

reconciliation — whether a source’s row points at one extraction or several. Tagged in the entry file cell.

[single] — one extraction file.
[multiple — unreconciled] — two or more extraction files of the same source (e.g. <slug>.md + <slug>.v2.md, or two agents) not yet reconciled. canonical for citation = latest version.
[combined] — a <slug>.combined.md reconciliation exists. canonical for citation = the combined file; the source extractions are kept and listed.

review state

solid — extraction reviewed; sufficient for register use and citation.
conditional — usable for cautious register reference, but check the entry before quoting numbers, equations, or specific claims.
needs review — do not use without reading the entry / source.
migrated — review pending — carried over from the flat register; not yet reviewed against its entry file.

threat types — OWASP OAT categories where they map, else project vocabulary (scraping, credential stuffing, scalping, account takeover, click/ad fraud, carding). Not threat-specific for method/infrastructure-only sources.

Update log

Append-only. New entries at the bottom.

2026-06-02 — Register schema v2; migrated from working/reading-register.md. Replaced the flat four-column register (Reference / Status / Entry / Notes) with a structured projection of the extraction fields. Added: evidence basis, provenance, review state, and threat types columns to the inventory; a framing-distance ledger; a signals-and-techniques cross-index; and this controlled-vocabulary appendix. 26 in-scope sources (SRC-001…SRC-026) and 2 rejected (SRC-R01, SRC-R02) migrated. Provenance is not recorded for all migrated rows because the prior register did not track extraction agent/model; the v2 extraction prompt records it going forward. Framing-distance what it fails to represent, threat types, and several evidence basis/signals cells are stubbed tbd — backfill from entry pending a read of working/register-entries/.

2026-06-05 — Added SRC-027…SRC-031 from reviewed extraction entries. Added OWASP Automated Threat Handbook v1.3 (SRC-027, provenance not recorded, provisional draft), Medium Playwright cookies tutorial (SRC-028, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), RoundProxies Rnet tutorial (SRC-029, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), ScrapFly Cloudflare Turnstile post (SRC-030, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), and ScrapFly Imperva/Incapsula post (SRC-031, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2). Added corresponding framing-distance rows and cross-index memberships; removed the now-stale queued OWASP Handbook row after representing it as SRC-027. Review state for all five is needs review; SRC-027 specifically requires review because its entry was produced without access to the repo scope docs.

2026-06-06 — Schema v3: added operational proximity axis; legal-record evidence basis. Formalised the capability-vs-use distinction — previously implicit in framing-distance prose and the threat-surface table note — into a queryable ordinal (capability / claimed / observed / measured / n/a), orthogonal to evidence basis, positioned after it in every inventory table. Added legal-record as an evidence basis for enforcement/court sources, admitted under a strict technique-not-attribution rule and a dual-use no-recipe rule (EVIDENCE-REVIEW.md §3; editorial enforcement in GOVERNANCE.md §4/§7). The source-extraction-prompt (v3) now emits the proximity field; register-update-prompt projects it. Proximity values across SRC-001…031 were assigned provisionally from the one-line summaries and inherit each row’s review state. First pass: the corpus concentrates at capability / claimed; observed is vendor-measured telemetry only (SRC-004, SRC-009, SRC-010); measured is essentially SRC-015 (honey-site) plus SRC-018 (weak labels). The five sources added 2026-06-05 sit at capability (SRC-027 taxonomy → n/a, SRC-028, SRC-029) and claimed (SRC-030, SRC-031 — the named-defender bypass writeups). In short the register evidences capability and market existence far more strongly than real-world prevalence — closing that is the new observed-use reading lane (Queued; EVIDENCE-REVIEW.md §2.5, §4).

2026-06-06 — Schema v3 extension: added scarce-resource abuse tags and conditional fields. Added scarce-resource-abuse as a cross-cutting umbrella tag, the specific tag vocabulary for appointment/ticketing/reservation/product-drop/queueing/booking/inventory abuse, and a conditional scarce-resource abuse index carrying scarce_resource_targeted, abuse_phase, website_facing_action, evidence_of_use, and abuse_outcome. This is schema support only; no source rows were added or reclassified.

2026-06-06 — Added SRC-032…SRC-040 from reviewed v3 extraction entries. Added Wikimedia crawler infrastructure account (SRC-032, threat-surface, Claude / Claude Opus 4.8 / source-extraction-prompt v3), Wang et al. FP-Agent (SRC-033, academic, Claude / Claude Opus 4.8 / source-extraction-prompt v3), F5 Labs credential-stuffing report (SRC-034, vendor, Claude / Claude Opus 4.8 / source-extraction-prompt v3), DVSA driving-test bot/resale post (SRC-035, threat-surface, Codex / GPT-5 / source-extraction-prompt v3), HUMAN OpenClaw (SRC-036), HUMAN Agentic Visibility (SRC-037), HUMAN State of Agentic Traffic May 2026 (SRC-038), Thales / Imperva 2026 Bad Bot Report (SRC-039), and Akamai financial-services security trends (SRC-040) (the HUMAN/Thales/Akamai entries from ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3). All are new distinct sources and [single] extractions. Added framing-distance rows, updated cross-index memberships, introduced API abuse / API endpoint visibility, Credential / account-abuse signals, and Scarce-resource / inventory-abuse signals, and added scarce-resource rows for SRC-035 plus SRC-039. Flags: SRC-032 uses threat-surface as a least-bad category for first-party operator evidence; SRC-039 has scarce-resource coverage but no dedicated scarce-resource block in the entry, so review the projected scarce-resource fields before relying on them; several supplied filenames retain (1) suffixes from the uploaded extraction files.

2026-06-06 — Added SRC-041…SRC-053 from reviewed v3 extraction entries. Added ScrapingBee scraper test sites (SRC-041, foundations), ScrapingBee legal guidelines (SRC-042, threat-surface after normalising the entry’s non-schema governance category), ScrapingBee advanced scraping techniques (SRC-043), ScrapingBee price-scraping tools (SRC-044), ScrapingBee PerimeterX/HUMAN bypass guide (SRC-045), niespodd browser-fingerprinting / anti-detection README (SRC-046), Martínez Llamas et al. GDPR/AI Act bot-detection review (SRC-047), Wardle honey-identity leaked-credential experiment (SRC-048), DataDome ticket-bot explainer (SRC-049), FTC first BOTS Act enforcement cases (SRC-050), StopBadBots SBB-WAF-Rules (SRC-051), Hamachek bad-ASN-list (SRC-052), and ScrapingBee web-scraping API product page (SRC-053). Added corresponding framing-distance rows, cross-index memberships, and scarce-resource rows for SRC-049 plus SRC-050. Schema note: added legal-explainer as an evidence-basis token for non-authoritative legal/compliance explainers. Normalisation flags: SRC-041 proximity was entered as low and has been normalised to capability (training / sandbox); SRC-042 proximity was entered as context and has been normalised to n/a (legal context; not use evidence); SRC-052 remains observed but explicitly at the anecdotal/first-party floor. Several entries are high dual-use scraper-side sources and should be cited only at technique-family level, not as operational recipes.

2026-06-06 — Added SRC-054…SRC-065 from reviewed v3 extraction entries; updated SRC-003 as a re-extraction. Added Cloudflare Block AI Bots (SRC-054), Cloudflare Turnstile (SRC-055), Cloudflare Detection IDs (SRC-056), Cloudflare bot detection engines (SRC-057), Cloudflare bot solutions overview (SRC-058), MDN CORS (SRC-059), MDN HTTP caching (SRC-060), MDN HTTP authentication (SRC-061), MDN HTTP cookies (SRC-062), MDN User-Agent header (SRC-063), MDN HTTP headers (SRC-064), and MDN Overview of HTTP (SRC-065). Treated the new Cloudflare Bot Management entry as a re-extraction of existing SRC-003 rather than a new source row: the existing migrated row now lists both the legacy extraction file and the v3 extraction file as [multiple — unreconciled], with the v3 file as canonical pending reconciliation. Added framing-distance rows, cross-index memberships, and two new cross-index families (AI crawlers / content-access governance; HTTP foundations / web basics). Added reference-doc as an evidence-basis token for neutral technical foundation material. Normalisation flag: all MDN entries supplied foundational as proximity; normalised to n/a (foundational reference) because operational proximity is not applicable to non-abuse protocol references.

2026-06-06 — Added SRC-066…SRC-072 from reviewed v3 extraction entries; updated SRC-047 as a re-extraction. Added Laperdrix et al. browser-fingerprinting survey (SRC-066), Berke et al. browser-fingerprinting demographics/dataset paper (SRC-067), Sudbury & Marks online-survey bots / bad-data case study (SRC-068), and four PortSwigger Web Security Academy authentication/OAuth/password-login foundation entries (SRC-069…SRC-072). Attached martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md as a re-extraction of existing SRC-047 rather than creating a duplicate row. Normalisation flags: foundational in the Laperdrix entry is represented as n/a (foundational survey); PortSwigger educational/security guidance is placed under Foundations rather than Vendor because it is used as worked reference material, not vendor evidence. Updated framing ledger, cross-index, and resolved queue notes.

2026-06-07 — Added SRC-073…SRC-093 from reviewed extraction entries. Added NIST SP 800-63B-4 authentication/authenticator-management standard (SRC-073), Sajid et al. hooking-based deception preprint (SRC-074), Cao browser-principal dissertation (SRC-075), Xu et al. layered obfuscation taxonomy (SRC-076), Tschacher bot-detection architecture explainer (SRC-077), Sudhir et al. Pushan deobfuscation preprint (SRC-078), DataDome VM-based obfuscation announcement (SRC-079), Ticketmaster v. Prestige legal/settlement source family (SRC-080), U.S. Senate Ticketmaster/Taylor Swift hearing source family (SRC-081), Kolhar & Sridevi AI/ML cybersecurity background (SRC-082), Kolhar & Gundoor API security chapter (SRC-083), commercial CAPTCHA-solving API ecosystem (SRC-084), Chen et al. ReCAP GUI-agent CAPTCHA paper (SRC-085), Bitsight OpenClaw exposure measurement (SRC-086), Infatica P2B SDK residential proxy source (SRC-087), Bright Data residential proxy market source (SRC-088), Choi et al. proxy-ecosystem measurement (SRC-089), RFC 9113/9114 HTTP/2 and HTTP/3 protocol foundations (SRC-090), Chromium multi-process architecture (SRC-091), RFC 7540 historical HTTP/2 standard (SRC-092), and OWASP ASVS 5.0 (SRC-093). Added corresponding framing-distance rows, cross-index memberships, new cross-index families for client-side obfuscation, residential/peer proxies, CAPTCHA-solving, ticket-bot legal/hearing evidence, browser architecture, and API/business-logic controls, plus scarce-resource rows for SRC-080, SRC-081, SRC-088, and SRC-093. Normalisation flags: RFC 7540 is retained as historical/obsolete because RFC 9113 supersedes it for current HTTP/2 claims; broad AI/ML cybersecurity background (SRC-082) is low-priority and should not carry bot-specific claims; the two Ticketmaster rows are legal/hearing evidence and should be cited as allegations/testimony rather than independent technical measurement. 2026-06-07 — Added SRC-094…SRC-102 from the cloud/SaaS, adversarial-infrastructure, cost-economics, and commercial automation batch. Added CISA Microsoft cloud post-compromise advisory (SRC-094), FBI/CISA Scattered Spider advisory (SRC-095), Censys bulletproof-hosting/RDP infrastructure measurement (SRC-096), Anderson et al. cybercrime-cost framework (SRC-097, duplicate upload deduped), IBM X-Force Threat Intelligence Index 2026 (SRC-098, raw TXT/HTML source family; extraction still needed), Recorded Future cloud/SaaS abuse landscape (SRC-099), Kasada Q1 2026 threat-enablers report (SRC-100), Summers/Netwrix AI attacker-cost commentary (SRC-101), and the commercial automation cost-stack cluster (SRC-102). Added framing-distance rows, cross-index updates, new cross-index families for cloud/SaaS identity abuse, threat infrastructure, official advisories, and security/cost economics, plus scarce-resource rows for SRC-100 and SRC-102. Normalisation flags: EVIDENCE-REVIEW(6).md was treated as a scope document rather than a source row; Anderson (1)/(2) are identical and were deduped; IBM TXT/HTML files were represented as one provisional raw-source row rather than a reviewed extraction entry.

2026-06-12 — Schema v3 extension: added regulatory-constraint lane vocabulary. Added regulatory-constraint as a cross-cutting tag for sources read only as constraints on technique deployment, plus conditional fields jurisdiction, currency, and constrains_technique. Added primary-law / regulator-guidance evidence-basis tokens for primary legal text and regulator guidance under this lane. Regulatory-constraint entries are always operational proximity: n/a, jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.