Evidence Register
This is the project’s bibliographic memory: one row per source read, structured so that extracted evidence can be traced into site pages, the signals/techniques taxonomy, and reading decisions. It is not the narrative evidence review — that lives in the Foundations, Background, and Technical territory sections. Per GOVERNANCE.md §6 and EVIDENCE-REVIEW.md §6 the register is public, including sources read and rejected.
Purpose and scope
This register is the structured projection of the per-source extraction entries in working/register-entries/. It exists so that the analytical fields the extraction prompt produces — evidence basis, signals / techniques, threat types, framing distance, what it cannot show — are queryable across the whole corpus rather than buried in free text.
- It tracks every source read, with provenance and review state.
- It records what kind of evidence each source actually is (
evidence basis), so a vendor marketing claim is never silently treated as equivalent to a controlled study. - It cross-indexes sources by signal/technique family, so a reader can find which sources cover JA3/JA4, mouse dynamics, residential proxies, GAN/RL evasion, etc.
- It tracks scarce-resource abuse as a cross-cutting tag family where relevant, not as a separate evidence category.
- It carries the framing-distance ledger — the project’s central analytical discipline (
EVIDENCE-REVIEW.md§5). - It records sources read and rejected, so they aren’t re-read.
The source of truth for any individual source is its entry file under working/register-entries/. This page does not re-read source material and does not infer beyond the entries.
How to update this register
Append-only by default.
- One register id (
SRC-NNN) per distinct source. Distinct sources from the same vendor stay as separate rows — they are never folded together. Selecting between overlapping sources is the page-writing step’s job, not the register’s. - Add one inventory row per new distinct source, in the relevant category sub-table.
- Add the source to the framing-distance ledger and to any signals/techniques cross-index rows it belongs to.
- If the source concerns scarce-resource abuse, add it to the scarce-resource abuse index and carry the conditional fields from the extraction entry.
- Set
evidence basis,provenance(agent + model), andreview statefrom the entry’s run-metadata block — do not leave provenance blank. - Record sources read and rejected in the rejected table with the reason. Rejected entries are kept, not deleted — the row is the “don’t re-read” record.
- Do not rewrite an existing judgement silently. If a judgement changes, add a dated note in the update log explaining why and preserve the prior context.
- Keep
current relevanceseparate fromfuture relevancein the entry; the register surfaces currentproject impactonly.
Versioning and reconciliation. Re-extracting a source under a new prompt version, or with a different agent, produces a new versioned entry file alongside the old one — the old file is never overwritten or deleted. Filename convention: stem = source slug, suffix = version/state, agent/model recorded inside the file’s run-metadata block (not in the filename):
<slug>.md— original extraction<slug>.v2.md— re-extraction under prompt v2 (.v3, … as prompts advance)<slug>.combined.md— a reconciliation of two or more extractions of the same source
A re-extraction does not create a new register row. It is added to the existing source’s row under the same SRC-NNN: list all its files in the entry file cell, and tag the cell with a reconciliation state — [single], [multiple — unreconciled], or [combined] (see appendix). canonical for citation = the .combined file if one exists, else the latest version. Reconciling multiple extractions into a .combined file is done when useful — there is no obligation to reconcile immediately.
The mechanics of projecting a reviewed entry into this register (assigning ids, versioning, cross-index maintenance, the update-log line) are specified in prompts/register-update-prompt.md.
This register was migrated from the flat working/reading-register.md. The earlier register stored everything analytical in a single free-text Notes column and did not record which agent/model produced each extraction. As a result:
provenanceisnot recordedfor every migrated row. It is populated going forward from the extraction prompt’s run-metadata block (v2).evidence basisandsignals / techniquesare migrated from the old one-line notes where those notes supported them; otherwise markedsee entry.- The framing-distance ledger and
threat typesare mostlytbd — backfill from entry: those fields live in the per-source entry files (working/register-entries/), which were not re-read during migration. review stateismigrated — review pendingfor all rows: extraction quality cannot be judged from a one-line note.
Completing the register is a mechanical backfill pass: an agent reads each working/register-entries/<slug>.md, lifts evidence basis, signals / techniques, threat types, framing distance, and what it cannot show, and fills the stubs. Treat any cell marked tbd/see entry/not recorded as unverified until that pass runs.
Extraction inventory
Columns: id · source · org / authors · year · evidence basis · operational proximity · signals / techniques · threat types · provenance (agent / model) · review state · project impact · entry file. Vocabulary for the ordinal/controlled columns is defined in the appendix.
Foundations
| id | source | org / authors | year | evidence basis | operational proximity | signals / techniques | threat types | provenance | review state | project impact | entry file |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRC-001 | Automated Threats to Web Applications (project page) | OWASP | n.d. | taxonomy | n/a (taxonomy) | OAT category set | All OAT (taxonomy) | not recorded | migrated — review pending | OAT taxonomy spine for threat-type vocabulary; project page only, full Handbook queued | owasp-automated-threats-to-web-applications.md |
| SRC-027 | Automated Threat Handbook: Web Applications v1.3 | OWASP / Watson & Zaw | 2026 | taxonomy | n/a (taxonomy) | 21 OAT categories; countermeasure classes; symptoms; fingerprinting/reputation/rate/monitoring classes | All OAT (taxonomy) | not recorded; provisional draft | needs review | Full OAT Handbook taxonomy and countermeasure-class reference; provisional extraction produced without repo scope docs, so verify before citation | [single]; canonical: owasp-automated-threat-handbook-v1v3.md |
| SRC-028 | How to Get and Use Cookies in Playwright | armanabbasi / Medium | 2023 | capability-doc | capability | Playwright browser contexts; cookie extraction; session/authentication state capture | Not threat-specific | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 | needs review | Low-level foundations example showing browser automation can access and preserve cookie/session state | [single]; canonical: medium-playwright-cookies-source-extraction.md |
| SRC-041 | Top 15 Scraper Sites to Enhance Your Data Collection Skills | ScrapingBee | 2026 | capability-doc | capability (training / sandbox) | scraping practice sites; static HTML; pagination; authentication; cookies/sessions; JSON APIs; JavaScript rendering; proxy management; CAPTCHA handling | Not threat-specific; scraper skill-building and production-path context | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Low-priority foundations/context source for how scraping skills are taught and how vendors frame the move from sandbox practice to production scraping | [single]; canonical: scrapingbee-2026-scraper-test-sites(1).md |
| SRC-059 | Cross-Origin Resource Sharing (CORS) | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | CORS; same-origin policy; Origin header; preflight OPTIONS; Access-Control-* headers; credentialed cross-origin requests | Not threat-specific; browser-side cross-origin data access and CSRF-adjacent reasoning | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Foundation reference for CORS; useful mainly to prevent treating CORS as a general anti-scraping defence | [single]; canonical: mdn-2026-cors(1).md |
| SRC-060 | HTTP caching | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | Cache-Control; private/shared/proxy/CDN caches; ETag; Last-Modified; If-None-Match; Vary; conditional requests | Not threat-specific; cache-aware crawling, origin-load interpretation, and shared-cache privacy risk | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Foundation reference for caching and why repeated crawler/scraper requests may affect origin servers differently depending on cache behaviour | [single]; canonical: mdn-2026-http-caching(1).md |
| SRC-061 | HTTP authentication | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | 401/403/407; WWW-Authenticate; Authorization; Proxy-Authenticate; Proxy-Authorization; Basic auth; bearer tokens | Not threat-specific; credential-bearing requests, proxy authentication, and access-control boundary concepts | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Foundation reference for separating HTTP authentication, proxy authentication, login sessions, and account-abuse concepts | [single]; canonical: mdn-2026-http-authentication(1).md |
| SRC-062 | Using HTTP cookies | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | Set-Cookie; Cookie; session IDs; session/permanent cookies; Secure; HttpOnly; SameSite; Domain/Path; session fixation | Not threat-specific; session state, tracking, account takeover, scraping behind login, and cookie-continuity detection | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Core foundation source for explaining how stateless HTTP becomes stateful through cookies and sessions | [single]; canonical: mdn-2026-using-http-cookies(1).md |
| SRC-063 | User-Agent header | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | User-Agent strings; browser/crawler/tool identification; User-Agent reduction; Client Hints; Navigator.userAgent | Not threat-specific; crawler identification, spoofing, browser impersonation, and passive fingerprinting surface | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Foundation entry for what User-Agent is and why it sits between compatibility, crawler identity, bot detection, and privacy | [single]; canonical: mdn-2026-user-agent-header(1).md |
| SRC-064 | HTTP headers | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | request/response/representation/payload headers; User-Agent; Accept-Language; Cookie; Authorization; CORS; cache; proxy headers | Not threat-specific; header-based detection, spoofing/mismatch checks, session/authentication-bearing requests, and proxy-aware analysis | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Neutral vocabulary bridge between basic HTTP mechanics and later sources on header-order checks, proxy headers, and browser impersonation | [single]; canonical: mdn-2026-http-headers(1).md |
| SRC-065 | Overview of HTTP | MDN Web Docs | 2026 | reference-doc | n/a (foundational reference) | HTTP requests/responses; client-server model; user-agents; browser resource fetching; proxies; cookies/sessions; authentication | Not threat-specific; foundation for scraping, crawling, request-pattern detection, and session-based abuse | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Plain-language foundation for browsers, crawlers, and scripts as user-agents making HTTP requests | [single]; canonical: mdn-2026-overview-of-http(1).md |
| SRC-069 | OAuth 2.0 authentication vulnerabilities | PortSwigger Web Security Academy | 2026 | methods-taxonomy | capability | OAuth grant types; authorization endpoints; redirect URI validation; state parameter; authorization codes; access tokens; scope validation; OpenID Connect | OAuth authentication bypass; token/code leakage; forced profile linking; third-party authentication abuse; account takeover | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | OAuth / third-party-authentication foundation entry; broadens login-abuse coverage beyond password forms | [single]; canonical: portswigger-2026-oauth-2-authentication-vulnerabilities(1).md |
| SRC-070 | How to secure your authentication mechanisms | PortSwigger Web Security Academy | 2026 | control-guidance | capability (defensive control guidance) | password strength checking; zxcvbn; generic errors; response-time equalisation; IP-based rate limiting; CAPTCHA; MFA/2FA; password reset/change flows | credential disclosure; username enumeration; brute-force login; password-reset abuse; weak MFA; credential-stuffing-adjacent login abuse | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Defensive counterpart to login-vulnerability entries; useful for “what closes down easy routes” without treating controls as proven sufficient | [single]; canonical: portswigger-2026-secure-authentication-mechanisms(1).md |
| SRC-071 | Vulnerabilities in password-based login | PortSwigger Web Security Academy | 2026 | methods-taxonomy | capability | username enumeration; status-code/error-message/response-timing differences; brute-force wordlists; account locking; IP blocking; rate limiting; CAPTCHA; HTTP Basic Auth; Authorization header | brute-force login; credential stuffing; username enumeration; account takeover; basic-auth brute force | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Core foundation source for credential stuffing and password-login abuse mechanics; shows why simple IP/account-lock controls are partial | [single]; canonical: portswigger-2026-password-based-login-vulnerabilities(1).md |
| SRC-072 | Authentication vulnerabilities | PortSwigger Web Security Academy | 2026 | vulnerability-taxonomy | capability | broken authentication; brute force; authentication bypass; password-based login; MFA weaknesses; third-party authentication; OAuth | account takeover; brute-force login; authentication bypass; post-login attack-surface expansion; high-privilege account compromise | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Foundation overview for authentication as an attack surface and how login abuse can lead to account takeover and follow-on exploitation | [single]; canonical: portswigger-2026-authentication-vulnerabilities(1).md |
| SRC-073 | Digital Identity Guidelines: Authentication and Authenticator Management | NIST / Temoshok et al. | 2025 | standards-reference / authentication-guidance / control-requirements | control | AAL; MFA; phishing resistance; passwords; credential-stuffing throttling; session secrets; fraud indicators; browser cookies | Credential stuffing; credential cracking; account takeover; session abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Standards foundation for authentication, sessions, rate limiting, and why fraud indicators do not replace authenticators | [single]; canonical: nist-2025-sp-800-63b-4-authentication-authenticator-management(1).md |
| SRC-090 | HTTP/2 and HTTP/3 protocol foundations | IETF / Thomson, Benfield & Bishop | 2022 | protocol-standard / technical specification | n/a (foundational protocol standard) | HTTP/2; HTTP/3; streams; multiplexing; HPACK; QPACK; QUIC; ALPN; binary framing; protocol identifiers | Not threat-specific; protocol foundation for HTTP fingerprinting and request-layer analysis | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Current protocol-standard foundation for HTTP/2 and HTTP/3; use RFC 9113 for current HTTP/2 claims rather than RFC 7540 | [single]; canonical: rfc-9113-9114-http2-http3-protocol-foundations(1).md |
| SRC-091 | Multi-process Architecture | Chromium Projects | n.d. | browser-architecture reference / design documentation | n/a (foundational browser architecture) | browser process; renderer process; Blink; Mojo; IPC; sandboxing; GPU/network/storage services; process isolation | Not threat-specific; browser-native automation and browser-security foundation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Foundation for explaining why modern browsers are not simple HTTP clients and why browser-native automation has a different threat surface | [single]; canonical: chromium-multi-process-architecture(1).md |
| SRC-092 | Hypertext Transfer Protocol Version 2 (HTTP/2) | IETF / Belshe, Peon & Thomson | 2015 | protocol-standard / technical specification | n/a (historical protocol standard) | HTTP/2; binary framing; streams; multiplexing; HPACK; ALPN; h2/h2c; server push | Not threat-specific; historical HTTP/2 protocol foundation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Historical HTTP/2 specification; keep for provenance/history, but use RFC 9113 / SRC-090 for current HTTP/2 wording |
[single]; canonical: rfc-7540-http2(1).md |
| SRC-093 | Application Security Verification Standard, Version 5.0.0 | OWASP Foundation | 2025 | standards-reference / control-requirements / defensive-guidance | control | anti-automation; business logic; rate limiting; session management; authentication; authorization; HTTP validation; logging | Credential stuffing; scraping; scalping; sniping; account creation; denial of inventory; expediting; DoS; account aggregation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Defensive-control foundation connecting automated-abuse categories to verifiable application-security requirements | [single]; canonical: owasp-2025-application-security-verification-standard-5-0(1).md |
Vendor and industry
Treated as evidence of what the field claims, not independent proof. Efficacy/prevalence figures are vendor-measured.
| id | source | org / authors | year | evidence basis | operational proximity | signals / techniques | threat types | provenance | review state | project impact | entry file |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRC-002 | Bot scores, JA3/JA4, Detection IDs, Web Bot Auth, custom rules (Bots docs) | Cloudflare | 2026 | capability-doc | capability | bot score 1–99, JA3/JA4, Detection IDs, Web Bot Auth, WAF rule fields | Not threat-specific | not recorded | migrated — review pending | Supports “Cloudflare exposes/uses X”, not “X works” | cloudflare-2026-bot-scores-detection-engines-ja3-ja4-web-bot-auth-custom-rules.md |
| SRC-003 | Bot Management documentation | Cloudflare | 2026 | capability-doc | capability | bot score; WAF custom rules; Workers; Bot Analytics; logs; verified bots; JavaScript detections; machine-learning model updates; endpoint-specific policy | bot traffic; login automation; application abuse; unwanted access to protected resources | not recorded; ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Primary Cloudflare entry for per-request scoring and endpoint-specific bot policy; legacy migrated row now has a v3 re-extraction attached | [multiple — unreconciled]; previous: cloudflare-2026-bot-management-docs.md; canonical: cloudflare-2026-bot-management(1).md |
| SRC-004 | Bot Protect, AI Detection Engine, 2025 Global Bot Security Report | DataDome | 2025–2026 | vendor-claim; threat-intel | observed (vendor-measured) | intent-based detection; signal-family taxonomy | tbd — backfill from entry | not recorded | migrated — review pending | Intent-based framing; 2.8% “fully protected” is vendor-measured; pair with SRC-015 for external evidence | datadome-2025-2026-bot-protect-ai-detection-global-bot-security-report.md |
| SRC-005 | Bot Management (product brochure) | Netacea | n.d. | vendor-claim | claimed | server-side / no-client-JS positioning; 2 case studies | tbd — backfill from entry | not recorded | migrated — review pending | Product-positioning evidence | netacea-bot-management-product-brochure.md |
| SRC-006 | Technical Showcase: ML in Advanced Bot Management | Netacea | n.d. | methods-taxonomy | capability | supervised/unsupervised, real-time/batch, general/specific; Intent Analytics | Not threat-specific | not recorded | migrated — review pending | ML-methods taxonomy; no reproducible detail | netacea-technical-showcase-machine-learning.md |
| SRC-007 | Death by a Billion Bots | Netacea | 2023 | survey | claimed (self-report) | 440-executive survey; $85.6M/company business-impact framing | tbd — backfill from entry | not recorded | migrated — review pending | Survey evidence; origin/geopolitical claims out of scope | netacea-2023-death-by-a-billion-bots.md |
| SRC-008 | Bot Manager, ACTIR, Agentic AI Security Report | Arkose Labs | 2023–2026 | vendor-claim; survey | claimed | dynamic challenges; attacker-cost framing; agentic-AI survey | tbd — backfill from entry | not recorded | migrated — review pending | Account-integrity + attacker-cost angle; several reports gated | arkose-2023-2026-bot-manager-actir-agentic-ai-reports.md |
| SRC-009 | Bot Defense, Adversarial Techniques, AI Agent Trust, 2026 Benchmark | Kasada | 2025–2026 | vendor-claim; threat-intel | observed (vendor-measured) | solver/proxy/CAPTCHA pricing; proof-of-execution; AI-agent governance | tbd — backfill from entry | not recorded | migrated — review pending | Strong attacker-economy angle | kasada-2025-2026-bot-defense-adversarial-retooling-ai-agent-trust.md |
| SRC-010 | Sightline, AI Agent Detection, OpenClaw, 2026 benchmark | HUMAN Security / PerimeterX | 2026 | vendor-claim; threat-intel | observed (vendor-measured) | cyberfraud-journey framing; AI-agent detection signal categories; OpenClaw observations | tbd — backfill from entry | not recorded | migrated — review pending | Concrete AI-agent detection signal categories | human-2026-sightline-bot-mitigation-ai-agent-detection-openclaw.md |
| SRC-034 | 2021 Credential Stuffing Report | F5 Labs / Vinberg & Overson | 2021 | threat-intel; empirical-operational | observed (vendor-measured) | credential-spill aggregation; login success-rate / password-reset / diurnal anomalies; browser automation; CAPTCHA-solving microwork; attacker sophistication tiers | credential stuffing; account takeover | Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 | needs review | Primary observed-use anchor for credential stuffing; combines spill-supply evidence with vendor-measured login abuse against large production sites | [single]; canonical: f5-2021-credential-stuffing-report.md |
| SRC-036 | OpenClaw in the wild: How autonomous agents can drive abuse at scale | HUMAN Security / Kaiserman & Cirlig | 2026 | empirical-operational | observed (vendor-measured) | autonomous-agent browser automation; exposed agent gateways; request bursts; referral UTM tagging; reconnaissance; directory/file probing | synthetic engagement; referral manipulation; reconnaissance; browser-native automation abuse | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Observed-use evidence for agentic browser automation abuse, with explicit attribution caveats | [single]; canonical: human-2026-openclaw-in-the-wild(1).md |
| SRC-037 | Agentic Visibility: How to See AI Agents in Your Traffic | HUMAN Security / McArtney | 2026 | capability-doc | capability | AI-agent identification and classification; trust levels; HTTP Message Signatures and key directories; session and route analysis; dashboard visibility | Not threat-specific; AI-agent visibility and analytics contamination | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Product/capability framing for agentic visibility, trust classification, and the shift from visibility to control | [single]; canonical: human-2026-agentic-visibility-how-to-see-ai-agent-traffic(1).md |
| SRC-038 | State of Agentic Traffic – May 2026 | HUMAN Security / Kaiserman | 2026 | empirical-operational | observed (vendor-measured) | agentic-traffic telemetry; named agent/operator mix; sector distribution; page-route categorisation; blocking rate; policy controls | Not threat-specific; AI-agent traffic across product/search, account, authentication, content, and checkout routes | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Current vendor-telemetry snapshot of agentic traffic patterns and route exposure, not proof of malicious intent | [single]; canonical: human-2026-state-agentic-traffic-may(1).md |
| SRC-039 | 2026 Thales Bad Bot Report: Bad Bots in the Agentic Age | Thales / Imperva | 2026 | empirical-operational; threat-intel | observed (vendor-measured) | bot traffic classes; AI crawlers/fetchers; API endpoint targeting; browser impersonation; session consistency; residential/mobile proxies; CAPTCHA solving; headless automation; signed AI bots | account takeover; API abuse; scraping; scalping/inventory hoarding; SMS pumping; carding/payment-flow abuse | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Broad vendor-measured snapshot of production bot, API, ATO, inventory, and AI-agent abuse; useful but not independent prevalence evidence | [single]; canonical: thales-2026-bad-bot-report(1).md |
| SRC-040 | AI-Empowered Botnets and API Visibility Gaps: Attack Trends in Financial Services | Akamai Security | 2026 | empirical-operational | observed (vendor-measured) | WAF/API alerts; API endpoint attack tracking; BOLA/BOPLA; shadow/zombie APIs; behavioural heuristics; user-risk telemetry; low-and-slow tactics; headless browsers; AI crawler classification | API abuse; scraping/AI crawler activity; bot evasion; financial-services web attacks; ATO/fraud adjacent | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Vendor telemetry source for financial-services API/bot abuse and the public-data boundary; alerts are not proof of successful attacks | [single]; canonical: akamai-2026-financial-services-security-trends(1).md |
| SRC-049 | How to Restore Fairness in Online Ticketing by Fighting Ticket Bots | DataDome / Falokun | 2026 | methods-taxonomy | claimed | ticket-bot lifecycle; account creation/takeover; rapid refresh; availability scraping; checkout automation; CAPTCHA bypass; virtual waiting rooms; intent-aware detection | ticket bots; scalping; queue abuse; limited-stock attacks; checkout automation; resale | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Vendor taxonomy for ticket bots / slot-sniping and scarce-inventory abuse; useful when paired with FTC legal-record evidence | [single]; canonical: datadome-2026-ticket-bots(1).md |
| SRC-051 | Comodo ModSecurity WAF Rules Update: The 2026 Solution / SBB-WAF-Rules | StopBadBots / sminozzi | 2025 | tooling-readme | capability | ModSecurity rules; WAF augmentation; user-agent blocklists; AI-crawler blocking; scanner detection; behavioural thresholds; WordPress hardening | bot blocking; scanner/reconnaissance; AI crawler blocking; web-application probing | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Defensive-tooling example for the simple WAF/blocklist/behavioural-threshold end of the control stack | [single]; canonical: stopbadbots-2025-sbb-waf-rules(1).md |
| SRC-054 | Block AI Bots - Cloudflare bot solutions docs | Cloudflare | 2026 | capability-doc | capability | verified AI crawler classification; unverified AI-like bot blocking; hostname-level controls; ad-hostname blocking; AI Crawl Control | AI crawler access; unverified AI-like crawling; content-access governance | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Current-trend source for AI crawler management as defensive product categorisation and publisher/content-access governance | [single]; canonical: cloudflare-2026-block-ai-bots(1).md |
| SRC-055 | Overview - Cloudflare Turnstile docs | Cloudflare | 2026 | capability-doc | capability | non-interactive JavaScript challenges; proof-of-work; proof-of-space; Web API probing; browser quirks; human-behaviour checks; pre-clearance cookie | automated scripts; non-human browser environments; protected form/login flows | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Challenge-system / CAPTCHA-alternative source showing the move from visual puzzles to browser/environment/human-like signal evaluation | [single]; canonical: cloudflare-2026-turnstile(1).md |
| SRC-056 | Detection IDs - Cloudflare bot solutions docs | Cloudflare | 2026 | capability-doc | capability | Detection IDs/tags; claimed-browser consistency; HTTP header order; heuristics; verified-bot and anomaly detections; Logpush; WAF custom rules | predictable bot behaviour; header-order mismatch; browser impersonation; endpoint-specific bot traffic | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Concrete Cloudflare source for coherence checks and turning detection signals into rules, analytics, and logging | [single]; canonical: cloudflare-2026-detection-ids(1).md |
| SRC-057 | Bot detection engines - Cloudflare bot solutions docs | Cloudflare | 2026 | methods-taxonomy | capability | heuristic checks; malicious fingerprints; JavaScript detections; headless-browser detection; headers; session characteristics; browser signals; supervised ML; bot score; anomaly detection | simple automation; headless-browser automation; sophisticated bots; malicious fingerprints | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Central Cloudflare defensive-methods taxonomy; useful vendor-side mirror of scraper-side evasion layers | [single]; canonical: cloudflare-2026-bot-detection-engines(1).md |
| SRC-058 | Overview - Cloudflare bot solutions docs | Cloudflare | 2026 | capability-doc | capability | Bot Fight Mode; Super Bot Fight Mode; Bot Analytics; firewall variables; WAF; Turnstile; API Shield; DDoS protection; defensive stack | automated traffic; known bot patterns; unwanted crawling; resource abuse; automated endpoint interaction | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Cloudflare defensive-stack overview bridging simple controls, challenge systems, per-request scoring, analytics, and endpoint-specific policy | [single]; canonical: cloudflare-2026-bot-solutions-overview(1).md |
| SRC-079 | DataDome Releases VM-Based Obfuscation: The Next Evolution in Client-Side Detection Security | DataDome / Vayno | 2026 | capability-doc / vendor product announcement / defensive architecture explanation | capability | VM obfuscation; client-side detection; browser detection; Device Check; Slider; WebAssembly; dynamic code regeneration; proprietary bytecode; anti-reverse-engineering | Bot detection code protection; client-side detection arms race | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Direct vendor source for VM-based obfuscation applied to commercial client-side bot-detection logic | [single]; canonical: datadome-2026-vm-based-obfuscation-client-side-detection(1).md |
| SRC-084 | Commercial CAPTCHA-solving API ecosystem | CapSolver; Hayes; HasData / Skakun | 2025–2026 | capability-doc / vendor marketing / tutorial ecosystem / vendor-adjacent benchmark | capability | CAPTCHA-solving APIs; reCAPTCHA; Turnstile; Geetest; AWS WAF CAPTCHA; token generation; solver markets; AI agents; automation workflows | CAPTCHA defeat; scraping; automated account management; price monitoring; SEO/SERP automation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Capability evidence that CAPTCHA-solving APIs are openly marketed and integrated into automation and AI-agent workflows | [single]; canonical: commercial-captcha-solving-api-ecosystem-2026(1).md |
| SRC-086 | OpenClaw: exposed AI-agent gateways and enterprise risk | Bitsight / Cruz | 2026 | empirical-exposure measurement / attack-surface analysis / threat-intelligence | observed (exposure measurement) | OpenClaw; exposed services; internet scanning; autonomous agents; integrations; prompt injection; RCE; credential exposure; WebSocket API; weak token | AI-agent exposure; exposed-agent attack surface; misconfiguration | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Complements HUMAN OpenClaw by measuring exposed gateways and configuration/blast-radius risk rather than traffic abuse | [single]; canonical: bitsight-2026-openclaw-exposed-ai-agent-gateways(1).md |
| SRC-087 | How the Peer-to-Business Model Redefines App Monetization | Infatica SDK Experts | 2025 | capability-doc / business-model description / vendor marketing | capability | peer-to-business SDK; residential proxies; idle bandwidth; opt-in peers; public web data; geo-restrictions; rate limits; CAPTCHA walls | Commercial scraping infrastructure; proxy supply-chain; access-barrier bypass | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Vendor business-model source for how residential/peer proxy supply can be built through SDK-enabled app users | [single]; canonical: infatica-2025-peer-to-business-app-monetization-sdk(1).md |
| SRC-088 | Residential Proxies: Definition, Use Cases, and Best Providers | Bright Data / Zanini | 2026 | capability-doc / vendor marketing / market map | capability | residential proxies; ISP proxies; rotating proxies; sticky sessions; geo-targeting; IP reputation; rate-limit and IP-ban avoidance | Web scraping; price monitoring; ad verification; sneaker and ticket purchasing; SEO monitoring; social-media management | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Commercial proxy-ecosystem source explaining residential proxy capabilities and use cases, including limited-stock purchasing | [single]; canonical: brightdata-2026-residential-proxies-definition-use-cases-best-providers(1).md |
[multiple — raw source; extraction pending]; canonical: X-Force Threat Intelligence Index 2026(1).txt; supporting: X-Force 2026 Threat Intelligence Index - Executive Summary _ IBM(1).html |SRC-099 | 2025 Cloud Threat Hunting and Defense Landscape | Recorded Future Insikt Group | 2026 | threat-intelligence synthesis / observed incidents / mitigations and detections | threat-intelligence synthesis | cloud/SaaS abuse; valid accounts; tokens/keys/service accounts; cloud APIs; CI/CD; backups/snapshots; SaaS functionality; LLM/ML service abuse | cloud abuse; SaaS abuse; credential abuse; account takeover; third-party compromise; cloud ransomware; supply-chain abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Cloud/SaaS adversarial-infrastructure anchor; useful for showing legitimate cloud functions and identities as attack infrastructure, not bot-specific telemetry |
[single]; canonical: recordedfuture-2026-cloud-saas-abuse-adversarial-infrastructure(1).md |SRC-100 | Quarterly Threat Intelligence Report: Q1 2026 | KasadaIQ | 2026 | vendor telemetry / marketplace monitoring / threat-intelligence assessment | observed-vendor-telemetry | threat enablers; bots-as-a-service; automated checkout; account markets; verification/KYC/2FA bypass services; residential proxies; no-code/vibe-coded bots; AI-account demand | account takeover; automated checkout; scalping; credential markets; verification bypass; reselling communities; limited-inventory abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Main bot/automated-abuse source for SaaSification of adversarial infrastructure and market-enabled automation; use cautiously because raw telemetry and source lists are not reproducible |
[single]; canonical: kasada-2026-q1-threat-enablers-saasification-adversarial-infrastructure(1).md |SRC-101 | Mythos and the cost of attacking | Summers / Netwrix | 2026 | opinion / strategic commentary / vendor blog | low | AI attacker economics; cost-of-attack framing; OODA loop; Pyramid of Pain; vulnerability discovery; phishing; command-and-control; intent detection | Not threat-specific; AI-enabled attacker cost framing | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Low-priority security-economics context for the claim that cheap AI can reduce attacker cost; do not use as empirical evidence |
[single]; canonical: summers-2026-mythos-cost-of-attacking-ai-security-economics(1).md |Academic and research
| id | source | org / authors | year | evidence basis | operational proximity | signals / techniques | threat types | provenance | review state | project impact | entry file |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRC-011 | ML-Based Detection and Evasion Techniques for Advanced Web Bots (PhD thesis, Bournemouth) | Iliou, C. | 2022 | empirical-academic | capability | sophistication taxonomy (simple→advanced); web-log + mouse detection; RL & GAN evasion | tbd — backfill from entry | not recorded | migrated — review pending | Primary academic anchor; controlled/academic setting | iliou-2022-thesis-advanced-web-bots.md |
| SRC-012 | Towards a framework for detecting advanced web bots (ARES) | Iliou et al. | 2019 | empirical-academic | capability | advanced-bot AUC ~0.68 at low FPR; proxy labels | tbd — backfill from entry | not recorded | migrated — review pending | Cleanest source for “simple-bot results hide weak advanced-bot detection” | iliou-2019-ares-detecting-advanced-web-bots.md |
| SRC-013 | Web Bot Detection Evasion Using GANs (CSR) | Iliou et al. | 2021 | empirical-academic | capability | GAN evasion of CNN mouse/touch detectors; web mouse recall → ~0.45 | tbd — backfill from entry | not recorded | migrated — review pending | Adversarial framing | iliou-2021-csr-web-bot-detection-evasion-gans.md |
| SRC-014 | Web Bot Detection Evasion Using Deep RL (ARES) | Iliou et al. | 2022 | empirical-academic | capability | RL web-log evasion; detection/evasion as repeated game | tbd — backfill from entry | not recorded | migrated — review pending | PoC mechanism, not observed campaigns | iliou-2022-ares-web-bot-detection-evasion-deep-rl.md |
| SRC-015 | FP-Inconsistent (arXiv 2406.07647) | Venugopalan et al. | 2025 | empirical-operational | measured (honey-site) | purchased evasive bot traffic vs DataDome/BotD on a honey site; fingerprint inconsistency rules | impression / ad fraud | not recorded | migrated — review pending | Strongest operational academic anchor; external evidence on DataDome | venugopalan-2025-fp-inconsistent-fingerprint-inconsistencies-evasive-bot-traffic.md |
| SRC-016 | FP-Inspector (IEEE S&P) | Iqbal et al. | 2021 | empirical-academic | n/a (not bot-use) | detecting fingerprinting scripts (static + dynamic JS) | Not threat-specific | not recorded | migrated — review pending | Foundations for fingerprinting section; not direct bot-detection evidence | iqbal-2021-fingerprinting-the-fingerprinters-fp-inspector.md |
| SRC-017 | Browser fingerprints for web authentication (ACM TWEB) | Andriamilanto et al. | 2021 | empirical-academic | n/a (not bot-use) | fingerprint distinctiveness/stability at scale | Not threat-specific (auth context) | not recorded | migrated — review pending | Auth context not bots; 2016–17 data, needs replication caveat | andriamilanto-2021-large-scale-browser-fingerprints-web-authentication.md |
| SRC-018 | Detecting Bad Bots via TLS Fingerprints (arXiv 2602.09606) | Jarad & Bıçakcı | 2026 | empirical-academic | measured (weak labels) | JA4/TLS classification; XGBoost/CatBoost AUC ~0.998 | tbd — backfill from entry | not recorded | migrated — review pending | Strong headline metrics; labelling (“bot” in app field) is a real caveat | jarad-2026-handshakes-tell-truth-tls-fingerprints-ja4-bad-bots.md |
| SRC-019 | BeCAPTCHA-Mouse (arXiv 2005.00890) | Acien et al. | 2021 | empirical-academic | capability | mouse-dynamics detection; synthetic (function/GAN) trajectories; public benchmark | Not threat-specific | not recorded | migrated — review pending | Constrained point-and-click task | acien-2021-becaptcha-mouse-synthetic-mouse-trajectories.md |
| SRC-020 | Hacking reCAPTCHA v3 using RL (arXiv 1903.01003) | Akrout et al. | 2019 | empirical-academic | capability | RL mouse-movement vs reCAPTCHA v3 score | CAPTCHA defeat | not recorded | migrated — review pending | 2019 PoC against one setup; narrow, likely stale | akrout-2019-recaptcha-v3-reinforcement-learning.md |
| SRC-033 | FP-Agent: Fingerprinting AI Browsing Agents | Wang, Shafiq & Vekaria | 2026 | empirical-academic; empirical-operational | measured (honey-site) | browser fingerprints; behavioural fingerprints; typing latency; paste/change events; scroll and mouse movement; XGBoost/SHAP; Cloudflare free-tier case study | Not threat-specific; AI-agent detection on benign web tasks | Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 | needs review | Independent measured anchor for AI-agent detectability; external check of Cloudflare free-tier behaviour, not enterprise efficacy | [single]; canonical: wang-2026-fp-agent-fingerprinting-ai-browsing-agents.md |
| SRC-047 | Balancing Security and Privacy: Web Bot Detection, Privacy Challenges, and Regulatory Compliance under the GDPR and AI Act | Martínez Llamas et al. | 2025 | review / methods-taxonomy / legal-regulatory analysis | capability | network/request data; browser/device/TLS fingerprinting; behavioural biometrics; proxies; headless browsers; adversarial fingerprints; PETs; GDPR / AI Act controls | bot detection/evasion; credential stuffing; scraping; scalping; privacy/compliance risk | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Academic taxonomy and compliance anchor for detection signals, evasion classes, privacy risks, and GDPR / AI Act implications; re-extraction attached for review-paper framing | [multiple — unreconciled]; previous: martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act(1).md; canonical: martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md |
| SRC-048 | How long does it take to get owned? | Wardle | 2019 | empirical-academic | measured (honey identities) | honey identities; leaked credential publication; paste sites; login monitoring; 2FA alerts; honeytokens; IP/user-agent cautions | leaked-credential use; credential stuffing adjacent; account takeover risk | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Independent measured-use evidence for leaked credential use and honey-account methodology; small and dated but transparent | [single]; canonical: wardle-2019-how-long-does-it-take-to-get-owned(1).md |
| SRC-066 | Browser Fingerprinting: A survey | Laperdrix, Bielova, Baudry & Avoine | 2020 | review; methods-taxonomy; foundations | n/a (foundational survey) | browser/device fingerprinting; User-Agent; HTTP headers; JavaScript APIs; Canvas; WebGL; AudioContext; fonts; plugins; extensions; entropy; anonymity sets; fingerprinting defences | cross-site tracking; stateless device identification; browser/device re-identification; privacy loss; dual-use security/fraud signal | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Core browser-fingerprinting foundation source covering concepts, attributes, metrics, history, and defences; not observed bot-abuse evidence | [single]; canonical: laperdrix-2020-browser-fingerprinting-survey(1).md; source family notes ACM 2020 paper with arXiv v2 alternate |
| SRC-067 | How Unique is Whose Web Browser? The Role of Demographics in Browser Fingerprinting among US Users | Berke et al. | 2025 | empirical-measurement; dataset paper | measured (browser-attribute / demographic dataset) | browser fingerprinting; demographics; User-Agent; languages; timezone; screen resolution; platform; hardware concurrency; device memory; WebGL; entropy; anonymity sets; demographic inference | cross-site tracking; passive/active fingerprinting; re-identification risk; demographic inference; unequal privacy risk | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Empirical update to fingerprinting foundations; useful for unequal privacy-risk and demographic-inference framing, not bot-abuse prevalence | [single]; canonical: berke-2025-how-unique-whose-web-browser-demographics(1).md |
| SRC-068 | Battling bots and bad data: enhancing data quality in online surveys | Sudbury & Marks | 2026 | empirical-methods; review-informed case study | measured (online survey quality controls) | CAPTCHA/reCAPTCHA; open-ended bot checks; attention checks; consistency checks; quota screening; speed/page-time checks; IP/location/duplicate checks; Qualtrics fraud controls | online survey bots; bad data; low-quality responses; duplicate participation; quota-gaming; survey fraud | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Review-informed case study showing bot-like automation and inattentive humans can degrade online survey data; supports layered controls outside classic cybersecurity | [single]; canonical: sudbury-marks-2026-battling-bots-bad-data(1).md |
| SRC-074 | Secure Development of a Hooking-Based Deception Framework Against Keylogging Techniques | Sajid, Ahmed & Sosnoski | 2025 | empirical-method demonstration / preprint | measured-but-bounded | API hooking; runtime instrumentation; EasyHook; Microsoft Detours; decoy injection; input perturbation; anti-hooking resilience | Not bot-specific; keylogging deception and credential-theft context | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Smaller related entry for cyber-deception, runtime instrumentation, and credential-theft-adjacent defences | [single]; canonical: sajid-2025-hooking-based-deception-keylogging(1).md |
| SRC-075 | Protecting Client Browsers with a Principal-based Approach | Cao | 2014 | dissertation / architecture proposal / method demonstration | n/a (browser-security foundation) | browser principals; client-side isolation; Virtual Browser; JavaScript sandboxing; third-party JavaScript; JShield; XSS; postMessage | Not bot-specific; browser security and malicious-content detection foundation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Browser-security foundation for client-side isolation, JavaScript virtualisation, and principal boundaries | [single]; canonical: cao-2014-protecting-client-browsers-principal-based-approach(1).md |
| SRC-076 | Layered obfuscation: a taxonomy of software obfuscation techniques for layered security | Xu, Zhou, Ming & Lyu | 2020 | review / taxonomy / conceptual framework | n/a (foundation / taxonomy) | layered obfuscation; reverse engineering; control-flow obfuscation; data obfuscation; JavaScript obfuscation; application-layer obfuscation | Not bot-specific; anti-reverse-engineering and client-side protection background | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Low-priority academic foundation for layered obfuscation as risk management rather than one magic protection | [single]; canonical: xu-2020-layered-obfuscation-taxonomy-software-security(1).md |
| SRC-078 | Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries | Sudhir et al. | 2026 | empirical-method demonstration / deobfuscation research / preprint | measured-but-bounded | VM obfuscation; virtualization obfuscation; deobfuscation; VMProtect; Themida; Tigress; symbolic emulation; CFG recovery | Not bot-specific; reverse-engineering arms-race background | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Academic counterweight to VM-obfuscation claims; shows VM obfuscation is strong but actively attacked by deobfuscation research | [single]; canonical: sudhir-2026-pushan-trace-free-deobfuscation-vm-obfuscated-binaries(1).md |
| SRC-082 | AI/ML for cybersecurity and cyber-risk management | Kolhar & Sridevi | 2025–2026 | review / conceptual framework / governance overview | n/a (background) | AI cybersecurity; supervised and unsupervised learning; anomaly detection; UEBA; SOC; adversarial ML; XAI; governance | Broad cybersecurity detection and governance; not bot-specific | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Low-priority background source for ML/security governance language; use cautiously because it is broad and not bot-specific | [single]; canonical: kolhar-sridevi-2025-2026-ai-ml-cybersecurity-governance-background(1).md |
| SRC-083 | API Security Testing and Exploitation Techniques | Kolhar & Gundoor | 2026 | API-security taxonomy / testing-methods overview / defensive-guidance | control-and-capability | API security; BOLA; broken authentication; rate limiting; business-logic abuse; API scraping; OAuth2; OIDC; JWT; shadow APIs | API abuse; brute force; API scraping; business-logic abuse; API DoS | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Secondary overview for API abuse, API security testing, and business-logic risk; less authoritative than OWASP/NIST/PortSwigger | [single]; canonical: kolhar-gundoor-2026-api-security-testing-exploitation-techniques(1).md |
| SRC-085 | CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training | Chen et al. | 2026 | empirical-measurement / method demonstration / preprint | measured-but-bounded | CAPTCHA solving; GUI agents; VLMs; ReCAP; OCR; slider CAPTCHA; image grid; self-correction; reasoning-action traces | CAPTCHA defeat; AI-agent automation; challenge-response bypass capability | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Academic capability evidence that GUI agents can be trained for modern interactive CAPTCHA tasks, bounded by synthetic/benchmark setting | [single]; canonical: chen-2026-recap-captcha-native-gui-agents(1).md |
| SRC-089 | Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet | Choi et al. | 2020 | empirical-measurement / infrastructure-analysis | measured | open proxies; residential proxies; proxy geolocation; ASN; blacklists; IP reputation; malicious activity; evasion infrastructure | Proxy-enabled abuse infrastructure; not a specific web-abuse campaign | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Independent empirical foundation for residential/open proxy infrastructure and blacklist overlap | [single]; canonical: choi-2020-understanding-proxy-ecosystem(1).md |
[duplicate upload deduped]; canonical: anderson-2019-measuring-changing-cost-cybercrime(1).md; duplicate: anderson-2019-measuring-changing-cost-cybercrime(2).md |Threat surface and territory
Capability / infrastructure evidence, not proof of malicious use or bypass success. README/marketing claims are not independent test results.
| id | source | org / authors | year | evidence basis | operational proximity | signals / techniques | threat types | provenance | review state | project impact | entry file |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRC-021 | Official documentation (Playwright / Puppeteer / Selenium) | project maintainers | 2026 | capability-doc | capability | baseline browser-automation capability layer | Not threat-specific | not recorded | migrated — review pending | Capability, not intent; sits beneath stealth/cloud layers | playwright-puppeteer-selenium-2026-browser-automation-docs.md |
| SRC-022 | undetected-chromedriver (GitHub/PyPI) | ultrafunkamsterdam | 2021–2024 | tooling-readme | capability | Selenium ChromeDriver evasion layer; explicit IP-reputation caveat | Not threat-specific | not recorded | migrated — review pending | README claims, not independent tests | ultrafunkamsterdam-2024-undetected-chromedriver-docs-github.md |
| SRC-023 | puppeteer-extra-plugin-stealth (GitHub/npm) | berstend | 2018–2023 | tooling-readme | claimed | modular evasion catalogue: webdriver, plugins, codecs, WebGL | Not threat-specific | not recorded | migrated — review pending | “Passes public bot tests” ≠ production | berstend-2023-puppeteer-extra-plugin-stealth-docs-github.md |
| SRC-024 | Anti-scraping bypass, stealth, proxies, fingerprints, Cloudflare bypass | ScrapFly | 2025–2026 | capability-doc; vendor-claim | claimed | API-level bypass (asp); byte-perfect JA4/HTTP2/QUIC claims; names Nodriver/Camoufox/UC Mode |
scraping | not recorded | migrated — review pending | Documents the attacker mental model for Cloudflare | scrapfly-2025-2026-anti-scraping-bypass-stealth-proxies-fingerprints.md |
| SRC-025 | Web Unlocker, Browser API, proxies, agentic web execution | Bright Data | 2026 | capability-doc; vendor-claim | claimed | managed proxies/fingerprints/CAPTCHA + cloud browsers; password entry disabled by default | scraping | not recorded | migrated — review pending | Compliance/public-data framing | brightdata-2026-web-unlocker-browser-api-proxies-anti-bot-bypass.md |
| SRC-026 | Cloud-browser & agent docs (Browserless / Browserbase / Hyperbrowser) | respective vendors | 2026 | capability-doc | capability | cloud browsers + AI-agent infra; stealth, proxies, CAPTCHA-solving, persistent sessions | Not threat-specific | not recorded | migrated — review pending | Bridges automation to agentic browsers | browserless-browserbase-hyperbrowser-2026-cloud-browser-agent-automation-docs.md |
| SRC-029 | How to Use Rnet: The Blazing-Fast Python HTTP Client | RoundProxies / Marius Bernard | 2025 | tooling-readme; capability-doc | capability | browser TLS/HTTP2 impersonation; JA3/custom fingerprints; header order; cookies; sticky sessions; proxies; WebSockets | scraping | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 | needs review | Scraper-side evidence that browser-like TLS/protocol impersonation is treated as a normal evasion capability | [single]; canonical: roundproxies-rnet-source-extraction.md |
| SRC-030 | How to Bypass Cloudflare Turnstile | ScrapFly / Hisham | 2026 | tooling-readme; vendor-claim; capability-doc | claimed | Turnstile modes; browser fingerprinting; canvas/WebGL; behavioural signals; JA3/JA4; proof-of-work/token handling; cloud browsers; residential proxies | scraping; CAPTCHA/challenge-response evasion | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 | needs review | Turnstile-specific scraper-side account of challenge-response signal families and bypass classes | [single]; canonical: scrapfly-cloudflare-turnstile-source-extraction.md |
| SRC-031 | How to Bypass Imperva Incapsula when Web Scraping in 2026 | ScrapFly / Bernardas Alisauskas | 2026 | tooling-readme; capability-doc; vendor-claim | claimed | Imperva/Incapsula block indicators; JA3/JA4; IP reputation; header order; JS/canvas/WebGL/audio fingerprinting; cookies/sessions; rate limiting; stealth browsers | scraping; API scraping; WAF/bot-protection evasion | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 | needs review | Imperva-specific scraper-side view of WAF/bot-protection detection surfaces and claimed evasion patterns | [single]; canonical: scrapfly-imperva-incapsula-source-extraction.md |
| SRC-032 | Quo vadis, crawlers? Progress and what’s next on safeguarding our infrastructure | Wikimedia Foundation | 2026 | empirical-operational; threat-intel | observed (first-party operator-reported) | AI crawler traffic; residential proxies; browser-identity spoofing; rate-limit circumvention; robot-policy updates; identification-tiered API limits; bot detection | scraping / aggressive crawling; infrastructure strain | Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 | needs review | First named-operator account in the register; strong operator-side evidence for AI-crawler pressure and residential-proxy evasion, but platform-specific | [single]; canonical: wikimedia-2026-quo-vadis-crawlers-infrastructure.md |
| SRC-035 | How we’re dealing with bots and the reselling of driving tests | DVSA / Ryder | 2023 | threat-intel | observed (platform-side) | appointment search/reservation automation; CAPTCHA; bot-protection measures; ADI-service monitoring; cancellation-rate and account-link controls | scarce-resource appointment abuse; slot-sniping; slot-resale; denial of inventory | Codex / GPT-5 / source-extraction-prompt v3 | needs review | Concrete public-sector appointment-abuse example for the booking-style worked example and scarce-resource abuse lane | [single]; canonical: dvsa-2023-bots-reselling-driving-tests.md |
| SRC-042 | Is Web Scraping Legal? Key Insights and Guidelines You Need to Know | ScrapingBee | 2026 | legal-explainer | n/a (legal context; not use evidence) | terms-of-service; copyright/fair-use risk; personal-data processing; GDPR/CCPA/CFAA framing; robots.txt; rate limiting; CAPTCHA/paywall/login/IP-block risk | scraping; unwanted automation; unauthorised-access risk; privacy-risky data collection | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Scraper-side governance framing around the boundary between technical capability and permission; verify legal claims against primary/specialist sources before use | [single]; canonical: scrapingbee-2026-web-scraping-legal-guidelines(1).md |
| SRC-043 | Advanced Web Scraping: Hidden Techniques Pro Developers Actually Use | ScrapingBee | 2026 | bypass-guide | capability | async orchestration; multiprocessing; rate control; backoff/jitter; circuit breakers; recursive filtering; JavaScript rendering; AJAX/API discovery; proxy rotation; CAPTCHA solving | web scraping at scale; large-scale data extraction; pagination-limit circumvention; anti-blocking | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Public scraper-side engineering-pattern source for robust extraction at scale; high dual-use, so cite technique families only | [single]; canonical: scrapingbee-2026-advanced-web-scraping-hidden-techniques(1).md |
| SRC-044 | Best Price Scraping Tools for 2026: Top Services Compared | ScrapingBee | 2026 | capability-doc | capability | price scraping; scraping APIs; no-code scrapers; JavaScript rendering; headless browsers; proxy rotation; CAPTCHA handling; anti-bot reliability; AI extraction | ecommerce scraping; price intelligence; competitive-intelligence collection; unwanted automated price collection | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Market-map evidence showing price scraping as packaged commercial service with anti-bot handling as a core buying criterion | [single]; canonical: scrapingbee-2026-price-scraping-tools(1).md |
| SRC-045 | How To Bypass PerimeterX Anti-Bot Protection System In 2026 | ScrapingBee / Krukowski | 2026 | bypass-guide | capability | IP reputation; TLS fingerprints; HTTP/2; header order; browser fingerprinting; cookies/tokens; session continuity; behavioural signals; residential/mobile proxies; stealth browsers | anti-bot evasion; web scraping against PerimeterX/HUMAN-protected sites; browser automation | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Public scraper-side evidence that named-defender bypass thinking is framed as multi-layer signal alignment across IP, TLS, HTTP, fingerprint, session, and behaviour | [single]; canonical: scrapingbee-2026-perimeterx-human-bypass(1).md |
| SRC-046 | Avoiding bot detection: How to scrape the web without getting blocked? / browser-fingerprinting | niespodd | n.d. / ongoing | tooling-readme | claimed | browser fingerprinting; anti-detection; stealth browsers; Puppeteer/Playwright/Selenium; residential proxies; CAPTCHA solving; TLS/JA3/JA4; WebGL/fonts/client hints/WebDriver | scraper-side evasion; anti-detection; proxy-assisted scraping; browser automation | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Public scraper-side/evasion mental model and tooling-ecosystem map; maintainer claims, not independent effectiveness evidence | [single]; canonical: niespodd-browser-fingerprinting(1).md |
| SRC-050 | FTC Brings First-Ever Cases Under the BOTS Act | Federal Trade Commission | 2021 | legal-record | observed (enforcement record) | automated ticket search/reservation; IP-address concealment; fictitious accounts; multiple credit cards; purchase-limit circumvention | ticket bots; scalping; limited-stock inventory capture; purchase-limit circumvention; resale-market abuse | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | High-value observed-use / enforcement evidence for ticket-bot abuse and limited-stock automation; cite as FTC allegations/orders unless underlying records are checked | [single]; canonical: ftc-2021-first-bots-act-cases(1).md |
| SRC-052 | bad-asn-list: open-source ASN blocklist for cloud/hosting/colo traffic | Hamachek | n.d. (~2019–2020 unverified) | tooling-readme; empirical-operational | observed (first-party anecdotal) | datacenter/hosting/colo ASN blocklist; IP→ASN lookup; network-origin reputation; VPN/hosting egress; signup fraud scoring | fake account creation; signup abuse; datacenter-origin automation | Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 | needs review | Worked example of network-origin / ASN-reputation detection and the datacenter-blocking → residential-proxy arms-race baseline; anecdotal and dated | [single]; canonical: hamachek-bad-asn-list-datacenter-asn-blocklist.md |
| SRC-053 | The Best Web Scraping API to Avoid Getting Blocked | ScrapingBee | 2026 | capability-doc | capability | managed scraping API; headless Chrome; JavaScript rendering; selector waits; custom interactions; proxy rotation; residential/stealth proxies; AI extraction; LLM/RAG data ingestion | web scraping; commercial scraping infrastructure; anti-blocking abstraction; ecommerce and LLM data collection | ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 | needs review | Commercial capability source showing scraping-as-a-service packaging of browsers, proxies, extraction, geotargeting, and anti-blocking features | [single]; canonical: scrapingbee-2026-web-scraping-api(1).md |
| SRC-077 | On the Architecture of Bot Detection Services | Tschacher | 2021 | technical explainer / architecture analysis / attacker-aware commentary | n/a (architecture analysis) | passive detection; client-side detection; JavaScript fingerprinting; TLS/TCP/IP fingerprinting; HTTP headers; IP reputation; cookies; sessions; JavaScript obfuscation; JavaScript VMs | Bot detection architecture; browser automation detection context | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Architecture-level source connecting client/server signal layers, spoofability, and protection of client-side detection scripts | [single]; canonical: tschacher-2021-architecture-of-bot-detection-services(1).md |
| SRC-080 | Ticketmaster v. Prestige Entertainment West: ticket bots, dummy accounts, CAPTCHA, and legal remedies | Ticketmaster litigation; Proskauer summary; Ballon context | 2018–2019 | court pleadings/order / litigation allegations / settlement summary / legal analysis | observed (legal case / alleged conduct) | ticket bots; dummy accounts; CAPTCHA; access controls; CFAA; DMCA; BOTS Act; purchase limits; resale | Ticket scalping; automated ticket purchasing; purchase-limit circumvention; inventory capture | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Strong legal-case evidence for alleged ticket-bot activity, dummy accounts, CAPTCHA circumvention context, and settlement remedies | [single]; canonical: ticketmaster-prestige-2018-2019-ticket-bots-settlement(1).md |
| SRC-081 | U.S. Senate Ticketmaster / Taylor Swift case: scalper bots, Verified Fan, and live-event ticketing | Berchtold; Bradish; Guardian / U.S. Senate hearing | 2023 | public testimony / contested case account / secondary press reporting | observed-claim | Ticketmaster; Taylor Swift; Verified Fan; scalper bots; access-code servers; BOTS Act; secondary ticketing; live events | Ticket bots; queue pressure; access-code-server attack pressure; scalping; live-event ticketing abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | High-profile public-hearing source for ticket-bot pressure, with clear caveat that core bot-volume claims come from Live Nation/Ticketmaster | [single]; canonical: us-senate-2023-ticketmaster-taylor-swift-scalper-bots(1).md |
| SRC-094 | Detecting Post-Compromise Threat Activity in Microsoft Cloud Environments | CISA | 2021 | government advisory / detection guidance / incident-linked TTPs | observed-guidance (historical / archived) | Microsoft cloud identity; Azure AD; M365/O365; federated identity; forged tokens; OAuth/SAML; service principals; API access; Sparrow | cloud identity compromise; API-based persistence; post-compromise cloud activity; credential/service-account abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Supporting official source for cloud identity/API persistence after compromise; archived and historical, so not current trend evidence | [single]; canonical: cisa-2021-post-compromise-microsoft-cloud-identity-api-access(1).md |
| SRC-095 | Scattered Spider | FBI / CISA / RCMP / ASD ACSC / AFP / CCCS / NCSC-UK | 2025 | government advisory / investigation-derived TTPs / MITRE ATT&CK mapping | observed-investigative | helpdesk social engineering; SIM swap; MFA fatigue; OTP; valid accounts; SSO; RMM/remote-access tools; cloud discovery; ransomware | identity abuse; account takeover; helpdesk compromise; valid-account intrusion; legitimate-tool abuse; ransomware/extortion | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Strong official non-vendor evidence for identity/social-engineering/legitimate-tool abuse, but actor-specific and not bot-specific | [single]; canonical: cisa-fbi-2025-scattered-spider-identity-helpdesk-legitimate-tools(1).md |
| SRC-096 | Hiding in Plain Sight: Tracking Bulletproof Hosting and Abused RDP Infrastructure | Censys | 2026 | internet-scale scanning analysis / technical measurement / threat detection | measured infrastructure | bulletproof hosting; abused RDP; Windows hostnames; VM templates; ASNs; VPS; infrastructure clustering; takedown evasion | adversarial infrastructure; ransomware infrastructure context; hosting abuse; persistence/evasion | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Infrastructure-measurement source for adversarial hosting and abused RDP patterns; balances generic threat reports with internet-scale artifact analysis | [single]; canonical: censys-2026-bulletproof-hosting-abused-rdp-infrastructure(1).md |
| SRC-102 | Commercial automation cost stack 2026: scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification | Combined pricing / market source cluster | 2026 | vendor pricing pages / vendor-adjacent comparison / industry survey / pricing snapshots | market availability | scraping APIs; managed scraping; residential proxies; CAPTCHA solving; temporary SMS; browser rendering; proxy routing; account verification inputs | scraping; CAPTCHA defeat; account creation; credential-stuffing support; indirect scarce-resource automation | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Cost-of-capability source showing key automation inputs are modular and purchasable; not abuse prevalence or effectiveness evidence | [single]; canonical: commercial-automation-cost-stack-2026-scraping-proxies-captcha-sms(1).md |
Framing-distance ledger
The project’s central analytical discipline: each source approximates the real problem differently and fails to represent it differently (EVIDENCE-REVIEW.md §5). The what it cannot show column below is migrated from the old register’s notes where present; everything else is tbd — backfill from entry pending a read of working/register-entries/.
| id | source | what it approximates | what it fails to represent | what it cannot show (migrated) |
|---|---|---|---|---|
| SRC-001 | OWASP OAT | a shared vocabulary for automated-threat types | tbd — backfill from entry | Taxonomy/ontology only — no detection or prevalence evidence |
| SRC-002 | Cloudflare bots docs | what one major vendor’s control plane exposes and uses | tbd — backfill from entry | That any exposed signal actually works in production — only that Cloudflare uses/exposes it |
| SRC-003 | Cloudflare Bot Mgmt | the product surface area and WAF/Workers variables | tbd — backfill from entry | Efficacy; product structure ≠ detection performance |
| SRC-004 | DataDome | intent-based detection framing and signal families | tbd — backfill from entry | “Fully protected” exposure figures are vendor-measured, not independently verifiable |
| SRC-005 | Netacea brochure | server-side / no-client-JS detection positioning | tbd — backfill from entry | Case-study results are vendor-reported |
| SRC-006 | Netacea ML showcase | a taxonomy of ML approaches to bot detection | tbd — backfill from entry | No reproducible method detail; cannot validate any approach |
| SRC-007 | Netacea survey | executive-perceived business impact of bots | tbd — backfill from entry | Self-reported survey; not measured prevalence or efficacy |
| SRC-008 | Arkose | attacker-cost / dynamic-challenge framing; agentic-AI claims | tbd — backfill from entry | Several reports gated; survey/vendor evidence only |
| SRC-009 | Kasada | the attacker economy (solver/proxy/CAPTCHA pricing) | tbd — backfill from entry | Vendor/threat-intel framing; pricing claims not independently audited |
| SRC-010 | HUMAN/PerimeterX | AI-agent detection signal categories; OpenClaw observations | tbd — backfill from entry | Vendor/threat-intel; detection-category claims not externally verified |
| SRC-011 | Iliou thesis | advanced-bot detection in a controlled academic setting | tbd — backfill from entry | Controlled/academic setting; does not establish production behaviour |
| SRC-012 | Iliou 2019 | that simple-bot metrics hide weak advanced-bot detection | tbd — backfill from entry | Proxy labels; advanced-bot AUC ~0.68 at low FPR is the honest figure |
| SRC-013 | Iliou 2021 | GAN evasion of CNN mouse/touch detectors | tbd — backfill from entry | One adversarial strategy; recall drop to ~0.45 is not a worst-case bound |
| SRC-014 | Iliou 2022 RL | detection/evasion as a repeated game (RL evasion) | tbd — backfill from entry | PoC mechanism, not observed campaigns |
| SRC-015 | FP-Inconsistent | evasive bot traffic vs commercial detectors on a honey site | tbd — backfill from entry | Threat model is impression fraud; one honey site, not general production |
| SRC-016 | FP-Inspector | detecting fingerprinting scripts, not bots | tbd — backfill from entry | Not direct bot-detection evidence |
| SRC-017 | Andriamilanto | fingerprint distinctiveness/stability at scale (auth) | tbd — backfill from entry | Auth context not bots; 2016–17 data, needs replication |
| SRC-018 | Jarad TLS | JA4/TLS classification of bad bots | tbd — backfill from entry | Labelling (“bot” in app field) caveats the headline AUC ~0.998 |
| SRC-019 | BeCAPTCHA-Mouse | mouse-dynamics detection + synthetic trajectories | tbd — backfill from entry | Constrained point-and-click task; not free browsing |
| SRC-020 | Akrout reCAPTCHA | RL mouse movement vs one reCAPTCHA v3 setup | tbd — backfill from entry | 2019 PoC against one setup; likely stale |
| SRC-021 | automation docs | the baseline automation capability layer | tbd — backfill from entry | Capability, not intent or malicious use |
| SRC-022 | undetected-chromedriver | a Selenium evasion layer’s claimed capabilities | tbd — backfill from entry | README claims, not independent tests; maintainer’s own IP-reputation caveat |
| SRC-023 | puppeteer-stealth | a modular evasion catalogue | tbd — backfill from entry | Passing public bot tests ≠ evading production detection |
| SRC-024 | ScrapFly | the attacker mental model for Cloudflare bypass | tbd — backfill from entry | Byte-perfect-fingerprint claims are vendor claims, not verified bypass success |
| SRC-025 | Bright Data | managed bypass infrastructure + cloud browsers | tbd — backfill from entry | Compliance framing; capability claims not independently verified |
| SRC-026 | cloud-browser/agent docs | cloud-browser + AI-agent infrastructure features | tbd — backfill from entry | Feature availability, not evasion efficacy |
| SRC-027 | OWASP Handbook v1.3 | the defender’s naming layer for automated web-application threats and broad countermeasure classes | no prevalence, detection performance, algorithmic detail, production telemetry, or empirical AI-agent evidence; intent categories may not separate cleanly in observed traffic | Taxonomy and countermeasure suggestions only; cannot show detectability, prevalence, or efficacy |
| SRC-028 | Playwright cookies tutorial | the basic automation capability of reading and preserving cookie/session state from a browser context | no adversarial setting, no bot detection, no cookie replay at scale, no session binding or risk-scoring controls | Cannot show that cookies are sufficient to impersonate users or bypass bot detection |
| SRC-029 | RoundProxies Rnet | scraper-side HTTP-client evolution toward browser-like TLS/HTTP2/header/cookie/proxy behaviour | no defender-side logic, production traffic, behavioural JS challenges, account history, graph/entity signals, or verified bypass outcomes | Cannot show that Rnet is undetectable or that TLS matching is sufficient to bypass modern anti-bot systems |
| SRC-030 | ScrapFly Turnstile | scraper-side understanding of Cloudflare Turnstile challenge mechanisms and bypass classes | no verified Cloudflare internals, defender telemetry, independent success rates, false-positive handling, or broader Cloudflare decisioning | Cannot show the complete Turnstile signal set or that any bypass method works reliably across production sites |
| SRC-031 | ScrapFly Imperva | scraper-side view of websites protected by Imperva/Incapsula and the signal families scraper tooling believes matter | no Imperva internals, cross-customer telemetry, signal weightings, independent validation, durable success rates, or false-positive behaviour | Cannot show Imperva’s actual internal trust-score mechanism or that listed bypass approaches work reliably |
| SRC-032 | Wikimedia crawlers | operator-side view of AI-crawler pressure, residential-proxy evasion, and tiered API/rate-limit response | first-party blog; platform-specific; no methodology, false-positive rate, or independent verification; open knowledge platform differs from commercial booking/e-commerce targets | Cannot show prevalence outside Wikimedia, rigorous defence efficacy, or direct generality to credential stuffing, ATO, or scalping |
| SRC-033 | FP-Agent | current commercial AI browsing agents’ browser and behavioural fingerprints during realistic benign web tasks; includes a Cloudflare free-tier check | controlled honey site; benign tasks; closed-world known agents; narrow human population; point-in-time; not enterprise bot management | Cannot show production abuse prevalence, open-world detection, durability under adversarial humanisation, or general Cloudflare/vendor efficacy |
| SRC-034 | F5 credential stuffing | end-to-end credential-stuffing landscape from spill supply to login traffic against large consumer sites | vendor telemetry from large enterprise customers; disclosed spill data only; 2020-era tooling; no reproducible method | Cannot show web-wide prevalence, independent detection efficacy, current-era tooling coverage, or generality to smaller booking-style targets |
| SRC-035 | DVSA driving-test bots | public-sector appointment-slot abuse: automated monitoring, booking, holding, resale, and platform mitigation | platform-side public blog; no traffic counts, bot counts, detection logs, or false-positive rates; platform-specific | Cannot show prevalence, exact automation mechanism, share of bookings affected, all third-party service behaviour, or mitigation efficacy |
| SRC-036 | HUMAN OpenClaw | vendor-observed exposed autonomous-agent gateways producing browser-automation traffic across engagement manipulation and reconnaissance patterns | vendor telemetry; attribution uncertainty; no raw data, query method, full validation, or disclosed detection logic | Cannot show autonomous execution for every request, internet-wide prevalence, attack success, or harm magnitude |
| SRC-037 | HUMAN Agentic Visibility | a vendor capability model for identifying, classifying, measuring, and controlling AI-agent traffic | product explanation; dashboard screenshots; no independent evaluation or primary telemetry in this source | Cannot show traffic prevalence, classification accuracy, abuse prevalence, or effectiveness of the controls |
| SRC-038 | HUMAN State of Agentic Traffic May 2026 | monthly vendor telemetry on observed AI-agent mix, sector destinations, page-route categories, and blocking rates | HUMAN-visible traffic only; opaque classification; traffic is not necessarily malicious; one month snapshot | Cannot show malicious intent, internet-wide prevalence, attack success, or accuracy of named-agent attribution |
| SRC-039 | Thales Bad Bot Report 2026 | production-facing bot-management view of automated traffic, API-first abuse, AI clients, ATO, and inventory abuse across protected customers | vendor-visible traffic; sampling/classification opaque; mixes telemetry, analyst interpretation, and product framing | Cannot show internet-wide prevalence, classification quality, attack success rates, comparative vendor efficacy, or AI causality |
| SRC-040 | Akamai financial-services trends | edge/WAF/API telemetry view of financial-services web attacks, API visibility gaps, AI-labelled bot traffic, and scraping/API targeting | Akamai customer/product visibility; finance-sector-specific; alerts not success; mixes telemetry, survey claims, and interpretation | Cannot show market-wide prevalence, classification accuracy, detection efficacy, AI causality, or generalisation beyond Akamai-protected finance traffic |
| SRC-041 | ScrapingBee scraper test sites | the educational pipeline from controlled scraper practice to production-style dynamic scraping | training-site article; not abuse, evasion success, prevalence, or harm | Cannot show real-world bot traffic, abuse, hostile scraping, or that practice sites cause misuse |
| SRC-042 | ScrapingBee legal guidelines | scraper-vendor compliance framing around lawful/risky/impermissible scraping | vendor legal explainer; not primary law, legal advice, or jurisdiction-specific analysis | Cannot determine legality, replace legal advice, establish lawful customer behaviour, or validate legal-risk reduction claims |
| SRC-043 | ScrapingBee advanced scraping techniques | public scraper-side engineering maturity: scaling, reliability, JavaScript rendering, pagination workarounds, and anti-blocking infrastructure | capability guide; no observed abuse, target-specific impact, independent validation, or success rates | Cannot show abusive use, prevalence, success against protected targets, or defender impact; high dual-use |
| SRC-044 | ScrapingBee price scraping tools | commercial packaging of price scraping as a normal business workflow with anti-bot handling as a feature | vendor comparison/marketing; not neutral benchmark or defender-side evidence | Cannot show independent tool performance, abuse prevalence, or that price-scraping activity is lawful or wanted by targets |
| SRC-045 | ScrapingBee PerimeterX/HUMAN bypass guide | the public evasion mental model for named commercial bot management as multi-layer signal alignment | scraper-vendor bypass narrative; no independent success evidence, raw tests, or defender confirmation | Cannot show PerimeterX/HUMAN weakness, bypass success rates, observed abuse, or target-specific effectiveness |
| SRC-046 | niespodd browser-fingerprinting | a public scraper-side anti-detection taxonomy and tooling ecosystem map | maintainer claims/tool catalogue; no raw tests, denominators, success rates, or independent validation | Cannot establish tool effectiveness, prevalence, legality, safety, or production bypass success |
| SRC-047 | Martínez Llamas et al. privacy/GDPR/AI Act review | academic synthesis of detection signal families, evasion classes, privacy risks, and regulatory controls | review/taxonomy, not production telemetry or empirical measurement | Cannot show abuse prevalence, detection efficacy, legal compliance in a specific deployment, or current production practice |
| SRC-048 | Wardle honey identities | independent measurement of leaked-credential use through honey identities and paste-site publication | small, dated, paste-site-only honey experiment; observes unauthorised access, not necessarily bots | Cannot show market-wide prevalence, automation share, modern credential-stuffing infrastructure, or vendor-control efficacy |
| SRC-049 | DataDome ticket bots | vendor taxonomy of ticket-bot activity across account preparation, sale/queue pressure, checkout, and resale | vendor explainer/product marketing; no primary telemetry, independent measurement, or product validation | Cannot show ticket-bot prevalence, DataDome efficacy, false-positive/negative rates, or that bots dominate resale prices |
| SRC-050 | FTC BOTS Act cases | enforcement-record proximity to alleged real ticket-bot use against scarce-inventory ticketing flows | legal press release; allegations/proposed orders, not a technical measurement study or prevalence estimate | Cannot show defence success rates, detection signals, full bot architecture, or generalisation beyond named enforcement cases |
| SRC-051 | StopBadBots SBB-WAF-Rules | small-site / self-managed-server defensive controls: WAF rules, blocklists, user-agent filters, scanner heuristics | maintainer claims attached to tooling; no independent evaluation, prevalence, or false-positive measurement | Cannot show general blocking efficacy, current blocklist quality, low false positives, or coverage of browser-native agents |
| SRC-052 | Hamachek bad-asn-list | a concrete operator account of datacenter/ASN blocking for signup abuse and a reusable defensive artifact | single-site anecdote; dated and no methodology/false-positive measurement | Cannot establish general efficacy, current usefulness, false-positive rates, or prevalence beyond one operator account |
| SRC-053 | ScrapingBee web scraping API | commercial scraping-as-a-service abstraction of browsers, proxies, extraction, geotargeting, and anti-blocking | vendor product page; no independent benchmark, target list, raw logs, or defender corroboration | Cannot show abuse, effectiveness, success rate validity, or customer legality |
| SRC-054 | Cloudflare Block AI Bots | site-owner control over AI crawler and AI-like crawler access | product-control documentation; no classification details, traffic counts, effectiveness, or harm evidence | Cannot show AI crawler prevalence, malice, false positives, effectiveness, or legal/policy sufficiency |
| SRC-055 | Cloudflare Turnstile | browser/form challenge systems that assess browser environment and behaviour before or instead of visual puzzles | vendor documentation; no independent bypass, accessibility, usability, or privacy assessment | Cannot show real-world challenge effectiveness, advanced-bot resistance, user impact, abuse prevalence, or legal sufficiency |
| SRC-056 | Cloudflare Detection IDs | operational bot detection where low-level signals are exposed for rule-making and troubleshooting | no full detection list, exact logic, performance data, or false-positive metrics | Cannot prove zero human overlap, adaptation resistance, prevalence, or correctness of a particular rule |
| SRC-057 | Cloudflare bot detection engines | layered production detection across heuristics, JavaScript detections, ML, anomaly detection, headers, sessions, and browser signals | product-method summary; not a model disclosure, benchmark, audit, or telemetry study | Cannot prove JS/ML effectiveness, data necessity, false-positive rates, or real-world attack prevalence |
| SRC-058 | Cloudflare bot solutions overview | the packaging of bot defence as a layered operational stack from simple challenges to enterprise scoring and analytics | vendor capability overview; no independent prevalence, error-rate, or enforcement-outcome data | Cannot show effectiveness, comparative performance, prevalence, or legal/governance sufficiency |
| SRC-059 | MDN CORS | the browser cross-origin sharing mechanism that later scraping/browser-security explanations rely on | neutral technical reference, not threat or efficacy evidence | Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality |
| SRC-060 | MDN HTTP caching | cache semantics that affect server load, repeated crawler requests, conditional requests, and shared-cache privacy/security issues | neutral technical reference, not threat or efficacy evidence | Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality |
| SRC-061 | MDN HTTP authentication | HTTP and proxy authentication concepts behind credential-bearing automated requests and access-control boundaries | neutral technical reference, not threat or efficacy evidence | Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality |
| SRC-062 | MDN cookies | cookie/session mechanics that support session continuity, login state, tracking, and bot-management cookie checks | neutral technical reference, not threat or efficacy evidence | Cannot show account-abuse prevalence, attacker use, detection effectiveness, or legality |
| SRC-063 | MDN User-Agent header | the basic identity/compatibility string used by browsers, crawlers, tools, and spoofing or reduction discussions | neutral technical reference, not threat or efficacy evidence | Cannot show spoofing prevalence, malicious use, detection effectiveness, or legality |
| SRC-064 | MDN HTTP headers | the vocabulary of HTTP request/response fields later used in header-based detection and scraper-side spoofing discussions | neutral technical reference, not threat or efficacy evidence | Cannot show malicious header use, prevalence, detection effectiveness, or legality |
| SRC-065 | MDN Overview of HTTP | the basic client-server/user-agent model behind browsers, crawlers, scripts, proxies, cookies, and request patterns | neutral technical reference, not threat or efficacy evidence | Cannot show abuse, prevalence, attacker use, detection effectiveness, or legality |
| SRC-066 | Laperdrix browser-fingerprinting survey | the browser/device fingerprinting layer used in tracking, fraud detection, bot detection, and privacy-invasive identification | survey/foundation source; not current production bot telemetry or observed abuse; browser/API surfaces have changed since publication | Cannot show current prevalence, modern anti-bot performance, abuse against a target, or legal compliance |
| SRC-067 | Berke et al. browser demographics | browser/device-attribute identifiability and unequal privacy risk across demographic groups | US Prolific sample; Dec 2023 device/browser snapshot; not bot traffic or anti-bot detection | Cannot show bot prevalence, detection performance, malicious use, or that a specific API change fixes demographic inference risk |
| SRC-068 | Sudbury & Marks survey bots | automated and low-quality participation in incentivised online surveys and layered survey-quality controls | online-survey domain; not commercial web scraping, credential stuffing, ticket bots, or vendor telemetry | Cannot prove all bad responses were bots, quantify internet-wide bot prevalence, or validate commercial bot-management controls |
| SRC-069 | PortSwigger OAuth vulnerabilities | abuse of third-party login and OAuth-based authentication flows, including account takeover and token/API misuse | educational vulnerability taxonomy; not measured prevalence, bot volume, or incident counts | Cannot prove OAuth is generally unsafe, quantify automation around OAuth, or replace OAuth/OIDC specifications and BCPs |
| SRC-070 | PortSwigger secure authentication | defensive hardening against automated login abuse and authentication bypass | practical guidance; not empirical control-effectiveness evidence | Cannot prove a control is sufficient, quantify CAPTCHA/rate-limit effectiveness, or replace formal standards such as ASVS/NIST |
| SRC-071 | PortSwigger password login vulnerabilities | automated login abuse through brute force, credential stuffing, username enumeration, and weak password-login controls | educational attack/defence taxonomy; not real-world attack-frequency or success-rate evidence | Cannot quantify credential-stuffing prevalence or prove CAPTCHA, rate limiting, IP blocking, or account locking works in the wild |
| SRC-072 | PortSwigger authentication vulnerabilities | authentication as an attack surface linking bot automation to account takeover and follow-on exploitation | educational overview/lab framing; not production telemetry or observed abuse evidence | Cannot show automation prevalence, economic harm, bot-detection effectiveness, or legal/regulatory status |
| SRC-073 | NIST SP 800-63B-4 | standards-backed authentication, session, throttling, and authenticator-management controls for account-abuse contexts | normative control guidance; no bot-abuse telemetry, no control-effectiveness data, and no vendor/tool comparison | Cannot show credential-stuffing prevalence, detection performance, or that a specific fraud indicator reliably separates bots from humans |
| SRC-074 | Sajid et al. hooking deception | runtime deception and API-hooking concepts that may inform anti-tamper/instrumentation thinking | endpoint keylogging domain, not web-bot detection or browser automation; preprint | Cannot show web-abuse prevalence, bot detection, or operational effectiveness in browser/client anti-bot systems |
| SRC-075 | Cao browser principals | browser principal boundaries, JavaScript virtualisation, and client-side isolation concepts | older browser-security dissertation; not current bot-detection telemetry or modern browser-agent evidence | Cannot show modern browser-automation detection, current anti-bot efficacy, or observed abuse |
| SRC-076 | Xu layered obfuscation | layered obfuscation as risk management and a taxonomy of obfuscation targets and layers | general software-security taxonomy, not bot-specific and not empirical bot evidence | Cannot show that client-side bot-detection obfuscation works, or how attackers respond in production |
| SRC-077 | Tschacher bot-detection architecture | architecture-level explanation of passive bot detection, spoofable client signals, and layered signal collection | public technical commentary, not telemetry, benchmark, or vendor-validated architecture | Cannot quantify prevalence, false positives, or the effectiveness of any specific detection service |
| SRC-078 | Pushan deobfuscation | reverse-engineering pressure against VM-obfuscated binaries and limits of obfuscation as a durable defence | binary deobfuscation research, not browser JavaScript bot detection; preprint | Cannot show that DataDome-style VM obfuscation is broken or effective in browser deployments |
| SRC-079 | DataDome VM obfuscation | commercial use of VM-based obfuscation to protect exposed client-side bot-detection logic | vendor product announcement; no independent measurement, no raw attack data, no performance or usability data | Cannot prove the protection works, quantify attacker cost increase, or show detection efficacy |
| SRC-080 | Ticketmaster Prestige litigation | legal-case evidence of alleged large-scale automated ticket purchasing, dummy accounts, and access-control circumvention | allegations and settlement context, not a full technical study or final trial finding on every claim | Cannot provide bot-code details, systematic prevalence, or independently measured detection/control performance |
| SRC-081 | U.S. Senate Ticketmaster hearing | high-profile public account of ticket-bot pressure and access-code-server attack claims around the Taylor Swift presale | contested public testimony; key technical claims come from Live Nation/Ticketmaster; antitrust frame is partly separate | Cannot independently verify bot volume, causality, or platform-specific failure mechanics |
| SRC-082 | Kolhar & Sridevi AI/ML cybersecurity | broad AI/ML cybersecurity and governance vocabulary for anomaly detection, SOC, UEBA, and human oversight | generic cybersecurity overview, not bot-specific and not empirical | Cannot support bot-specific claims, prevalence, or method effectiveness without stronger sources |
| SRC-083 | Kolhar & Gundoor API security | API abuse vocabulary: BOLA, broken auth, rate limits, business logic, API scraping, shadow APIs | secondary book-chapter overview; less authoritative than OWASP/NIST/PortSwigger and not observed abuse | Cannot quantify API abuse or validate specific API-security controls in production |
| SRC-084 | CAPTCHA-solving ecosystem | open commercial solver market and integration of CAPTCHA solving into automation and AI-agent workflows | vendor/tutorial/benchmark ecosystem; not independent prevalence or defender-side validation | Cannot show abuse volume, real-target success rates, or that a solver works against a given site |
| SRC-085 | Chen ReCAP | controlled evidence that GUI agents can be trained to solve modern interactive CAPTCHA variants | synthetic and benchmark-focused preprint; not live abuse or deployed solver telemetry | Cannot show operational CAPTCHA-bypass prevalence, target impact, or production reliability |
| SRC-086 | Bitsight OpenClaw exposure | internet-exposure measurement of AI-agent gateways and deployment-risk/blast-radius framing | stronger for exposure than abuse; vendor scan methodology and coverage require checking before using numbers | Cannot show confirmed abuse from each exposed agent or full internet-wide completeness |
| SRC-087 | Infatica P2B SDK | residential proxy supply-chain business model using SDK-enabled app users and idle bandwidth | vendor marketing; compliance and consent claims are not independently validated | Cannot prove opt-in quality, compliance, effectiveness, abuse prevalence, or end-use legitimacy |
| SRC-088 | Bright Data residential proxies | commercial explanation of residential/ISP proxy capabilities, rotation, sticky sessions, and limited-stock use cases | vendor market map; not independent measurement and not proof of effectiveness | Cannot establish provider quality, legal use, real-world success, or abuse prevalence |
| SRC-089 | Choi proxy ecosystem | independent empirical comparison of open and residential proxies and blacklist overlap | dataset vintage and prior residential dataset source limit currentness; not a specific web-abuse campaign | Cannot show current proxy market structure, target-specific abuse, or detection performance in bot-management systems |
| SRC-090 | RFC 9113/9114 | current HTTP/2 and HTTP/3 protocol mechanics used as foundation for protocol-layer interpretation | protocol standards, not bot evidence or detection evidence | Cannot show abuse, fingerprint prevalence, or effectiveness of protocol-level detection |
| SRC-091 | Chromium multi-process architecture | browser architecture vocabulary for renderer/browser processes, sandboxing, IPC, and browser-native automation context | design reference, not bot/security telemetry | Cannot show browser-automation detection, abuse prevalence, or fingerprinting behaviour |
| SRC-092 | RFC 7540 | historical HTTP/2 protocol specification and original terminology | obsoleted by RFC 9113; historical only for current HTTP/2 claims | Cannot support current HTTP/2 wording where RFC 9113 differs or supersedes it |
| SRC-093 | OWASP ASVS 5.0.0 | standards-backed application controls for anti-automation, business logic, auth, sessions, API validation, and logging | requirements standard, not threat taxonomy, telemetry, or modern bot-management method source | Cannot show prevalence, detection effectiveness, false-positive trade-offs, or AI/browser-agent trends |
| SRC-094 | CISA Microsoft cloud post-compromise | cloud identity compromise, forged token/OAuth/SAML abuse, and API-based persistence after compromise | historical SolarWinds/SVR-linked advisory; not bot-specific; archived; no raw telemetry or prevalence | Cannot show current 2026 cloud threat prevalence, bot-specific abuse, SaaS pricing, or detection efficacy |
| SRC-095 | Scattered Spider advisory | identity/helpdesk/social-engineering abuse using valid accounts, MFA manipulation, SSO, and legitimate remote tools | actor-specific; investigation-derived; not bot-specific; no neutral prevalence or raw victim logs | Cannot show bot automation prevalence, full campaign reconstruction, or general market-wide identity-abuse rates |
| SRC-096 | Censys bulletproof hosting / RDP | infrastructure measurement of abused hosting patterns, exposed RDP artifacts, and clustering signals | not bot-specific; intent uncertain for individual hosts; no full raw dataset; infrastructure evidence, not abuse outcome evidence | Cannot prove criminal intent for every host, quantify bot abuse, or show account/booking/scraping impacts |
| SRC-097 | Anderson cybercrime costs | security-economics framing separating criminal revenue, direct loss, indirect loss, defence cost, and social cost | 2019 synthesis; not bot-specific; not current automation pricing or 2026 telemetry | Cannot show current bot markets, current attacker costs, or effectiveness of specific controls |
| SRC-098 | IBM X-Force 2026 | current vendor threat-landscape framing around public-facing application exploitation, credential theft, supply chain/cloud dependencies, and AI-accelerated operations | raw IBM page/source rather than reviewed extraction; vendor perspective; broad cyber, not bot-specific | Cannot show bot-specific prevalence, detailed methodology from the supplied raw page, or independent validation |
| SRC-099 | Recorded Future cloud/SaaS abuse | cloud and SaaS functionality as adversarial infrastructure: valid accounts, APIs, CI/CD, storage, backups, LLM/ML services | threat-intelligence synthesis; many examples based on third-party reporting; not bot-specific; no raw telemetry for all cases | Cannot provide neutral prevalence, complete primary evidence, or scraping/proxy/CAPTCHA-specific evidence |
| SRC-100 | Kasada threat enablers | mature service economy around automated checkout, account markets, verification bypass, bots-as-a-service, and reselling communities | vendor telemetry and marketplace monitoring; source list/method not fully reproducible; product-sector lens | Cannot independently validate all figures, prove all account sales led to abuse, or isolate AI as causal |
| SRC-101 | Netwrix Mythos cost of attacking | strategic argument that AI lowers marginal attacker costs and compresses decision cycles | opinion/vendor commentary; speculative in places; not bot-specific and not empirical | Cannot verify Mythos claims, quantify attacker adoption, or support pricing/prevalence claims |
| SRC-102 | Commercial automation cost stack | modular purchasability and approximate cost of scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification | pricing/market snapshots; not observed abuse; effectiveness and terms vary; sources are vendor or vendor-adjacent | Cannot prove malicious use, success rates, legal compliance, or that any component works against a specific target |
Signals and techniques cross-index
The view the flat register lacked: which sources cover which technical material. Membership is migrated from the inventory’s signals / techniques column; treat as a starting index to be completed by the backfill pass.
| signal / technique family | sources |
|---|---|
| TLS / network fingerprints (JA3/JA4) | SRC-002, SRC-018, SRC-024, SRC-029, SRC-030, SRC-031, SRC-045, SRC-046, SRC-047, SRC-057, SRC-066, SRC-077, SRC-090, SRC-092, SRC-096, SRC-102 |
| Browser fingerprinting (JS / canvas / attributes / scripts) | SRC-016, SRC-017, SRC-023, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-055, SRC-057, SRC-063, SRC-066, SRC-067, SRC-075, SRC-077, SRC-079, SRC-091 |
| Behavioural — mouse / touch dynamics | SRC-011, SRC-013, SRC-019, SRC-020, SRC-030, SRC-033, SRC-034, SRC-039, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-068, SRC-084, SRC-085 |
| Behavioural — request/session timing and navigation patterns | SRC-027, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-051, SRC-055, SRC-057, SRC-068, SRC-073, SRC-077, SRC-080, SRC-081, SRC-083, SRC-084, SRC-088, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100 |
| ML detection methods (supervised/unsupervised, boosting, CNN) | SRC-006, SRC-011, SRC-018, SRC-019, SRC-033, SRC-047, SRC-057, SRC-067, SRC-082, SRC-083, SRC-085 |
| Adversarial evasion (GAN / RL) | SRC-011, SRC-013, SRC-014, SRC-020, SRC-047 |
| Fingerprint-inconsistency / evasion detection | SRC-015, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-056, SRC-057, SRC-077, SRC-079 |
| Browser automation & stealth layers | SRC-021, SRC-022, SRC-023, SRC-024, SRC-028, SRC-030, SRC-031, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-041, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-053, SRC-077, SRC-079, SRC-084, SRC-085, SRC-086, SRC-091 |
| HTTP headers / header order / protocol details | SRC-002, SRC-003, SRC-027, SRC-029, SRC-031, SRC-037, SRC-039, SRC-040, SRC-041, SRC-042, SRC-045, SRC-046, SRC-047, SRC-051, SRC-053, SRC-056, SRC-057, SRC-059, SRC-060, SRC-061, SRC-063, SRC-064, SRC-065, SRC-066, SRC-069, SRC-071, SRC-073, SRC-077, SRC-083, SRC-090, SRC-092, SRC-093, SRC-094, SRC-098, SRC-099 |
| Cookies / session persistence | SRC-002, SRC-028, SRC-029, SRC-031, SRC-034, SRC-039, SRC-041, SRC-045, SRC-047, SRC-055, SRC-061, SRC-062, SRC-065, SRC-066, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-075, SRC-077, SRC-083, SRC-091, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099 |
| Proxy / infrastructure / cloud browsers | SRC-024, SRC-025, SRC-026, SRC-029, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-036, SRC-039, SRC-040, SRC-041, SRC-043, SRC-044, SRC-045, SRC-046, SRC-047, SRC-052, SRC-053, SRC-061, SRC-065, SRC-086, SRC-087, SRC-088, SRC-089, SRC-096, SRC-099, SRC-100, SRC-102 |
| AI-agent signals & governance | SRC-008, SRC-009, SRC-010, SRC-026, SRC-032, SRC-033, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-053, SRC-054, SRC-058, SRC-082, SRC-084, SRC-085, SRC-086, SRC-098, SRC-099, SRC-100, SRC-101 |
| Intent / journey / score-based detection | SRC-002, SRC-003, SRC-004, SRC-009, SRC-010, SRC-027, SRC-030, SRC-031, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-057, SRC-058, SRC-079, SRC-080, SRC-081, SRC-093 |
| Challenge-response / CAPTCHA / proof-of-work | SRC-009, SRC-020, SRC-030, SRC-034, SRC-035, SRC-039, SRC-041, SRC-042, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-058, SRC-068, SRC-070, SRC-071, SRC-080, SRC-081, SRC-084, SRC-085, SRC-093, SRC-100, SRC-102 |
| Taxonomy / canonical threat vocabulary | SRC-001, SRC-027 |
| Countermeasure classes / symptoms | SRC-027, SRC-032, SRC-034, SRC-035, SRC-039, SRC-040, SRC-042, SRC-047, SRC-049, SRC-051, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-068, SRC-070, SRC-073, SRC-079, SRC-083, SRC-093 |
| API abuse / API endpoint visibility | SRC-039, SRC-040, SRC-041, SRC-053, SRC-069, SRC-083, SRC-093, SRC-094, SRC-098, SRC-099 |
| Credential / account-abuse signals | SRC-034, SRC-039, SRC-047, SRC-048, SRC-049, SRC-052, SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-080, SRC-081, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100, SRC-102 |
| Scarce-resource / inventory-abuse signals | SRC-035, SRC-039, SRC-049, SRC-050, SRC-080, SRC-081, SRC-088, SRC-093, SRC-100, SRC-102 |
Standards / canonical reference (standards / reference-doc) |
SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-066, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093 |
| Scraper training / practice environments | SRC-041 |
| Commercial scraping / scraping-as-a-service | SRC-043, SRC-044, SRC-045, SRC-053, SRC-087, SRC-088, SRC-102 |
| Legal / governance boundary for scraping and bot detection | SRC-042, SRC-047, SRC-050, SRC-066, SRC-067, SRC-073, SRC-080, SRC-081, SRC-082, SRC-093, SRC-097 |
| Defensive tooling / WAF / blocklists | SRC-051, SRC-052, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-073, SRC-079, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099 |
| Honey accounts / honeypots / honeytokens | SRC-048 |
| Network-origin / IP / ASN reputation | SRC-045, SRC-047, SRC-052, SRC-087, SRC-088, SRC-089, SRC-096, SRC-100, SRC-102 |
| AI crawlers / content-access governance | SRC-032, SRC-039, SRC-054, SRC-058, SRC-086 |
| HTTP foundations / web basics | SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093 |
| CORS / browser security boundaries | SRC-059 |
| Caching / conditional requests | SRC-060 |
| Browser-fingerprinting surveys / privacy measurement | SRC-016, SRC-017, SRC-066, SRC-067, SRC-075, SRC-077 |
| Authentication / account-abuse foundations | SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-083, SRC-093 |
| Survey/data-quality abuse and form-quality controls | SRC-068 |
| Client-side detection code protection / obfuscation | SRC-075, SRC-076, SRC-077, SRC-078, SRC-079 |
| Residential / peer-proxy ecosystem | SRC-087, SRC-088, SRC-089, SRC-100, SRC-102 |
| CAPTCHA-solving / solver ecosystem | SRC-084, SRC-085 |
| Legal / hearing evidence for ticket bots | SRC-050, SRC-080, SRC-081 |
| Browser architecture / browser-native automation foundations | SRC-075, SRC-077, SRC-091 |
| API security / business-logic controls | SRC-083, SRC-093, SRC-094, SRC-099 |
| Cloud / SaaS / identity abuse | SRC-094, SRC-095, SRC-098, SRC-099 |
| Threat infrastructure / bulletproof hosting / RDP | SRC-096, SRC-099 |
| Security economics / cost-of-abuse framing | SRC-097, SRC-100, SRC-101, SRC-102 |
| Commercial automation cost stack | SRC-100, SRC-102 |
| Government advisories / official TTP guidance | SRC-094, SRC-095 |
Scarce-resource abuse index
Scarce-resource abuse is a cross-cutting tag family for sources about competition over a limited transactional resource. It is not a fifth category: sources still belong to foundations, vendor, academic, or threat-surface.
Rows are added only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Otherwise these fields are not applicable.
| id | tags | scarce_resource_targeted | abuse_phase | website_facing_action | evidence_of_use | abuse_outcome |
|---|---|---|---|---|---|---|
| SRC-035 | scarce-resource-abuse; slot-sniping; limited-inventory; appointment-abuse; inventory-hoarding; booking-flow-abuse; availability-polling; cancellation-monitoring; fast-booking; auto-booking; slot-resale | appointment | monitoring / booking / holding / resale / cancellation exploitation | polling availability / completing booking / holding inventory / reselling | observed-use | ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness / operational load |
| SRC-039 | scarce-resource-abuse; limited-inventory; inventory-hoarding; denial-of-inventory; scalping; booking-flow-abuse; availability-polling | booking / product / reservation | monitoring / booking / holding | polling availability / completing booking / holding inventory | vendor-measured | inventory unavailable / distorted metrics / operational load |
| SRC-049 | scarce-resource-abuse; ticketing-abuse; limited-inventory; scalping; queue-abuse; account-preparation; ticket-resale; booking-flow-abuse; availability-polling; fast-booking | ticket | account preparation / queue entry / monitoring / booking / resale | creating accounts / entering queue / polling availability / completing booking / reselling | capability-only | ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness |
| SRC-050 | scarce-resource-abuse; ticketing-abuse; limited-inventory; inventory-hoarding; scalping; purchase-limit-circumvention; account-preparation; ticket-resale | ticket | account preparation / booking / holding / resale | automated search and reservation / completing booking / using accounts and payment identities / reselling | legal-record | inventory unavailable / inflated resale price / degraded fairness |
| SRC-100 | scarce-resource-abuse; limited-inventory; scalping; automated-checkout; account-preparation; reselling-communities; verification-bypass | product / ticket / booking | account preparation / monitoring / checkout / resale | using accounts / bypassing verification / completing checkout / reselling | vendor-measured / market-evidence | inventory unavailable / inflated resale price / degraded fairness |
| SRC-102 | scarce-resource-abuse; limited-inventory; indirect-cost-stack; CAPTCHA-solving; proxies; temporary-SMS; scraping-APIs | product / ticket / appointment / booking (indirect) | monitoring / account preparation / challenge solving / booking support | polling availability / solving challenge / account verification / completing booking support | market-evidence / capability-only | indirect capability support only; no specific abuse outcome evidenced |
Read and rejected
Recorded so they aren’t re-read (EVIDENCE-REVIEW.md §6). Both are title-collision retrieval artefacts — “Actions Speak Louder than Words” papers pulled in by string match, unrelated to bots.
| id | source | org / authors | year | reason rejected | entry file |
|---|---|---|---|---|---|
| SRC-R01 | Loyalty Program Building Blocks (Economics and Sociology) | Kwiatek et al. | 2018 | Marketing/consumer-perception study. No bot/abuse content. Out of scope; keep only if a loyalty-abuse adjacency is later added. | kwiatek-2018-actions-speak-louder-loyalty-program-building-blocks.md |
| SRC-R02 | Figurative Language & Gesturing in Entrepreneurial Pitches (AMJ) | Healey et al. | 2018 | Communication/persuasion study. No bot/abuse content. Out of scope; possible use only for a dissemination/communication note. | healey-2018-actions-speak-louder-than-words-entrepreneurial-pitches.md |
Queued
Not yet read or not yet extracted as a distinct source. This queue has been pruned so it no longer lists sources already represented by SRC-027, SRC-034, SRC-039, SRC-046, SRC-066, SRC-067, SRC-069–SRC-073, or SRC-079–SRC-093.
Highest-priority gaps
| source / area | category | why flagged |
|---|---|---|
| Remaining foundations primers: browser storage, DNS/TLS/CDN basics, IP/network identity | foundations | MDN HTTP/CORS/cookies/header basics are represented by SRC-059…SRC-065, PortSwigger authentication foundations by SRC-069…SRC-072, NIST/OWASP control standards by SRC-073 and SRC-093, and HTTP/2/3 RFCs by SRC-090/SRC-092; remaining need is DNS/TLS/CDN, browser storage, and IP/network identity foundations. |
| Independent in-the-wild bot, credential-stuffing, ad-fraud, fake-account, or scraping measurement studies | academic / threat-surface | Observed-use lane remains thinner than capability/vendor evidence; independent measurement is highest value. |
| Victim/operator engineering postmortems from platforms affected by scraping, credential stuffing, account creation, booking abuse, or crawler pressure | threat-surface | Balances vendor telemetry with first-party target/operator accounts. |
| Primary legal/enforcement records: BOTS Act complaints/orders, UK ticketing/consumer-law material, DVSA booking terms, regulator guidance where legal claims become load-bearing | legal-record / governance | Vendor legal explainers are not enough for legal claims. |
Useful but second-order
| source / area | category | why flagged |
|---|---|---|
| OpenAI agent / Operator documentation and safety material; Anthropic Claude computer-use / browser-use material; Browser Use and Skyvern docs | vendor / threat-surface | Needed to represent agent-builder framing rather than only defender-vendor framing. |
| Akamai, Imperva/Thales, F5 technical docs for bot management, WAF controls, API-security controls, and exposed rule/score/control-plane fields | vendor | Cloudflare control-plane docs are now represented by SRC-003, SRC-054…SRC-058; non-Cloudflare technical docs remain incomplete. |
| Anti-detect browsers and stealth tooling as distinct entries: Multilogin, GoLogin, Camoufox, Nodriver, SeleniumBase UC Mode | threat-surface | Currently mostly present through secondary mentions and catalogues, not their own source entries. |
| Browser-extension and userscript automation material: Tampermonkey/userscripts, browser add-ons used for page monitoring or form automation | threat-surface | Important for the “individual tool running inside the browser” / slot-sniping argument. |
| Public datasets for methodology investigations: ad-fraud, clickstream, login/session, credential-stuffing proxies, web-log, fraud graph datasets | methodology / academic | Needed to connect the written review to reproducible public-data investigations and to state framing distance clearly. |
| Additional Cloudflare Radar / AI crawler / bot traffic reports | vendor telemetry | Cloudflare product/capability docs are now represented; remaining Cloudflare gap is telemetry/prevalence. |
Recently resolved from the old queue
| old queued item | resolved by | note |
|---|---|---|
| OWASP Automated Threat Handbook full source | SRC-027 | Now extracted; still marked needs review because the extraction was provisional. |
| Imperva Bad Bot Report annual | SRC-039 | Covered via 2026 Thales / Imperva Bad Bot Report. |
| F5 Labs reports | SRC-034 | Credential-stuffing report extracted; further F5 technical docs can still be queued separately if needed. |
| niespodd/browser-fingerprinting GitHub catalogue | SRC-046 | Now extracted as its own threat-surface source. |
| Akamai financial-services report | SRC-040 | Vendor telemetry report extracted; technical docs remain queued separately. |
| Ticket-bot / scarce-resource enforcement example | SRC-050 | FTC BOTS Act source added; more primary legal records can still be useful. |
| MDN HTTP/CORS/cookies/header foundations | SRC-059…SRC-065 | Core MDN foundations now extracted; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity. |
| Cloudflare bot product/control-plane docs | SRC-003, SRC-054…SRC-058 | Capability layer now better represented; Cloudflare telemetry/Radar remains separate if needed. |
| Browser-fingerprinting survey / empirical update | SRC-066, SRC-067 | Laperdrix survey and Berke demographic-fingerprinting paper added; remaining fingerprinting work should be targeted rather than generic survey collection. |
| Web-bot detection privacy / methods review | SRC-047 | Re-extraction attached to existing Martínez Llamas row; use as methods/privacy/governance anchor, not observed-use evidence. |
| PortSwigger authentication foundations | SRC-069…SRC-072 | Worked authentication/OAuth/password-login sources added; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity. |
| NIST authentication/session standard | SRC-073 | SP 800-63B-4 added as authentication and session-management control foundation. |
| OWASP ASVS application-security controls | SRC-093 | ASVS 5.0 added as defensive-control foundation for anti-automation, business logic, auth, sessions, API validation, and logging. |
| HTTP/2 and HTTP/3 protocol standards | SRC-090, SRC-092 | RFC 9113/9114 added for current protocol claims; RFC 7540 kept as historical/obsolete HTTP/2 source. |
| Chromium browser architecture | SRC-091 | Multi-process architecture added for browser-native automation and browser architecture foundations. |
| Proxy ecosystem and residential proxy supply | SRC-087, SRC-088, SRC-089 | Commercial peer/residential proxy sources and independent proxy-ecosystem measurement added. |
| VM obfuscation and client-side bot-detection code protection | SRC-076, SRC-078, SRC-079 | General obfuscation taxonomy, deobfuscation counterweight, and DataDome VM-obfuscation source added. |
| Ticketmaster legal/hearing evidence | SRC-080, SRC-081 | Earlier Ticketmaster/Prestige legal case and 2023 Senate/Taylor Swift hearing source added. |
| CAPTCHA-solving ecosystem and GUI-agent CAPTCHA capability | SRC-084, SRC-085 | Commercial solver ecosystem and ReCAP GUI-agent CAPTCHA-solving paper added. |
| OpenClaw exposure measurement | SRC-086 | Bitsight exposure source added as complement to HUMAN OpenClaw traffic-abuse source. |
| API-security/business-logic overview | SRC-083 | Secondary API-security chapter added; OWASP/NIST/PortSwigger remain stronger control references. |
| Cloud/SaaS abuse and official identity-abuse advisories | SRC-094, SRC-095, SRC-098, SRC-099 | CISA Microsoft cloud, Scattered Spider, IBM X-Force, and Recorded Future entries added; still avoid treating these as bot-specific. |
| Bulletproof hosting / adversarial infrastructure measurement | SRC-096 | Censys RDP/BPH infrastructure source added. |
| Cybercrime economics and automation cost framing | SRC-097, SRC-101, SRC-102 | Anderson gives non-vendor economics framework; Summers is low-priority opinion; commercial cost stack gives availability/pricing context. |
| SaaSification of automated abuse infrastructure | SRC-100, SRC-102 | Kasada Q1 2026 and commercial automation cost stack added; cite as vendor/market evidence, not independent prevalence. |
Appendices
Register taxonomy
category — foundations / vendor / academic / threat-surface (EVIDENCE-REVIEW §2).
evidence basis — what kind of evidence the source actually is. This is the column that prevents a marketing claim being treated as equivalent to a study.
empirical-academic— controlled study or dataset experiment in a research setting.empirical-operational— measurement against real or purchased traffic / a live honey site.survey— practitioner/executive survey; self-reported.vendor-claim— vendor marketing / efficacy / prevalence claims; vendor-measured, not independently verifiable.capability-doc— product or platform documentation describing what a system exposes or can do.tooling-readme— open-source tool README / docs; maintainer claims, not independent tests.bypass-guide— scraper-side or evasion-side guidance describing ways to avoid blocking or align detection surfaces. High dual-use; cite only at technique-family level, not as a recipe.methods-taxonomy— a categorisation of methods with no reproducible detail.taxonomy— a canonical categorisation of the field (e.g. OWASP OAT).threat-intel— vendor observation/threat reports.legal-record— court filings, indictments, or enforcement actions. Used for technique and operational-proximity evidence only, with actor/campaign attribution stripped (EVIDENCE-REVIEW.md§3).legal-explainer— non-authoritative legal/compliance explainer or guidance source. Treat as context only; load-bearing legal claims require primary law, regulator guidance, specialist legal analysis, or legal records.primary-law/regulator-guidance— primary legal text or regulator guidance used only under the regulatory-constraint lane. Treat as jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.reference-doc— neutral technical reference documentation used for foundations/protocol concepts. Not evidence of abuse, prevalence, or defensive performance.standards-reference/protocol-standard— normative standards or specifications used for control/protocol foundations. Not evidence of abuse or control effectiveness.control-requirements/authentication-guidance/defensive-guidance— defensive requirements or implementation guidance. Useful for control vocabulary; not proof that controls work in a given deployment.browser-architecture reference/architecture analysis— architecture explanations used to understand browser/runtime/client-server boundaries. Not telemetry or detection evidence.empirical-method demonstration— method evaluation in a bounded research setting. Rigour may be high, but operational proximity remains limited unless it measures real-world abuse.empirical-exposure measurement/infrastructure-analysis— measurement of exposed services or infrastructure such as proxies, not necessarily measurement of confirmed abuse.vendor product announcement/tutorial ecosystem— vendor or adjacent ecosystem material documenting available capabilities or workflow culture; treat as capability evidence unless independently measured.review— literature review or survey source synthesising prior work. Use for academic/foundation surveys where the source is not itself measuring live abuse.empirical-measurement— original measurement or dataset paper that measures a relevant phenomenon but not necessarily bot abuse directly.dataset paper— source whose primary contribution is a dataset or dataset-linked measurement.empirical-methods— empirical methods paper or case study evaluating controls/processes in a bounded setting.review-informed case study— case study grounded in prior literature but not designed as a controlled comparative experiment.control-guidance— defensive implementation guidance or checklist. Useful for controls vocabulary, not proof of control efficacy.vulnerability-taxonomy— educational or reference taxonomy of weakness classes; not prevalence evidence.
operational proximity — how close the source sits to observed abuse against a real target. Orthogonal to evidence basis (which records source type/rigour); the two are tracked separately so a rigorous lab result and a vendor blog are not flattened onto one axis. Where a source mixes levels (e.g. a vendor report carrying both capability claims and production telemetry), the cell records the highest level the source independently supports, with a parenthetical caveat.
capability— establishes only that a tool, technique, or capability exists or is feasible. Includes documentation, tool READMEs, and controlled academic PoCs — a lab demonstration that an evasion works is still not observation of real-target abuse; its rigour lives inevidence basis, not here.claimed— an interested party asserts the capability is used or works against targets, without independent observation (vendor “we stop X” claims, bypass-vendor “works against Y” claims, self-report surveys).observed— the activity has been seen against a real or realistic target, but not cleanly or independently quantified (vendor telemetry reports — vendor-measured; victim engineering postmortems; enforcement/legal records describing technique).measured— controlled or operational measurement quantifying the activity against a real or realistic target (honey-site experiments; in-the-wild measurement studies).n/a— the source is a taxonomy or a non-bot-use foundation; the axis does not apply.control— the source is control guidance or a requirements standard. It can say what should be built or verified, but not whether the control works against a specific live threat.measured-but-bounded— a controlled benchmark or method evaluation with quantitative results, but outside the project’s core live web-abuse setting.observed-claim/observed-exposure— public testimony, legal/hearing claims, or exposure scans. Useful as operational-proximity evidence, but weaker than independent measurement of confirmed abuse.
Migrated and pre-v3 rows carry provisional proximity values assigned from the one-line summaries; they inherit the row’s standing review state and are part of the same entry-file backfill, not yet reviewed.
Scarce-resource abuse tags — a cross-cutting tag family, not a top-level category. Apply scarce-resource-abuse as the umbrella tag when a source concerns scarce transactional resources, and add the more specific tags supported by the source:
slot-snipinglimited-inventoryappointment-abusereservation-abuseticketing-abuseinventory-hoardingdenial-of-inventoryscalpingqueue-abusebooking-flow-abuseavailability-pollingcancellation-monitoringfast-bookingauto-bookingslot-resaleticket-resalereservation-resalebooking-transferaccount-preparation
Scarce-resource abuse fields — conditional fields for sources tagged scarce-resource-abuse. They apply only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Availability polling may be scraping-like, but the abuse pattern is competition for a scarce transactional resource, so do not collapse it into generic scraping.
scarce_resource_targeted—appointment/ticket/reservation/product/booking/queue position/other.abuse_phase—monitoring/account preparation/queue entry/booking/holding/transfer/resale/no-show/cancellation exploitation.website_facing_action—polling availability/entering queue/solving challenge/completing booking/holding inventory/changing booking/transferring booking/reselling.evidence_of_use—measured-use/observed-use/vendor-measured/legal-record/regulatory-record/market-evidence/capability-only/controlled-PoC. This is the scarce-resource-specific use classification; it does not replaceoperational proximity, which remains the broader corpus-level capability-to-use axis.abuse_outcome—ordinary users blocked/inventory unavailable/inflated resale price/no-show/degraded fairness/distorted metrics/operational load.
Regulatory-constraint tag and fields — conditional vocabulary for sources admitted under EVIDENCE-REVIEW.md §2.6. This is a cross-cutting lane, not a category. Apply regulatory-constraint only when the source is being read for how a rule constrains a technique family.
jurisdiction—UK/EU/US/Canada/Australia/other.currency— free text; must include an as-of date and the caveatsubject to change; not verified current. Honest token if unknown:as-of unknown — verify before use.constrains_technique— the technique family the rule bears on, preferably matching an existing signals-and-techniques cross-index row such as browser fingerprinting, cookies/session persistence, behavioural signals, scraping/access control, or AI crawlers/content-access governance.operational proximity— alwaysn/afor regulatory-constraint entries. These sources explain technique-deployment constraints; they are not evidence of abuse prevalence, bot behaviour, or control effectiveness.
provenance — extraction agent and model, from the entry’s run-metadata block (e.g. Claude Code / Opus 4.8). not recorded for rows migrated before the v2 prompt. Where a source has several extraction files (different prompt versions or agents), provenance lists each.
reconciliation — whether a source’s row points at one extraction or several. Tagged in the entry file cell.
[single]— one extraction file.[multiple — unreconciled]— two or more extraction files of the same source (e.g.<slug>.md+<slug>.v2.md, or two agents) not yet reconciled.canonical for citation= latest version.[combined]— a<slug>.combined.mdreconciliation exists.canonical for citation= the combined file; the source extractions are kept and listed.
review state
solid— extraction reviewed; sufficient for register use and citation.conditional— usable for cautious register reference, but check the entry before quoting numbers, equations, or specific claims.needs review— do not use without reading the entry / source.migrated — review pending— carried over from the flat register; not yet reviewed against its entry file.
threat types — OWASP OAT categories where they map, else project vocabulary (scraping, credential stuffing, scalping, account takeover, click/ad fraud, carding). Not threat-specific for method/infrastructure-only sources.
Update log
Append-only. New entries at the bottom.
2026-06-02 — Register schema v2; migrated from working/reading-register.md. Replaced the flat four-column register (Reference / Status / Entry / Notes) with a structured projection of the extraction fields. Added: evidence basis, provenance, review state, and threat types columns to the inventory; a framing-distance ledger; a signals-and-techniques cross-index; and this controlled-vocabulary appendix. 26 in-scope sources (SRC-001…SRC-026) and 2 rejected (SRC-R01, SRC-R02) migrated. Provenance is not recorded for all migrated rows because the prior register did not track extraction agent/model; the v2 extraction prompt records it going forward. Framing-distance what it fails to represent, threat types, and several evidence basis/signals cells are stubbed tbd — backfill from entry pending a read of working/register-entries/.
2026-06-05 — Added SRC-027…SRC-031 from reviewed extraction entries. Added OWASP Automated Threat Handbook v1.3 (SRC-027, provenance not recorded, provisional draft), Medium Playwright cookies tutorial (SRC-028, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), RoundProxies Rnet tutorial (SRC-029, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), ScrapFly Cloudflare Turnstile post (SRC-030, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), and ScrapFly Imperva/Incapsula post (SRC-031, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2). Added corresponding framing-distance rows and cross-index memberships; removed the now-stale queued OWASP Handbook row after representing it as SRC-027. Review state for all five is needs review; SRC-027 specifically requires review because its entry was produced without access to the repo scope docs.
2026-06-06 — Schema v3: added operational proximity axis; legal-record evidence basis. Formalised the capability-vs-use distinction — previously implicit in framing-distance prose and the threat-surface table note — into a queryable ordinal (capability / claimed / observed / measured / n/a), orthogonal to evidence basis, positioned after it in every inventory table. Added legal-record as an evidence basis for enforcement/court sources, admitted under a strict technique-not-attribution rule and a dual-use no-recipe rule (EVIDENCE-REVIEW.md §3; editorial enforcement in GOVERNANCE.md §4/§7). The source-extraction-prompt (v3) now emits the proximity field; register-update-prompt projects it. Proximity values across SRC-001…031 were assigned provisionally from the one-line summaries and inherit each row’s review state. First pass: the corpus concentrates at capability / claimed; observed is vendor-measured telemetry only (SRC-004, SRC-009, SRC-010); measured is essentially SRC-015 (honey-site) plus SRC-018 (weak labels). The five sources added 2026-06-05 sit at capability (SRC-027 taxonomy → n/a, SRC-028, SRC-029) and claimed (SRC-030, SRC-031 — the named-defender bypass writeups). In short the register evidences capability and market existence far more strongly than real-world prevalence — closing that is the new observed-use reading lane (Queued; EVIDENCE-REVIEW.md §2.5, §4).
2026-06-06 — Schema v3 extension: added scarce-resource abuse tags and conditional fields. Added scarce-resource-abuse as a cross-cutting umbrella tag, the specific tag vocabulary for appointment/ticketing/reservation/product-drop/queueing/booking/inventory abuse, and a conditional scarce-resource abuse index carrying scarce_resource_targeted, abuse_phase, website_facing_action, evidence_of_use, and abuse_outcome. This is schema support only; no source rows were added or reclassified.
2026-06-06 — Added SRC-032…SRC-040 from reviewed v3 extraction entries. Added Wikimedia crawler infrastructure account (SRC-032, threat-surface, Claude / Claude Opus 4.8 / source-extraction-prompt v3), Wang et al. FP-Agent (SRC-033, academic, Claude / Claude Opus 4.8 / source-extraction-prompt v3), F5 Labs credential-stuffing report (SRC-034, vendor, Claude / Claude Opus 4.8 / source-extraction-prompt v3), DVSA driving-test bot/resale post (SRC-035, threat-surface, Codex / GPT-5 / source-extraction-prompt v3), HUMAN OpenClaw (SRC-036), HUMAN Agentic Visibility (SRC-037), HUMAN State of Agentic Traffic May 2026 (SRC-038), Thales / Imperva 2026 Bad Bot Report (SRC-039), and Akamai financial-services security trends (SRC-040) (the HUMAN/Thales/Akamai entries from ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3). All are new distinct sources and [single] extractions. Added framing-distance rows, updated cross-index memberships, introduced API abuse / API endpoint visibility, Credential / account-abuse signals, and Scarce-resource / inventory-abuse signals, and added scarce-resource rows for SRC-035 plus SRC-039. Flags: SRC-032 uses threat-surface as a least-bad category for first-party operator evidence; SRC-039 has scarce-resource coverage but no dedicated scarce-resource block in the entry, so review the projected scarce-resource fields before relying on them; several supplied filenames retain (1) suffixes from the uploaded extraction files.
2026-06-06 — Added SRC-041…SRC-053 from reviewed v3 extraction entries. Added ScrapingBee scraper test sites (SRC-041, foundations), ScrapingBee legal guidelines (SRC-042, threat-surface after normalising the entry’s non-schema governance category), ScrapingBee advanced scraping techniques (SRC-043), ScrapingBee price-scraping tools (SRC-044), ScrapingBee PerimeterX/HUMAN bypass guide (SRC-045), niespodd browser-fingerprinting / anti-detection README (SRC-046), Martínez Llamas et al. GDPR/AI Act bot-detection review (SRC-047), Wardle honey-identity leaked-credential experiment (SRC-048), DataDome ticket-bot explainer (SRC-049), FTC first BOTS Act enforcement cases (SRC-050), StopBadBots SBB-WAF-Rules (SRC-051), Hamachek bad-ASN-list (SRC-052), and ScrapingBee web-scraping API product page (SRC-053). Added corresponding framing-distance rows, cross-index memberships, and scarce-resource rows for SRC-049 plus SRC-050. Schema note: added legal-explainer as an evidence-basis token for non-authoritative legal/compliance explainers. Normalisation flags: SRC-041 proximity was entered as low and has been normalised to capability (training / sandbox); SRC-042 proximity was entered as context and has been normalised to n/a (legal context; not use evidence); SRC-052 remains observed but explicitly at the anecdotal/first-party floor. Several entries are high dual-use scraper-side sources and should be cited only at technique-family level, not as operational recipes.
2026-06-06 — Added SRC-054…SRC-065 from reviewed v3 extraction entries; updated SRC-003 as a re-extraction. Added Cloudflare Block AI Bots (SRC-054), Cloudflare Turnstile (SRC-055), Cloudflare Detection IDs (SRC-056), Cloudflare bot detection engines (SRC-057), Cloudflare bot solutions overview (SRC-058), MDN CORS (SRC-059), MDN HTTP caching (SRC-060), MDN HTTP authentication (SRC-061), MDN HTTP cookies (SRC-062), MDN User-Agent header (SRC-063), MDN HTTP headers (SRC-064), and MDN Overview of HTTP (SRC-065). Treated the new Cloudflare Bot Management entry as a re-extraction of existing SRC-003 rather than a new source row: the existing migrated row now lists both the legacy extraction file and the v3 extraction file as [multiple — unreconciled], with the v3 file as canonical pending reconciliation. Added framing-distance rows, cross-index memberships, and two new cross-index families (AI crawlers / content-access governance; HTTP foundations / web basics). Added reference-doc as an evidence-basis token for neutral technical foundation material. Normalisation flag: all MDN entries supplied foundational as proximity; normalised to n/a (foundational reference) because operational proximity is not applicable to non-abuse protocol references.
2026-06-06 — Added SRC-066…SRC-072 from reviewed v3 extraction entries; updated SRC-047 as a re-extraction. Added Laperdrix et al. browser-fingerprinting survey (SRC-066), Berke et al. browser-fingerprinting demographics/dataset paper (SRC-067), Sudbury & Marks online-survey bots / bad-data case study (SRC-068), and four PortSwigger Web Security Academy authentication/OAuth/password-login foundation entries (SRC-069…SRC-072). Attached martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md as a re-extraction of existing SRC-047 rather than creating a duplicate row. Normalisation flags: foundational in the Laperdrix entry is represented as n/a (foundational survey); PortSwigger educational/security guidance is placed under Foundations rather than Vendor because it is used as worked reference material, not vendor evidence. Updated framing ledger, cross-index, and resolved queue notes.
2026-06-07 — Added SRC-073…SRC-093 from reviewed extraction entries. Added NIST SP 800-63B-4 authentication/authenticator-management standard (SRC-073), Sajid et al. hooking-based deception preprint (SRC-074), Cao browser-principal dissertation (SRC-075), Xu et al. layered obfuscation taxonomy (SRC-076), Tschacher bot-detection architecture explainer (SRC-077), Sudhir et al. Pushan deobfuscation preprint (SRC-078), DataDome VM-based obfuscation announcement (SRC-079), Ticketmaster v. Prestige legal/settlement source family (SRC-080), U.S. Senate Ticketmaster/Taylor Swift hearing source family (SRC-081), Kolhar & Sridevi AI/ML cybersecurity background (SRC-082), Kolhar & Gundoor API security chapter (SRC-083), commercial CAPTCHA-solving API ecosystem (SRC-084), Chen et al. ReCAP GUI-agent CAPTCHA paper (SRC-085), Bitsight OpenClaw exposure measurement (SRC-086), Infatica P2B SDK residential proxy source (SRC-087), Bright Data residential proxy market source (SRC-088), Choi et al. proxy-ecosystem measurement (SRC-089), RFC 9113/9114 HTTP/2 and HTTP/3 protocol foundations (SRC-090), Chromium multi-process architecture (SRC-091), RFC 7540 historical HTTP/2 standard (SRC-092), and OWASP ASVS 5.0 (SRC-093). Added corresponding framing-distance rows, cross-index memberships, new cross-index families for client-side obfuscation, residential/peer proxies, CAPTCHA-solving, ticket-bot legal/hearing evidence, browser architecture, and API/business-logic controls, plus scarce-resource rows for SRC-080, SRC-081, SRC-088, and SRC-093. Normalisation flags: RFC 7540 is retained as historical/obsolete because RFC 9113 supersedes it for current HTTP/2 claims; broad AI/ML cybersecurity background (SRC-082) is low-priority and should not carry bot-specific claims; the two Ticketmaster rows are legal/hearing evidence and should be cited as allegations/testimony rather than independent technical measurement. 2026-06-07 — Added SRC-094…SRC-102 from the cloud/SaaS, adversarial-infrastructure, cost-economics, and commercial automation batch. Added CISA Microsoft cloud post-compromise advisory (SRC-094), FBI/CISA Scattered Spider advisory (SRC-095), Censys bulletproof-hosting/RDP infrastructure measurement (SRC-096), Anderson et al. cybercrime-cost framework (SRC-097, duplicate upload deduped), IBM X-Force Threat Intelligence Index 2026 (SRC-098, raw TXT/HTML source family; extraction still needed), Recorded Future cloud/SaaS abuse landscape (SRC-099), Kasada Q1 2026 threat-enablers report (SRC-100), Summers/Netwrix AI attacker-cost commentary (SRC-101), and the commercial automation cost-stack cluster (SRC-102). Added framing-distance rows, cross-index updates, new cross-index families for cloud/SaaS identity abuse, threat infrastructure, official advisories, and security/cost economics, plus scarce-resource rows for SRC-100 and SRC-102. Normalisation flags: EVIDENCE-REVIEW(6).md was treated as a scope document rather than a source row; Anderson (1)/(2) are identical and were deduped; IBM TXT/HTML files were represented as one provisional raw-source row rather than a reviewed extraction entry.
2026-06-12 — Schema v3 extension: added regulatory-constraint lane vocabulary. Added regulatory-constraint as a cross-cutting tag for sources read only as constraints on technique deployment, plus conditional fields jurisdiction, currency, and constrains_technique. Added primary-law / regulator-guidance evidence-basis tokens for primary legal text and regulator guidance under this lane. Regulatory-constraint entries are always operational proximity: n/a, jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.