openbotrisk
  • Home
  • Orientation
    • Overview
    • Reading list
    • Evidence register
  • Foundations
    • Overview
    • 01. IP addresses and network origin
    • 02. Cookies and sessions
    • 03. HTTP headers and browser claims
    • 04. Browser and device fingerprinting
    • 05. Proxies, VPNs, NAT, and shared addresses
    • 06. How websites recognise visitors
    • 07. How this becomes bot detection
    • 08. Automation techniques
    • 09. Types of automated threats
  • Background
    • Overview
    • Threat model
    • Booking-style example
    • Adversary model
    • Economics of automation
    • Commercial defender landscape
    • AI-agent shift
    • Background status
  • Techniques
  • Methodology
    • Overview
    • Identifier graphs and synchrony
    • Fingerprints into stable IDs
  • Public datasets
    • Overview
    • Web Robot Sessions
    • Facebook Recruiting IV
    • IEEE-CIS Fraud Detection
    • TalkingData AdTracking

On this page

  • Purpose and scope
  • How to update this register
  • Extraction inventory
    • Foundations
    • Vendor and industry
    • Academic and research
    • Threat surface and territory
  • Framing-distance ledger
  • Signals and techniques cross-index
  • Scarce-resource abuse index
  • Read and rejected
  • Queued
    • Highest-priority gaps
    • Useful but second-order
    • Recently resolved from the old queue
  • Appendices
    • Register taxonomy
    • Update log

Evidence Register

Structured, maintainer-facing register tracing extracted sources into openbotrisk’s site pages, signals taxonomy, and reading decisions.
Note

This is the project’s bibliographic memory: one row per source read, structured so that extracted evidence can be traced into site pages, the signals/techniques taxonomy, and reading decisions. It is not the narrative evidence review — that lives in the Foundations, Background, and Technical territory sections. Per GOVERNANCE.md §6 and EVIDENCE-REVIEW.md §6 the register is public, including sources read and rejected.

Purpose and scope

This register is the structured projection of the per-source extraction entries in working/register-entries/. It exists so that the analytical fields the extraction prompt produces — evidence basis, signals / techniques, threat types, framing distance, what it cannot show — are queryable across the whole corpus rather than buried in free text.

  • It tracks every source read, with provenance and review state.
  • It records what kind of evidence each source actually is (evidence basis), so a vendor marketing claim is never silently treated as equivalent to a controlled study.
  • It cross-indexes sources by signal/technique family, so a reader can find which sources cover JA3/JA4, mouse dynamics, residential proxies, GAN/RL evasion, etc.
  • It tracks scarce-resource abuse as a cross-cutting tag family where relevant, not as a separate evidence category.
  • It carries the framing-distance ledger — the project’s central analytical discipline (EVIDENCE-REVIEW.md §5).
  • It records sources read and rejected, so they aren’t re-read.

The source of truth for any individual source is its entry file under working/register-entries/. This page does not re-read source material and does not infer beyond the entries.

How to update this register

Append-only by default.

  • One register id (SRC-NNN) per distinct source. Distinct sources from the same vendor stay as separate rows — they are never folded together. Selecting between overlapping sources is the page-writing step’s job, not the register’s.
  • Add one inventory row per new distinct source, in the relevant category sub-table.
  • Add the source to the framing-distance ledger and to any signals/techniques cross-index rows it belongs to.
  • If the source concerns scarce-resource abuse, add it to the scarce-resource abuse index and carry the conditional fields from the extraction entry.
  • Set evidence basis, provenance (agent + model), and review state from the entry’s run-metadata block — do not leave provenance blank.
  • Record sources read and rejected in the rejected table with the reason. Rejected entries are kept, not deleted — the row is the “don’t re-read” record.
  • Do not rewrite an existing judgement silently. If a judgement changes, add a dated note in the update log explaining why and preserve the prior context.
  • Keep current relevance separate from future relevance in the entry; the register surfaces current project impact only.

Versioning and reconciliation. Re-extracting a source under a new prompt version, or with a different agent, produces a new versioned entry file alongside the old one — the old file is never overwritten or deleted. Filename convention: stem = source slug, suffix = version/state, agent/model recorded inside the file’s run-metadata block (not in the filename):

  • <slug>.md — original extraction
  • <slug>.v2.md — re-extraction under prompt v2 (.v3, … as prompts advance)
  • <slug>.combined.md — a reconciliation of two or more extractions of the same source

A re-extraction does not create a new register row. It is added to the existing source’s row under the same SRC-NNN: list all its files in the entry file cell, and tag the cell with a reconciliation state — [single], [multiple — unreconciled], or [combined] (see appendix). canonical for citation = the .combined file if one exists, else the latest version. Reconciling multiple extractions into a .combined file is done when useful — there is no obligation to reconcile immediately.

The mechanics of projecting a reviewed entry into this register (assigning ids, versioning, cross-index maintenance, the update-log line) are specified in prompts/register-update-prompt.md.

WarningMigration state (2026-06-02)

This register was migrated from the flat working/reading-register.md. The earlier register stored everything analytical in a single free-text Notes column and did not record which agent/model produced each extraction. As a result:

  • provenance is not recorded for every migrated row. It is populated going forward from the extraction prompt’s run-metadata block (v2).
  • evidence basis and signals / techniques are migrated from the old one-line notes where those notes supported them; otherwise marked see entry.
  • The framing-distance ledger and threat types are mostly tbd — backfill from entry: those fields live in the per-source entry files (working/register-entries/), which were not re-read during migration.
  • review state is migrated — review pending for all rows: extraction quality cannot be judged from a one-line note.

Completing the register is a mechanical backfill pass: an agent reads each working/register-entries/<slug>.md, lifts evidence basis, signals / techniques, threat types, framing distance, and what it cannot show, and fills the stubs. Treat any cell marked tbd/see entry/not recorded as unverified until that pass runs.

Extraction inventory

Columns: id · source · org / authors · year · evidence basis · operational proximity · signals / techniques · threat types · provenance (agent / model) · review state · project impact · entry file. Vocabulary for the ordinal/controlled columns is defined in the appendix.

Foundations

id source org / authors year evidence basis operational proximity signals / techniques threat types provenance review state project impact entry file
SRC-001 Automated Threats to Web Applications (project page) OWASP n.d. taxonomy n/a (taxonomy) OAT category set All OAT (taxonomy) not recorded migrated — review pending OAT taxonomy spine for threat-type vocabulary; project page only, full Handbook queued owasp-automated-threats-to-web-applications.md
SRC-027 Automated Threat Handbook: Web Applications v1.3 OWASP / Watson & Zaw 2026 taxonomy n/a (taxonomy) 21 OAT categories; countermeasure classes; symptoms; fingerprinting/reputation/rate/monitoring classes All OAT (taxonomy) not recorded; provisional draft needs review Full OAT Handbook taxonomy and countermeasure-class reference; provisional extraction produced without repo scope docs, so verify before citation [single]; canonical: owasp-automated-threat-handbook-v1v3.md
SRC-028 How to Get and Use Cookies in Playwright armanabbasi / Medium 2023 capability-doc capability Playwright browser contexts; cookie extraction; session/authentication state capture Not threat-specific ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 needs review Low-level foundations example showing browser automation can access and preserve cookie/session state [single]; canonical: medium-playwright-cookies-source-extraction.md
SRC-041 Top 15 Scraper Sites to Enhance Your Data Collection Skills ScrapingBee 2026 capability-doc capability (training / sandbox) scraping practice sites; static HTML; pagination; authentication; cookies/sessions; JSON APIs; JavaScript rendering; proxy management; CAPTCHA handling Not threat-specific; scraper skill-building and production-path context ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Low-priority foundations/context source for how scraping skills are taught and how vendors frame the move from sandbox practice to production scraping [single]; canonical: scrapingbee-2026-scraper-test-sites(1).md
SRC-059 Cross-Origin Resource Sharing (CORS) MDN Web Docs 2026 reference-doc n/a (foundational reference) CORS; same-origin policy; Origin header; preflight OPTIONS; Access-Control-* headers; credentialed cross-origin requests Not threat-specific; browser-side cross-origin data access and CSRF-adjacent reasoning ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Foundation reference for CORS; useful mainly to prevent treating CORS as a general anti-scraping defence [single]; canonical: mdn-2026-cors(1).md
SRC-060 HTTP caching MDN Web Docs 2026 reference-doc n/a (foundational reference) Cache-Control; private/shared/proxy/CDN caches; ETag; Last-Modified; If-None-Match; Vary; conditional requests Not threat-specific; cache-aware crawling, origin-load interpretation, and shared-cache privacy risk ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Foundation reference for caching and why repeated crawler/scraper requests may affect origin servers differently depending on cache behaviour [single]; canonical: mdn-2026-http-caching(1).md
SRC-061 HTTP authentication MDN Web Docs 2026 reference-doc n/a (foundational reference) 401/403/407; WWW-Authenticate; Authorization; Proxy-Authenticate; Proxy-Authorization; Basic auth; bearer tokens Not threat-specific; credential-bearing requests, proxy authentication, and access-control boundary concepts ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Foundation reference for separating HTTP authentication, proxy authentication, login sessions, and account-abuse concepts [single]; canonical: mdn-2026-http-authentication(1).md
SRC-062 Using HTTP cookies MDN Web Docs 2026 reference-doc n/a (foundational reference) Set-Cookie; Cookie; session IDs; session/permanent cookies; Secure; HttpOnly; SameSite; Domain/Path; session fixation Not threat-specific; session state, tracking, account takeover, scraping behind login, and cookie-continuity detection ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Core foundation source for explaining how stateless HTTP becomes stateful through cookies and sessions [single]; canonical: mdn-2026-using-http-cookies(1).md
SRC-063 User-Agent header MDN Web Docs 2026 reference-doc n/a (foundational reference) User-Agent strings; browser/crawler/tool identification; User-Agent reduction; Client Hints; Navigator.userAgent Not threat-specific; crawler identification, spoofing, browser impersonation, and passive fingerprinting surface ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Foundation entry for what User-Agent is and why it sits between compatibility, crawler identity, bot detection, and privacy [single]; canonical: mdn-2026-user-agent-header(1).md
SRC-064 HTTP headers MDN Web Docs 2026 reference-doc n/a (foundational reference) request/response/representation/payload headers; User-Agent; Accept-Language; Cookie; Authorization; CORS; cache; proxy headers Not threat-specific; header-based detection, spoofing/mismatch checks, session/authentication-bearing requests, and proxy-aware analysis ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Neutral vocabulary bridge between basic HTTP mechanics and later sources on header-order checks, proxy headers, and browser impersonation [single]; canonical: mdn-2026-http-headers(1).md
SRC-065 Overview of HTTP MDN Web Docs 2026 reference-doc n/a (foundational reference) HTTP requests/responses; client-server model; user-agents; browser resource fetching; proxies; cookies/sessions; authentication Not threat-specific; foundation for scraping, crawling, request-pattern detection, and session-based abuse ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Plain-language foundation for browsers, crawlers, and scripts as user-agents making HTTP requests [single]; canonical: mdn-2026-overview-of-http(1).md
SRC-069 OAuth 2.0 authentication vulnerabilities PortSwigger Web Security Academy 2026 methods-taxonomy capability OAuth grant types; authorization endpoints; redirect URI validation; state parameter; authorization codes; access tokens; scope validation; OpenID Connect OAuth authentication bypass; token/code leakage; forced profile linking; third-party authentication abuse; account takeover ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review OAuth / third-party-authentication foundation entry; broadens login-abuse coverage beyond password forms [single]; canonical: portswigger-2026-oauth-2-authentication-vulnerabilities(1).md
SRC-070 How to secure your authentication mechanisms PortSwigger Web Security Academy 2026 control-guidance capability (defensive control guidance) password strength checking; zxcvbn; generic errors; response-time equalisation; IP-based rate limiting; CAPTCHA; MFA/2FA; password reset/change flows credential disclosure; username enumeration; brute-force login; password-reset abuse; weak MFA; credential-stuffing-adjacent login abuse ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Defensive counterpart to login-vulnerability entries; useful for “what closes down easy routes” without treating controls as proven sufficient [single]; canonical: portswigger-2026-secure-authentication-mechanisms(1).md
SRC-071 Vulnerabilities in password-based login PortSwigger Web Security Academy 2026 methods-taxonomy capability username enumeration; status-code/error-message/response-timing differences; brute-force wordlists; account locking; IP blocking; rate limiting; CAPTCHA; HTTP Basic Auth; Authorization header brute-force login; credential stuffing; username enumeration; account takeover; basic-auth brute force ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Core foundation source for credential stuffing and password-login abuse mechanics; shows why simple IP/account-lock controls are partial [single]; canonical: portswigger-2026-password-based-login-vulnerabilities(1).md
SRC-072 Authentication vulnerabilities PortSwigger Web Security Academy 2026 vulnerability-taxonomy capability broken authentication; brute force; authentication bypass; password-based login; MFA weaknesses; third-party authentication; OAuth account takeover; brute-force login; authentication bypass; post-login attack-surface expansion; high-privilege account compromise ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Foundation overview for authentication as an attack surface and how login abuse can lead to account takeover and follow-on exploitation [single]; canonical: portswigger-2026-authentication-vulnerabilities(1).md
SRC-073 Digital Identity Guidelines: Authentication and Authenticator Management NIST / Temoshok et al. 2025 standards-reference / authentication-guidance / control-requirements control AAL; MFA; phishing resistance; passwords; credential-stuffing throttling; session secrets; fraud indicators; browser cookies Credential stuffing; credential cracking; account takeover; session abuse ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Standards foundation for authentication, sessions, rate limiting, and why fraud indicators do not replace authenticators [single]; canonical: nist-2025-sp-800-63b-4-authentication-authenticator-management(1).md
SRC-090 HTTP/2 and HTTP/3 protocol foundations IETF / Thomson, Benfield & Bishop 2022 protocol-standard / technical specification n/a (foundational protocol standard) HTTP/2; HTTP/3; streams; multiplexing; HPACK; QPACK; QUIC; ALPN; binary framing; protocol identifiers Not threat-specific; protocol foundation for HTTP fingerprinting and request-layer analysis ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Current protocol-standard foundation for HTTP/2 and HTTP/3; use RFC 9113 for current HTTP/2 claims rather than RFC 7540 [single]; canonical: rfc-9113-9114-http2-http3-protocol-foundations(1).md
SRC-091 Multi-process Architecture Chromium Projects n.d. browser-architecture reference / design documentation n/a (foundational browser architecture) browser process; renderer process; Blink; Mojo; IPC; sandboxing; GPU/network/storage services; process isolation Not threat-specific; browser-native automation and browser-security foundation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Foundation for explaining why modern browsers are not simple HTTP clients and why browser-native automation has a different threat surface [single]; canonical: chromium-multi-process-architecture(1).md
SRC-092 Hypertext Transfer Protocol Version 2 (HTTP/2) IETF / Belshe, Peon & Thomson 2015 protocol-standard / technical specification n/a (historical protocol standard) HTTP/2; binary framing; streams; multiplexing; HPACK; ALPN; h2/h2c; server push Not threat-specific; historical HTTP/2 protocol foundation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Historical HTTP/2 specification; keep for provenance/history, but use RFC 9113 / SRC-090 for current HTTP/2 wording [single]; canonical: rfc-7540-http2(1).md
SRC-093 Application Security Verification Standard, Version 5.0.0 OWASP Foundation 2025 standards-reference / control-requirements / defensive-guidance control anti-automation; business logic; rate limiting; session management; authentication; authorization; HTTP validation; logging Credential stuffing; scraping; scalping; sniping; account creation; denial of inventory; expediting; DoS; account aggregation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Defensive-control foundation connecting automated-abuse categories to verifiable application-security requirements [single]; canonical: owasp-2025-application-security-verification-standard-5-0(1).md

Vendor and industry

Treated as evidence of what the field claims, not independent proof. Efficacy/prevalence figures are vendor-measured.

id source org / authors year evidence basis operational proximity signals / techniques threat types provenance review state project impact entry file
SRC-002 Bot scores, JA3/JA4, Detection IDs, Web Bot Auth, custom rules (Bots docs) Cloudflare 2026 capability-doc capability bot score 1–99, JA3/JA4, Detection IDs, Web Bot Auth, WAF rule fields Not threat-specific not recorded migrated — review pending Supports “Cloudflare exposes/uses X”, not “X works” cloudflare-2026-bot-scores-detection-engines-ja3-ja4-web-bot-auth-custom-rules.md
SRC-003 Bot Management documentation Cloudflare 2026 capability-doc capability bot score; WAF custom rules; Workers; Bot Analytics; logs; verified bots; JavaScript detections; machine-learning model updates; endpoint-specific policy bot traffic; login automation; application abuse; unwanted access to protected resources not recorded; ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Primary Cloudflare entry for per-request scoring and endpoint-specific bot policy; legacy migrated row now has a v3 re-extraction attached [multiple — unreconciled]; previous: cloudflare-2026-bot-management-docs.md; canonical: cloudflare-2026-bot-management(1).md
SRC-004 Bot Protect, AI Detection Engine, 2025 Global Bot Security Report DataDome 2025–2026 vendor-claim; threat-intel observed (vendor-measured) intent-based detection; signal-family taxonomy tbd — backfill from entry not recorded migrated — review pending Intent-based framing; 2.8% “fully protected” is vendor-measured; pair with SRC-015 for external evidence datadome-2025-2026-bot-protect-ai-detection-global-bot-security-report.md
SRC-005 Bot Management (product brochure) Netacea n.d. vendor-claim claimed server-side / no-client-JS positioning; 2 case studies tbd — backfill from entry not recorded migrated — review pending Product-positioning evidence netacea-bot-management-product-brochure.md
SRC-006 Technical Showcase: ML in Advanced Bot Management Netacea n.d. methods-taxonomy capability supervised/unsupervised, real-time/batch, general/specific; Intent Analytics Not threat-specific not recorded migrated — review pending ML-methods taxonomy; no reproducible detail netacea-technical-showcase-machine-learning.md
SRC-007 Death by a Billion Bots Netacea 2023 survey claimed (self-report) 440-executive survey; $85.6M/company business-impact framing tbd — backfill from entry not recorded migrated — review pending Survey evidence; origin/geopolitical claims out of scope netacea-2023-death-by-a-billion-bots.md
SRC-008 Bot Manager, ACTIR, Agentic AI Security Report Arkose Labs 2023–2026 vendor-claim; survey claimed dynamic challenges; attacker-cost framing; agentic-AI survey tbd — backfill from entry not recorded migrated — review pending Account-integrity + attacker-cost angle; several reports gated arkose-2023-2026-bot-manager-actir-agentic-ai-reports.md
SRC-009 Bot Defense, Adversarial Techniques, AI Agent Trust, 2026 Benchmark Kasada 2025–2026 vendor-claim; threat-intel observed (vendor-measured) solver/proxy/CAPTCHA pricing; proof-of-execution; AI-agent governance tbd — backfill from entry not recorded migrated — review pending Strong attacker-economy angle kasada-2025-2026-bot-defense-adversarial-retooling-ai-agent-trust.md
SRC-010 Sightline, AI Agent Detection, OpenClaw, 2026 benchmark HUMAN Security / PerimeterX 2026 vendor-claim; threat-intel observed (vendor-measured) cyberfraud-journey framing; AI-agent detection signal categories; OpenClaw observations tbd — backfill from entry not recorded migrated — review pending Concrete AI-agent detection signal categories human-2026-sightline-bot-mitigation-ai-agent-detection-openclaw.md
SRC-034 2021 Credential Stuffing Report F5 Labs / Vinberg & Overson 2021 threat-intel; empirical-operational observed (vendor-measured) credential-spill aggregation; login success-rate / password-reset / diurnal anomalies; browser automation; CAPTCHA-solving microwork; attacker sophistication tiers credential stuffing; account takeover Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 needs review Primary observed-use anchor for credential stuffing; combines spill-supply evidence with vendor-measured login abuse against large production sites [single]; canonical: f5-2021-credential-stuffing-report.md
SRC-036 OpenClaw in the wild: How autonomous agents can drive abuse at scale HUMAN Security / Kaiserman & Cirlig 2026 empirical-operational observed (vendor-measured) autonomous-agent browser automation; exposed agent gateways; request bursts; referral UTM tagging; reconnaissance; directory/file probing synthetic engagement; referral manipulation; reconnaissance; browser-native automation abuse ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Observed-use evidence for agentic browser automation abuse, with explicit attribution caveats [single]; canonical: human-2026-openclaw-in-the-wild(1).md
SRC-037 Agentic Visibility: How to See AI Agents in Your Traffic HUMAN Security / McArtney 2026 capability-doc capability AI-agent identification and classification; trust levels; HTTP Message Signatures and key directories; session and route analysis; dashboard visibility Not threat-specific; AI-agent visibility and analytics contamination ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Product/capability framing for agentic visibility, trust classification, and the shift from visibility to control [single]; canonical: human-2026-agentic-visibility-how-to-see-ai-agent-traffic(1).md
SRC-038 State of Agentic Traffic – May 2026 HUMAN Security / Kaiserman 2026 empirical-operational observed (vendor-measured) agentic-traffic telemetry; named agent/operator mix; sector distribution; page-route categorisation; blocking rate; policy controls Not threat-specific; AI-agent traffic across product/search, account, authentication, content, and checkout routes ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Current vendor-telemetry snapshot of agentic traffic patterns and route exposure, not proof of malicious intent [single]; canonical: human-2026-state-agentic-traffic-may(1).md
SRC-039 2026 Thales Bad Bot Report: Bad Bots in the Agentic Age Thales / Imperva 2026 empirical-operational; threat-intel observed (vendor-measured) bot traffic classes; AI crawlers/fetchers; API endpoint targeting; browser impersonation; session consistency; residential/mobile proxies; CAPTCHA solving; headless automation; signed AI bots account takeover; API abuse; scraping; scalping/inventory hoarding; SMS pumping; carding/payment-flow abuse ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Broad vendor-measured snapshot of production bot, API, ATO, inventory, and AI-agent abuse; useful but not independent prevalence evidence [single]; canonical: thales-2026-bad-bot-report(1).md
SRC-040 AI-Empowered Botnets and API Visibility Gaps: Attack Trends in Financial Services Akamai Security 2026 empirical-operational observed (vendor-measured) WAF/API alerts; API endpoint attack tracking; BOLA/BOPLA; shadow/zombie APIs; behavioural heuristics; user-risk telemetry; low-and-slow tactics; headless browsers; AI crawler classification API abuse; scraping/AI crawler activity; bot evasion; financial-services web attacks; ATO/fraud adjacent ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Vendor telemetry source for financial-services API/bot abuse and the public-data boundary; alerts are not proof of successful attacks [single]; canonical: akamai-2026-financial-services-security-trends(1).md
SRC-049 How to Restore Fairness in Online Ticketing by Fighting Ticket Bots DataDome / Falokun 2026 methods-taxonomy claimed ticket-bot lifecycle; account creation/takeover; rapid refresh; availability scraping; checkout automation; CAPTCHA bypass; virtual waiting rooms; intent-aware detection ticket bots; scalping; queue abuse; limited-stock attacks; checkout automation; resale ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Vendor taxonomy for ticket bots / slot-sniping and scarce-inventory abuse; useful when paired with FTC legal-record evidence [single]; canonical: datadome-2026-ticket-bots(1).md
SRC-051 Comodo ModSecurity WAF Rules Update: The 2026 Solution / SBB-WAF-Rules StopBadBots / sminozzi 2025 tooling-readme capability ModSecurity rules; WAF augmentation; user-agent blocklists; AI-crawler blocking; scanner detection; behavioural thresholds; WordPress hardening bot blocking; scanner/reconnaissance; AI crawler blocking; web-application probing ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Defensive-tooling example for the simple WAF/blocklist/behavioural-threshold end of the control stack [single]; canonical: stopbadbots-2025-sbb-waf-rules(1).md
SRC-054 Block AI Bots - Cloudflare bot solutions docs Cloudflare 2026 capability-doc capability verified AI crawler classification; unverified AI-like bot blocking; hostname-level controls; ad-hostname blocking; AI Crawl Control AI crawler access; unverified AI-like crawling; content-access governance ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Current-trend source for AI crawler management as defensive product categorisation and publisher/content-access governance [single]; canonical: cloudflare-2026-block-ai-bots(1).md
SRC-055 Overview - Cloudflare Turnstile docs Cloudflare 2026 capability-doc capability non-interactive JavaScript challenges; proof-of-work; proof-of-space; Web API probing; browser quirks; human-behaviour checks; pre-clearance cookie automated scripts; non-human browser environments; protected form/login flows ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Challenge-system / CAPTCHA-alternative source showing the move from visual puzzles to browser/environment/human-like signal evaluation [single]; canonical: cloudflare-2026-turnstile(1).md
SRC-056 Detection IDs - Cloudflare bot solutions docs Cloudflare 2026 capability-doc capability Detection IDs/tags; claimed-browser consistency; HTTP header order; heuristics; verified-bot and anomaly detections; Logpush; WAF custom rules predictable bot behaviour; header-order mismatch; browser impersonation; endpoint-specific bot traffic ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Concrete Cloudflare source for coherence checks and turning detection signals into rules, analytics, and logging [single]; canonical: cloudflare-2026-detection-ids(1).md
SRC-057 Bot detection engines - Cloudflare bot solutions docs Cloudflare 2026 methods-taxonomy capability heuristic checks; malicious fingerprints; JavaScript detections; headless-browser detection; headers; session characteristics; browser signals; supervised ML; bot score; anomaly detection simple automation; headless-browser automation; sophisticated bots; malicious fingerprints ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Central Cloudflare defensive-methods taxonomy; useful vendor-side mirror of scraper-side evasion layers [single]; canonical: cloudflare-2026-bot-detection-engines(1).md
SRC-058 Overview - Cloudflare bot solutions docs Cloudflare 2026 capability-doc capability Bot Fight Mode; Super Bot Fight Mode; Bot Analytics; firewall variables; WAF; Turnstile; API Shield; DDoS protection; defensive stack automated traffic; known bot patterns; unwanted crawling; resource abuse; automated endpoint interaction ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Cloudflare defensive-stack overview bridging simple controls, challenge systems, per-request scoring, analytics, and endpoint-specific policy [single]; canonical: cloudflare-2026-bot-solutions-overview(1).md
SRC-079 DataDome Releases VM-Based Obfuscation: The Next Evolution in Client-Side Detection Security DataDome / Vayno 2026 capability-doc / vendor product announcement / defensive architecture explanation capability VM obfuscation; client-side detection; browser detection; Device Check; Slider; WebAssembly; dynamic code regeneration; proprietary bytecode; anti-reverse-engineering Bot detection code protection; client-side detection arms race ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Direct vendor source for VM-based obfuscation applied to commercial client-side bot-detection logic [single]; canonical: datadome-2026-vm-based-obfuscation-client-side-detection(1).md
SRC-084 Commercial CAPTCHA-solving API ecosystem CapSolver; Hayes; HasData / Skakun 2025–2026 capability-doc / vendor marketing / tutorial ecosystem / vendor-adjacent benchmark capability CAPTCHA-solving APIs; reCAPTCHA; Turnstile; Geetest; AWS WAF CAPTCHA; token generation; solver markets; AI agents; automation workflows CAPTCHA defeat; scraping; automated account management; price monitoring; SEO/SERP automation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Capability evidence that CAPTCHA-solving APIs are openly marketed and integrated into automation and AI-agent workflows [single]; canonical: commercial-captcha-solving-api-ecosystem-2026(1).md
SRC-086 OpenClaw: exposed AI-agent gateways and enterprise risk Bitsight / Cruz 2026 empirical-exposure measurement / attack-surface analysis / threat-intelligence observed (exposure measurement) OpenClaw; exposed services; internet scanning; autonomous agents; integrations; prompt injection; RCE; credential exposure; WebSocket API; weak token AI-agent exposure; exposed-agent attack surface; misconfiguration ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Complements HUMAN OpenClaw by measuring exposed gateways and configuration/blast-radius risk rather than traffic abuse [single]; canonical: bitsight-2026-openclaw-exposed-ai-agent-gateways(1).md
SRC-087 How the Peer-to-Business Model Redefines App Monetization Infatica SDK Experts 2025 capability-doc / business-model description / vendor marketing capability peer-to-business SDK; residential proxies; idle bandwidth; opt-in peers; public web data; geo-restrictions; rate limits; CAPTCHA walls Commercial scraping infrastructure; proxy supply-chain; access-barrier bypass ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Vendor business-model source for how residential/peer proxy supply can be built through SDK-enabled app users [single]; canonical: infatica-2025-peer-to-business-app-monetization-sdk(1).md
SRC-088 Residential Proxies: Definition, Use Cases, and Best Providers Bright Data / Zanini 2026 capability-doc / vendor marketing / market map capability residential proxies; ISP proxies; rotating proxies; sticky sessions; geo-targeting; IP reputation; rate-limit and IP-ban avoidance Web scraping; price monitoring; ad verification; sneaker and ticket purchasing; SEO monitoring; social-media management ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Commercial proxy-ecosystem source explaining residential proxy capabilities and use cases, including limited-stock purchasing [single]; canonical: brightdata-2026-residential-proxies-definition-use-cases-best-providers(1).md
SRC-098 | X-Force Threat Intelligence Index 2026 | IBM X-Force | 2026 | threat-intelligence synthesis / vendor report | observed-vendor-threat-intel | public-facing application exploitation; supply-chain compromise; credential theft; AI-assisted social engineering; identity protection; weak authentication; misconfiguration | cloud/application exploitation; credential theft; supply-chain compromise; ransomware/extortion context; AI-assisted operations | not recorded; raw IBM TXT/HTML source supplied | needs extraction / review | Broad current threat-landscape context for identity, cloud/application exposure, AI-enabled acceleration, and basic security hygiene; not bot-specific evidence | [multiple — raw source; extraction pending]; canonical: X-Force Threat Intelligence Index 2026(1).txt; supporting: X-Force 2026 Threat Intelligence Index - Executive Summary _ IBM(1).html |
SRC-099 | 2025 Cloud Threat Hunting and Defense Landscape | Recorded Future Insikt Group | 2026 | threat-intelligence synthesis / observed incidents / mitigations and detections | threat-intelligence synthesis | cloud/SaaS abuse; valid accounts; tokens/keys/service accounts; cloud APIs; CI/CD; backups/snapshots; SaaS functionality; LLM/ML service abuse | cloud abuse; SaaS abuse; credential abuse; account takeover; third-party compromise; cloud ransomware; supply-chain abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Cloud/SaaS adversarial-infrastructure anchor; useful for showing legitimate cloud functions and identities as attack infrastructure, not bot-specific telemetry | [single]; canonical: recordedfuture-2026-cloud-saas-abuse-adversarial-infrastructure(1).md |
SRC-100 | Quarterly Threat Intelligence Report: Q1 2026 | KasadaIQ | 2026 | vendor telemetry / marketplace monitoring / threat-intelligence assessment | observed-vendor-telemetry | threat enablers; bots-as-a-service; automated checkout; account markets; verification/KYC/2FA bypass services; residential proxies; no-code/vibe-coded bots; AI-account demand | account takeover; automated checkout; scalping; credential markets; verification bypass; reselling communities; limited-inventory abuse | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Main bot/automated-abuse source for SaaSification of adversarial infrastructure and market-enabled automation; use cautiously because raw telemetry and source lists are not reproducible | [single]; canonical: kasada-2026-q1-threat-enablers-saasification-adversarial-infrastructure(1).md |
SRC-101 | Mythos and the cost of attacking | Summers / Netwrix | 2026 | opinion / strategic commentary / vendor blog | low | AI attacker economics; cost-of-attack framing; OODA loop; Pyramid of Pain; vulnerability discovery; phishing; command-and-control; intent detection | Not threat-specific; AI-enabled attacker cost framing | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Low-priority security-economics context for the claim that cheap AI can reduce attacker cost; do not use as empirical evidence | [single]; canonical: summers-2026-mythos-cost-of-attacking-ai-security-economics(1).md |

Academic and research

id source org / authors year evidence basis operational proximity signals / techniques threat types provenance review state project impact entry file
SRC-011 ML-Based Detection and Evasion Techniques for Advanced Web Bots (PhD thesis, Bournemouth) Iliou, C. 2022 empirical-academic capability sophistication taxonomy (simple→advanced); web-log + mouse detection; RL & GAN evasion tbd — backfill from entry not recorded migrated — review pending Primary academic anchor; controlled/academic setting iliou-2022-thesis-advanced-web-bots.md
SRC-012 Towards a framework for detecting advanced web bots (ARES) Iliou et al. 2019 empirical-academic capability advanced-bot AUC ~0.68 at low FPR; proxy labels tbd — backfill from entry not recorded migrated — review pending Cleanest source for “simple-bot results hide weak advanced-bot detection” iliou-2019-ares-detecting-advanced-web-bots.md
SRC-013 Web Bot Detection Evasion Using GANs (CSR) Iliou et al. 2021 empirical-academic capability GAN evasion of CNN mouse/touch detectors; web mouse recall → ~0.45 tbd — backfill from entry not recorded migrated — review pending Adversarial framing iliou-2021-csr-web-bot-detection-evasion-gans.md
SRC-014 Web Bot Detection Evasion Using Deep RL (ARES) Iliou et al. 2022 empirical-academic capability RL web-log evasion; detection/evasion as repeated game tbd — backfill from entry not recorded migrated — review pending PoC mechanism, not observed campaigns iliou-2022-ares-web-bot-detection-evasion-deep-rl.md
SRC-015 FP-Inconsistent (arXiv 2406.07647) Venugopalan et al. 2025 empirical-operational measured (honey-site) purchased evasive bot traffic vs DataDome/BotD on a honey site; fingerprint inconsistency rules impression / ad fraud not recorded migrated — review pending Strongest operational academic anchor; external evidence on DataDome venugopalan-2025-fp-inconsistent-fingerprint-inconsistencies-evasive-bot-traffic.md
SRC-016 FP-Inspector (IEEE S&P) Iqbal et al. 2021 empirical-academic n/a (not bot-use) detecting fingerprinting scripts (static + dynamic JS) Not threat-specific not recorded migrated — review pending Foundations for fingerprinting section; not direct bot-detection evidence iqbal-2021-fingerprinting-the-fingerprinters-fp-inspector.md
SRC-017 Browser fingerprints for web authentication (ACM TWEB) Andriamilanto et al. 2021 empirical-academic n/a (not bot-use) fingerprint distinctiveness/stability at scale Not threat-specific (auth context) not recorded migrated — review pending Auth context not bots; 2016–17 data, needs replication caveat andriamilanto-2021-large-scale-browser-fingerprints-web-authentication.md
SRC-018 Detecting Bad Bots via TLS Fingerprints (arXiv 2602.09606) Jarad & Bıçakcı 2026 empirical-academic measured (weak labels) JA4/TLS classification; XGBoost/CatBoost AUC ~0.998 tbd — backfill from entry not recorded migrated — review pending Strong headline metrics; labelling (“bot” in app field) is a real caveat jarad-2026-handshakes-tell-truth-tls-fingerprints-ja4-bad-bots.md
SRC-019 BeCAPTCHA-Mouse (arXiv 2005.00890) Acien et al. 2021 empirical-academic capability mouse-dynamics detection; synthetic (function/GAN) trajectories; public benchmark Not threat-specific not recorded migrated — review pending Constrained point-and-click task acien-2021-becaptcha-mouse-synthetic-mouse-trajectories.md
SRC-020 Hacking reCAPTCHA v3 using RL (arXiv 1903.01003) Akrout et al. 2019 empirical-academic capability RL mouse-movement vs reCAPTCHA v3 score CAPTCHA defeat not recorded migrated — review pending 2019 PoC against one setup; narrow, likely stale akrout-2019-recaptcha-v3-reinforcement-learning.md
SRC-033 FP-Agent: Fingerprinting AI Browsing Agents Wang, Shafiq & Vekaria 2026 empirical-academic; empirical-operational measured (honey-site) browser fingerprints; behavioural fingerprints; typing latency; paste/change events; scroll and mouse movement; XGBoost/SHAP; Cloudflare free-tier case study Not threat-specific; AI-agent detection on benign web tasks Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 needs review Independent measured anchor for AI-agent detectability; external check of Cloudflare free-tier behaviour, not enterprise efficacy [single]; canonical: wang-2026-fp-agent-fingerprinting-ai-browsing-agents.md
SRC-047 Balancing Security and Privacy: Web Bot Detection, Privacy Challenges, and Regulatory Compliance under the GDPR and AI Act Martínez Llamas et al. 2025 review / methods-taxonomy / legal-regulatory analysis capability network/request data; browser/device/TLS fingerprinting; behavioural biometrics; proxies; headless browsers; adversarial fingerprints; PETs; GDPR / AI Act controls bot detection/evasion; credential stuffing; scraping; scalping; privacy/compliance risk ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Academic taxonomy and compliance anchor for detection signals, evasion classes, privacy risks, and GDPR / AI Act implications; re-extraction attached for review-paper framing [multiple — unreconciled]; previous: martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act(1).md; canonical: martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md
SRC-048 How long does it take to get owned? Wardle 2019 empirical-academic measured (honey identities) honey identities; leaked credential publication; paste sites; login monitoring; 2FA alerts; honeytokens; IP/user-agent cautions leaked-credential use; credential stuffing adjacent; account takeover risk ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Independent measured-use evidence for leaked credential use and honey-account methodology; small and dated but transparent [single]; canonical: wardle-2019-how-long-does-it-take-to-get-owned(1).md
SRC-066 Browser Fingerprinting: A survey Laperdrix, Bielova, Baudry & Avoine 2020 review; methods-taxonomy; foundations n/a (foundational survey) browser/device fingerprinting; User-Agent; HTTP headers; JavaScript APIs; Canvas; WebGL; AudioContext; fonts; plugins; extensions; entropy; anonymity sets; fingerprinting defences cross-site tracking; stateless device identification; browser/device re-identification; privacy loss; dual-use security/fraud signal ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Core browser-fingerprinting foundation source covering concepts, attributes, metrics, history, and defences; not observed bot-abuse evidence [single]; canonical: laperdrix-2020-browser-fingerprinting-survey(1).md; source family notes ACM 2020 paper with arXiv v2 alternate
SRC-067 How Unique is Whose Web Browser? The Role of Demographics in Browser Fingerprinting among US Users Berke et al. 2025 empirical-measurement; dataset paper measured (browser-attribute / demographic dataset) browser fingerprinting; demographics; User-Agent; languages; timezone; screen resolution; platform; hardware concurrency; device memory; WebGL; entropy; anonymity sets; demographic inference cross-site tracking; passive/active fingerprinting; re-identification risk; demographic inference; unequal privacy risk ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Empirical update to fingerprinting foundations; useful for unequal privacy-risk and demographic-inference framing, not bot-abuse prevalence [single]; canonical: berke-2025-how-unique-whose-web-browser-demographics(1).md
SRC-068 Battling bots and bad data: enhancing data quality in online surveys Sudbury & Marks 2026 empirical-methods; review-informed case study measured (online survey quality controls) CAPTCHA/reCAPTCHA; open-ended bot checks; attention checks; consistency checks; quota screening; speed/page-time checks; IP/location/duplicate checks; Qualtrics fraud controls online survey bots; bad data; low-quality responses; duplicate participation; quota-gaming; survey fraud ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Review-informed case study showing bot-like automation and inattentive humans can degrade online survey data; supports layered controls outside classic cybersecurity [single]; canonical: sudbury-marks-2026-battling-bots-bad-data(1).md
SRC-074 Secure Development of a Hooking-Based Deception Framework Against Keylogging Techniques Sajid, Ahmed & Sosnoski 2025 empirical-method demonstration / preprint measured-but-bounded API hooking; runtime instrumentation; EasyHook; Microsoft Detours; decoy injection; input perturbation; anti-hooking resilience Not bot-specific; keylogging deception and credential-theft context ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Smaller related entry for cyber-deception, runtime instrumentation, and credential-theft-adjacent defences [single]; canonical: sajid-2025-hooking-based-deception-keylogging(1).md
SRC-075 Protecting Client Browsers with a Principal-based Approach Cao 2014 dissertation / architecture proposal / method demonstration n/a (browser-security foundation) browser principals; client-side isolation; Virtual Browser; JavaScript sandboxing; third-party JavaScript; JShield; XSS; postMessage Not bot-specific; browser security and malicious-content detection foundation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Browser-security foundation for client-side isolation, JavaScript virtualisation, and principal boundaries [single]; canonical: cao-2014-protecting-client-browsers-principal-based-approach(1).md
SRC-076 Layered obfuscation: a taxonomy of software obfuscation techniques for layered security Xu, Zhou, Ming & Lyu 2020 review / taxonomy / conceptual framework n/a (foundation / taxonomy) layered obfuscation; reverse engineering; control-flow obfuscation; data obfuscation; JavaScript obfuscation; application-layer obfuscation Not bot-specific; anti-reverse-engineering and client-side protection background ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Low-priority academic foundation for layered obfuscation as risk management rather than one magic protection [single]; canonical: xu-2020-layered-obfuscation-taxonomy-software-security(1).md
SRC-078 Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries Sudhir et al. 2026 empirical-method demonstration / deobfuscation research / preprint measured-but-bounded VM obfuscation; virtualization obfuscation; deobfuscation; VMProtect; Themida; Tigress; symbolic emulation; CFG recovery Not bot-specific; reverse-engineering arms-race background ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Academic counterweight to VM-obfuscation claims; shows VM obfuscation is strong but actively attacked by deobfuscation research [single]; canonical: sudhir-2026-pushan-trace-free-deobfuscation-vm-obfuscated-binaries(1).md
SRC-082 AI/ML for cybersecurity and cyber-risk management Kolhar & Sridevi 2025–2026 review / conceptual framework / governance overview n/a (background) AI cybersecurity; supervised and unsupervised learning; anomaly detection; UEBA; SOC; adversarial ML; XAI; governance Broad cybersecurity detection and governance; not bot-specific ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Low-priority background source for ML/security governance language; use cautiously because it is broad and not bot-specific [single]; canonical: kolhar-sridevi-2025-2026-ai-ml-cybersecurity-governance-background(1).md
SRC-083 API Security Testing and Exploitation Techniques Kolhar & Gundoor 2026 API-security taxonomy / testing-methods overview / defensive-guidance control-and-capability API security; BOLA; broken authentication; rate limiting; business-logic abuse; API scraping; OAuth2; OIDC; JWT; shadow APIs API abuse; brute force; API scraping; business-logic abuse; API DoS ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Secondary overview for API abuse, API security testing, and business-logic risk; less authoritative than OWASP/NIST/PortSwigger [single]; canonical: kolhar-gundoor-2026-api-security-testing-exploitation-techniques(1).md
SRC-085 CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training Chen et al. 2026 empirical-measurement / method demonstration / preprint measured-but-bounded CAPTCHA solving; GUI agents; VLMs; ReCAP; OCR; slider CAPTCHA; image grid; self-correction; reasoning-action traces CAPTCHA defeat; AI-agent automation; challenge-response bypass capability ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Academic capability evidence that GUI agents can be trained for modern interactive CAPTCHA tasks, bounded by synthetic/benchmark setting [single]; canonical: chen-2026-recap-captcha-native-gui-agents(1).md
SRC-089 Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet Choi et al. 2020 empirical-measurement / infrastructure-analysis measured open proxies; residential proxies; proxy geolocation; ASN; blacklists; IP reputation; malicious activity; evasion infrastructure Proxy-enabled abuse infrastructure; not a specific web-abuse campaign ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Independent empirical foundation for residential/open proxy infrastructure and blacklist overlap [single]; canonical: choi-2020-understanding-proxy-ecosystem(1).md
SRC-097 | Measuring the Changing Cost of Cybercrime | Anderson et al. | 2019 | literature synthesis / measurement framework / cost analysis | framework | cost decomposition; criminal revenue; direct/indirect losses; defence costs; supporting infrastructure; botnets; pay-per-install; measurement bias | Not bot-specific; cybercrime economics and supporting-infrastructure cost framing | ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern | needs review | Non-vendor balance source for cost/economics claims; useful to separate attacker revenue, victim loss, defence cost, and wider social cost | [duplicate upload deduped]; canonical: anderson-2019-measuring-changing-cost-cybercrime(1).md; duplicate: anderson-2019-measuring-changing-cost-cybercrime(2).md |

Threat surface and territory

Capability / infrastructure evidence, not proof of malicious use or bypass success. README/marketing claims are not independent test results.

id source org / authors year evidence basis operational proximity signals / techniques threat types provenance review state project impact entry file
SRC-021 Official documentation (Playwright / Puppeteer / Selenium) project maintainers 2026 capability-doc capability baseline browser-automation capability layer Not threat-specific not recorded migrated — review pending Capability, not intent; sits beneath stealth/cloud layers playwright-puppeteer-selenium-2026-browser-automation-docs.md
SRC-022 undetected-chromedriver (GitHub/PyPI) ultrafunkamsterdam 2021–2024 tooling-readme capability Selenium ChromeDriver evasion layer; explicit IP-reputation caveat Not threat-specific not recorded migrated — review pending README claims, not independent tests ultrafunkamsterdam-2024-undetected-chromedriver-docs-github.md
SRC-023 puppeteer-extra-plugin-stealth (GitHub/npm) berstend 2018–2023 tooling-readme claimed modular evasion catalogue: webdriver, plugins, codecs, WebGL Not threat-specific not recorded migrated — review pending “Passes public bot tests” ≠ production berstend-2023-puppeteer-extra-plugin-stealth-docs-github.md
SRC-024 Anti-scraping bypass, stealth, proxies, fingerprints, Cloudflare bypass ScrapFly 2025–2026 capability-doc; vendor-claim claimed API-level bypass (asp); byte-perfect JA4/HTTP2/QUIC claims; names Nodriver/Camoufox/UC Mode scraping not recorded migrated — review pending Documents the attacker mental model for Cloudflare scrapfly-2025-2026-anti-scraping-bypass-stealth-proxies-fingerprints.md
SRC-025 Web Unlocker, Browser API, proxies, agentic web execution Bright Data 2026 capability-doc; vendor-claim claimed managed proxies/fingerprints/CAPTCHA + cloud browsers; password entry disabled by default scraping not recorded migrated — review pending Compliance/public-data framing brightdata-2026-web-unlocker-browser-api-proxies-anti-bot-bypass.md
SRC-026 Cloud-browser & agent docs (Browserless / Browserbase / Hyperbrowser) respective vendors 2026 capability-doc capability cloud browsers + AI-agent infra; stealth, proxies, CAPTCHA-solving, persistent sessions Not threat-specific not recorded migrated — review pending Bridges automation to agentic browsers browserless-browserbase-hyperbrowser-2026-cloud-browser-agent-automation-docs.md
SRC-029 How to Use Rnet: The Blazing-Fast Python HTTP Client RoundProxies / Marius Bernard 2025 tooling-readme; capability-doc capability browser TLS/HTTP2 impersonation; JA3/custom fingerprints; header order; cookies; sticky sessions; proxies; WebSockets scraping ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 needs review Scraper-side evidence that browser-like TLS/protocol impersonation is treated as a normal evasion capability [single]; canonical: roundproxies-rnet-source-extraction.md
SRC-030 How to Bypass Cloudflare Turnstile ScrapFly / Hisham 2026 tooling-readme; vendor-claim; capability-doc claimed Turnstile modes; browser fingerprinting; canvas/WebGL; behavioural signals; JA3/JA4; proof-of-work/token handling; cloud browsers; residential proxies scraping; CAPTCHA/challenge-response evasion ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 needs review Turnstile-specific scraper-side account of challenge-response signal families and bypass classes [single]; canonical: scrapfly-cloudflare-turnstile-source-extraction.md
SRC-031 How to Bypass Imperva Incapsula when Web Scraping in 2026 ScrapFly / Bernardas Alisauskas 2026 tooling-readme; capability-doc; vendor-claim claimed Imperva/Incapsula block indicators; JA3/JA4; IP reputation; header order; JS/canvas/WebGL/audio fingerprinting; cookies/sessions; rate limiting; stealth browsers scraping; API scraping; WAF/bot-protection evasion ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2 needs review Imperva-specific scraper-side view of WAF/bot-protection detection surfaces and claimed evasion patterns [single]; canonical: scrapfly-imperva-incapsula-source-extraction.md
SRC-032 Quo vadis, crawlers? Progress and what’s next on safeguarding our infrastructure Wikimedia Foundation 2026 empirical-operational; threat-intel observed (first-party operator-reported) AI crawler traffic; residential proxies; browser-identity spoofing; rate-limit circumvention; robot-policy updates; identification-tiered API limits; bot detection scraping / aggressive crawling; infrastructure strain Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 needs review First named-operator account in the register; strong operator-side evidence for AI-crawler pressure and residential-proxy evasion, but platform-specific [single]; canonical: wikimedia-2026-quo-vadis-crawlers-infrastructure.md
SRC-035 How we’re dealing with bots and the reselling of driving tests DVSA / Ryder 2023 threat-intel observed (platform-side) appointment search/reservation automation; CAPTCHA; bot-protection measures; ADI-service monitoring; cancellation-rate and account-link controls scarce-resource appointment abuse; slot-sniping; slot-resale; denial of inventory Codex / GPT-5 / source-extraction-prompt v3 needs review Concrete public-sector appointment-abuse example for the booking-style worked example and scarce-resource abuse lane [single]; canonical: dvsa-2023-bots-reselling-driving-tests.md
SRC-042 Is Web Scraping Legal? Key Insights and Guidelines You Need to Know ScrapingBee 2026 legal-explainer n/a (legal context; not use evidence) terms-of-service; copyright/fair-use risk; personal-data processing; GDPR/CCPA/CFAA framing; robots.txt; rate limiting; CAPTCHA/paywall/login/IP-block risk scraping; unwanted automation; unauthorised-access risk; privacy-risky data collection ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Scraper-side governance framing around the boundary between technical capability and permission; verify legal claims against primary/specialist sources before use [single]; canonical: scrapingbee-2026-web-scraping-legal-guidelines(1).md
SRC-043 Advanced Web Scraping: Hidden Techniques Pro Developers Actually Use ScrapingBee 2026 bypass-guide capability async orchestration; multiprocessing; rate control; backoff/jitter; circuit breakers; recursive filtering; JavaScript rendering; AJAX/API discovery; proxy rotation; CAPTCHA solving web scraping at scale; large-scale data extraction; pagination-limit circumvention; anti-blocking ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Public scraper-side engineering-pattern source for robust extraction at scale; high dual-use, so cite technique families only [single]; canonical: scrapingbee-2026-advanced-web-scraping-hidden-techniques(1).md
SRC-044 Best Price Scraping Tools for 2026: Top Services Compared ScrapingBee 2026 capability-doc capability price scraping; scraping APIs; no-code scrapers; JavaScript rendering; headless browsers; proxy rotation; CAPTCHA handling; anti-bot reliability; AI extraction ecommerce scraping; price intelligence; competitive-intelligence collection; unwanted automated price collection ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Market-map evidence showing price scraping as packaged commercial service with anti-bot handling as a core buying criterion [single]; canonical: scrapingbee-2026-price-scraping-tools(1).md
SRC-045 How To Bypass PerimeterX Anti-Bot Protection System In 2026 ScrapingBee / Krukowski 2026 bypass-guide capability IP reputation; TLS fingerprints; HTTP/2; header order; browser fingerprinting; cookies/tokens; session continuity; behavioural signals; residential/mobile proxies; stealth browsers anti-bot evasion; web scraping against PerimeterX/HUMAN-protected sites; browser automation ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Public scraper-side evidence that named-defender bypass thinking is framed as multi-layer signal alignment across IP, TLS, HTTP, fingerprint, session, and behaviour [single]; canonical: scrapingbee-2026-perimeterx-human-bypass(1).md
SRC-046 Avoiding bot detection: How to scrape the web without getting blocked? / browser-fingerprinting niespodd n.d. / ongoing tooling-readme claimed browser fingerprinting; anti-detection; stealth browsers; Puppeteer/Playwright/Selenium; residential proxies; CAPTCHA solving; TLS/JA3/JA4; WebGL/fonts/client hints/WebDriver scraper-side evasion; anti-detection; proxy-assisted scraping; browser automation ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Public scraper-side/evasion mental model and tooling-ecosystem map; maintainer claims, not independent effectiveness evidence [single]; canonical: niespodd-browser-fingerprinting(1).md
SRC-050 FTC Brings First-Ever Cases Under the BOTS Act Federal Trade Commission 2021 legal-record observed (enforcement record) automated ticket search/reservation; IP-address concealment; fictitious accounts; multiple credit cards; purchase-limit circumvention ticket bots; scalping; limited-stock inventory capture; purchase-limit circumvention; resale-market abuse ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review High-value observed-use / enforcement evidence for ticket-bot abuse and limited-stock automation; cite as FTC allegations/orders unless underlying records are checked [single]; canonical: ftc-2021-first-bots-act-cases(1).md
SRC-052 bad-asn-list: open-source ASN blocklist for cloud/hosting/colo traffic Hamachek n.d. (~2019–2020 unverified) tooling-readme; empirical-operational observed (first-party anecdotal) datacenter/hosting/colo ASN blocklist; IP→ASN lookup; network-origin reputation; VPN/hosting egress; signup fraud scoring fake account creation; signup abuse; datacenter-origin automation Claude (chat interface) / Claude Opus 4.8 / source-extraction-prompt v3 needs review Worked example of network-origin / ASN-reputation detection and the datacenter-blocking → residential-proxy arms-race baseline; anecdotal and dated [single]; canonical: hamachek-bad-asn-list-datacenter-asn-blocklist.md
SRC-053 The Best Web Scraping API to Avoid Getting Blocked ScrapingBee 2026 capability-doc capability managed scraping API; headless Chrome; JavaScript rendering; selector waits; custom interactions; proxy rotation; residential/stealth proxies; AI extraction; LLM/RAG data ingestion web scraping; commercial scraping infrastructure; anti-blocking abstraction; ecommerce and LLM data collection ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3 needs review Commercial capability source showing scraping-as-a-service packaging of browsers, proxies, extraction, geotargeting, and anti-blocking features [single]; canonical: scrapingbee-2026-web-scraping-api(1).md
SRC-077 On the Architecture of Bot Detection Services Tschacher 2021 technical explainer / architecture analysis / attacker-aware commentary n/a (architecture analysis) passive detection; client-side detection; JavaScript fingerprinting; TLS/TCP/IP fingerprinting; HTTP headers; IP reputation; cookies; sessions; JavaScript obfuscation; JavaScript VMs Bot detection architecture; browser automation detection context ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Architecture-level source connecting client/server signal layers, spoofability, and protection of client-side detection scripts [single]; canonical: tschacher-2021-architecture-of-bot-detection-services(1).md
SRC-080 Ticketmaster v. Prestige Entertainment West: ticket bots, dummy accounts, CAPTCHA, and legal remedies Ticketmaster litigation; Proskauer summary; Ballon context 2018–2019 court pleadings/order / litigation allegations / settlement summary / legal analysis observed (legal case / alleged conduct) ticket bots; dummy accounts; CAPTCHA; access controls; CFAA; DMCA; BOTS Act; purchase limits; resale Ticket scalping; automated ticket purchasing; purchase-limit circumvention; inventory capture ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Strong legal-case evidence for alleged ticket-bot activity, dummy accounts, CAPTCHA circumvention context, and settlement remedies [single]; canonical: ticketmaster-prestige-2018-2019-ticket-bots-settlement(1).md
SRC-081 U.S. Senate Ticketmaster / Taylor Swift case: scalper bots, Verified Fan, and live-event ticketing Berchtold; Bradish; Guardian / U.S. Senate hearing 2023 public testimony / contested case account / secondary press reporting observed-claim Ticketmaster; Taylor Swift; Verified Fan; scalper bots; access-code servers; BOTS Act; secondary ticketing; live events Ticket bots; queue pressure; access-code-server attack pressure; scalping; live-event ticketing abuse ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review High-profile public-hearing source for ticket-bot pressure, with clear caveat that core bot-volume claims come from Live Nation/Ticketmaster [single]; canonical: us-senate-2023-ticketmaster-taylor-swift-scalper-bots(1).md
SRC-094 Detecting Post-Compromise Threat Activity in Microsoft Cloud Environments CISA 2021 government advisory / detection guidance / incident-linked TTPs observed-guidance (historical / archived) Microsoft cloud identity; Azure AD; M365/O365; federated identity; forged tokens; OAuth/SAML; service principals; API access; Sparrow cloud identity compromise; API-based persistence; post-compromise cloud activity; credential/service-account abuse ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Supporting official source for cloud identity/API persistence after compromise; archived and historical, so not current trend evidence [single]; canonical: cisa-2021-post-compromise-microsoft-cloud-identity-api-access(1).md
SRC-095 Scattered Spider FBI / CISA / RCMP / ASD ACSC / AFP / CCCS / NCSC-UK 2025 government advisory / investigation-derived TTPs / MITRE ATT&CK mapping observed-investigative helpdesk social engineering; SIM swap; MFA fatigue; OTP; valid accounts; SSO; RMM/remote-access tools; cloud discovery; ransomware identity abuse; account takeover; helpdesk compromise; valid-account intrusion; legitimate-tool abuse; ransomware/extortion ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Strong official non-vendor evidence for identity/social-engineering/legitimate-tool abuse, but actor-specific and not bot-specific [single]; canonical: cisa-fbi-2025-scattered-spider-identity-helpdesk-legitimate-tools(1).md
SRC-096 Hiding in Plain Sight: Tracking Bulletproof Hosting and Abused RDP Infrastructure Censys 2026 internet-scale scanning analysis / technical measurement / threat detection measured infrastructure bulletproof hosting; abused RDP; Windows hostnames; VM templates; ASNs; VPS; infrastructure clustering; takedown evasion adversarial infrastructure; ransomware infrastructure context; hosting abuse; persistence/evasion ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Infrastructure-measurement source for adversarial hosting and abused RDP patterns; balances generic threat reports with internet-scale artifact analysis [single]; canonical: censys-2026-bulletproof-hosting-abused-rdp-infrastructure(1).md
SRC-102 Commercial automation cost stack 2026: scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification Combined pricing / market source cluster 2026 vendor pricing pages / vendor-adjacent comparison / industry survey / pricing snapshots market availability scraping APIs; managed scraping; residential proxies; CAPTCHA solving; temporary SMS; browser rendering; proxy routing; account verification inputs scraping; CAPTCHA defeat; account creation; credential-stuffing support; indirect scarce-resource automation ChatGPT / GPT-5.5 Thinking / newer register/source-extraction pattern needs review Cost-of-capability source showing key automation inputs are modular and purchasable; not abuse prevalence or effectiveness evidence [single]; canonical: commercial-automation-cost-stack-2026-scraping-proxies-captcha-sms(1).md

Framing-distance ledger

The project’s central analytical discipline: each source approximates the real problem differently and fails to represent it differently (EVIDENCE-REVIEW.md §5). The what it cannot show column below is migrated from the old register’s notes where present; everything else is tbd — backfill from entry pending a read of working/register-entries/.

id source what it approximates what it fails to represent what it cannot show (migrated)
SRC-001 OWASP OAT a shared vocabulary for automated-threat types tbd — backfill from entry Taxonomy/ontology only — no detection or prevalence evidence
SRC-002 Cloudflare bots docs what one major vendor’s control plane exposes and uses tbd — backfill from entry That any exposed signal actually works in production — only that Cloudflare uses/exposes it
SRC-003 Cloudflare Bot Mgmt the product surface area and WAF/Workers variables tbd — backfill from entry Efficacy; product structure ≠ detection performance
SRC-004 DataDome intent-based detection framing and signal families tbd — backfill from entry “Fully protected” exposure figures are vendor-measured, not independently verifiable
SRC-005 Netacea brochure server-side / no-client-JS detection positioning tbd — backfill from entry Case-study results are vendor-reported
SRC-006 Netacea ML showcase a taxonomy of ML approaches to bot detection tbd — backfill from entry No reproducible method detail; cannot validate any approach
SRC-007 Netacea survey executive-perceived business impact of bots tbd — backfill from entry Self-reported survey; not measured prevalence or efficacy
SRC-008 Arkose attacker-cost / dynamic-challenge framing; agentic-AI claims tbd — backfill from entry Several reports gated; survey/vendor evidence only
SRC-009 Kasada the attacker economy (solver/proxy/CAPTCHA pricing) tbd — backfill from entry Vendor/threat-intel framing; pricing claims not independently audited
SRC-010 HUMAN/PerimeterX AI-agent detection signal categories; OpenClaw observations tbd — backfill from entry Vendor/threat-intel; detection-category claims not externally verified
SRC-011 Iliou thesis advanced-bot detection in a controlled academic setting tbd — backfill from entry Controlled/academic setting; does not establish production behaviour
SRC-012 Iliou 2019 that simple-bot metrics hide weak advanced-bot detection tbd — backfill from entry Proxy labels; advanced-bot AUC ~0.68 at low FPR is the honest figure
SRC-013 Iliou 2021 GAN evasion of CNN mouse/touch detectors tbd — backfill from entry One adversarial strategy; recall drop to ~0.45 is not a worst-case bound
SRC-014 Iliou 2022 RL detection/evasion as a repeated game (RL evasion) tbd — backfill from entry PoC mechanism, not observed campaigns
SRC-015 FP-Inconsistent evasive bot traffic vs commercial detectors on a honey site tbd — backfill from entry Threat model is impression fraud; one honey site, not general production
SRC-016 FP-Inspector detecting fingerprinting scripts, not bots tbd — backfill from entry Not direct bot-detection evidence
SRC-017 Andriamilanto fingerprint distinctiveness/stability at scale (auth) tbd — backfill from entry Auth context not bots; 2016–17 data, needs replication
SRC-018 Jarad TLS JA4/TLS classification of bad bots tbd — backfill from entry Labelling (“bot” in app field) caveats the headline AUC ~0.998
SRC-019 BeCAPTCHA-Mouse mouse-dynamics detection + synthetic trajectories tbd — backfill from entry Constrained point-and-click task; not free browsing
SRC-020 Akrout reCAPTCHA RL mouse movement vs one reCAPTCHA v3 setup tbd — backfill from entry 2019 PoC against one setup; likely stale
SRC-021 automation docs the baseline automation capability layer tbd — backfill from entry Capability, not intent or malicious use
SRC-022 undetected-chromedriver a Selenium evasion layer’s claimed capabilities tbd — backfill from entry README claims, not independent tests; maintainer’s own IP-reputation caveat
SRC-023 puppeteer-stealth a modular evasion catalogue tbd — backfill from entry Passing public bot tests ≠ evading production detection
SRC-024 ScrapFly the attacker mental model for Cloudflare bypass tbd — backfill from entry Byte-perfect-fingerprint claims are vendor claims, not verified bypass success
SRC-025 Bright Data managed bypass infrastructure + cloud browsers tbd — backfill from entry Compliance framing; capability claims not independently verified
SRC-026 cloud-browser/agent docs cloud-browser + AI-agent infrastructure features tbd — backfill from entry Feature availability, not evasion efficacy
SRC-027 OWASP Handbook v1.3 the defender’s naming layer for automated web-application threats and broad countermeasure classes no prevalence, detection performance, algorithmic detail, production telemetry, or empirical AI-agent evidence; intent categories may not separate cleanly in observed traffic Taxonomy and countermeasure suggestions only; cannot show detectability, prevalence, or efficacy
SRC-028 Playwright cookies tutorial the basic automation capability of reading and preserving cookie/session state from a browser context no adversarial setting, no bot detection, no cookie replay at scale, no session binding or risk-scoring controls Cannot show that cookies are sufficient to impersonate users or bypass bot detection
SRC-029 RoundProxies Rnet scraper-side HTTP-client evolution toward browser-like TLS/HTTP2/header/cookie/proxy behaviour no defender-side logic, production traffic, behavioural JS challenges, account history, graph/entity signals, or verified bypass outcomes Cannot show that Rnet is undetectable or that TLS matching is sufficient to bypass modern anti-bot systems
SRC-030 ScrapFly Turnstile scraper-side understanding of Cloudflare Turnstile challenge mechanisms and bypass classes no verified Cloudflare internals, defender telemetry, independent success rates, false-positive handling, or broader Cloudflare decisioning Cannot show the complete Turnstile signal set or that any bypass method works reliably across production sites
SRC-031 ScrapFly Imperva scraper-side view of websites protected by Imperva/Incapsula and the signal families scraper tooling believes matter no Imperva internals, cross-customer telemetry, signal weightings, independent validation, durable success rates, or false-positive behaviour Cannot show Imperva’s actual internal trust-score mechanism or that listed bypass approaches work reliably
SRC-032 Wikimedia crawlers operator-side view of AI-crawler pressure, residential-proxy evasion, and tiered API/rate-limit response first-party blog; platform-specific; no methodology, false-positive rate, or independent verification; open knowledge platform differs from commercial booking/e-commerce targets Cannot show prevalence outside Wikimedia, rigorous defence efficacy, or direct generality to credential stuffing, ATO, or scalping
SRC-033 FP-Agent current commercial AI browsing agents’ browser and behavioural fingerprints during realistic benign web tasks; includes a Cloudflare free-tier check controlled honey site; benign tasks; closed-world known agents; narrow human population; point-in-time; not enterprise bot management Cannot show production abuse prevalence, open-world detection, durability under adversarial humanisation, or general Cloudflare/vendor efficacy
SRC-034 F5 credential stuffing end-to-end credential-stuffing landscape from spill supply to login traffic against large consumer sites vendor telemetry from large enterprise customers; disclosed spill data only; 2020-era tooling; no reproducible method Cannot show web-wide prevalence, independent detection efficacy, current-era tooling coverage, or generality to smaller booking-style targets
SRC-035 DVSA driving-test bots public-sector appointment-slot abuse: automated monitoring, booking, holding, resale, and platform mitigation platform-side public blog; no traffic counts, bot counts, detection logs, or false-positive rates; platform-specific Cannot show prevalence, exact automation mechanism, share of bookings affected, all third-party service behaviour, or mitigation efficacy
SRC-036 HUMAN OpenClaw vendor-observed exposed autonomous-agent gateways producing browser-automation traffic across engagement manipulation and reconnaissance patterns vendor telemetry; attribution uncertainty; no raw data, query method, full validation, or disclosed detection logic Cannot show autonomous execution for every request, internet-wide prevalence, attack success, or harm magnitude
SRC-037 HUMAN Agentic Visibility a vendor capability model for identifying, classifying, measuring, and controlling AI-agent traffic product explanation; dashboard screenshots; no independent evaluation or primary telemetry in this source Cannot show traffic prevalence, classification accuracy, abuse prevalence, or effectiveness of the controls
SRC-038 HUMAN State of Agentic Traffic May 2026 monthly vendor telemetry on observed AI-agent mix, sector destinations, page-route categories, and blocking rates HUMAN-visible traffic only; opaque classification; traffic is not necessarily malicious; one month snapshot Cannot show malicious intent, internet-wide prevalence, attack success, or accuracy of named-agent attribution
SRC-039 Thales Bad Bot Report 2026 production-facing bot-management view of automated traffic, API-first abuse, AI clients, ATO, and inventory abuse across protected customers vendor-visible traffic; sampling/classification opaque; mixes telemetry, analyst interpretation, and product framing Cannot show internet-wide prevalence, classification quality, attack success rates, comparative vendor efficacy, or AI causality
SRC-040 Akamai financial-services trends edge/WAF/API telemetry view of financial-services web attacks, API visibility gaps, AI-labelled bot traffic, and scraping/API targeting Akamai customer/product visibility; finance-sector-specific; alerts not success; mixes telemetry, survey claims, and interpretation Cannot show market-wide prevalence, classification accuracy, detection efficacy, AI causality, or generalisation beyond Akamai-protected finance traffic
SRC-041 ScrapingBee scraper test sites the educational pipeline from controlled scraper practice to production-style dynamic scraping training-site article; not abuse, evasion success, prevalence, or harm Cannot show real-world bot traffic, abuse, hostile scraping, or that practice sites cause misuse
SRC-042 ScrapingBee legal guidelines scraper-vendor compliance framing around lawful/risky/impermissible scraping vendor legal explainer; not primary law, legal advice, or jurisdiction-specific analysis Cannot determine legality, replace legal advice, establish lawful customer behaviour, or validate legal-risk reduction claims
SRC-043 ScrapingBee advanced scraping techniques public scraper-side engineering maturity: scaling, reliability, JavaScript rendering, pagination workarounds, and anti-blocking infrastructure capability guide; no observed abuse, target-specific impact, independent validation, or success rates Cannot show abusive use, prevalence, success against protected targets, or defender impact; high dual-use
SRC-044 ScrapingBee price scraping tools commercial packaging of price scraping as a normal business workflow with anti-bot handling as a feature vendor comparison/marketing; not neutral benchmark or defender-side evidence Cannot show independent tool performance, abuse prevalence, or that price-scraping activity is lawful or wanted by targets
SRC-045 ScrapingBee PerimeterX/HUMAN bypass guide the public evasion mental model for named commercial bot management as multi-layer signal alignment scraper-vendor bypass narrative; no independent success evidence, raw tests, or defender confirmation Cannot show PerimeterX/HUMAN weakness, bypass success rates, observed abuse, or target-specific effectiveness
SRC-046 niespodd browser-fingerprinting a public scraper-side anti-detection taxonomy and tooling ecosystem map maintainer claims/tool catalogue; no raw tests, denominators, success rates, or independent validation Cannot establish tool effectiveness, prevalence, legality, safety, or production bypass success
SRC-047 Martínez Llamas et al. privacy/GDPR/AI Act review academic synthesis of detection signal families, evasion classes, privacy risks, and regulatory controls review/taxonomy, not production telemetry or empirical measurement Cannot show abuse prevalence, detection efficacy, legal compliance in a specific deployment, or current production practice
SRC-048 Wardle honey identities independent measurement of leaked-credential use through honey identities and paste-site publication small, dated, paste-site-only honey experiment; observes unauthorised access, not necessarily bots Cannot show market-wide prevalence, automation share, modern credential-stuffing infrastructure, or vendor-control efficacy
SRC-049 DataDome ticket bots vendor taxonomy of ticket-bot activity across account preparation, sale/queue pressure, checkout, and resale vendor explainer/product marketing; no primary telemetry, independent measurement, or product validation Cannot show ticket-bot prevalence, DataDome efficacy, false-positive/negative rates, or that bots dominate resale prices
SRC-050 FTC BOTS Act cases enforcement-record proximity to alleged real ticket-bot use against scarce-inventory ticketing flows legal press release; allegations/proposed orders, not a technical measurement study or prevalence estimate Cannot show defence success rates, detection signals, full bot architecture, or generalisation beyond named enforcement cases
SRC-051 StopBadBots SBB-WAF-Rules small-site / self-managed-server defensive controls: WAF rules, blocklists, user-agent filters, scanner heuristics maintainer claims attached to tooling; no independent evaluation, prevalence, or false-positive measurement Cannot show general blocking efficacy, current blocklist quality, low false positives, or coverage of browser-native agents
SRC-052 Hamachek bad-asn-list a concrete operator account of datacenter/ASN blocking for signup abuse and a reusable defensive artifact single-site anecdote; dated and no methodology/false-positive measurement Cannot establish general efficacy, current usefulness, false-positive rates, or prevalence beyond one operator account
SRC-053 ScrapingBee web scraping API commercial scraping-as-a-service abstraction of browsers, proxies, extraction, geotargeting, and anti-blocking vendor product page; no independent benchmark, target list, raw logs, or defender corroboration Cannot show abuse, effectiveness, success rate validity, or customer legality
SRC-054 Cloudflare Block AI Bots site-owner control over AI crawler and AI-like crawler access product-control documentation; no classification details, traffic counts, effectiveness, or harm evidence Cannot show AI crawler prevalence, malice, false positives, effectiveness, or legal/policy sufficiency
SRC-055 Cloudflare Turnstile browser/form challenge systems that assess browser environment and behaviour before or instead of visual puzzles vendor documentation; no independent bypass, accessibility, usability, or privacy assessment Cannot show real-world challenge effectiveness, advanced-bot resistance, user impact, abuse prevalence, or legal sufficiency
SRC-056 Cloudflare Detection IDs operational bot detection where low-level signals are exposed for rule-making and troubleshooting no full detection list, exact logic, performance data, or false-positive metrics Cannot prove zero human overlap, adaptation resistance, prevalence, or correctness of a particular rule
SRC-057 Cloudflare bot detection engines layered production detection across heuristics, JavaScript detections, ML, anomaly detection, headers, sessions, and browser signals product-method summary; not a model disclosure, benchmark, audit, or telemetry study Cannot prove JS/ML effectiveness, data necessity, false-positive rates, or real-world attack prevalence
SRC-058 Cloudflare bot solutions overview the packaging of bot defence as a layered operational stack from simple challenges to enterprise scoring and analytics vendor capability overview; no independent prevalence, error-rate, or enforcement-outcome data Cannot show effectiveness, comparative performance, prevalence, or legal/governance sufficiency
SRC-059 MDN CORS the browser cross-origin sharing mechanism that later scraping/browser-security explanations rely on neutral technical reference, not threat or efficacy evidence Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-060 MDN HTTP caching cache semantics that affect server load, repeated crawler requests, conditional requests, and shared-cache privacy/security issues neutral technical reference, not threat or efficacy evidence Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-061 MDN HTTP authentication HTTP and proxy authentication concepts behind credential-bearing automated requests and access-control boundaries neutral technical reference, not threat or efficacy evidence Cannot show abuse, prevalence, attacker use, anti-bot effectiveness, or legality
SRC-062 MDN cookies cookie/session mechanics that support session continuity, login state, tracking, and bot-management cookie checks neutral technical reference, not threat or efficacy evidence Cannot show account-abuse prevalence, attacker use, detection effectiveness, or legality
SRC-063 MDN User-Agent header the basic identity/compatibility string used by browsers, crawlers, tools, and spoofing or reduction discussions neutral technical reference, not threat or efficacy evidence Cannot show spoofing prevalence, malicious use, detection effectiveness, or legality
SRC-064 MDN HTTP headers the vocabulary of HTTP request/response fields later used in header-based detection and scraper-side spoofing discussions neutral technical reference, not threat or efficacy evidence Cannot show malicious header use, prevalence, detection effectiveness, or legality
SRC-065 MDN Overview of HTTP the basic client-server/user-agent model behind browsers, crawlers, scripts, proxies, cookies, and request patterns neutral technical reference, not threat or efficacy evidence Cannot show abuse, prevalence, attacker use, detection effectiveness, or legality
SRC-066 Laperdrix browser-fingerprinting survey the browser/device fingerprinting layer used in tracking, fraud detection, bot detection, and privacy-invasive identification survey/foundation source; not current production bot telemetry or observed abuse; browser/API surfaces have changed since publication Cannot show current prevalence, modern anti-bot performance, abuse against a target, or legal compliance
SRC-067 Berke et al. browser demographics browser/device-attribute identifiability and unequal privacy risk across demographic groups US Prolific sample; Dec 2023 device/browser snapshot; not bot traffic or anti-bot detection Cannot show bot prevalence, detection performance, malicious use, or that a specific API change fixes demographic inference risk
SRC-068 Sudbury & Marks survey bots automated and low-quality participation in incentivised online surveys and layered survey-quality controls online-survey domain; not commercial web scraping, credential stuffing, ticket bots, or vendor telemetry Cannot prove all bad responses were bots, quantify internet-wide bot prevalence, or validate commercial bot-management controls
SRC-069 PortSwigger OAuth vulnerabilities abuse of third-party login and OAuth-based authentication flows, including account takeover and token/API misuse educational vulnerability taxonomy; not measured prevalence, bot volume, or incident counts Cannot prove OAuth is generally unsafe, quantify automation around OAuth, or replace OAuth/OIDC specifications and BCPs
SRC-070 PortSwigger secure authentication defensive hardening against automated login abuse and authentication bypass practical guidance; not empirical control-effectiveness evidence Cannot prove a control is sufficient, quantify CAPTCHA/rate-limit effectiveness, or replace formal standards such as ASVS/NIST
SRC-071 PortSwigger password login vulnerabilities automated login abuse through brute force, credential stuffing, username enumeration, and weak password-login controls educational attack/defence taxonomy; not real-world attack-frequency or success-rate evidence Cannot quantify credential-stuffing prevalence or prove CAPTCHA, rate limiting, IP blocking, or account locking works in the wild
SRC-072 PortSwigger authentication vulnerabilities authentication as an attack surface linking bot automation to account takeover and follow-on exploitation educational overview/lab framing; not production telemetry or observed abuse evidence Cannot show automation prevalence, economic harm, bot-detection effectiveness, or legal/regulatory status
SRC-073 NIST SP 800-63B-4 standards-backed authentication, session, throttling, and authenticator-management controls for account-abuse contexts normative control guidance; no bot-abuse telemetry, no control-effectiveness data, and no vendor/tool comparison Cannot show credential-stuffing prevalence, detection performance, or that a specific fraud indicator reliably separates bots from humans
SRC-074 Sajid et al. hooking deception runtime deception and API-hooking concepts that may inform anti-tamper/instrumentation thinking endpoint keylogging domain, not web-bot detection or browser automation; preprint Cannot show web-abuse prevalence, bot detection, or operational effectiveness in browser/client anti-bot systems
SRC-075 Cao browser principals browser principal boundaries, JavaScript virtualisation, and client-side isolation concepts older browser-security dissertation; not current bot-detection telemetry or modern browser-agent evidence Cannot show modern browser-automation detection, current anti-bot efficacy, or observed abuse
SRC-076 Xu layered obfuscation layered obfuscation as risk management and a taxonomy of obfuscation targets and layers general software-security taxonomy, not bot-specific and not empirical bot evidence Cannot show that client-side bot-detection obfuscation works, or how attackers respond in production
SRC-077 Tschacher bot-detection architecture architecture-level explanation of passive bot detection, spoofable client signals, and layered signal collection public technical commentary, not telemetry, benchmark, or vendor-validated architecture Cannot quantify prevalence, false positives, or the effectiveness of any specific detection service
SRC-078 Pushan deobfuscation reverse-engineering pressure against VM-obfuscated binaries and limits of obfuscation as a durable defence binary deobfuscation research, not browser JavaScript bot detection; preprint Cannot show that DataDome-style VM obfuscation is broken or effective in browser deployments
SRC-079 DataDome VM obfuscation commercial use of VM-based obfuscation to protect exposed client-side bot-detection logic vendor product announcement; no independent measurement, no raw attack data, no performance or usability data Cannot prove the protection works, quantify attacker cost increase, or show detection efficacy
SRC-080 Ticketmaster Prestige litigation legal-case evidence of alleged large-scale automated ticket purchasing, dummy accounts, and access-control circumvention allegations and settlement context, not a full technical study or final trial finding on every claim Cannot provide bot-code details, systematic prevalence, or independently measured detection/control performance
SRC-081 U.S. Senate Ticketmaster hearing high-profile public account of ticket-bot pressure and access-code-server attack claims around the Taylor Swift presale contested public testimony; key technical claims come from Live Nation/Ticketmaster; antitrust frame is partly separate Cannot independently verify bot volume, causality, or platform-specific failure mechanics
SRC-082 Kolhar & Sridevi AI/ML cybersecurity broad AI/ML cybersecurity and governance vocabulary for anomaly detection, SOC, UEBA, and human oversight generic cybersecurity overview, not bot-specific and not empirical Cannot support bot-specific claims, prevalence, or method effectiveness without stronger sources
SRC-083 Kolhar & Gundoor API security API abuse vocabulary: BOLA, broken auth, rate limits, business logic, API scraping, shadow APIs secondary book-chapter overview; less authoritative than OWASP/NIST/PortSwigger and not observed abuse Cannot quantify API abuse or validate specific API-security controls in production
SRC-084 CAPTCHA-solving ecosystem open commercial solver market and integration of CAPTCHA solving into automation and AI-agent workflows vendor/tutorial/benchmark ecosystem; not independent prevalence or defender-side validation Cannot show abuse volume, real-target success rates, or that a solver works against a given site
SRC-085 Chen ReCAP controlled evidence that GUI agents can be trained to solve modern interactive CAPTCHA variants synthetic and benchmark-focused preprint; not live abuse or deployed solver telemetry Cannot show operational CAPTCHA-bypass prevalence, target impact, or production reliability
SRC-086 Bitsight OpenClaw exposure internet-exposure measurement of AI-agent gateways and deployment-risk/blast-radius framing stronger for exposure than abuse; vendor scan methodology and coverage require checking before using numbers Cannot show confirmed abuse from each exposed agent or full internet-wide completeness
SRC-087 Infatica P2B SDK residential proxy supply-chain business model using SDK-enabled app users and idle bandwidth vendor marketing; compliance and consent claims are not independently validated Cannot prove opt-in quality, compliance, effectiveness, abuse prevalence, or end-use legitimacy
SRC-088 Bright Data residential proxies commercial explanation of residential/ISP proxy capabilities, rotation, sticky sessions, and limited-stock use cases vendor market map; not independent measurement and not proof of effectiveness Cannot establish provider quality, legal use, real-world success, or abuse prevalence
SRC-089 Choi proxy ecosystem independent empirical comparison of open and residential proxies and blacklist overlap dataset vintage and prior residential dataset source limit currentness; not a specific web-abuse campaign Cannot show current proxy market structure, target-specific abuse, or detection performance in bot-management systems
SRC-090 RFC 9113/9114 current HTTP/2 and HTTP/3 protocol mechanics used as foundation for protocol-layer interpretation protocol standards, not bot evidence or detection evidence Cannot show abuse, fingerprint prevalence, or effectiveness of protocol-level detection
SRC-091 Chromium multi-process architecture browser architecture vocabulary for renderer/browser processes, sandboxing, IPC, and browser-native automation context design reference, not bot/security telemetry Cannot show browser-automation detection, abuse prevalence, or fingerprinting behaviour
SRC-092 RFC 7540 historical HTTP/2 protocol specification and original terminology obsoleted by RFC 9113; historical only for current HTTP/2 claims Cannot support current HTTP/2 wording where RFC 9113 differs or supersedes it
SRC-093 OWASP ASVS 5.0.0 standards-backed application controls for anti-automation, business logic, auth, sessions, API validation, and logging requirements standard, not threat taxonomy, telemetry, or modern bot-management method source Cannot show prevalence, detection effectiveness, false-positive trade-offs, or AI/browser-agent trends
SRC-094 CISA Microsoft cloud post-compromise cloud identity compromise, forged token/OAuth/SAML abuse, and API-based persistence after compromise historical SolarWinds/SVR-linked advisory; not bot-specific; archived; no raw telemetry or prevalence Cannot show current 2026 cloud threat prevalence, bot-specific abuse, SaaS pricing, or detection efficacy
SRC-095 Scattered Spider advisory identity/helpdesk/social-engineering abuse using valid accounts, MFA manipulation, SSO, and legitimate remote tools actor-specific; investigation-derived; not bot-specific; no neutral prevalence or raw victim logs Cannot show bot automation prevalence, full campaign reconstruction, or general market-wide identity-abuse rates
SRC-096 Censys bulletproof hosting / RDP infrastructure measurement of abused hosting patterns, exposed RDP artifacts, and clustering signals not bot-specific; intent uncertain for individual hosts; no full raw dataset; infrastructure evidence, not abuse outcome evidence Cannot prove criminal intent for every host, quantify bot abuse, or show account/booking/scraping impacts
SRC-097 Anderson cybercrime costs security-economics framing separating criminal revenue, direct loss, indirect loss, defence cost, and social cost 2019 synthesis; not bot-specific; not current automation pricing or 2026 telemetry Cannot show current bot markets, current attacker costs, or effectiveness of specific controls
SRC-098 IBM X-Force 2026 current vendor threat-landscape framing around public-facing application exploitation, credential theft, supply chain/cloud dependencies, and AI-accelerated operations raw IBM page/source rather than reviewed extraction; vendor perspective; broad cyber, not bot-specific Cannot show bot-specific prevalence, detailed methodology from the supplied raw page, or independent validation
SRC-099 Recorded Future cloud/SaaS abuse cloud and SaaS functionality as adversarial infrastructure: valid accounts, APIs, CI/CD, storage, backups, LLM/ML services threat-intelligence synthesis; many examples based on third-party reporting; not bot-specific; no raw telemetry for all cases Cannot provide neutral prevalence, complete primary evidence, or scraping/proxy/CAPTCHA-specific evidence
SRC-100 Kasada threat enablers mature service economy around automated checkout, account markets, verification bypass, bots-as-a-service, and reselling communities vendor telemetry and marketplace monitoring; source list/method not fully reproducible; product-sector lens Cannot independently validate all figures, prove all account sales led to abuse, or isolate AI as causal
SRC-101 Netwrix Mythos cost of attacking strategic argument that AI lowers marginal attacker costs and compresses decision cycles opinion/vendor commentary; speculative in places; not bot-specific and not empirical Cannot verify Mythos claims, quantify attacker adoption, or support pricing/prevalence claims
SRC-102 Commercial automation cost stack modular purchasability and approximate cost of scraping APIs, proxies, CAPTCHA solving, managed scraping, and SMS verification pricing/market snapshots; not observed abuse; effectiveness and terms vary; sources are vendor or vendor-adjacent Cannot prove malicious use, success rates, legal compliance, or that any component works against a specific target

Signals and techniques cross-index

The view the flat register lacked: which sources cover which technical material. Membership is migrated from the inventory’s signals / techniques column; treat as a starting index to be completed by the backfill pass.

signal / technique family sources
TLS / network fingerprints (JA3/JA4) SRC-002, SRC-018, SRC-024, SRC-029, SRC-030, SRC-031, SRC-045, SRC-046, SRC-047, SRC-057, SRC-066, SRC-077, SRC-090, SRC-092, SRC-096, SRC-102
Browser fingerprinting (JS / canvas / attributes / scripts) SRC-016, SRC-017, SRC-023, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-055, SRC-057, SRC-063, SRC-066, SRC-067, SRC-075, SRC-077, SRC-079, SRC-091
Behavioural — mouse / touch dynamics SRC-011, SRC-013, SRC-019, SRC-020, SRC-030, SRC-033, SRC-034, SRC-039, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-068, SRC-084, SRC-085
Behavioural — request/session timing and navigation patterns SRC-027, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-051, SRC-055, SRC-057, SRC-068, SRC-073, SRC-077, SRC-080, SRC-081, SRC-083, SRC-084, SRC-088, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100
ML detection methods (supervised/unsupervised, boosting, CNN) SRC-006, SRC-011, SRC-018, SRC-019, SRC-033, SRC-047, SRC-057, SRC-067, SRC-082, SRC-083, SRC-085
Adversarial evasion (GAN / RL) SRC-011, SRC-013, SRC-014, SRC-020, SRC-047
Fingerprint-inconsistency / evasion detection SRC-015, SRC-024, SRC-030, SRC-031, SRC-033, SRC-034, SRC-039, SRC-040, SRC-045, SRC-046, SRC-047, SRC-056, SRC-057, SRC-077, SRC-079
Browser automation & stealth layers SRC-021, SRC-022, SRC-023, SRC-024, SRC-028, SRC-030, SRC-031, SRC-033, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-041, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-053, SRC-077, SRC-079, SRC-084, SRC-085, SRC-086, SRC-091
HTTP headers / header order / protocol details SRC-002, SRC-003, SRC-027, SRC-029, SRC-031, SRC-037, SRC-039, SRC-040, SRC-041, SRC-042, SRC-045, SRC-046, SRC-047, SRC-051, SRC-053, SRC-056, SRC-057, SRC-059, SRC-060, SRC-061, SRC-063, SRC-064, SRC-065, SRC-066, SRC-069, SRC-071, SRC-073, SRC-077, SRC-083, SRC-090, SRC-092, SRC-093, SRC-094, SRC-098, SRC-099
Cookies / session persistence SRC-002, SRC-028, SRC-029, SRC-031, SRC-034, SRC-039, SRC-041, SRC-045, SRC-047, SRC-055, SRC-061, SRC-062, SRC-065, SRC-066, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-075, SRC-077, SRC-083, SRC-091, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099
Proxy / infrastructure / cloud browsers SRC-024, SRC-025, SRC-026, SRC-029, SRC-030, SRC-031, SRC-032, SRC-033, SRC-034, SRC-036, SRC-039, SRC-040, SRC-041, SRC-043, SRC-044, SRC-045, SRC-046, SRC-047, SRC-052, SRC-053, SRC-061, SRC-065, SRC-086, SRC-087, SRC-088, SRC-089, SRC-096, SRC-099, SRC-100, SRC-102
AI-agent signals & governance SRC-008, SRC-009, SRC-010, SRC-026, SRC-032, SRC-033, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-053, SRC-054, SRC-058, SRC-082, SRC-084, SRC-085, SRC-086, SRC-098, SRC-099, SRC-100, SRC-101
Intent / journey / score-based detection SRC-002, SRC-003, SRC-004, SRC-009, SRC-010, SRC-027, SRC-030, SRC-031, SRC-034, SRC-035, SRC-036, SRC-037, SRC-038, SRC-039, SRC-040, SRC-047, SRC-049, SRC-057, SRC-058, SRC-079, SRC-080, SRC-081, SRC-093
Challenge-response / CAPTCHA / proof-of-work SRC-009, SRC-020, SRC-030, SRC-034, SRC-035, SRC-039, SRC-041, SRC-042, SRC-043, SRC-045, SRC-046, SRC-047, SRC-049, SRC-055, SRC-058, SRC-068, SRC-070, SRC-071, SRC-080, SRC-081, SRC-084, SRC-085, SRC-093, SRC-100, SRC-102
Taxonomy / canonical threat vocabulary SRC-001, SRC-027
Countermeasure classes / symptoms SRC-027, SRC-032, SRC-034, SRC-035, SRC-039, SRC-040, SRC-042, SRC-047, SRC-049, SRC-051, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-068, SRC-070, SRC-073, SRC-079, SRC-083, SRC-093
API abuse / API endpoint visibility SRC-039, SRC-040, SRC-041, SRC-053, SRC-069, SRC-083, SRC-093, SRC-094, SRC-098, SRC-099
Credential / account-abuse signals SRC-034, SRC-039, SRC-047, SRC-048, SRC-049, SRC-052, SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-080, SRC-081, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099, SRC-100, SRC-102
Scarce-resource / inventory-abuse signals SRC-035, SRC-039, SRC-049, SRC-050, SRC-080, SRC-081, SRC-088, SRC-093, SRC-100, SRC-102
Standards / canonical reference (standards / reference-doc) SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-066, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093
Scraper training / practice environments SRC-041
Commercial scraping / scraping-as-a-service SRC-043, SRC-044, SRC-045, SRC-053, SRC-087, SRC-088, SRC-102
Legal / governance boundary for scraping and bot detection SRC-042, SRC-047, SRC-050, SRC-066, SRC-067, SRC-073, SRC-080, SRC-081, SRC-082, SRC-093, SRC-097
Defensive tooling / WAF / blocklists SRC-051, SRC-052, SRC-054, SRC-055, SRC-056, SRC-057, SRC-058, SRC-073, SRC-079, SRC-083, SRC-093, SRC-094, SRC-095, SRC-098, SRC-099
Honey accounts / honeypots / honeytokens SRC-048
Network-origin / IP / ASN reputation SRC-045, SRC-047, SRC-052, SRC-087, SRC-088, SRC-089, SRC-096, SRC-100, SRC-102
AI crawlers / content-access governance SRC-032, SRC-039, SRC-054, SRC-058, SRC-086
HTTP foundations / web basics SRC-059, SRC-060, SRC-061, SRC-062, SRC-063, SRC-064, SRC-065, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-090, SRC-091, SRC-092, SRC-093
CORS / browser security boundaries SRC-059
Caching / conditional requests SRC-060
Browser-fingerprinting surveys / privacy measurement SRC-016, SRC-017, SRC-066, SRC-067, SRC-075, SRC-077
Authentication / account-abuse foundations SRC-061, SRC-062, SRC-069, SRC-070, SRC-071, SRC-072, SRC-073, SRC-083, SRC-093
Survey/data-quality abuse and form-quality controls SRC-068
Client-side detection code protection / obfuscation SRC-075, SRC-076, SRC-077, SRC-078, SRC-079
Residential / peer-proxy ecosystem SRC-087, SRC-088, SRC-089, SRC-100, SRC-102
CAPTCHA-solving / solver ecosystem SRC-084, SRC-085
Legal / hearing evidence for ticket bots SRC-050, SRC-080, SRC-081
Browser architecture / browser-native automation foundations SRC-075, SRC-077, SRC-091
API security / business-logic controls SRC-083, SRC-093, SRC-094, SRC-099
Cloud / SaaS / identity abuse SRC-094, SRC-095, SRC-098, SRC-099
Threat infrastructure / bulletproof hosting / RDP SRC-096, SRC-099
Security economics / cost-of-abuse framing SRC-097, SRC-100, SRC-101, SRC-102
Commercial automation cost stack SRC-100, SRC-102
Government advisories / official TTP guidance SRC-094, SRC-095

Scarce-resource abuse index

Scarce-resource abuse is a cross-cutting tag family for sources about competition over a limited transactional resource. It is not a fifth category: sources still belong to foundations, vendor, academic, or threat-surface.

Rows are added only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Otherwise these fields are not applicable.

id tags scarce_resource_targeted abuse_phase website_facing_action evidence_of_use abuse_outcome
SRC-035 scarce-resource-abuse; slot-sniping; limited-inventory; appointment-abuse; inventory-hoarding; booking-flow-abuse; availability-polling; cancellation-monitoring; fast-booking; auto-booking; slot-resale appointment monitoring / booking / holding / resale / cancellation exploitation polling availability / completing booking / holding inventory / reselling observed-use ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness / operational load
SRC-039 scarce-resource-abuse; limited-inventory; inventory-hoarding; denial-of-inventory; scalping; booking-flow-abuse; availability-polling booking / product / reservation monitoring / booking / holding polling availability / completing booking / holding inventory vendor-measured inventory unavailable / distorted metrics / operational load
SRC-049 scarce-resource-abuse; ticketing-abuse; limited-inventory; scalping; queue-abuse; account-preparation; ticket-resale; booking-flow-abuse; availability-polling; fast-booking ticket account preparation / queue entry / monitoring / booking / resale creating accounts / entering queue / polling availability / completing booking / reselling capability-only ordinary users blocked / inventory unavailable / inflated resale price / degraded fairness
SRC-050 scarce-resource-abuse; ticketing-abuse; limited-inventory; inventory-hoarding; scalping; purchase-limit-circumvention; account-preparation; ticket-resale ticket account preparation / booking / holding / resale automated search and reservation / completing booking / using accounts and payment identities / reselling legal-record inventory unavailable / inflated resale price / degraded fairness
SRC-100 scarce-resource-abuse; limited-inventory; scalping; automated-checkout; account-preparation; reselling-communities; verification-bypass product / ticket / booking account preparation / monitoring / checkout / resale using accounts / bypassing verification / completing checkout / reselling vendor-measured / market-evidence inventory unavailable / inflated resale price / degraded fairness
SRC-102 scarce-resource-abuse; limited-inventory; indirect-cost-stack; CAPTCHA-solving; proxies; temporary-SMS; scraping-APIs product / ticket / appointment / booking (indirect) monitoring / account preparation / challenge solving / booking support polling availability / solving challenge / account verification / completing booking support market-evidence / capability-only indirect capability support only; no specific abuse outcome evidenced

Read and rejected

Recorded so they aren’t re-read (EVIDENCE-REVIEW.md §6). Both are title-collision retrieval artefacts — “Actions Speak Louder than Words” papers pulled in by string match, unrelated to bots.

id source org / authors year reason rejected entry file
SRC-R01 Loyalty Program Building Blocks (Economics and Sociology) Kwiatek et al. 2018 Marketing/consumer-perception study. No bot/abuse content. Out of scope; keep only if a loyalty-abuse adjacency is later added. kwiatek-2018-actions-speak-louder-loyalty-program-building-blocks.md
SRC-R02 Figurative Language & Gesturing in Entrepreneurial Pitches (AMJ) Healey et al. 2018 Communication/persuasion study. No bot/abuse content. Out of scope; possible use only for a dissemination/communication note. healey-2018-actions-speak-louder-than-words-entrepreneurial-pitches.md

Queued

Not yet read or not yet extracted as a distinct source. This queue has been pruned so it no longer lists sources already represented by SRC-027, SRC-034, SRC-039, SRC-046, SRC-066, SRC-067, SRC-069–SRC-073, or SRC-079–SRC-093.

Highest-priority gaps

source / area category why flagged
Remaining foundations primers: browser storage, DNS/TLS/CDN basics, IP/network identity foundations MDN HTTP/CORS/cookies/header basics are represented by SRC-059…SRC-065, PortSwigger authentication foundations by SRC-069…SRC-072, NIST/OWASP control standards by SRC-073 and SRC-093, and HTTP/2/3 RFCs by SRC-090/SRC-092; remaining need is DNS/TLS/CDN, browser storage, and IP/network identity foundations.
Independent in-the-wild bot, credential-stuffing, ad-fraud, fake-account, or scraping measurement studies academic / threat-surface Observed-use lane remains thinner than capability/vendor evidence; independent measurement is highest value.
Victim/operator engineering postmortems from platforms affected by scraping, credential stuffing, account creation, booking abuse, or crawler pressure threat-surface Balances vendor telemetry with first-party target/operator accounts.
Primary legal/enforcement records: BOTS Act complaints/orders, UK ticketing/consumer-law material, DVSA booking terms, regulator guidance where legal claims become load-bearing legal-record / governance Vendor legal explainers are not enough for legal claims.

Useful but second-order

source / area category why flagged
OpenAI agent / Operator documentation and safety material; Anthropic Claude computer-use / browser-use material; Browser Use and Skyvern docs vendor / threat-surface Needed to represent agent-builder framing rather than only defender-vendor framing.
Akamai, Imperva/Thales, F5 technical docs for bot management, WAF controls, API-security controls, and exposed rule/score/control-plane fields vendor Cloudflare control-plane docs are now represented by SRC-003, SRC-054…SRC-058; non-Cloudflare technical docs remain incomplete.
Anti-detect browsers and stealth tooling as distinct entries: Multilogin, GoLogin, Camoufox, Nodriver, SeleniumBase UC Mode threat-surface Currently mostly present through secondary mentions and catalogues, not their own source entries.
Browser-extension and userscript automation material: Tampermonkey/userscripts, browser add-ons used for page monitoring or form automation threat-surface Important for the “individual tool running inside the browser” / slot-sniping argument.
Public datasets for methodology investigations: ad-fraud, clickstream, login/session, credential-stuffing proxies, web-log, fraud graph datasets methodology / academic Needed to connect the written review to reproducible public-data investigations and to state framing distance clearly.
Additional Cloudflare Radar / AI crawler / bot traffic reports vendor telemetry Cloudflare product/capability docs are now represented; remaining Cloudflare gap is telemetry/prevalence.

Recently resolved from the old queue

old queued item resolved by note
OWASP Automated Threat Handbook full source SRC-027 Now extracted; still marked needs review because the extraction was provisional.
Imperva Bad Bot Report annual SRC-039 Covered via 2026 Thales / Imperva Bad Bot Report.
F5 Labs reports SRC-034 Credential-stuffing report extracted; further F5 technical docs can still be queued separately if needed.
niespodd/browser-fingerprinting GitHub catalogue SRC-046 Now extracted as its own threat-surface source.
Akamai financial-services report SRC-040 Vendor telemetry report extracted; technical docs remain queued separately.
Ticket-bot / scarce-resource enforcement example SRC-050 FTC BOTS Act source added; more primary legal records can still be useful.
MDN HTTP/CORS/cookies/header foundations SRC-059…SRC-065 Core MDN foundations now extracted; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity.
Cloudflare bot product/control-plane docs SRC-003, SRC-054…SRC-058 Capability layer now better represented; Cloudflare telemetry/Radar remains separate if needed.
Browser-fingerprinting survey / empirical update SRC-066, SRC-067 Laperdrix survey and Berke demographic-fingerprinting paper added; remaining fingerprinting work should be targeted rather than generic survey collection.
Web-bot detection privacy / methods review SRC-047 Re-extraction attached to existing Martínez Llamas row; use as methods/privacy/governance anchor, not observed-use evidence.
PortSwigger authentication foundations SRC-069…SRC-072 Worked authentication/OAuth/password-login sources added; remaining foundations should focus on DNS/TLS/CDN, browser storage, and IP/network identity.
NIST authentication/session standard SRC-073 SP 800-63B-4 added as authentication and session-management control foundation.
OWASP ASVS application-security controls SRC-093 ASVS 5.0 added as defensive-control foundation for anti-automation, business logic, auth, sessions, API validation, and logging.
HTTP/2 and HTTP/3 protocol standards SRC-090, SRC-092 RFC 9113/9114 added for current protocol claims; RFC 7540 kept as historical/obsolete HTTP/2 source.
Chromium browser architecture SRC-091 Multi-process architecture added for browser-native automation and browser architecture foundations.
Proxy ecosystem and residential proxy supply SRC-087, SRC-088, SRC-089 Commercial peer/residential proxy sources and independent proxy-ecosystem measurement added.
VM obfuscation and client-side bot-detection code protection SRC-076, SRC-078, SRC-079 General obfuscation taxonomy, deobfuscation counterweight, and DataDome VM-obfuscation source added.
Ticketmaster legal/hearing evidence SRC-080, SRC-081 Earlier Ticketmaster/Prestige legal case and 2023 Senate/Taylor Swift hearing source added.
CAPTCHA-solving ecosystem and GUI-agent CAPTCHA capability SRC-084, SRC-085 Commercial solver ecosystem and ReCAP GUI-agent CAPTCHA-solving paper added.
OpenClaw exposure measurement SRC-086 Bitsight exposure source added as complement to HUMAN OpenClaw traffic-abuse source.
API-security/business-logic overview SRC-083 Secondary API-security chapter added; OWASP/NIST/PortSwigger remain stronger control references.
Cloud/SaaS abuse and official identity-abuse advisories SRC-094, SRC-095, SRC-098, SRC-099 CISA Microsoft cloud, Scattered Spider, IBM X-Force, and Recorded Future entries added; still avoid treating these as bot-specific.
Bulletproof hosting / adversarial infrastructure measurement SRC-096 Censys RDP/BPH infrastructure source added.
Cybercrime economics and automation cost framing SRC-097, SRC-101, SRC-102 Anderson gives non-vendor economics framework; Summers is low-priority opinion; commercial cost stack gives availability/pricing context.
SaaSification of automated abuse infrastructure SRC-100, SRC-102 Kasada Q1 2026 and commercial automation cost stack added; cite as vendor/market evidence, not independent prevalence.

Appendices

Register taxonomy

category — foundations / vendor / academic / threat-surface (EVIDENCE-REVIEW §2).

evidence basis — what kind of evidence the source actually is. This is the column that prevents a marketing claim being treated as equivalent to a study.

  • empirical-academic — controlled study or dataset experiment in a research setting.
  • empirical-operational — measurement against real or purchased traffic / a live honey site.
  • survey — practitioner/executive survey; self-reported.
  • vendor-claim — vendor marketing / efficacy / prevalence claims; vendor-measured, not independently verifiable.
  • capability-doc — product or platform documentation describing what a system exposes or can do.
  • tooling-readme — open-source tool README / docs; maintainer claims, not independent tests.
  • bypass-guide — scraper-side or evasion-side guidance describing ways to avoid blocking or align detection surfaces. High dual-use; cite only at technique-family level, not as a recipe.
  • methods-taxonomy — a categorisation of methods with no reproducible detail.
  • taxonomy — a canonical categorisation of the field (e.g. OWASP OAT).
  • threat-intel — vendor observation/threat reports.
  • legal-record — court filings, indictments, or enforcement actions. Used for technique and operational-proximity evidence only, with actor/campaign attribution stripped (EVIDENCE-REVIEW.md §3).
  • legal-explainer — non-authoritative legal/compliance explainer or guidance source. Treat as context only; load-bearing legal claims require primary law, regulator guidance, specialist legal analysis, or legal records.
  • primary-law / regulator-guidance — primary legal text or regulator guidance used only under the regulatory-constraint lane. Treat as jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.
  • reference-doc — neutral technical reference documentation used for foundations/protocol concepts. Not evidence of abuse, prevalence, or defensive performance.
  • standards-reference / protocol-standard — normative standards or specifications used for control/protocol foundations. Not evidence of abuse or control effectiveness.
  • control-requirements / authentication-guidance / defensive-guidance — defensive requirements or implementation guidance. Useful for control vocabulary; not proof that controls work in a given deployment.
  • browser-architecture reference / architecture analysis — architecture explanations used to understand browser/runtime/client-server boundaries. Not telemetry or detection evidence.
  • empirical-method demonstration — method evaluation in a bounded research setting. Rigour may be high, but operational proximity remains limited unless it measures real-world abuse.
  • empirical-exposure measurement / infrastructure-analysis — measurement of exposed services or infrastructure such as proxies, not necessarily measurement of confirmed abuse.
  • vendor product announcement / tutorial ecosystem — vendor or adjacent ecosystem material documenting available capabilities or workflow culture; treat as capability evidence unless independently measured.
  • review — literature review or survey source synthesising prior work. Use for academic/foundation surveys where the source is not itself measuring live abuse.
  • empirical-measurement — original measurement or dataset paper that measures a relevant phenomenon but not necessarily bot abuse directly.
  • dataset paper — source whose primary contribution is a dataset or dataset-linked measurement.
  • empirical-methods — empirical methods paper or case study evaluating controls/processes in a bounded setting.
  • review-informed case study — case study grounded in prior literature but not designed as a controlled comparative experiment.
  • control-guidance — defensive implementation guidance or checklist. Useful for controls vocabulary, not proof of control efficacy.
  • vulnerability-taxonomy — educational or reference taxonomy of weakness classes; not prevalence evidence.

operational proximity — how close the source sits to observed abuse against a real target. Orthogonal to evidence basis (which records source type/rigour); the two are tracked separately so a rigorous lab result and a vendor blog are not flattened onto one axis. Where a source mixes levels (e.g. a vendor report carrying both capability claims and production telemetry), the cell records the highest level the source independently supports, with a parenthetical caveat.

  • capability — establishes only that a tool, technique, or capability exists or is feasible. Includes documentation, tool READMEs, and controlled academic PoCs — a lab demonstration that an evasion works is still not observation of real-target abuse; its rigour lives in evidence basis, not here.
  • claimed — an interested party asserts the capability is used or works against targets, without independent observation (vendor “we stop X” claims, bypass-vendor “works against Y” claims, self-report surveys).
  • observed — the activity has been seen against a real or realistic target, but not cleanly or independently quantified (vendor telemetry reports — vendor-measured; victim engineering postmortems; enforcement/legal records describing technique).
  • measured — controlled or operational measurement quantifying the activity against a real or realistic target (honey-site experiments; in-the-wild measurement studies).
  • n/a — the source is a taxonomy or a non-bot-use foundation; the axis does not apply.
  • control — the source is control guidance or a requirements standard. It can say what should be built or verified, but not whether the control works against a specific live threat.
  • measured-but-bounded — a controlled benchmark or method evaluation with quantitative results, but outside the project’s core live web-abuse setting.
  • observed-claim / observed-exposure — public testimony, legal/hearing claims, or exposure scans. Useful as operational-proximity evidence, but weaker than independent measurement of confirmed abuse.

Migrated and pre-v3 rows carry provisional proximity values assigned from the one-line summaries; they inherit the row’s standing review state and are part of the same entry-file backfill, not yet reviewed.

Scarce-resource abuse tags — a cross-cutting tag family, not a top-level category. Apply scarce-resource-abuse as the umbrella tag when a source concerns scarce transactional resources, and add the more specific tags supported by the source:

  • slot-sniping
  • limited-inventory
  • appointment-abuse
  • reservation-abuse
  • ticketing-abuse
  • inventory-hoarding
  • denial-of-inventory
  • scalping
  • queue-abuse
  • booking-flow-abuse
  • availability-polling
  • cancellation-monitoring
  • fast-booking
  • auto-booking
  • slot-resale
  • ticket-resale
  • reservation-resale
  • booking-transfer
  • account-preparation

Scarce-resource abuse fields — conditional fields for sources tagged scarce-resource-abuse. They apply only when a source concerns appointment, ticketing, reservation, product-drop, queueing, cancellation-monitoring, booking-flow, inventory-hoarding, or limited-inventory abuse. Availability polling may be scraping-like, but the abuse pattern is competition for a scarce transactional resource, so do not collapse it into generic scraping.

  • scarce_resource_targeted — appointment / ticket / reservation / product / booking / queue position / other.
  • abuse_phase — monitoring / account preparation / queue entry / booking / holding / transfer / resale / no-show / cancellation exploitation.
  • website_facing_action — polling availability / entering queue / solving challenge / completing booking / holding inventory / changing booking / transferring booking / reselling.
  • evidence_of_use — measured-use / observed-use / vendor-measured / legal-record / regulatory-record / market-evidence / capability-only / controlled-PoC. This is the scarce-resource-specific use classification; it does not replace operational proximity, which remains the broader corpus-level capability-to-use axis.
  • abuse_outcome — ordinary users blocked / inventory unavailable / inflated resale price / no-show / degraded fairness / distorted metrics / operational load.

Regulatory-constraint tag and fields — conditional vocabulary for sources admitted under EVIDENCE-REVIEW.md §2.6. This is a cross-cutting lane, not a category. Apply regulatory-constraint only when the source is being read for how a rule constrains a technique family.

  • jurisdiction — UK / EU / US / Canada / Australia / other.
  • currency — free text; must include an as-of date and the caveat subject to change; not verified current. Honest token if unknown: as-of unknown — verify before use.
  • constrains_technique — the technique family the rule bears on, preferably matching an existing signals-and-techniques cross-index row such as browser fingerprinting, cookies/session persistence, behavioural signals, scraping/access control, or AI crawlers/content-access governance.
  • operational proximity — always n/a for regulatory-constraint entries. These sources explain technique-deployment constraints; they are not evidence of abuse prevalence, bot behaviour, or control effectiveness.

provenance — extraction agent and model, from the entry’s run-metadata block (e.g. Claude Code / Opus 4.8). not recorded for rows migrated before the v2 prompt. Where a source has several extraction files (different prompt versions or agents), provenance lists each.

reconciliation — whether a source’s row points at one extraction or several. Tagged in the entry file cell.

  • [single] — one extraction file.
  • [multiple — unreconciled] — two or more extraction files of the same source (e.g. <slug>.md + <slug>.v2.md, or two agents) not yet reconciled. canonical for citation = latest version.
  • [combined] — a <slug>.combined.md reconciliation exists. canonical for citation = the combined file; the source extractions are kept and listed.

review state

  • solid — extraction reviewed; sufficient for register use and citation.
  • conditional — usable for cautious register reference, but check the entry before quoting numbers, equations, or specific claims.
  • needs review — do not use without reading the entry / source.
  • migrated — review pending — carried over from the flat register; not yet reviewed against its entry file.

threat types — OWASP OAT categories where they map, else project vocabulary (scraping, credential stuffing, scalping, account takeover, click/ad fraud, carding). Not threat-specific for method/infrastructure-only sources.

Update log

Append-only. New entries at the bottom.

2026-06-02 — Register schema v2; migrated from working/reading-register.md. Replaced the flat four-column register (Reference / Status / Entry / Notes) with a structured projection of the extraction fields. Added: evidence basis, provenance, review state, and threat types columns to the inventory; a framing-distance ledger; a signals-and-techniques cross-index; and this controlled-vocabulary appendix. 26 in-scope sources (SRC-001…SRC-026) and 2 rejected (SRC-R01, SRC-R02) migrated. Provenance is not recorded for all migrated rows because the prior register did not track extraction agent/model; the v2 extraction prompt records it going forward. Framing-distance what it fails to represent, threat types, and several evidence basis/signals cells are stubbed tbd — backfill from entry pending a read of working/register-entries/.

2026-06-05 — Added SRC-027…SRC-031 from reviewed extraction entries. Added OWASP Automated Threat Handbook v1.3 (SRC-027, provenance not recorded, provisional draft), Medium Playwright cookies tutorial (SRC-028, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), RoundProxies Rnet tutorial (SRC-029, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), ScrapFly Cloudflare Turnstile post (SRC-030, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2), and ScrapFly Imperva/Incapsula post (SRC-031, ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v2). Added corresponding framing-distance rows and cross-index memberships; removed the now-stale queued OWASP Handbook row after representing it as SRC-027. Review state for all five is needs review; SRC-027 specifically requires review because its entry was produced without access to the repo scope docs.

2026-06-06 — Schema v3: added operational proximity axis; legal-record evidence basis. Formalised the capability-vs-use distinction — previously implicit in framing-distance prose and the threat-surface table note — into a queryable ordinal (capability / claimed / observed / measured / n/a), orthogonal to evidence basis, positioned after it in every inventory table. Added legal-record as an evidence basis for enforcement/court sources, admitted under a strict technique-not-attribution rule and a dual-use no-recipe rule (EVIDENCE-REVIEW.md §3; editorial enforcement in GOVERNANCE.md §4/§7). The source-extraction-prompt (v3) now emits the proximity field; register-update-prompt projects it. Proximity values across SRC-001…031 were assigned provisionally from the one-line summaries and inherit each row’s review state. First pass: the corpus concentrates at capability / claimed; observed is vendor-measured telemetry only (SRC-004, SRC-009, SRC-010); measured is essentially SRC-015 (honey-site) plus SRC-018 (weak labels). The five sources added 2026-06-05 sit at capability (SRC-027 taxonomy → n/a, SRC-028, SRC-029) and claimed (SRC-030, SRC-031 — the named-defender bypass writeups). In short the register evidences capability and market existence far more strongly than real-world prevalence — closing that is the new observed-use reading lane (Queued; EVIDENCE-REVIEW.md §2.5, §4).

2026-06-06 — Schema v3 extension: added scarce-resource abuse tags and conditional fields. Added scarce-resource-abuse as a cross-cutting umbrella tag, the specific tag vocabulary for appointment/ticketing/reservation/product-drop/queueing/booking/inventory abuse, and a conditional scarce-resource abuse index carrying scarce_resource_targeted, abuse_phase, website_facing_action, evidence_of_use, and abuse_outcome. This is schema support only; no source rows were added or reclassified.

2026-06-06 — Added SRC-032…SRC-040 from reviewed v3 extraction entries. Added Wikimedia crawler infrastructure account (SRC-032, threat-surface, Claude / Claude Opus 4.8 / source-extraction-prompt v3), Wang et al. FP-Agent (SRC-033, academic, Claude / Claude Opus 4.8 / source-extraction-prompt v3), F5 Labs credential-stuffing report (SRC-034, vendor, Claude / Claude Opus 4.8 / source-extraction-prompt v3), DVSA driving-test bot/resale post (SRC-035, threat-surface, Codex / GPT-5 / source-extraction-prompt v3), HUMAN OpenClaw (SRC-036), HUMAN Agentic Visibility (SRC-037), HUMAN State of Agentic Traffic May 2026 (SRC-038), Thales / Imperva 2026 Bad Bot Report (SRC-039), and Akamai financial-services security trends (SRC-040) (the HUMAN/Thales/Akamai entries from ChatGPT / GPT-5.5 Thinking / source-extraction-prompt v3). All are new distinct sources and [single] extractions. Added framing-distance rows, updated cross-index memberships, introduced API abuse / API endpoint visibility, Credential / account-abuse signals, and Scarce-resource / inventory-abuse signals, and added scarce-resource rows for SRC-035 plus SRC-039. Flags: SRC-032 uses threat-surface as a least-bad category for first-party operator evidence; SRC-039 has scarce-resource coverage but no dedicated scarce-resource block in the entry, so review the projected scarce-resource fields before relying on them; several supplied filenames retain (1) suffixes from the uploaded extraction files.

2026-06-06 — Added SRC-041…SRC-053 from reviewed v3 extraction entries. Added ScrapingBee scraper test sites (SRC-041, foundations), ScrapingBee legal guidelines (SRC-042, threat-surface after normalising the entry’s non-schema governance category), ScrapingBee advanced scraping techniques (SRC-043), ScrapingBee price-scraping tools (SRC-044), ScrapingBee PerimeterX/HUMAN bypass guide (SRC-045), niespodd browser-fingerprinting / anti-detection README (SRC-046), Martínez Llamas et al. GDPR/AI Act bot-detection review (SRC-047), Wardle honey-identity leaked-credential experiment (SRC-048), DataDome ticket-bot explainer (SRC-049), FTC first BOTS Act enforcement cases (SRC-050), StopBadBots SBB-WAF-Rules (SRC-051), Hamachek bad-ASN-list (SRC-052), and ScrapingBee web-scraping API product page (SRC-053). Added corresponding framing-distance rows, cross-index memberships, and scarce-resource rows for SRC-049 plus SRC-050. Schema note: added legal-explainer as an evidence-basis token for non-authoritative legal/compliance explainers. Normalisation flags: SRC-041 proximity was entered as low and has been normalised to capability (training / sandbox); SRC-042 proximity was entered as context and has been normalised to n/a (legal context; not use evidence); SRC-052 remains observed but explicitly at the anecdotal/first-party floor. Several entries are high dual-use scraper-side sources and should be cited only at technique-family level, not as operational recipes.

2026-06-06 — Added SRC-054…SRC-065 from reviewed v3 extraction entries; updated SRC-003 as a re-extraction. Added Cloudflare Block AI Bots (SRC-054), Cloudflare Turnstile (SRC-055), Cloudflare Detection IDs (SRC-056), Cloudflare bot detection engines (SRC-057), Cloudflare bot solutions overview (SRC-058), MDN CORS (SRC-059), MDN HTTP caching (SRC-060), MDN HTTP authentication (SRC-061), MDN HTTP cookies (SRC-062), MDN User-Agent header (SRC-063), MDN HTTP headers (SRC-064), and MDN Overview of HTTP (SRC-065). Treated the new Cloudflare Bot Management entry as a re-extraction of existing SRC-003 rather than a new source row: the existing migrated row now lists both the legacy extraction file and the v3 extraction file as [multiple — unreconciled], with the v3 file as canonical pending reconciliation. Added framing-distance rows, cross-index memberships, and two new cross-index families (AI crawlers / content-access governance; HTTP foundations / web basics). Added reference-doc as an evidence-basis token for neutral technical foundation material. Normalisation flag: all MDN entries supplied foundational as proximity; normalised to n/a (foundational reference) because operational proximity is not applicable to non-abuse protocol references.

2026-06-06 — Added SRC-066…SRC-072 from reviewed v3 extraction entries; updated SRC-047 as a re-extraction. Added Laperdrix et al. browser-fingerprinting survey (SRC-066), Berke et al. browser-fingerprinting demographics/dataset paper (SRC-067), Sudbury & Marks online-survey bots / bad-data case study (SRC-068), and four PortSwigger Web Security Academy authentication/OAuth/password-login foundation entries (SRC-069…SRC-072). Attached martinez-llamas-2025-web-bot-detection-privacy-gdpr-ai-act-review(1).md as a re-extraction of existing SRC-047 rather than creating a duplicate row. Normalisation flags: foundational in the Laperdrix entry is represented as n/a (foundational survey); PortSwigger educational/security guidance is placed under Foundations rather than Vendor because it is used as worked reference material, not vendor evidence. Updated framing ledger, cross-index, and resolved queue notes.

2026-06-07 — Added SRC-073…SRC-093 from reviewed extraction entries. Added NIST SP 800-63B-4 authentication/authenticator-management standard (SRC-073), Sajid et al. hooking-based deception preprint (SRC-074), Cao browser-principal dissertation (SRC-075), Xu et al. layered obfuscation taxonomy (SRC-076), Tschacher bot-detection architecture explainer (SRC-077), Sudhir et al. Pushan deobfuscation preprint (SRC-078), DataDome VM-based obfuscation announcement (SRC-079), Ticketmaster v. Prestige legal/settlement source family (SRC-080), U.S. Senate Ticketmaster/Taylor Swift hearing source family (SRC-081), Kolhar & Sridevi AI/ML cybersecurity background (SRC-082), Kolhar & Gundoor API security chapter (SRC-083), commercial CAPTCHA-solving API ecosystem (SRC-084), Chen et al. ReCAP GUI-agent CAPTCHA paper (SRC-085), Bitsight OpenClaw exposure measurement (SRC-086), Infatica P2B SDK residential proxy source (SRC-087), Bright Data residential proxy market source (SRC-088), Choi et al. proxy-ecosystem measurement (SRC-089), RFC 9113/9114 HTTP/2 and HTTP/3 protocol foundations (SRC-090), Chromium multi-process architecture (SRC-091), RFC 7540 historical HTTP/2 standard (SRC-092), and OWASP ASVS 5.0 (SRC-093). Added corresponding framing-distance rows, cross-index memberships, new cross-index families for client-side obfuscation, residential/peer proxies, CAPTCHA-solving, ticket-bot legal/hearing evidence, browser architecture, and API/business-logic controls, plus scarce-resource rows for SRC-080, SRC-081, SRC-088, and SRC-093. Normalisation flags: RFC 7540 is retained as historical/obsolete because RFC 9113 supersedes it for current HTTP/2 claims; broad AI/ML cybersecurity background (SRC-082) is low-priority and should not carry bot-specific claims; the two Ticketmaster rows are legal/hearing evidence and should be cited as allegations/testimony rather than independent technical measurement. 2026-06-07 — Added SRC-094…SRC-102 from the cloud/SaaS, adversarial-infrastructure, cost-economics, and commercial automation batch. Added CISA Microsoft cloud post-compromise advisory (SRC-094), FBI/CISA Scattered Spider advisory (SRC-095), Censys bulletproof-hosting/RDP infrastructure measurement (SRC-096), Anderson et al. cybercrime-cost framework (SRC-097, duplicate upload deduped), IBM X-Force Threat Intelligence Index 2026 (SRC-098, raw TXT/HTML source family; extraction still needed), Recorded Future cloud/SaaS abuse landscape (SRC-099), Kasada Q1 2026 threat-enablers report (SRC-100), Summers/Netwrix AI attacker-cost commentary (SRC-101), and the commercial automation cost-stack cluster (SRC-102). Added framing-distance rows, cross-index updates, new cross-index families for cloud/SaaS identity abuse, threat infrastructure, official advisories, and security/cost economics, plus scarce-resource rows for SRC-100 and SRC-102. Normalisation flags: EVIDENCE-REVIEW(6).md was treated as a scope document rather than a source row; Anderson (1)/(2) are identical and were deduped; IBM TXT/HTML files were represented as one provisional raw-source row rather than a reviewed extraction entry.

2026-06-12 — Schema v3 extension: added regulatory-constraint lane vocabulary. Added regulatory-constraint as a cross-cutting tag for sources read only as constraints on technique deployment, plus conditional fields jurisdiction, currency, and constrains_technique. Added primary-law / regulator-guidance evidence-basis tokens for primary legal text and regulator guidance under this lane. Regulatory-constraint entries are always operational proximity: n/a, jurisdiction-bound, time-varying, non-load-bearing, and not legal advice.

 

Written content: CC BY 4.0  |  Code: MIT