SYSTEM ARCHITECTURE

INTEL DATA PIPELINE

Automated multi-stage intelligence pipeline running on Cloudflare's global edge. Collects, deduplicates, classifies, fact-checks, and indexes conflict news every hour.

8STAGES

3AI MODELS

1hCYCLE TIME

0.85DEDUP SCORE

6REST APIs

SYSTEM ARCHITECTURE

EXTERNAL SERVICES

Gemini 3.1 Pro

Google Search grounding

Brave Search

Independent web index

CLOUDFLARE EDGE

Workers

API + Cron runtime

Workflows

Durable pipeline execution

Workers AI

BGE + Llama 3.1-8B

D1 SQL

SQLite edge DB

Vectorize

384-dim vector index

FRONTEND

React 19 SPA

war.trackit.today

Polls /api/intel every 30s

localStorage cache

PWA offline fallback

PIPELINE FLOW

Each run is a durable Cloudflare Workflow instance. Steps checkpoint automatically — if a step fails mid-run, only that step retries (not the entire pipeline).

CRON TRIGGER

Cloudflare Workers

Scheduled trigger fires every hour at :00 via Cloudflare's built-in cron system. Webhook-triggered on-demand runs also supported (POST /api/cron/trigger with Bearer token). Kicks off a new durable Workflow instance.

CRON: 0 * * * *EVERY HOURWEBHOOK MANUAL

↗ OUT: Workflow start signal

NEWS COLLECTION

Brave Search + Gemini 3 Flash

Primary: 3–4 parallel Brave Search queries (including a dynamic query built from recent critical headlines) return real-time web results, which Gemini 3 Flash structures into a JSON article array. Fallback: if Brave returns empty, Gemini uses Google Search grounding directly. Articles are tagged sourceType 'brave' or 'gemini' for downstream routing.

BRAVE SEARCH: 3-4 PARALLEL QUERIESGEMINI 3 FLASH (PRIMARY)GEMINI 3.1 FLASH LITE (FALLBACK)THINKING: LOW

↘ IN: Structured prompt↗ OUT: ~14 raw articles

EMBEDDING GENERATION

Workers AI — BGE-small-en-v1.5

Each article's title + first 500 characters are vectorized into a 384-dimensional dense embedding. Runs entirely on Cloudflare's edge inference — zero external network calls, sub-second latency for the full batch. Step timeout: 30 seconds.

@CF/BAAI/BGE-SMALL-EN-V1.5384-DIM VECTORSBATCH INFERENCE

↘ IN: 14 raw articles↗ OUT: 14 × 384-dim vectors

DEDUPLICATION

Cloudflare Vectorize

Each new embedding is queried against the persistent Vectorize index (growing over time). Articles scoring ≥ 0.85 cosine similarity to any previously seen article are silently dropped as duplicates. Only genuinely novel events advance. Step timeout: 15 seconds.

COSINE SIMILARITYTHRESHOLD: 0.85TOP-K: 1 QUERY

↘ IN: 14 articles + vectors↗ OUT: 2–5 unique articles

CLASSIFY + SUMMARIZE

Workers AI — Llama 3.1-8B Instruct

Each novel article is classified and dual-language summarized in a single LLM call: category (Military / Diplomatic / Nuclear / Economic), severity rating (1–5), tag array, English summary, Chinese summary (summary_zh), Chinese title (title_zh), and event_type. Step timeout: 5 minutes.

@CF/META/LLAMA-3.1-8B-INSTRUCTTEMP: 0.2MAX 2048 TOKENSJSON OUTPUT

↘ IN: Novel articles↗ OUT: Classified + translated

FACT CORROBORATION

Brave Search API

A targeted Brave Search query (article title + 'Iran Israel US military 2026') is executed for each article — but only for RSS and Gemini-sourced articles. Brave-sourced articles skip this step: querying Brave to corroborate an article that already came from Brave would be circular and waste API quota.

BRAVE SEARCH APILIVE WEB INDEXINDEPENDENT SOURCES

↘ IN: Article title + key terms↗ OUT: Independent news sources

FACT-CHECK

Workers AI — Llama 3.1-8B Instruct

LLM cross-references the original article against Brave Search results. Returns: status (verified / uncertain / disputed), confidence score (0–100), human-readable verification notes, and corroborating URLs. Disputed articles are silently dropped from the pipeline — never stored. Step timeout: 3 minutes.

VERIFIED | UNCERTAIN | DISPUTEDCONFIDENCE 0–100JSON OUTPUT

↘ IN: Article + web sources↗ OUT: Verified articles only

STORE + INDEX

D1 Database + Vectorize

Verified articles are written to D1 with all metadata. Embeddings are upserted to Vectorize so future runs can deduplicate against them. A live_events record is created (or replaced) for the Timeline view — INSERT OR REPLACE ensures re-processed articles get updated translations without leaving stale entries. Data is immediately available via the /api/intel REST endpoint. Step timeout: 30 seconds.

CLOUDFLARE D1 (SQLITE)VECTORIZE UPSERTLIVE EVENTS TABLE/api/intel

↘ IN: Verified articles↗ OUT: Persisted + indexed + served

TECHNOLOGY STACK

Cloudflare Workers

RUNTIME

Serverless compute at the edge. The entire backend runs as a single Workers bundle — zero server management, global distribution.

V8 IsolateEdge CDN< 1ms cold startCron + Webhook

Cloudflare Workflows

ORCHESTRATION

Long-running durable pipeline with automatic step-level retry and checkpointing. If a step fails, only that step retries — not the whole pipeline.

Durable executionAuto-retryStep checkpointingUp to 1yr lifespan

Workers AI

EDGE INFERENCE

On-edge ML inference runs directly inside the Worker runtime. No external API calls for embeddings or classification — single-digit millisecond overhead.

On-edge MLBGE embeddingsLlama 3.1-8BZero network hop

Cloudflare Vectorize

VECTOR DATABASE

Persistent vector database grows with every processed article. Enables semantic deduplication — not just exact-match but concept-level duplicate detection.

Persistent index384-dim BGECosine similarityANN queries

Cloudflare D1

SQL DATABASE

Edge-native SQLite. Stores all verified articles, analytics events, live timeline events, and oil/market data snapshots.

SQLite-compatibleEdge-native SQLSYD replicaREST API

Gemini 3.1 Pro

NEWS LLM

Google's frontier model with real-time web access for news collection. Thinking mode produces higher-quality article analysis and structured JSON output.

Google Search groundingThinking: HIGH50s timeout3-model fallback

Brave Search

FACT SEARCH

Independent search index (not Google/Bing-derived) used for fact corroboration. Provides unbiased, real-time corroborating sources for each article.

Independent indexNo SEO spamReal-time resultsREST API

React 19 + Vite

FRONTEND

SPA polls /api/intel every 30 seconds. Articles cached in localStorage for offline resilience. PWA-enabled — installable as a native-like app.

React 19Vite 6Lazy routes30s polling

ARTICLE DATA SCHEMA

D1 table articles — each row is a verified, deduplicated, AI-processed news article.

FIELD	TYPE	DESCRIPTION
`id`	TEXT (UUID)	Unique article identifier
`title`	TEXT	Original article headline
`content`	TEXT	Full article body text from source
`summary`	TEXT	AI-generated English summary (2–3 sentences)
`summary_zh`	TEXT	AI-generated Traditional Chinese summary
`title_zh`	TEXT	AI-translated Chinese headline
`category`	TEXT	Military \| Diplomatic \| Nuclear \| Economic
`event_type`	TEXT	airstrike \| missile \| diplomatic \| nuclear \| sanction \| ...
`tags`	JSON []	Array of keyword tags (e.g. ['Iran', 'IRGC', 'nuclear'])
`severity`	INT 1–5	Geopolitical significance score (5 = war-changing)
`source`	TEXT	News outlet name (e.g. AP News, Al Jazeera)
`source_url`	TEXT	Original article URL
`published_at`	DATETIME	Original publication timestamp
`fact_check_status`	TEXT	verified \| uncertain (disputed = dropped)
`fact_check_notes`	TEXT	AI verification reasoning and caveats
`brave_sources`	JSON []	Corroborating URLs from Brave Search
`created_at`	DATETIME	Database insertion timestamp

LIVE DATA SOURCES

Four external APIs feed this platform — news intelligence, financial signals, vessel tracking, and fact verification.

Yahoo Finance v8

FINANCIAL DATA

NO KEYFREE

Provides Brent crude oil spot price (BZ=F) and 22 defense stock quotes. Uses the undocumented v8/finance/chart endpoint — no API key required. Each ticker is fetched in parallel via Promise.allSettled so a single failure doesn't block the rest. Oil price is sanity-checked ($20–$300 range). 8-second timeout per request.

ENDPOINTquery2.finance.yahoo.com/v8/finance/chart/{symbol}

NOTENo API key · Unofficial endpoint · May be rate-limited

Brave Search API

FACT CORROBORATION

API KEYFREE

Independent web search index (not derived from Google or Bing) used in Pipeline step 05. A targeted query is fired for each article — title + 'Iran Israel US military 2026' — with freshness:pw (past week). Returns up to 8 corroborating results with title, URL, and snippet. Disputed articles are silently dropped.

ENDPOINTapi.search.brave.com/res/v1/web/search

NOTERequires API key (BRAVE_API_KEY) · Free tier available

AIS

AISStream.io

VESSEL TRACKING

API KEYFREEFALLBACK

Real-time AIS (Automatic Identification System) vessel data via WebSocket. Subscribes to the Strait of Hormuz bounding box (25–27.5°N, 55–57.5°E), collects for 15 seconds, then returns unique vessels by MMSI. Browser CORS blocks direct access — data is fetched by the Cloudflare Worker backend. Frontend falls back to a deterministic simulation engine when live data is unavailable.

ENDPOINTwss://stream.aisstream.io/v0/stream (WebSocket)

NOTERequires API key · Free tier · Backend-only (CORS blocked in browser)

Gemini API (Google)

NEWS INTELLIGENCE

API KEYPAID

Google's Gemini 3.1 Pro with real-time Google Search grounding is the primary news collection engine (Pipeline step 01). Thinking mode (HIGH) enables multi-step reasoning for better article synthesis. A 3-model fallback chain protects against outages: Gemini 3.1 Pro → 2.5 Flash (thinkingBudget 4096) → 2.0 Flash. Also used for classifying articles and generating bilingual summaries (step 04, via Workers AI Llama 3.1-8B).

ENDPOINTgenerativelanguage.googleapis.com/v1beta/models/{model}:generateContent

NOTERequires API key (GEMINI_API_KEY) · Paid · 50s timeout

REST API ENDPOINTS

Base URL: https://iran-war-intel.672rmwysbs.workers.dev

GET/api/intel

Latest verified articles (JSON array, sorted by created_at DESC)

GET/api/insights/oil

Brent crude oil price + Hormuz vessel count (Yahoo Finance v8 + Gemini)

GET/api/insights/market

Defense stock quotes (22 tickers: LMT, RTX, NOC, GD, BA, ...)

GET/api/live-events

Timeline events for the Timeline page (date, type, severity, summary)

POST/api/cron/triggerAUTH

Manually trigger the pipeline (returns article batch + market data)

GET/api/analytics/dashboard

Aggregated analytics stats (page views, events, top pages)