Floowed · Operations · Lex

The prospect warehouse, end to end

Every dataset enters through one door, lives in one database, gets enriched by waterfalls that remember what works, and exits through the channel it has earned. Rebuilt 2026-07-02→03. Counts are live as of 2026-07-03.

data flow guard / memory / human gate reject / suppress path
01Sources Anything can arrive; nothing skips the door.

Regulator scrapers

Central-bank & registry plugins, auto-discovered, run serially with per-source state.

78 sources · python -m agents.lex.sources

Dataset ingest

Drop a CSV/XLSX in data/lex/inbox/, or CLI / API. Header-fingerprint cache: a repeat format never re-asks.

276 header synonyms · LLM maps the residue

Attio (CRM)

Webhook in real time + full reconcile nightly. Deals, people, companies, notes mirror down.

webhook + 16:00 UTC full_sync

Platform & bookings

PostHog identities every 15 min; Vera/Atlas rosters via Lex API; Cal.com bookings via Make → Attio → here.

identity poll 15 min
one door
02Ingestion gates Anything that reaches “ready” passed every gate — that is the trust rule.
receivedparse, sniff encoding, find the real header row
mappedcolumns → canonical fields; ≥0.8 confidence auto-approves
normalizednames, countries, domains-from-email, free-mail guard
dedupedexternal id → domain → normalized name + country
classifiedcategory · sub-category · lending products · ICP score
readyQA report per batch; enrichment queued
rejected, counted per reason no_name · name_fragment · junk_row · branch_not_hq · no_handle — kept in rejected.csv, never silently dropped
03Warehouse (Supabase) Lex is the only writer. Agents read via one shared client or Lex’s API.

organizations

Canonical companies: taxonomy, ICP tier, soft-merge dedup, audit trail. 9.8k microfinance/rural · 9.8k NBFC · 3k banks · 2.6k fleet · 10.1k off-ICP.

35,306 rows · 116 countries

org_contacts

People with one canonical email verdict: valid · accept_all · risky · invalid · unverified.

22,503 rows · 6,699 valid / 1,230 invalid

segments

Persisted GTM slices — never orphan CSVs. Dynamic ones re-materialize nightly from their stored definition.

ph-rural-thrift-banks · 785 orgs · 124 in Attio

scoring_strategies + org_scores

Strategy-versioned opportunity scores; change strategy → re-score, history kept.

active: lending-midsize-t23-v1 · 30,848 scores

import_batches + source_runs

Every import and scraper run recorded: counts, rejects, QA, last-ok times.

state machine per batch

suppression + email_events

Bounces, unsubscribes, erasures. Fail-closed check before any send.

PDPA / DPA aware
what’s missing gets hunted
04Enrichment waterfalls Cheap → expensive. The queue is the database: quota walls pause work, nothing crashes.
WaterfallSteps, in cost orderTarget
W1 domaincontact-email backfill → registry → Brave/SerpAPI discovery → Claude fleet64% of orgs lack one
W2 email-findwebsite scrape → pattern-guess → Prospeo → Icypeas2–3 decision-makers per org (the target, not the gate)
W3 verifyfree gates (MX, syntax, suppression) → Reoon → Scrubby catch-all resolveone canonical verdict
W4 linkedinsite scrape → search → persona researchfallback channel
W5 peopleregistry rosters → leadership pages → persona search → Claude fleetoperator > president

Vendor memory (the guard on every step)

  • Never twice: a permanent miss on the same vendor + target is never retried; a changed input re-opens it.
  • Cooldowns: transient errors back off 24 h → 7 d.
  • Suspensions: a vendor below its hit-rate floor in a market gets benched 30 days, then probed.
  • Trials: new vendors earn their slot through a quota, measured against the incumbent.
  • Pre-seeded: MillionVerifier / PH starts suspended — the “100% risky” history is data now.
Claude fleet · throttled Headless research for decision-makers on priority orgs (341 pilot-gap orgs stamped). Dispatch ladder respects Michel’s idle time, 5-hour and weekly token breakers, and a daily org cap.
every contact earns a tier
05Outreach readiness → channels Ready = domain + ≥1 emailable decision-maker: 585 orgs today, 148 in the pilot segment.
T1

michel@ small-batch 1:1

Verdict = valid, vendor-verified, a real person (not a role inbox), senior title. The trust channel.

T2

EmailBison cold infra

Valid or accept-all with MX. Demotable per segment when bounce rates rise.

T3

The Memo (Buttondown)

Not invalid/disposable; role inboxes welcome. Tagged by segment — pilot: ~350 execs & credit/risk heads at PH rural + thrift banks, re-verified against the live warehouse at push time.

T4

LinkedIn queue (manual send)

No usable email but a profile — 147 senior contacts in the pilot alone.

Feedback loop Gmail NDR + channel bounces → email_events → suppression + channel health. Every bounce also raises the false-positive count of the vendor that called the address valid — bad verifiers demote themselves.
engaged prospects graduate
06Consumers The warehouse feeds everything; nothing feeds around it.

Attio graduation

Cold → engaged prospects promoted to the CRM with history attached.

Command Center

Lex dashboard: warehouse, enrichment, sync, email health, dedup, segments.

Outreach queues

1:1 builder, LinkedIn queue, EB campaigns — all select on readiness tiers.

Revenue sheet + Pulse

Nightly Google-Sheet refresh; PH call queue reads the leads view.

The nightly clock (UTC · SGT+8) Declarative job table; any job movable with LEX_JOB_<NAME>=HH:MM.
UTCJobWhat it does
16:00attio_syncFull CRM reconcile (companies, people, deals, notes)
16:15stage_syncDeal stages → contact stage (fixed 2026-07-02: had been silently failing)
16:30 / 16:45enrichment batchesOrg tier-0 pass, then Reoon contact verification
17:00scrubby_submitCatch-all resolution queue (results polled 10:00 + 22:00)
17:30 / 18:00data_quality / retentionQuality scoring + decay flags; PDPA retention enforcement
Mon 18:30dedupDuplicate-org detection into the review queue
19:30segments_refreshRe-materialize active dynamic segments
21:00sheets_syncRevenue tracker fresh before the 06:00 SGT brief
every 5–15 minwatchersIngest inbox scan · Gmail NDR bounce poll · PostHog identities