Every dataset enters through one door, lives in one database, gets enriched by waterfalls that remember what works, and exits through the channel it has earned. Rebuilt 2026-07-02→03. Counts are live as of 2026-07-03.
Central-bank & registry plugins, auto-discovered, run serially with per-source state.
78 sources · python -m agents.lex.sourcesDrop a CSV/XLSX in data/lex/inbox/, or CLI / API. Header-fingerprint cache: a repeat format never re-asks.
Webhook in real time + full reconcile nightly. Deals, people, companies, notes mirror down.
webhook + 16:00 UTC full_syncPostHog identities every 15 min; Vera/Atlas rosters via Lex API; Cal.com bookings via Make → Attio → here.
identity poll 15 minCanonical companies: taxonomy, ICP tier, soft-merge dedup, audit trail. 9.8k microfinance/rural · 9.8k NBFC · 3k banks · 2.6k fleet · 10.1k off-ICP.
35,306 rows · 116 countriesPeople with one canonical email verdict: valid · accept_all · risky · invalid · unverified.
22,503 rows · 6,699 valid / 1,230 invalidPersisted GTM slices — never orphan CSVs. Dynamic ones re-materialize nightly from their stored definition.
ph-rural-thrift-banks · 785 orgs · 124 in AttioStrategy-versioned opportunity scores; change strategy → re-score, history kept.
active: lending-midsize-t23-v1 · 30,848 scoresEvery import and scraper run recorded: counts, rejects, QA, last-ok times.
state machine per batchBounces, unsubscribes, erasures. Fail-closed check before any send.
PDPA / DPA aware| Waterfall | Steps, in cost order | Target |
|---|---|---|
| W1 domain | contact-email backfill → registry → Brave/SerpAPI discovery → Claude fleet | 64% of orgs lack one |
| W2 email-find | website scrape → pattern-guess → Prospeo → Icypeas | 2–3 decision-makers per org (the target, not the gate) |
| W3 verify | free gates (MX, syntax, suppression) → Reoon → Scrubby catch-all resolve | one canonical verdict |
| W4 linkedin | site scrape → search → persona research | fallback channel |
| W5 people | registry rosters → leadership pages → persona search → Claude fleet | operator > president |
Verdict = valid, vendor-verified, a real person (not a role inbox), senior title. The trust channel.
Valid or accept-all with MX. Demotable per segment when bounce rates rise.
Not invalid/disposable; role inboxes welcome. Tagged by segment — pilot: ~350 execs & credit/risk heads at PH rural + thrift banks, re-verified against the live warehouse at push time.
No usable email but a profile — 147 senior contacts in the pilot alone.
Cold → engaged prospects promoted to the CRM with history attached.
Lex dashboard: warehouse, enrichment, sync, email health, dedup, segments.
1:1 builder, LinkedIn queue, EB campaigns — all select on readiness tiers.
Nightly Google-Sheet refresh; PH call queue reads the leads view.
| UTC | Job | What it does |
|---|---|---|
| 16:00 | attio_sync | Full CRM reconcile (companies, people, deals, notes) |
| 16:15 | stage_sync | Deal stages → contact stage (fixed 2026-07-02: had been silently failing) |
| 16:30 / 16:45 | enrichment batches | Org tier-0 pass, then Reoon contact verification |
| 17:00 | scrubby_submit | Catch-all resolution queue (results polled 10:00 + 22:00) |
| 17:30 / 18:00 | data_quality / retention | Quality scoring + decay flags; PDPA retention enforcement |
| Mon 18:30 | dedup | Duplicate-org detection into the review queue |
| 19:30 | segments_refresh | Re-materialize active dynamic segments |
| 21:00 | sheets_sync | Revenue tracker fresh before the 06:00 SGT brief |
| every 5–15 min | watchers | Ingest inbox scan · Gmail NDR bounce poll · PostHog identities |