A production system where three AI agents coordinate autonomous tasks — signal ingestion, triage, enrichment, and structured output — across platforms without human relay.
Autonomous triage agent. Reads per-stage reference files (SKILL-intake.md, SKILL-enrich.md, SKILL-triage-1-filter.md, SKILL-triage-2-assess.md, SKILL-triage-3-decide.md) split from the original 2,300-line SKILL.md to prevent context window overflow. Applies 13 calibration rules, scores incoming roles, and updates the Notion job tracker. 4 batches defined (5:30, 12:00, 16:00, 20:00); 2 currently active (Noon, Evening). Each batch = 5 pipeline stages staggered 15–30 min apart. Triage refuses to process any item where JD Enriched ≠ true.
Handles strategic work — CV adaptation, cover letter writing, interview prep, pipeline decisions. Reads the Agent Coordination Board at session start for context handoff.
Infrastructure agent. Debugs OpenClaw config, maintains SKILL.md and split reference files, monitors service health, and runs pipeline diagnostics. References OPENCLAW-PIPELINE.md as shared state document across all three agents.
Coordination: Shared Notion board for async communication. Structured message format (agent name, timestamp, task/question). No human relay needed for routine operations. Luka intervenes only for judgment calls tagged 🔴. OPENCLAW-PIPELINE.md serves as shared state across all three agents — architecture map, key IDs, known-good states, and fixes log. All agents can read and update it. It references SKILL.md sections rather than duplicating them.
🔴 INCIDENT — 2026-04-23 Zombie Cron Race Condition
WHAT HAPPENED
When triage was decomposed from SKILL-triage.md into 3
sub-stages, new cron jobs were created and the legacy monolithic
triage job was set to enabled: false in jobs.json. It was not
deleted.
OpenClaw's scheduler doesn't reliably respect enabled: false
when state.nextRunAtMs is still set in the job definition. The
scheduler saw a valid next-run timestamp and fired the job anyway.
RESULT
Legacy triage ran IN PARALLEL with the new 3-stage pipeline
every evening. The legacy job shortlisted a MYTRAFFIC "GTM
Engineer M/F" role at ⭐⭐⭐⭐⭐. The new pipeline would have
caught "fluent in French" as a hard language filter eliminate —
but the legacy job wrote its verdict to Notion first. Item was
already marked Decided before triage-1 (filter) reached it.
A second zombie (legacy enrichment) was also firing, duplicating
work with the active enrichment stage.
DETECTION
Manual audit: MYTRAFFIC shortlisted with no language filter note.
Cross-referencing cron logs showed two triage processes writing
to the same Notion entries within the same time window.
ROOT CAUSE
// ❌ What we did (unreliable):
{
"name": "[LEGACY] Evening Triage (9:00 PM)",
"enabled": false, // scheduler ignores this
"state": {
"nextRunAtMs": 1745528400000 // ...but honors this
}
}
// ✅ What works:
// Job deleted from jobs.json entirely.
// No entry = no execution. No ambiguity.
FIX
Deleted both legacy jobs from jobs.json. Set MYTRAFFIC to
Eliminated with explanation of the false positive.
LESSON
"Delete, don't disable." In cron systems where job state persists
alongside config, a disabled flag can be silently overridden by
stale scheduler state. The only reliable decommission is removal.
// Load dedup data — HALT if either query fails
const urlResult = await loadNotionUrls();
const comboResult = await loadNotionCompanyTitles();
if (!urlResult.ok || !comboResult.ok) {
const errors = [];
if (!urlResult.ok) errors.push(`URL dedup: ${urlResult.error}`);
if (!comboResult.ok) errors.push(`Company+Title dedup: ${comboResult.error}`);
console.error(`\n🛑 DEDUP QUERY FAILED — HALTING PIPELINE`);
console.error(`Errors: ${errors.join('; ')}`);
console.error(`Reason: Cannot proceed without dedup — would create duplicate entries.`);
process.exit(1);
}
**Rule 6: Apply the "What Would Luka Actually DO All Day?" Test**
Before triaging, mentally simulate a typical day in the role.
If the answer is "build automations, connect APIs, deploy AI
workflows, improve internal tools" — that's a match regardless
of title. If the answer is "write SQL queries all day, build
dashboards in Looker, run A/B tests in Amplitude" — that's a
miss. The signal is in the verbs and tools in the
responsibilities section, not in the title or team name.
The domain is personal automation, but the architecture patterns are general-purpose:
The system evolves through real failures — each rule and safeguard exists because something broke in production.
| Date | Change | Why |
|---|---|---|
| 2026-04-26 | Rule 13: Function-Modified Ops cap | CX Ops role shortlisted despite automation being secondary to departmental outcomes. Roles where "Ops" follows a department name now capped at ⭐⭐⭐ unless automation IS the function. |
| 2026-04-23 | Triage decomposed into 3 sub-stages | Monolithic triage caused context overflow and made it impossible to retry individual phases. Splitting into filter → assess → decide allows targeted re-runs and cleaner failure modes. |
| 2026-04-23 | Legacy cron zombies deleted | Disabled jobs still executing due to stale nextRunAtMs timestamps — caused a race condition that shortlisted a role violating language rules. |
| 2026-03-30 | SearXNG replaces Brave as primary JD search | Brave-only + ATS-only filter caused 100% failure when companies weren't on 5 whitelisted platforms. SearXNG (self-hosted) is now Layer 1; Brave is fallback. Any URL matching company name is accepted. |
| 2026-03-30 | Halt-on-failure dedup | Notion query failures silently let duplicates through. Processor now exits entirely if dedup queries can't be verified — never assumes the pipeline is clean. |
| 2026-03-27 | Full autonomy — human-in-the-loop only for 🔴 decisions | Moved from semi-manual triage to 4-batch cron with Telegram summaries. Luka reviews only items escalated with 🔴 urgency. |