← Back to portfolio

Architecture

Passive Pipeline — Email Ingest
Job Alert Emails
Gmail triggers
Processor
Node.js on Hetzner
Dedup
Notion DB query
Create Entry
Notion API
Active Pipeline — ATS Discovery
SearXNG
Layer 1 · self-hosted
Brave Search
Layer 2 · fallback
URL Match
any ATS platform
Dedup
Halt-on-failure
Create Entry
Notion API
JD Enrichment — Multi-Fallback Fetch
markdown.new
Primary
defuddle.md
Fallback 1
r.jina.ai
Fallback 2
Scrapling
Fallback 3
Write JD
Notion page
5-Stage Cron Pipeline — Per Batch (2 active: Noon + Evening)
Intake
SKILL-intake.md
30 min
Enrich
SKILL-enrich.md
15 min
Triage-1
Filter · hard rules
15 min
Triage-2
Assess · rules 1–13
15 min
Triage-3
Decide · Telegram

Agents

Agent 1 — OpenClaw (Kimi K2.5)

Kimi K2.5 (Moonshot AI) · OpenAI fallback · Self-hosted via OpenClaw gateway

Autonomous triage agent. Reads per-stage reference files (SKILL-intake.md, SKILL-enrich.md, SKILL-triage-1-filter.md, SKILL-triage-2-assess.md, SKILL-triage-3-decide.md) split from the original 2,300-line SKILL.md to prevent context window overflow. Applies 13 calibration rules, scores incoming roles, and updates the Notion job tracker. 4 batches defined (5:30, 12:00, 16:00, 20:00); 2 currently active (Noon, Evening). Each batch = 5 pipeline stages staggered 15–30 min apart. Triage refuses to process any item where JD Enriched ≠ true.

Agent 2 — Claude

Claude Code · Anthropic API

Handles strategic work — CV adaptation, cover letter writing, interview prep, pipeline decisions. Reads the Agent Coordination Board at session start for context handoff.

Agent 3 — Claude Code on Server

Claude Code CLI · Anthropic API · CatClaw VPS

Infrastructure agent. Debugs OpenClaw config, maintains SKILL.md and split reference files, monitors service health, and runs pipeline diagnostics. References OPENCLAW-PIPELINE.md as shared state document across all three agents.

Coordination: Shared Notion board for async communication. Structured message format (agent name, timestamp, task/question). No human relay needed for routine operations. Luka intervenes only for judgment calls tagged 🔴. OPENCLAW-PIPELINE.md serves as shared state across all three agents — architecture map, key IDs, known-good states, and fixes log. All agents can read and update it. It references SKILL.md sections rather than duplicating them.

Production incident: zombie cron race condition (April 23, 2026)
🔴 INCIDENT — 2026-04-23 Zombie Cron Race Condition

WHAT HAPPENED
When triage was decomposed from SKILL-triage.md into 3
sub-stages, new cron jobs were created and the legacy monolithic
triage job was set to enabled: false in jobs.json. It was not
deleted.

OpenClaw's scheduler doesn't reliably respect enabled: false
when state.nextRunAtMs is still set in the job definition. The
scheduler saw a valid next-run timestamp and fired the job anyway.

RESULT
Legacy triage ran IN PARALLEL with the new 3-stage pipeline
every evening. The legacy job shortlisted a MYTRAFFIC "GTM
Engineer M/F" role at ⭐⭐⭐⭐⭐. The new pipeline would have
caught "fluent in French" as a hard language filter eliminate —
but the legacy job wrote its verdict to Notion first. Item was
already marked Decided before triage-1 (filter) reached it.

A second zombie (legacy enrichment) was also firing, duplicating
work with the active enrichment stage.

DETECTION
Manual audit: MYTRAFFIC shortlisted with no language filter note.
Cross-referencing cron logs showed two triage processes writing
to the same Notion entries within the same time window.

ROOT CAUSE
// ❌ What we did (unreliable):
{
  "name": "[LEGACY] Evening Triage (9:00 PM)",
  "enabled": false,           // scheduler ignores this
  "state": {
    "nextRunAtMs": 1745528400000  // ...but honors this
  }
}

// ✅ What works:
// Job deleted from jobs.json entirely.
// No entry = no execution. No ambiguity.

FIX
Deleted both legacy jobs from jobs.json. Set MYTRAFFIC to
Eliminated with explanation of the false positive.

LESSON
"Delete, don't disable." In cron systems where job state persists
alongside config, a disabled flag can be silently overridden by
stale scheduler state. The only reliable decommission is removal.

Key Engineering Decisions

Metrics

20+
items processed / day
700+
items processed, 3 duplicates
13
calibration rules, evolved from 4
3
coordinating agents
21
cron jobs defined (11 active)
Autonomous
with human-in-the-loop

Tech Stack

Claude Code Anthropic API Kimi K2.5 (Moonshot) OpenAI API OpenClaw Node.js Bash Python Notion API Brave Search API Hetzner VPS Cloudflare Tunnel Cron Scheduling SearXNG Docker systemd

Engineering Patterns

The domain is personal automation, but the architecture patterns are general-purpose:

Changelog

The system evolves through real failures — each rule and safeguard exists because something broke in production.

Date Change Why
2026-04-26 Rule 13: Function-Modified Ops cap CX Ops role shortlisted despite automation being secondary to departmental outcomes. Roles where "Ops" follows a department name now capped at ⭐⭐⭐ unless automation IS the function.
2026-04-23 Triage decomposed into 3 sub-stages Monolithic triage caused context overflow and made it impossible to retry individual phases. Splitting into filter → assess → decide allows targeted re-runs and cleaner failure modes.
2026-04-23 Legacy cron zombies deleted Disabled jobs still executing due to stale nextRunAtMs timestamps — caused a race condition that shortlisted a role violating language rules.
2026-03-30 SearXNG replaces Brave as primary JD search Brave-only + ATS-only filter caused 100% failure when companies weren't on 5 whitelisted platforms. SearXNG (self-hosted) is now Layer 1; Brave is fallback. Any URL matching company name is accepted.
2026-03-30 Halt-on-failure dedup Notion query failures silently let duplicates through. Processor now exits entirely if dedup queries can't be verified — never assumes the pipeline is clean.
2026-03-27 Full autonomy — human-in-the-loop only for 🔴 decisions Moved from semi-manual triage to 4-batch cron with Telegram summaries. Luka reviews only items escalated with 🔴 urgency.