Betrieb_Observability_&_Notfälle

Betrieb Observability & Notfälle

Seitenregeln

Zweck und Abgrenzung

SLIs/SLOs

Beispiel‑SQL (SLIs):

-- Letztes Kursdatum je ISIN (XETR)
SELECT isin, MAX(date) AS last_price
FROM core.prices_daily
WHERE mic='XETR'
GROUP BY isin;

-- Letzter Marktstatus
SELECT MAX(date) AS last_market_state FROM core.market_state;

-- Job-Erfolgsquote
SELECT job_name,
  AVG(CASE WHEN status='ok' THEN 1 ELSE 0 END)::numeric(5,2) AS success_rate
FROM core.job_runs
GROUP BY job_name;

Monitoring/Metriken/Dashboards

Logging (Quellen, Struktur, Retention)

Alarmierung/On‑Call/Schwellwerte

Wartungsfenster/Standardaufgaben

Beispiel:

# In den Postgres-Container
docker exec -it n8n-n8n-postgres-1 psql -U n8n -d mf_app -c "VACUUM ANALYZE;"

Performance‑/Kapazitätsmanagement

Runbooks (häufige Tasks)

Incident‑Response (Störung, Eskalation, Post‑Mortem)

Hilfs‑SQL:

-- Stale Preise XETR > 3 Tage
SELECT s.isin, s.name, MAX(p.date) AS last_price
FROM core.securities s
LEFT JOIN core.prices_daily p ON p.isin=s.isin AND p.mic='XETR'
GROUP BY s.isin, s.name
HAVING MAX(p.date) < (CURRENT_DATE - INTERVAL '3 day');

-- Fehlgeschlagene Jobs
SELECT * FROM core.job_runs WHERE status IN ('fail','partial') ORDER BY started_at DESC;

DR/Restore‑Smoke‑Tests (Ergebnisse/Termine)