Camunda Incidents vs Errors vs Failures – Complete Guide for Developers
Camunda incidents vs errors vs failures are often confused. This guide explains differences, retry behavior, and best practices for handling workflow issues.
If you're working with Camunda, you’ve likely encountered terms like Incidents, Errors, and Failures.
But many developers confuse them.
👉 Are they the same?
👉 When does each occur?
👉 How should you handle them properly?
Let’s break it down clearly.
🔹 1. What is a Failure in Camunda?
A Failure occurs when:
A service task throws an exception
External task fails
Job execution fails
👉 Camunda automatically:
Retries the job (default 3 times)
Example:
API call fails
Database connection issue
👉 This is a temporary issue
🔹 2. What is an Incident?
An Incident is created when:
👉 All retries are exhausted
Key points:
No more automatic retries
Manual intervention required
Visible in Camunda Cockpit
Example:
Permanent failure
Wrong configuration
Broken integration
🔹 3. What is a BPMN Error?
A BPMN Error is:
👉 A business-level error, not technical
Used when:
Business rule fails
Validation fails
Expected error scenario
Example:
“Customer not eligible”
“Insufficient balance”
👉 Handled using:
Error boundary events
Error end events
🔹 4. Key Differences
| Type | Nature | Retry | Handling |
|---|---|---|---|
| Failure | Technical | ✅ Yes | Automatic |
| Incident | Technical | ❌ No | Manual |
| Error | Business | ❌ No | BPMN Flow |
🔹 5. When to Use What?
👉 Use Failure:
Temporary issues
Retryable errors
👉 Use Incident:
System failure after retries
Requires manual fix
👉 Use BPMN Error:
Business logic issues
Expected scenarios
🔹 6. Best Practices
✔ Use BPMN Error for business logic
✔ Let Failures handle retries automatically
✔ Monitor Incidents in Cockpit
✔ Avoid mixing technical & business errors
🔹 7. Common Mistakes
❌ Using exceptions for business errors
❌ Not configuring retries
❌ Ignoring incidents
🔹 8. Summary
Failure → temporary technical issue (auto retry)
Incident → retries exhausted (manual action)
Error → business flow handling
👉 Understanding this distinction is critical for production-ready workflows
Real-world Production Scenarios
Understanding how Incidents, Errors, and Failures behave in real production systems is critical when working with Camunda.
✅ Scenario 1: External API Failure (Payment Service Down)
- A service task calls a payment API
- API is temporarily unavailable (timeout / 503)
👉 Best Handling: Failure
-
Use retries (
retries > 0) - Camunda will retry automatically
- No manual intervention needed initially
✅ Scenario 2: Invalid Business Input (Validation Issue)
- User submits incorrect data (e.g., invalid email, missing document)
👉 Best Handling: BPMN Error
-
Throw a business error (
BpmnError) - Catch using error boundary event
- Redirect workflow (e.g., “Fix Data” task)
✅ Scenario 3: Unexpected System Exception (NullPointer / Bug)
- Code throws runtime exception
- No retry logic or fallback available
👉 Result: Incident
- Camunda creates an incident
- Requires manual resolution (fix + retry)
✅ Scenario 4: Downstream System Intermittent Failure
- Kafka / RabbitMQ message send fails randomly
👉 Best Handling: Failure → Incident
- Retry first (Failure)
- If retries exhausted → Incident created
✅ Scenario 5: Business Rule Violation (Approval Rejected)
- Loan rejected due to credit score
👉 Best Handling: BPMN Error (Business Flow)
- Not a technical failure
- Route to rejection flow
🔹 When to Use Incident vs Error vs Failure
Choosing the correct mechanism ensures resilient and maintainable workflows.
🔸 Use Failure (Retries) when:
- Temporary technical issues
- External system downtime
- Network/API failures
✅ Example:
- REST API timeout
- Database connection issue
👉 Goal: Automatic recovery
🔸 Use BPMN Error when:
- Business logic condition fails
- Expected alternative path exists
✅ Example:
- Validation failure
- Business rule violation
👉 Goal: Controlled workflow routing
🔸 Use Incident when:
- Unexpected technical issue
- Retries exhausted
- No defined fallback
✅ Example:
- Code bug
- Misconfiguration
- Permanent failure
👉 Goal: Manual intervention + monitoring
📚 Recommended Articles
👉 Continue learning:
Comments
Post a Comment