Camunda 7 External Task Retry Not Working – Fix

Camunda 7 External Task Retry Not Working – Root Cause & Fix (Production Guide)

External Tasks are widely used in Camunda 7 to decouple business logic from the process engine.
However, a very common production issue teams face is:

External Task retries are not working as expected.

In this blog, we’ll cover why retries fail, real root causes, and how to fix them safely in production.


1️⃣ How External Task Retry Works in Camunda 7

In Camunda 7, retries are controlled by:

  • retries count

  • lockDuration

  • lockExpirationTime

  • Failure handling logic in the worker

A retry happens only when the worker explicitly reports failure.


2️⃣ Most Common Reasons Retries Don’t Work

❌ 1. handleFailure() Not Called Correctly

Wrong implementation (very common):

try { // business logic } catch (Exception e) { throw e; // ❌ This does NOT trigger retry }

Correct implementation:

externalTaskService.handleFailure( externalTask, "Processing failed", e.getMessage(), retries, retryTimeout );

👉 If handleFailure() is not called, Camunda assumes success.


❌ 2. Retries Count Is Already Zero

If retries = 0, Camunda will never retry again.

Check in:

  • Cockpit → External Tasks → Retries column

Fix:

externalTaskService.setRetries(externalTask, 3);

Or manually reset retries from Cockpit.


❌ 3. Lock Duration Too Short

If lockDuration is too small:

  • Task unlocks before worker finishes

  • Another worker may pick it up

  • Retry logic behaves unpredictably

Fix:

client.subscribe("topic") .lockDuration(10000) // 10 seconds or more .handler(...)

Rule of thumb:
👉 lockDuration > max execution time


❌ 4. Worker Crashes Before Reporting Failure

If the worker:

  • Crashes

  • Gets killed

  • Loses network

Then:

  • handleFailure() is never called

  • Retry is not scheduled

What happens instead?

  • Task becomes available again only after lock expires

  • Retry count is unchanged

👉 This is expected behavior, not a bug.


3️⃣ Retry Timeout Misunderstanding

Many developers think retries are immediate.

But:

retryTimeout = 60000; // 1 minute

Means:

  • Camunda waits 1 minute

  • Then the task becomes fetchable again

If retryTimeout is large, it looks like retries are not working.


4️⃣ BPMN Error vs External Task Retry (Wrong Choice)

❌ Using BPMN Error for technical failures:

throw new BpmnError("ERROR_CODE");

This:

  • Ends retry logic

  • Moves process forward

✅ Use retries for:

  • Network errors

  • DB timeouts

  • Temporary failures

Use BPMN Error only for business exceptions.


5️⃣ External Task Topic Subscription Issues

Check:

  • Topic name matches BPMN exactly

  • Worker is actually subscribed

  • Worker is polling continuously

Enable logs:

org.camunda.bpm.client=DEBUG

6️⃣ How to Debug Retry Issues (Checklist)

✅ Check retries value in Cockpit
✅ Check worker logs
✅ Verify handleFailure() is called
✅ Confirm lockDuration
✅ Validate retry timeout
✅ Ensure worker is alive


7️⃣ Production-Safe Retry Strategy (Recommended)

int retriesLeft = externalTask.getRetries() != null ? externalTask.getRetries() - 1 : 3; externalTaskService.handleFailure( externalTask, "Temporary failure", errorMessage, retriesLeft, 30000 );

This ensures:

  • Controlled retry count

  • No infinite loops

  • Safe recovery


🔑 Key Takeaway

External Task retries do NOT fail silently — they fail due to implementation or configuration issues.

Most problems are caused by:

  • Missing handleFailure()

  • Zero retries

  • Short lock duration

  • Wrong exception strategy


💼 Professional Support Available

If you are:

  • Facing Camunda 7 External Task retry issues

  • Debugging production failures

  • Designing robust retry strategies

I provide paid consulting, production debugging, and Camunda training.

📧 Contact : ishikhanirankari@gmail.com info@realtechnologiesindia.com

🌐 Website : IT Trainings | Digital metal podium


Comments

Popular posts from this blog

jBPM Installation Guide: Step by Step Setup

Scopes of Signal in jBPM

OOPs Concepts in Java | English | Object Oriented Programming Explained