Camunda 7 Multi-Instance Loop Not Completing – Root Causes & Fixes

 Introduction

One of the most frustrating production issues in Camunda 7 is when a Multi-Instance (MI) activity never completes.

Typical symptoms:

  • All instances appear finished, but the process never moves forward

  • The parent task stays active indefinitely

  • No error in logs

  • Process instance is stuck in production

This is not a Camunda bug in most cases.
It is almost always caused by modeling or data issues.

This blog explains:

  • How Multi-Instance works internally

  • Why loops get stuck

  • The most common root causes

  • How to fix and prevent them


How Multi-Instance Works in Camunda 7 (Quick Refresher)

A Multi-Instance activity:

  • Creates multiple executions

  • Tracks how many instances are:

    • created

    • completed

  • Completes only when the completion condition is met

Execution formula (simplified)

Completed Instances == Total Instances

If even one instance never completes, the loop never ends.


🔷 Multi-Instance BPMN – Diagram

Visual cues

  • Three vertical bars → parallel MI

  • Three horizontal bars → sequential MI

  • Loop characteristics define:

    • Collection

    • Element variable

    • Completion condition


Root Cause #1: Wrong or Empty Collection Variable

The problem

The MI collection expression evaluates to:

  • null

  • Empty list

  • Unexpected type

${users}

But at runtime:

users = null

Result

  • Camunda creates zero or partial instances

  • MI completion logic breaks

  • Process never moves forward

✅ Fix

✔ Always validate collection before MI

${users != null ? users : []}

✔ Or ensure data is prepared before the MI activity


Root Cause #2: One Instance Never Completes

The problem

One MI instance:

  • Waits at a user task

  • Fails in a service task

  • Is blocked by an external system

Result

  • MI parent waits forever

  • No error thrown

✅ Fix

✔ Ensure every path completes
✔ Add boundary events:

  • Error boundary

  • Timer boundary
    ✔ Handle retries properly


Root Cause #3: Incorrect Completion Condition

The problem

${nrOfCompletedInstances == nrOfInstances}

But:

  • One instance is skipped

  • One instance fails silently

Result

Condition is never true.

✅ Fix

✔ Avoid custom completion conditions unless required
✔ Let Camunda handle default completion
✔ If used, guard against edge cases


Root Cause #4: Modifying the Collection During Execution

The problem

The MI collection is changed while instances are running.

Example:

  • Items removed from the list

  • Collection replaced

Result

  • Camunda’s internal counters mismatch

  • Loop never finishes

✅ Fix

✔ Treat MI collection as immutable
✔ Never modify it after MI starts


Root Cause #5: Asynchronous Boundaries in the Wrong Place

The problem

Async Before / Async After configured incorrectly on MI tasks.

Result

  • Token execution split incorrectly

  • Completion signal never reaches parent

✅ Fix

✔ Use async carefully
✔ Avoid async after MI unless required
✔ Test async behavior under load


Root Cause #6: External Task Workers Not Completing

The problem

External workers:

  • Fetch tasks

  • Execute logic

  • But never complete the task

Result

MI instance remains active forever.

✅ Fix

✔ Ensure complete() is always called
✔ Handle worker crashes
✔ Add retries and timeouts


Root Cause #7: Sequential MI + Error Handling

The problem

Sequential MI stops on first failure.

Result

Remaining instances never run
MI never completes

✅ Fix

✔ Add error boundary events
✔ Decide explicitly:

  • Stop on first error

  • Continue on error


How to Debug MI Issues in Production

Step-by-step checklist

  1. Check runtime execution tree (Cockpit)

  2. Verify:

    • nrOfInstances

    • nrOfCompletedInstances

  3. Identify stuck execution

  4. Inspect:

    • Variables

    • External tasks

    • User tasks

  5. Check async job retries


Production-Safe Multi-Instance Checklist

✔ Collection is never null
✔ Collection is immutable
✔ Every instance completes or errors cleanly
✔ External tasks always call complete()
✔ Boundary events exist
✔ Completion condition is simple or default
✔ Sequential MI has clear error strategy


Common Anti-Patterns 🚨

❌ Modifying the MI collection at runtime
❌ Using complex completion conditions
❌ No timeout or error handling
❌ Blocking external calls
❌ Assuming MI completes automatically


Interview Question (Very Common)

Q: Why does a Multi-Instance loop never complete even when tasks look finished?
A: Because at least one execution is still active or the completion condition is never satisfied.


Conclusion

A Multi-Instance loop that never completes is a modeling problem, not an engine bug.

In 90% of cases, the cause is:

  • Bad data

  • One stuck instance

  • Unsafe completion logic

  • Missing error handling

If you treat Multi-Instance as distributed execution, not a simple loop, these issues disappear.

👉 Model defensively. Assume failures. Complete every instance.


💼 Professional Support Available

If you are facing issues in real projects related to enterprise backend development or workflow automation, I provide paid consulting, production debugging, project support, and focused trainings.

Technologies covered include Java, Spring Boot, PL/SQL, Azure, and workflow automation (jBPM, Camunda BPM, RHPAM).

📧 Contact: ishikhanirankari@gmail.com | info@realtechnologiesindia.com

🌐 Website: IT Trainings | Digital metal podium     



Comments

Popular posts from this blog

jBPM Installation Guide: Step by Step Setup

Scopes of Signal in jBPM

OOPs Concepts in Java | English | Object Oriented Programming Explained