Java Production Issue Troubleshooting Guide (Memory Leak, CPU Spikes & Thread Dumps)

 Production issues in Java applications can silently impact business operations before users even report them.

This guide covers real-world Java production troubleshooting techniques for diagnosing:

  • Java memory leaks
  • High CPU usage
  • Thread contention
  • Deadlocks
  • Application hangs
  • JVM performance bottlenecks

Whether you're working on Spring Boot microservices, enterprise applications, or cloud-native Java systems, this troubleshooting guide will help you debug production incidents efficiently.


🚨 Common Java Production Issues

IssueSymptoms
Memory LeakIncreasing heap usage, frequent GC, OutOfMemoryError
CPU Spike100% CPU usage, slow APIs
Thread DeadlockApplication freeze/hang
GC ThrashingHigh Full GC frequency
DB Connection LeakRequest timeout
Blocking ThreadsAPIs become unresponsive
Metaspace OverflowClassLoader memory issue

🖼️ Recommended Feature Image (JPG)

Image Placement: Top Banner



🧠 Understanding JVM Memory Areas

Before debugging production issues, understand JVM memory regions:

  • Heap Memory
  • Stack Memory
  • Metaspace
  • Direct Memory
  • Thread Stack

Java memory issues often originate from incorrect object lifecycle management or excessive object retention.


🔥 1. Java Memory Leak Troubleshooting

Common Symptoms

  • Heap usage continuously increases
  • Frequent Full GC
  • Application slowdown
  • OutOfMemoryError
  • Kubernetes pod restart
  • Container OOMKill

🖼️ Memory Leak Monitoring Image



✅ Common Causes of Java Memory Leaks

1. Static Collections

private static final List<User> cache = new ArrayList<>();

Objects remain forever in heap.


2. Unclosed Resources

InputStream stream = new FileInputStream(file);

Always close streams using try-with-resources.


3. ThreadLocal Leaks

private static final ThreadLocal<User> userHolder = new ThreadLocal<>();

Always remove values after usage.


4. Improper Cache Usage

Large Redis/Hazelcast/local cache objects can exhaust heap.


📌 Enable Automatic Heap Dump

Use JVM flags:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/logs/heapdump.hprof

Heap dumps are essential for identifying retained objects.


📌 Capture Heap Dump Manually

jmap -dump:live,format=b,file=heap.hprof <PID>

Find PID:

jps -l


🛠️ Best Tools for Heap Dump Analysis

ToolPurpose
Eclipse MATHeap dump analysis
VisualVMJVM monitoring
JProfilerAdvanced profiling
YourKitCPU + memory profiling
HeapHeroLeak analysis

🔥 2. CPU Spike Troubleshooting in Java

High CPU usage is one of the most common production issues.


🖼️ CPU Spike Troubleshooting Image



✅ Common Reasons for CPU Spikes

  • Infinite loops
  • Excessive logging
  • Recursive methods
  • Large JSON serialization
  • High garbage collection activity
  • Thread contention
  • DB retry loops
  • Blocking synchronization

📌 Identify High CPU Java Process

Linux command:

top -H -p <PID>

Find problematic thread.

Convert thread ID to HEX:

printf "%x\n" <THREAD_ID>

Search HEX thread inside thread dump.

This helps correlate high CPU threads with Java stack traces.


📌 Generate Thread Dump

jstack -l <PID> > threaddump.txt

Alternative:

kill -3 <PID>

Thread dumps are critical for diagnosing hangs, deadlocks, and CPU bottlenecks.


🔥 3. Thread Dump Analysis Guide

Thread dumps provide JVM thread state snapshots.


🖼️ Thread Dump Analysis Image



✅ Important Thread States

StateMeaning
RUNNABLEThread executing
BLOCKEDWaiting for lock
WAITINGWaiting indefinitely
TIMED_WAITINGWaiting with timeout
TERMINATEDCompleted

📌 Identify Deadlocks

Look for:

Found one Java-level deadlock:

Deadlocks occur when threads wait on each other indefinitely.


📌 Detect Blocking Threads

Example:

java.lang.Thread.State: BLOCKED

Usually caused by:

  • synchronized methods
  • lock contention
  • database locks

📌 Multiple Similar Stack Traces

If many threads show identical stack traces:

  • bottleneck exists
  • downstream dependency issue
  • shared resource lock

This is one of the fastest ways to identify production bottlenecks.


🔥 4. GC Troubleshooting

Garbage Collection issues directly affect latency.


🖼️ GC Monitoring Image



📌 Enable GC Logs

Java 17+:

-Xlog:gc*

Older JVM:

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps

🚨 Signs of GC Problems

ProblemIndicator
GC ThrashingFrequent Full GC
Memory LeakHeap never drops
High Pause TimeSlow APIs
Excessive Object CreationYoung GC spike

🔥 5. Production Incident Troubleshooting Workflow

Step 1 — Check System Metrics

  • CPU
  • Memory
  • Load average
  • Disk IO
  • Network

Step 2 — Capture JVM Data

  • Thread dump
  • Heap dump
  • GC logs

Step 3 — Correlate Metrics

Compare:

  • CPU spike timing
  • GC timing
  • thread states
  • API latency

Step 4 — Identify Root Cause

Possible findings:

  • memory leak
  • deadlock
  • slow DB query
  • lock contention
  • excessive object creation

📌 Essential Linux Commands for Java Troubleshooting

JVM Process List

jps -l

Thread Dump

jstack -l <PID>

Heap Histogram

jmap -histo <PID>

Heap Dump

jmap -dump:live,format=b,file=heap.hprof <PID>

CPU Monitoring

top -H -p <PID>

🚀 Best Practices to Avoid Production Issues

✅ Use Monitoring Tools

  • Prometheus
  • Grafana
  • ELK Stack
  • New Relic
  • Datadog

✅ Enable JVM Metrics

Track:

  • heap usage
  • GC pauses
  • thread count
  • CPU utilization

✅ Configure Alerts

Create alerts for:

  • high heap
  • Full GC
  • thread pool exhaustion
  • API latency

✅ Use Proper Thread Pools

Avoid:

Executors.newCachedThreadPool()

Prefer bounded thread pools.


✅ Prevent Resource Leaks

Always close:

  • DB connections
  • streams
  • sockets
  • Kafka consumers

🔥 Real Production Example

A Spring Boot microservice showed:

  • increasing heap
  • high GC
  • pod restart

Root cause:

Map<String, Object> cache = new HashMap<>();

Objects continuously accumulated without eviction.

Solution:

  • Introduced Redis cache TTL
  • Limited in-memory cache size
  • Enabled heap dump monitoring

Memory leaks can still happen in Java despite automatic garbage collection.


📊 Recommended Production Monitoring Stack

LayerTool
MetricsPrometheus
VisualizationGrafana
LogsELK Stack
APMNew Relic
JVM AnalysisVisualVM
Heap AnalysisEclipse MAT

🎯 Final Thoughts

Java production troubleshooting requires a combination of:

  • JVM knowledge
  • monitoring
  • thread analysis
  • heap analysis
  • Linux diagnostics

The fastest engineers in production incidents are usually the ones who:

  • capture thread dumps quickly
  • analyze heap growth patterns
  • correlate CPU + GC metrics
  • identify blocking threads early

Mastering these troubleshooting techniques is essential for enterprise Java developers, DevOps engineers, and platform architects.


📚 Recommended Articles

French version: https://shikhanirankari.blogspot.com/2026/05/guide-de-depannage-des-problemes-java.html


📢 Need help with Java, workflows, or backend systems?

I help teams design scalable, high-performance, production-ready applications and solve critical real-world issues.

Services:

  • Java & Spring Boot development
  • Camunda Training / consulting
  • Alfresco Training / consulting
  • Workflow architecture guidance
  • Workflow implementation (Camunda, Flowable – BPMN, DMN)
  • Backend & API integrations (REST, microservices)
  • Document management & ECM integrations (Alfresco)
  • Performance optimization & production issue resolution

🔗 https://shikhanirankari.blogspot.com/p/professional-services.html

📩 Email: ishikhanirankari@gmail.com | info@realtechnologiesindia.com
🌐 https://realtechnologiesindia.com

✔ Available for quick consultations
✔ Response within 24 hours

Comments

Popular posts from this blog

Top 50 Camunda BPM Interview Questions and Answers for Developers (2026 Guide)

OOPs Concepts in Java | English | Object Oriented Programming Explained

Scopes of Signal in jBPM