Java Production Issue Troubleshooting Guide (Memory Leak, CPU Spikes & Thread Dumps)
Production issues in Java applications can silently impact business operations before users even report them.
This guide covers real-world Java production troubleshooting techniques for diagnosing:
- Java memory leaks
- High CPU usage
- Thread contention
- Deadlocks
- Application hangs
- JVM performance bottlenecks
Whether you're working on Spring Boot microservices, enterprise applications, or cloud-native Java systems, this troubleshooting guide will help you debug production incidents efficiently.
🚨 Common Java Production Issues
| Issue | Symptoms |
|---|---|
| Memory Leak | Increasing heap usage, frequent GC, OutOfMemoryError |
| CPU Spike | 100% CPU usage, slow APIs |
| Thread Deadlock | Application freeze/hang |
| GC Thrashing | High Full GC frequency |
| DB Connection Leak | Request timeout |
| Blocking Threads | APIs become unresponsive |
| Metaspace Overflow | ClassLoader memory issue |
🖼️ Recommended Feature Image (JPG)
Image Placement: Top Banner
🧠 Understanding JVM Memory Areas
Before debugging production issues, understand JVM memory regions:
- Heap Memory
- Stack Memory
- Metaspace
- Direct Memory
- Thread Stack
Java memory issues often originate from incorrect object lifecycle management or excessive object retention.
🔥 1. Java Memory Leak Troubleshooting
Common Symptoms
- Heap usage continuously increases
- Frequent Full GC
- Application slowdown
- OutOfMemoryError
- Kubernetes pod restart
- Container OOMKill
🖼️ Memory Leak Monitoring Image
✅ Common Causes of Java Memory Leaks
1. Static Collections
private static final List<User> cache = new ArrayList<>();
Objects remain forever in heap.
2. Unclosed Resources
InputStream stream = new FileInputStream(file);
Always close streams using try-with-resources.
3. ThreadLocal Leaks
private static final ThreadLocal<User> userHolder = new ThreadLocal<>();
Always remove values after usage.
4. Improper Cache Usage
Large Redis/Hazelcast/local cache objects can exhaust heap.
📌 Enable Automatic Heap Dump
Use JVM flags:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/logs/heapdump.hprof
Heap dumps are essential for identifying retained objects.
📌 Capture Heap Dump Manually
jmap -dump:live,format=b,file=heap.hprof <PID>
Find PID:
jps -l
🛠️ Best Tools for Heap Dump Analysis
| Tool | Purpose |
|---|---|
| Eclipse MAT | Heap dump analysis |
| VisualVM | JVM monitoring |
| JProfiler | Advanced profiling |
| YourKit | CPU + memory profiling |
| HeapHero | Leak analysis |
🔥 2. CPU Spike Troubleshooting in Java
High CPU usage is one of the most common production issues.
🖼️ CPU Spike Troubleshooting Image
✅ Common Reasons for CPU Spikes
- Infinite loops
- Excessive logging
- Recursive methods
- Large JSON serialization
- High garbage collection activity
- Thread contention
- DB retry loops
- Blocking synchronization
📌 Identify High CPU Java Process
Linux command:
top -H -p <PID>
Find problematic thread.
Convert thread ID to HEX:
printf "%x\n" <THREAD_ID>
Search HEX thread inside thread dump.
This helps correlate high CPU threads with Java stack traces.
📌 Generate Thread Dump
jstack -l <PID> > threaddump.txt
Alternative:
kill -3 <PID>
Thread dumps are critical for diagnosing hangs, deadlocks, and CPU bottlenecks.
🔥 3. Thread Dump Analysis Guide
Thread dumps provide JVM thread state snapshots.
🖼️ Thread Dump Analysis Image
✅ Important Thread States
| State | Meaning |
|---|---|
| RUNNABLE | Thread executing |
| BLOCKED | Waiting for lock |
| WAITING | Waiting indefinitely |
| TIMED_WAITING | Waiting with timeout |
| TERMINATED | Completed |
📌 Identify Deadlocks
Look for:
Found one Java-level deadlock:
Deadlocks occur when threads wait on each other indefinitely.
📌 Detect Blocking Threads
Example:
java.lang.Thread.State: BLOCKED
Usually caused by:
- synchronized methods
- lock contention
- database locks
📌 Multiple Similar Stack Traces
If many threads show identical stack traces:
- bottleneck exists
- downstream dependency issue
- shared resource lock
This is one of the fastest ways to identify production bottlenecks.
🔥 4. GC Troubleshooting
Garbage Collection issues directly affect latency.
🖼️ GC Monitoring Image
📌 Enable GC Logs
Java 17+:
-Xlog:gc*
Older JVM:
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
🚨 Signs of GC Problems
| Problem | Indicator |
|---|---|
| GC Thrashing | Frequent Full GC |
| Memory Leak | Heap never drops |
| High Pause Time | Slow APIs |
| Excessive Object Creation | Young GC spike |
🔥 5. Production Incident Troubleshooting Workflow
Step 1 — Check System Metrics
- CPU
- Memory
- Load average
- Disk IO
- Network
Step 2 — Capture JVM Data
- Thread dump
- Heap dump
- GC logs
Step 3 — Correlate Metrics
Compare:
- CPU spike timing
- GC timing
- thread states
- API latency
Step 4 — Identify Root Cause
Possible findings:
- memory leak
- deadlock
- slow DB query
- lock contention
- excessive object creation
📌 Essential Linux Commands for Java Troubleshooting
JVM Process List
jps -l
Thread Dump
jstack -l <PID>
Heap Histogram
jmap -histo <PID>
Heap Dump
jmap -dump:live,format=b,file=heap.hprof <PID>
CPU Monitoring
top -H -p <PID>
🚀 Best Practices to Avoid Production Issues
✅ Use Monitoring Tools
- Prometheus
- Grafana
- ELK Stack
- New Relic
- Datadog
✅ Enable JVM Metrics
Track:
- heap usage
- GC pauses
- thread count
- CPU utilization
✅ Configure Alerts
Create alerts for:
- high heap
- Full GC
- thread pool exhaustion
- API latency
✅ Use Proper Thread Pools
Avoid:
Executors.newCachedThreadPool()
Prefer bounded thread pools.
✅ Prevent Resource Leaks
Always close:
- DB connections
- streams
- sockets
- Kafka consumers
🔥 Real Production Example
A Spring Boot microservice showed:
- increasing heap
- high GC
- pod restart
Root cause:
Map<String, Object> cache = new HashMap<>();
Objects continuously accumulated without eviction.
Solution:
- Introduced Redis cache TTL
- Limited in-memory cache size
- Enabled heap dump monitoring
Memory leaks can still happen in Java despite automatic garbage collection.
📊 Recommended Production Monitoring Stack
| Layer | Tool |
|---|---|
| Metrics | Prometheus |
| Visualization | Grafana |
| Logs | ELK Stack |
| APM | New Relic |
| JVM Analysis | VisualVM |
| Heap Analysis | Eclipse MAT |
🎯 Final Thoughts
Java production troubleshooting requires a combination of:
- JVM knowledge
- monitoring
- thread analysis
- heap analysis
- Linux diagnostics
The fastest engineers in production incidents are usually the ones who:
- capture thread dumps quickly
- analyze heap growth patterns
- correlate CPU + GC metrics
- identify blocking threads early
Mastering these troubleshooting techniques is essential for enterprise Java developers, DevOps engineers, and platform architects.
📚 Recommended Articles
- API Gateway Pattern in Java Microservices
- Java Caching Strategies for High Performance Applications
- Alfresco SOLR Search Optimization Guide
- Java Monitoring & Observability Guide
- Enterprise Workflow Engines in Java
- Java Microservices Security Best Practices
- Spring Boot Performance Optimization Guide
- Java Kafka Production Best Practices
French version: https://shikhanirankari.blogspot.com/2026/05/guide-de-depannage-des-problemes-java.html
📢 Need help with Java, workflows, or backend systems?
I help teams design scalable, high-performance, production-ready applications and solve critical real-world issues.
Services:
- Java & Spring Boot development
- Camunda Training / consulting
- Alfresco Training / consulting
- Workflow architecture guidance
- Workflow implementation (Camunda, Flowable – BPMN, DMN)
- Backend & API integrations (REST, microservices)
- Document management & ECM integrations (Alfresco)
- Performance optimization & production issue resolution
🔗 https://shikhanirankari.blogspot.com/p/professional-services.html
📩 Email: ishikhanirankari@gmail.com | info@realtechnologiesindia.com
🌐 https://realtechnologiesindia.com
✔ Available for quick consultations
✔ Response within 24 hours
Comments
Post a Comment