OpenText Network Operations Management (NOM) — Event & Incident Management

Blog Series: OpenText NOM — Part 3

➡ Part 1 — SNMP Explained
➡ Part 2 — Network Discovery & Monitoring

After discovery and monitoring, the next critical layer is Event & Incident Management.

Monitoring tells you what happened.
Event management tells you why it happened.
Incident management ensures it gets resolved.

This is the core of any enterprise NOC.


📌 What is Event Management?

An event is any detectable occurrence in the network:

  • Link Down

  • High CPU

  • Device unreachable

  • Interface errors

Not every event is an incident.


🖼️ Event Flow in Monitoring Systems


Event Lifecycle in NOM

1️⃣ Event Generated (polling or trap)
2️⃣ Event Normalized
3️⃣ Correlation Applied
4️⃣ Alarm Created
5️⃣ Operator Notified


📌 What is Incident Management?

An incident is a service-impacting event requiring action.

Example:

  • Core switch failure

  • Firewall outage

  • WAN link failure

Incident management includes:

✔ Ticket creation
✔ Assignment
✔ Escalation
✔ SLA tracking


🖼️ Incident Lifecycle


Event vs Incident

EventIncident
Raw alertBusiness impact
System generatedRequires action
May auto-clearNeeds resolution

Event Correlation (Root Cause Analysis)

In large networks, a single failure can generate hundreds of alerts.

Example:

Core switch down →
10 Access switches down →
200 servers unreachable →
Applications failing

Without correlation = 211 alarms
With correlation = 1 root cause alarm


🖼️ Root Cause Correlation


Noise Reduction Techniques

✔ Alarm suppression
✔ Duplicate filtering
✔ Threshold tuning
✔ Maintenance window configuration

Reduces alert fatigue in NOC teams.


SLA & Escalation Policies

Enterprise environments define:

  • Severity levels (Critical, Major, Minor)

  • Response time targets

  • Escalation matrix

Example:

Severity 1 → Escalate in 15 minutes
Severity 2 → Escalate in 1 hour

Integration with ITSM Tools

NOM integrates with:

  • ServiceNow

  • Remedy

  • Jira

Event → Ticket auto-creation → Assignment → Closure.


Real-World Example

Scenario:

Bandwidth spike on WAN link.

Flow:

  1. Event generated

  2. Threshold exceeded

  3. Incident created

  4. Ticket assigned

  5. Root cause identified

  6. Incident resolved

  7. Post-incident review


🖼️ Event to Incident Flow


Best Practices

✔ Define severity clearly
✔ Implement correlation rules
✔ Avoid alert storms
✔ Use automation
✔ Track MTTR


Key Metrics

MetricMeaning
MTTRMean Time to Repair
MTBFMean Time Between Failures
Event VolumeTotal alerts
False Positive RateNoise level

📚 Recommended Reading


🎯 Conclusion

Discovery gives visibility.
Monitoring gives metrics.
Event management gives intelligence.
Incident management ensures resolution.

This completes the operational backbone of enterprise network monitoring.


💼 Support professionnel disponible

Si vous rencontrez des problèmes sur des projets réels liés au développement backend d’entreprise ou à l’automatisation des workflows, je propose des services de conseil payants, de débogage en production, de support projet et de formations ciblées.

Les technologies couvertes incluent Java, Spring Boot, PL/SQL, Azure, CMS, ainsi que l’automatisation des workflows (jBPM, Camunda BPM, RHPAM), DMN/Drools.

📧 Contact: ishikhanirankari@gmail.com | info@realtechnologiesindia.com

🌐 Website: IT Trainings | Digital lectern | Digital rostrum | Digital metal podium     


Comments

Popular posts from this blog

OOPs Concepts in Java | English | Object Oriented Programming Explained

Scopes of Signal in jBPM

jBPM Installation Guide: Step by Step Setup