OSM Fault Management: Difference between revisions

From OSM Public Wiki
Jump to: navigation, search
mNo edit summary
No edit summary
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This is new set of features are available since the Release FOUR Lightweight build, by gradually adding capabilities on log visibility, events and failures of both the system and network services.
'''THIS PAGE IS DEPRECATED'''. OSM User Guide has been moved to a new location: '''https://osm.etsi.org/docs/user-guide/'''
 
---
 
This documentation corresponds now to Release FIVE, previous documentation related to Fault Management has been deprecated.


==Basic functionality==
==Basic functionality==


=== Logs & Events ===
=== Logs & Events ===
As of Release 4.0.0, logs can be monitored on a per-container basis via command line, like this:
As of Release 5.0.0, logs can be monitored on a per-container basis via command line, like this:


  docker logs <container id or name>
  docker logs <container id or name>


Furthermore, there are some important events flowing between components through the Kafka bus, which can be monitored on a per-topic basis by external tools.
For example:


=== VDU-level alarms and thresholds ===
docker logs osm_lcm.1.tkb8yr6v762d28ird0edkunlv
By default, alarms can be configured through the MON component at the VIM level, in order to receive a notification whenever a metric threshold is crossed.


Starting with OSM R4, it is possible to configure an alarm via the OSM CLI, by running the following command (in the example: NS name: "vnf01", VNF index: 1, VDU name: "ubuntuvnf_vnfd-VM", metric type: "cpu_utilization", severity: "critical", threshold value: 50, operator: "greater than", statistic measuring type: "average", name: "scale_out_alarm")
Logs can also be found in the corresponding volume of the host filesystem: '''/var/lib/containers/[container-id]/[container-id].json.log'''


osm ns-alarm-create --ns vnf01 --vnf 1 --vdu ubuntuvnf_vnfd-VM --metric cpu_utilization --severity critical --threshold_value 50 --threshold_operator GT --statistic AVERAGE scale_out_alarm
Furthermore, there are some important events flowing between components through the Kafka bus, which can be monitored on a per-topic basis by external tools.


Possible metric names are: cpu_utilizationaverage_memory_utilization, read_latency_<disk number>, write_latency_<disk number>,disk_read_ops,disk_write_ops,disk_read_bytes,disk_write_bytes,packets_dropped_<nic number>,packets_received, packets_sent
=== Alarm Manager for Metrics ===
As of Release FIVE, MON includes a new module called 'mon-evaluator'. The only use case supported today by this module is the configuration of alarms and evaluation of thresholds related to metrics, for the Policy Manager module (POL) to take actions such as [[OSM_Autoscaling|auto-scaling]].


Other parameter options can be explored through running "osm ns-alarm-create --help".
Whenever a threshold is crossed and an alarm is triggered, the notification is generated by MON and put in the Kafka bus so other components can consume them.  This event is today logged by both MON (generates notification) and POL (consumes notification, for its auto-scaling action)


Whenever an alarm is triggered, the notification is received by MON and put in the bus so other components can consume themFor example, there is a new Policy Manager component that soon will be able to take actions like further logging or scaling based on this notifications.
By default, threshold evaluation occurs every 30 secondsThis value can be changed by setting an environment variable, for example:
docker service update --env-add OSMMON_EVALUATOR_INTERVAL=15 osm_mon


Please note that:
Further information regarding how to configure alarms through VNFDs for the supported use case can be found at the [[OSM_Autoscaling|auto-scaling documentation]]
* As of Release 4.0.0, alarm creation has been tested with OpenStack VIM with Keystone v3 authentication, Aodh service and legacy or Gnocchi-based telemetry services. VNF-based alarms and other VIM types will soon be added during the Release Four cycle.
* For the alarm to be created, the metric has to exist at the VIM level.  If the VDU has just been created, this could take a little while. Please check out the MON logs to confirm its creation (docker logs <MON container id or name>


==Experimental functionality==
Reference diagram:
Some extensions have been added to the OSM installer to include an optional 'OSM ELK' stack to allow for events visualization, consisting of an ELK stack, with minor configurations that make it able to monitor some kakfa topics and logs.


Basic architecture is as follows:
[[File:osm_fm_rel5.png|800px|Diagram of OSM FM and ELK Experimental add-ons]]


[[File:OSM ELK Architecture 4.png|800px|Diagram of OSM ELK Experimental add-ons]]
==Experimental functionality==
As in the previous release, an optional 'OSM ELK' stack is available to allow for events visualization, consisting of the following tools:
* Elastisearch - scalable search engine and event database.
* Filebeat & Metricbeat - part of Elastic 'beats', which evolve the former Logstash component to provide generic logs and metrics collection, respectively.
* Kibana - Graphical tool for exploring all the collected events and generating customized views and dashboards.


=== Enabling the OSM ELK Stack ===
=== Enabling the OSM ELK Stack ===
If you want to install OSM along with the ELK stack, run the installer as follows:
If you want to install OSM along with the ELK stack, run the installer as follows:


  ./install_osm_release.sh --elk_stack
  ./install_osm.sh --elk_stack


If you just want to add the ELK stack to an existing OSM R4 Lightweight build, run the installer as follows:
If you just want to add the ELK stack to an existing OSM installation, run the installer as follows:


   ./install_osm_release.sh -o elk_stack
   ./install_osm.sh -o elk_stack


This will install three additional docker containers (Elasticsearch, Logstash and Kibana)
This will install four additional docker containers (Elasticsearch, Filebeat, Metricbeat and Kibana), as well as download a Docker image for an auxiliary tool named [https://www.elastic.co/guide/en/elasticsearch/client/curator/5.5/index.html Curator] (bobrik/curator)


If you need to remove it at some point in time, just run the following command:
If you need to remove it at some point in time, just run the following command:
Line 49: Line 55:
  docker stack rm osm_elk
  docker stack rm osm_elk


=== Testing the OSM ELK Stack ===
If you need to deploy the stack again after being removed:
 
docker stack deploy -c /etc/osm/docker/osm_elk/docker-compose.yml osm_elk
By default, the available ELK Stack monitors the 'alarm_response' Kafka topic, and any log sent to TCP port 5000.


1. Visit Kibana frontend at http://1.2.3.4:5601, replacing 1.2.3.4 with the IP address of your host.
'''IMPORTANT''': As time passes and more events are generated in your system, and depending on your configured searches, views and dashboards, Elasticsearch database which become very big, which may not be desirable in testing environments.
In order to delete your data periodically, you can launch a Curator container that will delete the saved indexes, freeing the associated disk space.


2. Create an alarm following the example above and you will see an event appear at the histogramThis event came from the 'alarm_response' topic.
For example, to delete all the data older than the last day:
docker run --rm --name curator --net host --entrypoint curator_cli bobrik/curator:5.5.4 --host localhost delete_indices --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":1}]'
Or to delete the data older than 2 hours:
  docker run --rm --name curator --net host --entrypoint curator_cli bobrik/curator:5.5.4 --host localhost delete_indices --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"hours","unit_count":2}]'


[[File:OSM Kibana.png|800px|Screenshot of OSM Kibana add-on]]
=== Testing the OSM ELK Stack ===


3. Trigger a VDU alarm (with CPU spikes or any other mechanism), and you will see a notification arrive at Kibana.  This event came from the 'alarm_response' topic as well.
# Download the sample dashboards to your desktop from this link (right click, save link as): https://osm-download.etsi.org/ftp/osm-4.0-four/4th-hackfest/other/osm_kibana_dashboards.json
# Visit Kibana at http://[OSM_IP]:5601 and:
## Go to "Management" --> Saved Objects --> Import (select the downloaded file)
## Go to "Dashboard" and select the "OSM System Dashboard", which connects to other three sub-dashboards (You may need to redefine "filebeat-*" as the default 'index-pattern' by selecting it, marking the star and revisiting the Dashboards)
## Metrics (from Metricbeat) and logs (from Filebeat) should appear at the corresponding visualizations.


4. Configure the Policy Manager component to log events to Logstash and repeat (3), you will see further details whenever an alarm is triggered.


docker service update --env-add LOGSTASH_URI=logstash:5000 osm_pm
[[File:osm_kibana_dashboard.png|800px|OSM Kibana Sample Dashboard]]


In the near future, more components will have visibility through Kibana. 
{{Feedback}}
Please send your feedback and suggestions to OSM_TECH@list.etsi.org

Latest revision as of 17:17, 17 February 2021

THIS PAGE IS DEPRECATED. OSM User Guide has been moved to a new location: https://osm.etsi.org/docs/user-guide/

---

This documentation corresponds now to Release FIVE, previous documentation related to Fault Management has been deprecated.

Basic functionality

Logs & Events

As of Release 5.0.0, logs can be monitored on a per-container basis via command line, like this:

docker logs <container id or name>

For example:

docker logs osm_lcm.1.tkb8yr6v762d28ird0edkunlv

Logs can also be found in the corresponding volume of the host filesystem: /var/lib/containers/[container-id]/[container-id].json.log

Furthermore, there are some important events flowing between components through the Kafka bus, which can be monitored on a per-topic basis by external tools.

Alarm Manager for Metrics

As of Release FIVE, MON includes a new module called 'mon-evaluator'. The only use case supported today by this module is the configuration of alarms and evaluation of thresholds related to metrics, for the Policy Manager module (POL) to take actions such as auto-scaling.

Whenever a threshold is crossed and an alarm is triggered, the notification is generated by MON and put in the Kafka bus so other components can consume them. This event is today logged by both MON (generates notification) and POL (consumes notification, for its auto-scaling action)

By default, threshold evaluation occurs every 30 seconds. This value can be changed by setting an environment variable, for example:

docker service update --env-add OSMMON_EVALUATOR_INTERVAL=15 osm_mon

Further information regarding how to configure alarms through VNFDs for the supported use case can be found at the auto-scaling documentation

Reference diagram:

Diagram of OSM FM and ELK Experimental add-ons

Experimental functionality

As in the previous release, an optional 'OSM ELK' stack is available to allow for events visualization, consisting of the following tools:

  • Elastisearch - scalable search engine and event database.
  • Filebeat & Metricbeat - part of Elastic 'beats', which evolve the former Logstash component to provide generic logs and metrics collection, respectively.
  • Kibana - Graphical tool for exploring all the collected events and generating customized views and dashboards.

Enabling the OSM ELK Stack

If you want to install OSM along with the ELK stack, run the installer as follows:

./install_osm.sh --elk_stack

If you just want to add the ELK stack to an existing OSM installation, run the installer as follows:

 ./install_osm.sh -o elk_stack

This will install four additional docker containers (Elasticsearch, Filebeat, Metricbeat and Kibana), as well as download a Docker image for an auxiliary tool named Curator (bobrik/curator)

If you need to remove it at some point in time, just run the following command:

docker stack rm osm_elk

If you need to deploy the stack again after being removed:

docker stack deploy -c /etc/osm/docker/osm_elk/docker-compose.yml osm_elk

IMPORTANT: As time passes and more events are generated in your system, and depending on your configured searches, views and dashboards, Elasticsearch database which become very big, which may not be desirable in testing environments. In order to delete your data periodically, you can launch a Curator container that will delete the saved indexes, freeing the associated disk space.

For example, to delete all the data older than the last day:

docker run --rm --name curator --net host --entrypoint curator_cli bobrik/curator:5.5.4 --host localhost delete_indices --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":1}]'

Or to delete the data older than 2 hours:

docker run --rm --name curator --net host --entrypoint curator_cli bobrik/curator:5.5.4 --host localhost delete_indices --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"hours","unit_count":2}]'

Testing the OSM ELK Stack

  1. Download the sample dashboards to your desktop from this link (right click, save link as): https://osm-download.etsi.org/ftp/osm-4.0-four/4th-hackfest/other/osm_kibana_dashboards.json
  2. Visit Kibana at http://[OSM_IP]:5601 and:
    1. Go to "Management" --> Saved Objects --> Import (select the downloaded file)
    2. Go to "Dashboard" and select the "OSM System Dashboard", which connects to other three sub-dashboards (You may need to redefine "filebeat-*" as the default 'index-pattern' by selecting it, marking the star and revisiting the Dashboards)
    3. Metrics (from Metricbeat) and logs (from Filebeat) should appear at the corresponding visualizations.


OSM Kibana Sample Dashboard

Your feedback is most welcome!
You can send us your comments and questions to OSM_TECH@list.etsi.org
Or join the OpenSourceMANO Slack Workplace
See hereafter some best practices to report issues on OSM