Monitoring Network Services and VNF Instances

Performance Management

VNF performance management

OSM automatically monitors the status of every VM running in the VIM account. In addition, OSM can collect VM resource consumption metrics such as CPU usage, memory usage, disk usage, and I/O packet rates. For resource consumption metrics to be collected, your VIM must support a Telemetry system. Currently, the collection of VM resource consumption metrics in OSM works with:

  • OpenStack telemetry services: VIM-legacy (ceilometer-based), Gnocchi-based or Prometheus.

  • Microsoft Azure.

  • Google Cloud Platform.

  • VMware vCloud Director with vRealizeOperations.

Next step is to activate metrics collection at your VNFDs. Every metric to be collected from the VIM for each VDU has to be described both at the VDU level, and then at the VNF level. For example:

vdu:
   id: hackfest_basic_metrics-VM
  ...  
    monitoring-parameter:
    - id: vnf_cpu_util
      name: vnf_cpu_util
      performance-metric: cpu_utilization
    - id: vnf_memory_util
      name: vnf_memory_util
      performance-metric: average_memory_utilization
    - id: vnf_packets_sent
      name: vnf_packets_sent
      performance-metric: packets_sent
    - id: vnf_packets_received
      name: vnf_packets_received
      performance-metric: packets_received

As you can see, a list of “NFVI metrics” is defined first at the VDU level, which contains an ID and the corresponding normalized metric name (in this case, cpu_utilization and average_memory_utilization). Normalized metric names are: cpu_utilization, average_memory_utilization, disk_read_ops, disk_write_ops, disk_read_bytes, disk_write_bytes, packets_received, packets_sent, packets_out_dropped, packets_in_dropped

Not all metrics can be collected from all types of VIMs, the following table shows which metrics are supported by each type of VIM:

Metric Openstack Azure GCP
cpu_utilization X X X
average_memory_utilization X X
disk_read_ops X X X
disk_write_ops X X X
disk_read_bytes X X X
disk_write_bytes X X X
packets_in_dropped X
packets_out_dropped X
packets_received X X
packets_sent X X

Available attributes and values can be directly explored at the OSM Information Model. A complete VNFD example can be downloaded from here.

VMware vCD specific notes (OLD)

Since REL6 onwards, MON collects all the normalized metrics, with the following exceptions:

  • packets_in_dropped is not available and will always return 0.

  • packets_received cannot be measured. Instead the number of bytes received for all interfaces is returned.

  • packets_sent cannot be measured. Instead the number of bytes sent for all interfaces is returned.

The rolling average for vROPS metrics is always 5 minutes. The collection interval is also 5 minutes, and can be changed, however, it will still report the rolling average for the past 5 minutes, just updated according to the collection interval. See https://kb.vmware.com/s/article/67792 for more information.

Although it is not recommended, if a more frequent interval is desired, the following procedure can be used to change the collection interval:

  • Log into vROPS as an admin.

  • Navigate to Administration and expand Configuration.

  • Select Inventory Explorer.

  • Expand the Adapter Instances and select vCenter Server.

  • Edit the vCenter Server instance and expand the Advanced Settings.

  • Edit the Collection Interval (Minutes) value and set to the desired value.

  • Click OK to save the change.

Infrastructure Status Collection

OSM MON collects, automatically, “status metrics” for:

  • VIMs - each VIM that OSM establishes contact with, the metric will be reflected with the name osm_vim_status in the TSDB.

  • VMs - VMs for each VDU that OSM has instantiated, the metric will be reflected with the name osm_vm_status in the TSDB.

Metrics will be “1” or “0” depending on the element availability.

System Metrics

OSM collects system-wide metrics directly using Prometheus exporters. The way these metrics are collected is highly dependant on how OSM was installed:

OSM on Kubernetes OSM on Docker Swarm
Components Prometheus Operator Chart / Other charts: MongoDB, MySQL and Kafka exporters Node exporter / CAdvisor exporter
Implements Multiple Grafana dashboards for a comprehensive health check of the system. Single Grafana dashboard with the most important system metrics.

The name with which these metrics are stored in Prometheus also depends on the installation, so Grafana Dashboards will be available by default, already showing these metrics. Please note that the K8 installation requires the optional Monitoring stack.

Screenshot of OSM System Metrics at Grafana

Retrieving OSM metrics from Prometheus TSDB

Once the metrics are being collected, they are stored in the Prometheus Time-Series DB with an ‘osm_’ prefix, and there are a number of ways in which you can retrieve them.

1) Visualizing metrics in Prometheus UI

Prometheus TSDB includes its own UI, which you can visit at http://[OSM_IP]:9091.

From there, you can:

  • Type any metric name (i.e. osm_cpu_utilization) in the ‘expression’ field and see its current value or a histogram.

  • Visit the Status –> Target menu, to monitor the connection status between Prometheus and MON (through mon-exporter)

Screenshot of OSM Prometheus UI

2) Visualizing metrics in Grafana

Starting in Release 7, OSM includes by default its own Grafana installation (deprecating the former experimental pm_stack)

Access Grafana with its default credentials (admin / admin) at http://[OSM_IP_address]:3000 and by clicking the ‘Manage’ option at the ‘Dashboards’ menu (to the left), you will find a sample dashboard containing two graphs for VIM metrics, and two graphs for VNF metrics. You can easily change them or add more, as desired.

Screenshot of OSM Grafana UI

Dashboard Automation

Starting in Release 7, Grafana Dashboards are created by default in OSM. This is done by the “dahboarder” service in MON, which provisions Grafana following changes in the common DB.

Updates in Automates these dashboards
OSM installation System Metrics, Admin Project-scoped
OSM Projects Project-scoped
OSM Network Services NS-scoped sample dashboard

3) Querying metrics through OSM SOL005-based NBI

For collecting metrics through the NBI, the following URL format should be followed:

https://<host-ip>:<nbi-port>/osm/nspm/v1/pm_jobs/<project-id>/reports/<network-service-id>

Where:

  • <host-ip>: Is the machine where OSM is installed.

  • <nbi-port>: The NBI port, i.e. 9999

  • <project-id>: Currently it can be any string.

  • <network-service-id>: It is the NS ID got after instantiation of network service.

Please note that a token should be obtained first in order to query a metric. More information on this can be found in the OSM NBI Documentation

In response, you would get a list of the available VNF metrics, for example:

   performanceMetric: osm_cpu_utilization
   performanceValue:
       performanceValue:
           performanceValue: '0.9563615332000001'
           vduName: test_fet7912-2-ubuntuvnf2vdu1-1
           vnfMemberIndex: '2'
       timestamp: 1568977549.065

4) Interacting with Prometheus directly through its API

The Prometheus HTTP API is always directly available to gather any metrics. A couple of examples are shown below:

Example with Date range query

curl 'http://localhost:9091/api/v1/query_range?query=osm_cpu_utilization&start=2018-12-03T14:10:00.000Z&end=2018-12-03T14:20:00.000Z&step=15s'

Example with Instant query

curl 'http://localhost:9091/api/v1/query?query=osm_cpu_utilization&time=2018-12-03T14:14:00.000Z'

Further examples and API calls can be found at the Prometheus HTTP API documentation.

5) Interacting directly with MON Collector

The way Prometheus TSDB stores metrics is by querying Prometheus ‘exporters’ periodically, which are set as ‘targets’. Exporters expose current metrics in a specific format that Prometheus can understand, more information can be found here

OSM MON features a “mon-exporter” module that exports current metrics through port 8000. Please note that this port is by default not being exposed outside the OSM docker’s network.

A tool that understands Prometheus ‘exporters’ (for example, Elastic Metricbeat) can be plugged-in to integrate directly with “mon-exporter”. To get an idea on how metrics look alike in this particular format, you could:

1. Get into MON console
docker exec -ti osm_mon.1.[id] bash
2. Install curl
apt -y install curl
3. Use curl to get the current metrics list
curl localhost:8000

Please note that as long as the Prometheus container is up, it will continue retrieving and storing metrics in addition to any other tool/DB you connect to mon-exporter.

6) Using your own TSDB

OSM MON integrates Prometheus through a plugin/backend model, so if desired, other backends can be developed. If interested in contributing with such option, you can ask for details at our Slack #service-assurance channel or through the OSM Tech mailing list.

Fault Management

Reference diagram:

Diagram of OSM FM and ELK Experimental add-ons

Basic functionality

Logs & Events

Logs can be monitored on a per-container basis via command line, like this:

docker logs <container id or name>

For example:

docker logs osm_lcm.1.tkb8yr6v762d28ird0edkunlv

Logs can also be found in the corresponding volume of the host filesystem: /var/lib/containers/[container-id]/[container-id].json.log

Furthermore, there are some important events flowing between components through the Kafka bus, which can be monitored on a per-topic basis by external tools.

Alarm Manager for Metrics

As of Release FIVE, MON includes a new module called ‘mon-evaluator’. The only use case supported today by this module is the configuration of alarms and evaluation of thresholds related to metrics, for the Policy Manager module (POL) to take actions such as auto-scaling.

Whenever a threshold is crossed and an alarm is triggered, the notification is generated by MON and put in the Kafka bus so other components, like POL can consume them. This event is today logged by both MON (generates notification) and POL (consumes notification, for its auto-scaling or webhook actions)

By default, threshold evaluation occurs every 30 seconds. This value can be changed by setting an environment variable, for example:

docker service update --env-add OSMMON_EVALUATOR_INTERVAL=15 osm_mon

To configure alarms that send webhooks to a web service, add the following to the VNF descriptor:

vdu:
-   alarm:
    -   alarm-id: alarm-1
        operation: LT
        value: 20
        actions:
          alarm:
            - url: https://webhook.site/1111
          ok:
            - url: https://webhook.site/2222
          insufficient-data:
            - url: https://webhook.site/3333
        vnf-monitoring-param-ref: vnf_cpu_util

Regarding how to configure alarms through VNFDs for the auto-scaling use case, follow the auto-scaling documentation