# Monitoring Network Services and VNF Instances

### Performance Management

#### VNF performance management

OSM automatically monitors the status of every VM running in the VIM account. In addition, OSM can collect VM resource consumption metrics such as CPU usage, memory usage, disk usage, and I/O packet rates. For resource consumption metrics to be collected, your VIM must support a Telemetry system. Currently, the collection of VM resource consumption metrics in OSM works with:

- OpenStack telemetry services: VIM-legacy (ceilometer-based), Gnocchi-based or Prometheus.
- Microsoft Azure.
- Google Cloud Platform.
- VMware vCloud Director with vRealizeOperations.

Next step is to activate metrics collection at your VNFDs. Every metric to be collected from the VIM for each VDU has to be described both at the VDU level, and then at the VNF level. For example:

```yaml
vdu:
   id: hackfest_basic_metrics-VM
  ...  
    monitoring-parameter:
    - id: vnf_cpu_util
      name: vnf_cpu_util
      performance-metric: cpu_utilization
    - id: vnf_memory_util
      name: vnf_memory_util
      performance-metric: average_memory_utilization
    - id: vnf_packets_sent
      name: vnf_packets_sent
      performance-metric: packets_sent
    - id: vnf_packets_received
      name: vnf_packets_received
      performance-metric: packets_received
```

As you can see, a list of "NFVI metrics" is defined first at the VDU level, which contains an ID and the corresponding normalized metric name (in this case, `cpu_utilization` and `average_memory_utilization`). Normalized metric names are: `cpu_utilization`, `average_memory_utilization`, `disk_read_ops`, `disk_write_ops`, `disk_read_bytes`, `disk_write_bytes`, `packets_received`, `packets_sent`, `packets_out_dropped`, `packets_in_dropped`

Not all metrics can be collected from all types of VIMs, the following table shows which metrics are supported by each type of VIM:

| Metric | Openstack | Azure |  GCP  |
| ------ |:---------:|:-----:|:-----:|
| cpu_utilization | X | X | X |
| average_memory_utilization | X || X |
| disk_read_ops | X | X | X |
| disk_write_ops | X | X | X |
| disk_read_bytes | X | X | X |
| disk_write_bytes | X | X | X |
| packets_in_dropped | X |||
| packets_out_dropped | X |||
| packets_received | X || X |
| packets_sent | X || X |

Available attributes and values can be directly explored at the [OSM Information Model](../../reference/information-model.md). A complete VNFD example can be downloaded from [here](https://osm.etsi.org/gitlab/vnf-onboarding/osm-packages/-/blob/master/hackfest_basic_metrics_vnf).

##### VMware vCD specific notes (OLD)

Since REL6 onwards, MON collects all the normalized metrics, with the following exceptions:

- `packets_in_dropped` is not available and will always return 0.
- `packets_received` cannot be measured. Instead the number of bytes received for all interfaces is returned.
- `packets_sent` cannot be measured. Instead the number of bytes sent for all interfaces is returned.

The rolling average for vROPS metrics is always 5 minutes. The collection interval is also 5 minutes, and can be changed, however, it will still report the rolling average for the past 5 minutes, just updated according to the collection interval. See <https://kb.vmware.com/s/article/67792> for more information.

Although it is not recommended, if a more frequent interval is desired, the following procedure can be used to change the collection interval:

- Log into vROPS as an admin.
- Navigate to Administration and expand Configuration.
- Select Inventory Explorer.
- Expand the Adapter Instances and select vCenter Server.
- Edit the vCenter Server instance and expand the Advanced Settings.
- Edit the Collection Interval (Minutes) value and set to the desired value.
- Click OK to save the change.

#### Infrastructure Status Collection

OSM MON collects, automatically, "status metrics" for:

- VIMs - each VIM that OSM establishes contact with, the metric will be reflected with the name `osm_vim_status` in the TSDB.
- VMs - VMs for each VDU that OSM has instantiated, the metric will be reflected with the name `osm_vm_status` in the TSDB.

Metrics will be "1" or "0" depending on the element availability.

#### System Metrics

OSM collects system-wide metrics directly using Prometheus exporters. The way these metrics are collected is highly dependant on how OSM was installed:

|      | OSM on Kubernetes | OSM on Docker Swarm  |
|:----:|:-----------------:|:--------------------:|
| Components | Prometheus Operator Chart / Other charts: MongoDB, MySQL and Kafka exporters | Node exporter / CAdvisor exporter |
| Implements  | Multiple Grafana dashboards for a comprehensive health check of the system. | Single Grafana dashboard with the most important system metrics.|

The name with which these metrics are stored in Prometheus also depends on the installation, so Grafana Dashboards will be available by default, already showing these metrics.
Please note that the K8 installation requires the optional Monitoring stack.

![Screenshot of OSM System Metrics at Grafana](../../assets/800px-OSM_system_metrics.png)

#### Retrieving OSM metrics from Prometheus TSDB

Once the metrics are being collected, they are stored in the Prometheus Time-Series DB **with an 'osm_' prefix**, and there are a number of ways in which you can retrieve them.

##### 1) Visualizing metrics in Prometheus UI

Prometheus TSDB includes its own UI, which you can visit at `http://[OSM_IP]:9091`.

From there, you can:

- Type any metric name (i.e. `osm_cpu_utilization`) in the 'expression' field and see its current value or a histogram.
- Visit the Status --> Target menu, to monitor the connection status between Prometheus and MON (through `mon-exporter`)

![Screenshot of OSM Prometheus UI](../../assets/800px-Osm_prometheus_rel5.png)

##### 2) Visualizing metrics in Grafana

Starting in Release 7, OSM includes by default its own Grafana installation (deprecating the former experimental `pm_stack`)

Access Grafana with its default credentials (admin / admin) at `http://[OSM_IP_address]:3000` and by clicking the 'Manage' option at the 'Dashboards' menu (to the left), you will find a sample dashboard containing two graphs for VIM metrics, and two graphs for VNF metrics. You can easily change them or add more, as desired.

![Screenshot of OSM Grafana UI](../../assets/800px-Osm_grafana_rel5.png)

###### Dashboard Automation

Starting in Release 7, Grafana Dashboards are created by default in OSM. This is done by the "dahboarder" service in MON, which provisions Grafana following changes in the common DB.

|Updates in|Automates these dashboards|
|:--------:|:------------------------:|
|OSM installation|System Metrics, Admin Project-scoped|
|OSM Projects|Project-scoped|
|OSM Network Services|NS-scoped sample dashboard|

##### 3) Querying metrics through OSM SOL005-based NBI

For collecting metrics through the NBI, the following URL format should be followed:

`https://<host-ip>:<nbi-port>/osm/nspm/v1/pm_jobs/<project-id>/reports/<network-service-id>`

Where:

- `<host-ip>`: Is the machine where OSM is installed.
- `<nbi-port>`: The NBI port, i.e. 9999
- `<project-id>`: Currently it can be any string.
- `<network-service-id>`: It is the NS ID got after instantiation of network service.

Please note that a token should be obtained first in order to query a metric. More information on this can be found in the [OSM NBI Documentation](../../reference/nbi-api.md)

In response, you would get a list of the available VNF metrics, for example:

```yaml
   performanceMetric: osm_cpu_utilization
   performanceValue:
       performanceValue:
           performanceValue: '0.9563615332000001'
           vduName: test_fet7912-2-ubuntuvnf2vdu1-1
           vnfMemberIndex: '2'
       timestamp: 1568977549.065
```

##### 4) Interacting with Prometheus directly through its API

The [Prometheus HTTP API](https://prometheus.io/docs/prometheus/latest/querying/api/) is always directly available to gather any metrics. A couple of examples are shown below:

Example with Date range query

```bash
curl 'http://localhost:9091/api/v1/query_range?query=osm_cpu_utilization&start=2018-12-03T14:10:00.000Z&end=2018-12-03T14:20:00.000Z&step=15s'
```

Example with Instant query

```bash
curl 'http://localhost:9091/api/v1/query?query=osm_cpu_utilization&time=2018-12-03T14:14:00.000Z'
```

Further examples and API calls can be found at the [Prometheus HTTP API documentation](https://prometheus.io/docs/prometheus/latest/querying/api/).

##### 5) Interacting directly with MON Collector

The way Prometheus TSDB stores metrics is by querying Prometheus 'exporters' periodically, which are set as 'targets'. Exporters expose current metrics in a specific format that Prometheus can understand, more information can be found [here](https://prometheus.io/docs/instrumenting/exporters/)

OSM MON features a "mon-exporter" module that exports **current metrics** through port 8000. Please note that this port is by default not being exposed outside the OSM docker's network.

A tool that understands Prometheus 'exporters' (for example, Elastic Metricbeat) can be plugged-in to integrate directly with "mon-exporter". To get an idea on how metrics look alike in this particular format, you could:

###### 1. Get into MON console

```bash
docker exec -ti osm_mon.1.[id] bash
```

###### 2. Install curl

```bash
apt -y install curl
```

###### 3. Use curl to get the current metrics list

```bash
curl localhost:8000
```

Please note that as long as the Prometheus container is up, it will continue retrieving and storing metrics in addition to any other tool/DB you connect to `mon-exporter`.

##### 6) Using your own TSDB

OSM MON integrates Prometheus through a plugin/backend model, so if desired, other backends can be developed. If interested in contributing with such option, you can ask for details at our Slack #service-assurance channel or through the OSM Tech mailing list.

### Fault Management

Reference diagram:

![Diagram of OSM FM and ELK Experimental add-ons](../../assets/800px-Osm_fm_rel5.png)

#### Basic functionality

##### Logs & Events

Logs can be monitored on a per-container basis via command line, like this:

```bash
docker logs <container id or name>
```

For example:

```bash
docker logs osm_lcm.1.tkb8yr6v762d28ird0edkunlv
```

Logs can also be found in the corresponding volume of the host filesystem: `/var/lib/containers/[container-id]/[container-id].json.log`

Furthermore, there are some important events flowing between components through the Kafka bus, which can be monitored on a per-topic basis by external tools.

##### Alarm Manager for Metrics

As of Release FIVE, MON includes a new module called 'mon-evaluator'. The only use case supported today by this module is the configuration of alarms and evaluation of thresholds related to metrics, for the Policy Manager module (POL) to take actions such as [auto-scaling](#autoscaling).

Whenever a threshold is crossed and an alarm is triggered, the notification is generated by MON and put in the Kafka bus so other components, like POL can consume them. This event is today logged by both MON (generates notification) and POL (consumes notification, for its auto-scaling or webhook actions)

By default, threshold evaluation occurs every 30 seconds. This value can be changed by setting an environment variable, for example:

```bash
docker service update --env-add OSMMON_EVALUATOR_INTERVAL=15 osm_mon
```

To configure alarms that send webhooks to a web service, add the following to the VNF descriptor:

```yaml
vdu:
-   alarm:
    -   alarm-id: alarm-1
        operation: LT
        value: 20
        actions:
          alarm:
            - url: https://webhook.site/1111
          ok:
            - url: https://webhook.site/2222
          insufficient-data:
            - url: https://webhook.site/3333
        vnf-monitoring-param-ref: vnf_cpu_util
```

Regarding how to configure alarms through VNFDs for the auto-scaling use case, follow the [auto-scaling documentation](#autoscaling)