Automatically heal the VNF when it fails. The recovery policy specified during deployment controls the recovery.
Network Service Healing
Proposers
Subhankar Pal (subhankar.pal@altran.com) Atul Agarwal (atul.agarwal@altran.com)
Type
Feature
Target MDG/TF
MON, POL, LCM, RO
Description
With healing procedure, the VNFs are healed when they fail, by bringing back unresponsive or dead VDUs of a VNF to life.
The healing will be performed through one of the following ways -
1- Auto Healing: The VNF is continuously monitored and is automatically healed as per the defined healing policy when it becomes unresponsive. 2- Manual Healing: Healing is triggered manually through command ine.
Auto-healing procedure will use the VNF metrics "OSM_VM_STATUS". When the metrics OSM_VM_STATUS = SHUTOFF then "VNF Failure" is reported.
The recovery policy specified during deployment controls the recovery process. This ensures high availability of network services.
The only recovery action that will be available is - 1- REBOOT_ONLY— Attempts to reboot the VM 2- ALERT - Alert is sent to an external entity by invoking webhook.
In future additional options can be considered which remains out of scope for now. 3- REBOOT_THEN_REDEPLOY—first attempt to reboot the affected VDU; if this fails, then it attempts to redeploy the affected VNFD (on the same host) 4- REDEPLOY_ONLY—only attempt to redeploy the VM
The auto healing policy can be registered using one of the two procedures 1- At runtime from OSM client after the NS and VNF's are already instantiated. 2- At deploy time using auto healing information defined in VNF descriptor (not part of current scope)
For option 1, OSM client will need following information. 1- Operation (default set, get-all or get) 2- Type (default auto-heal or manual-heal) 3- Level (default VNF, NS, project) 4- Entity (VNF Id or NS Id) 5- Metric (default OSM_VM_STATUS) 6- Event (default SHUTOFF) 7- heal-action (default reboot, reboot-and-redeploy, redeploy)
Possible future extension to this feature will be 1- Healing at network-service and project level. 2- Healing based on failure prediction using machine learning
Demo or definition of done
Further design details are posted in the pad. https://osm.etsi.org/pad/p/VnfAutoHealing