feat(LCM/RO/N2VC): Event driven LCM/RO
Event driven LCM / RO
Proposer
Fabián Bravo (Whitestack), Eduardo Sousa (Canonical)
Type
Feature
Target MDG/TF
NBI, LCM, RO, N2VC
Description
The main goal is to provide a complete API and State machine for each of the operations LCM has to do, that way an operator can have complete control over an instantiation, with tools to execute, retry, roll back or debug the complete lifecycle of a VNF.
Current status
LCM provides a reduced set of operations that reflect 1 to 1 what the user can do using OSM. But from a technical standpoint this is not the best approach if we want to provide full control to OSM's users, specially when a failure happens and the only option left for the user right now is to retry the complete operation again which is expensive if we take into account the time it takes and the urgency to complete it.
Proposal
Taking a different approach where we provide a feature rich API for the user to fully control the lifecycle of a NS or VNF and also control over the operations itself will enable better disaster recovery options and also better debugging and fixing over failures caused outside the reach of OSM, for example VIM resources.
The main idea is to have smaller operations where each of them satisfy the following requirements:
- It has a set of conditions it needs in order to execute the operation
- It has a set of conditions that check the operations has been done successfully
- It is idempotent
- It has a roll back procedure cleaning up all side effects
- It has a list of dependencies (can be empty)
This will need the cooperation of other modules like RO and N2VC, that would need to provide richer API's with the following requirements:
- Unit operations
- Clear and uniform contracts (parameters, return values, sync/async operation)
- Kafka communication where possible, avoiding HTTP requests between modules
Finally, taking into account all these changes, we will provide a better experience and control for end users, who will have a complete set of tools to know what's happening in their systems and take action in case of failure.
Definition of done
LCM has the capability of executing, stopping, pausing, roll backing any task at any moment without system instabilty or data corruption allowing the user to effectibly control the complete lifecycle of his/her NS/VNFs. New tests will be uploaded to test this characteristics, trying to stop an instantiation mid-way and checking the state of the VIM and other side-effects, then trying to roll back and do the same. The goal is to achieve consistency between deployments and disaster recovery.