1 # OSM platform resiliency to single component failure #
4 - Gerardo Garcia (Telefonica)
5 - Alfonso Tierno (Telefonica)
6 - Francisco Javier Ramon (Telefonica)
15 **This feature obsoletes feature #666:
16 https://osm.etsi.org/gerrit/#/c/666/**
18 The NFV Orchestrator becomes a critical component for the operator in a
19 production environment. As such, it should be capable of recovering from
20 unexpected failures of its components, via a combination of techniques. In this
21 case, it should be possible to keep the system alive in case of failure of a
22 single component (e.g. active-standby redundancy).
24 As part of this resilience strategy it might be useful identifying:
25 - Which sub-components (inside each of the current modules) are intended to
26 store permanent information (databases, repositories) or should be considered
27 stateful, and devise specific HA strategies for them.
28 - Which sub-components are stateless (or can recover efficiently their state
29 from databases or stateful components) and devise bootstrap and, if applicable,
30 load balancing or active-standby strategies for them.
32 ## Demo or definition of done ##
33 In a running OSM system with an instantiated NS, an abrupt poweroff is forced
34 in the container where one OSM component is running. After that event, the OSM
35 system keeps working and can continue the operation of the running NS. This
36 process should fire an alarm. This abrupt power-off might potentially affect