Release5/OSM_platform_resiliency_to_single_component_failure.md

   1 # OSM platform resiliency to single component failure #
   2
   3 ## Proposer ##
   4 - Gerardo Garcia (Telefonica)
   5 - Alfonso Tierno (Telefonica)
   6 - Francisco Javier Ramon (Telefonica)
   7
   8 ## Type ##
   9 **Feature**
  10
  11 ## Target MDG/TF ##
  12 SO, RO, VCA, UI
  13
  14 ## Description ##
  15 **This feature obsoletes feature #666:
  16 https://osm.etsi.org/gerrit/#/c/666/**
  17
  18 The NFV Orchestrator becomes a critical component for the operator in a
  19 production environment. As such, it should be capable of recovering from
  20 unexpected failures of its components, via a combination of techniques. In this
  21 case, it should be possible to keep the system alive in case of failure of a
  22 single component (e.g. active-standby redundancy).
  23
  24 As part of this resilience strategy it might be useful identifying:
  25 - Which sub-components (inside each of the current modules) are intended to
  26 store permanent information (databases, repositories) or should be considered
  27 stateful, and devise specific HA strategies for them.
  28 - Which sub-components are stateless (or can recover efficiently their state
  29 from databases or stateful components) and devise bootstrap and, if applicable,
  30 load balancing or active-standby strategies for them.
  31
  32 ## Demo or definition of done ##
  33 In a running OSM system with an instantiated NS, an abrupt poweroff is forced
  34 in the container where one OSM component is running. After that event, the OSM
  35 system keeps working and can continue the operation of the running NS. This
  36 process should fire an alarm. This abrupt power-off might potentially affect
  37 any OSM component.