Release5/OSM_platform_recovery_after_major_failure.md

   1 # OSM platform recovery after major failure #
   2
   3 ## Proposer ##
   4 - Gerardo Garcia (Telefonica)
   5 - Alfonso Tierno (Telefonica)
   6 - Francisco Javier Ramon (Telefonica)
   7
   8 ## Type ##
   9 **Feature**
  10
  11 ## Target MDG/TF ##
  12 SO, RO, VCA, UI
  13
  14 ## Description ##
  15 **This feature obsoletes feature #666:
  16 https://osm.etsi.org/gerrit/#/c/666/**
  17
  18 The NFV Orchestrator becomes a critical component for the operator in a
  19 production environment. As such, it should be capable of recovering from
  20 unexpected failures of its components. In case of a major failure, it should be
  21 able to restore its last known internal state after an unexpected system
  22 shutdown.
  23
  24 As part of this recovery strategy it might be useful identifying:
  25 - Which sub-components (inside each of the current modules) are intended to
  26 store permanent information (databases, repositories) or should be considered
  27 stateful.
  28 - Which sub-components are stateless (or can recover efficiently their state
  29 from databases or stateful components) and devise bootstrap procedures for them.
  30 - The flow by which databases and the state of the different components should
  31 be recovered after a crash.
  32 - If applicable, the restart sequence of the modules after a crash, i.e. if
  33 they need to be started in a specific order.
  34
  35 ## Demo or definition of done ##
  36 In a running OSM system with an instantiated NS, an abrupt shutdown of all the
  37 OSM components is forced (e.g. of the VM/host); then, the system is restarted
  38 and recovers the last known state, allowing OSM to operate the pre-existing NS
  39 instance again.