From 560bd41d9ae28ba566710739b3e0ac6338576143 Mon Sep 17 00:00:00 2001 From: garciadeblas Date: Wed, 29 Mar 2017 23:56:58 +0200 Subject: [PATCH] OSM platform resiliency to single component failure Change-Id: I281c3faa344071b77352c2c6643f61f589a4b81c Signed-off-by: garciadeblas --- ..._resiliency_to_single_component_failure.md | 37 +++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 Release5/OSM_platform_resiliency_to_single_component_failure.md diff --git a/Release5/OSM_platform_resiliency_to_single_component_failure.md b/Release5/OSM_platform_resiliency_to_single_component_failure.md new file mode 100644 index 0000000..b48264e --- /dev/null +++ b/Release5/OSM_platform_resiliency_to_single_component_failure.md @@ -0,0 +1,37 @@ +# OSM platform resiliency to single component failure # + +## Proposer ## +- Gerardo Garcia (Telefonica) +- Alfonso Tierno (Telefonica) +- Francisco Javier Ramon (Telefonica) + +## Type ## +**Feature** + +## Target MDG/TF ## +SO, RO, VCA, UI + +## Description ## +**This feature obsoletes feature #666: +https://osm.etsi.org/gerrit/#/c/666/** + +The NFV Orchestrator becomes a critical component for the operator in a +production environment. As such, it should be capable of recovering from +unexpected failures of its components, via a combination of techniques. In this +case, it should be possible to keep the system alive in case of failure of a +single component (e.g. active-standby redundancy). + +As part of this resilience strategy it might be useful identifying: +- Which sub-components (inside each of the current modules) are intended to +store permanent information (databases, repositories) or should be considered +stateful, and devise specific HA strategies for them. +- Which sub-components are stateless (or can recover efficiently their state +from databases or stateful components) and devise bootstrap and, if applicable, +load balancing or active-standby strategies for them. + +## Demo or definition of done ## +In a running OSM system with an instantiated NS, an abrupt poweroff is forced +in the container where one OSM component is running. After that event, the OSM +system keeps working and can continue the operation of the running NS. This +process should fire an alarm. This abrupt power-off might potentially affect +any OSM component. -- 2.25.1