Technical FAQ (Release THREE)

From OSM Public Wiki
Revision as of 16:02, 18 April 2018 by Fernandezca (talk | contribs) (Solution to fully loaded VCA containers where also Juju controller ceased to run)
Jump to: navigation, search

"Instantiation failed", but VMs and networks were successfully created

Q. After trying to instantiate, I got the message that the instantiation failed without much information about the reason. After checking the logs, it seems to be a timeout issue. However, I am seeing that the VMs and networks were created at the VIM.

A. First check in the RO that there is an IP address in the management interface of each VNF of the NS.

lxc exec RO --env OPENMANO_TENANT=osm openmano instance-scenario-list                      # to identify the running scenarios in the RO
lxc exec RO --env OPENMANO_TENANT=osm openmano instance-scenario-list <id> -vvv |grep ip   # to get verbose information on a specific scenario in the RO

If no IP address is present in the management interface of each VNF, then you are hitting a SO-RO timeout issue. The reason is typically a wrong configuration of the VIM. The way how management IP addresses are assigned to the VNFs change from one VIM to another. In all the cases, the recommendation is the following:

  • Pre-provision a management network in the VIM, with DHCP enabled. You can see, for instance, the instructions in the case of Openstack (https://osm.etsi.org/wikipub/index.php/Openstack_configuration_(Release_TWO) ).
  • Then make sure that, at instantiation time, you specify a mapping between the management network in th NS and the VIM network name that you pre-provisioned at the VIM.

If the IP address is present in the management interface, then you are probably hitting a SO-VCA timeout, caused because the VNF configuration via Juju charms takes too long. To confirm, connect to VCA container and check "juju status".

lxc exec VCA -- juju status

Then, if you see an error, you should debug the VNF charm or ask the people providing that VNF package.

"cannot load cookies: file locked for too long" where charms are not loaded

Depending on the remainder of the error message, this most likely means that a condition on the server hosting OSM is not allowing the charm to be loaded.

For instance, the following error indicates that the VCA container is full and cannot host new containers for Juju: "Cannot load charms due to "ERROR cannot load cookies: file locked for too long; giving up: cannot acquire lock: open /root/.local/share/juju/cookies/osm.json.lock: no space left on device".

To solve that, the containers and data created through Juju should be removed. Check the connection to the OSM Juju config agent account. Is it red/unavailable? Check if the service is running on port 17070, inside the Juju controller container that runs inside the VCA container; otherwise restore it.

# Access VCA container
$ lxc exec VCA bash

# Check the Juju status. The command may get stalled if the service is not running in the Juju controller
root@VCA:~# juju status

# Check if Juju is running on the Juju controller
# First, check the name for the LXC with the Juju controller
# In this case, juju_controller_instance_id = 10.44.127.136

root@VCA:~# lxc list | grep "${juju_controller_instance_id}"
| juju-f050fc-0   | RUNNING | 10.44.127.136 (eth0) |      | PERSISTENT | 0         |

# Check if the service for the agent account is running. It is not running in this case
root@VCA:~# lxc exec juju-f050fc-0 -- netstat -apen | grep 17070
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)

# Check if disk is completely filled in VCA
root@VCA:~# df -h

# If it is, remove some LXC beloging to the failed machines
# Note: keep the Juju controller container! (here, the last row)
# The IP is available on the configuration section, under "Accounts" in the OSM dashboard

root@VCA:~# lxc list
+-----------------+---------+----------------------+------+------------+-----------+
|      NAME       |  STATE  |         IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-260 | RUNNING | 10.44.127.190 (eth0) |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-269 | RUNNING | 10.44.127.69 (eth0)  |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-272 | RUNNING | 10.44.127.118 (eth0) |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-277 | RUNNING | 10.44.127.128 (eth0) |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-278 | RUNNING | 10.44.127.236 (eth0) |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-282 | RUNNING | 10.44.127.61 (eth0)  |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-ed3163-283 | RUNNING | 10.44.127.228 (eth0) |      | PERSISTENT | 0         |
+-----------------+---------+----------------------+------+------------+-----------+
| juju-f050fc-0   | RUNNING | 10.44.127.136 (eth0) |      | PERSISTENT | 0         |   # <- Do not remove the Juju controller!
+-----------------+---------+----------------------+------+------------+-----------+

# Example: lxc stop juju-ed3163-260; lxc delete juju-ed3163-260
lxc stop ${name}; lxc delete ${name}

# Clean status in Juju by removing the machines, units and apps that whose LXCs were removed before
root@VCA:~# juju status
Model    Controller  Cloud/Region         Version  SLA
default  osm         localhost/localhost  2.2.2    unsupported

App                      Version  Status       Scale  Charm      Store  Rev  OS      Notes
flhf-testcf-flhfilter-b           active         0/1  fl7filter  local   20  ubuntu
flhf-testd-flhfilter-b            active         0/1  fl7filter  local   22  ubuntu
flhf-testdc-flhfilter-b           active         0/1  fl7filter  local   25  ubuntu
flhf-va-flhfilter-b               active         0/1  fl7filter  local   21  ubuntu
ids-test-ac-ids-b                 active         0/1  ids        local    1  ubuntu
lala-dpi-b                        maintenance    0/1  dpi        local    4  ubuntu
lhbcdf-lcdfilter-b                maintenance    0/1  l23filter  local   20  ubuntu

Unit                       Workload     Agent   Machine  Public address  Ports  Message
flhf-testcf-flhfilter-b/0  unknown      lost    260      10.44.127.190          agent lost, see 'juju show-status-log flhf-testcf-flhfilter-b/0'
flhf-testd-flhfilter-b/1   unknown      lost    278      10.44.127.236          agent lost, see 'juju show-status-log flhf-testd-flhfilter-b/1'
flhf-testdc-flhfilter-b/2  unknown      lost    282      10.44.127.61           agent lost, see 'juju show-status-log flhf-testdc-flhfilter-b/2'
flhf-va-flhfilter-b/0      unknown      lost    272      10.44.127.118          agent lost, see 'juju show-status-log flhf-va-flhfilter-b/0'
ids-test-ac-ids-b/0        unknown      lost    277      10.44.127.128          agent lost, see 'juju show-status-log ids-test-ac-ids-b/0'
lala-dpi-b/1               maintenance  failed  283      10.44.127.228          installing charm software
lhbcdf-lcdfilter-b/0       unknown      lost    269      10.44.127.69           agent lost, see 'juju show-status-log lhbcdf-lcdfilter-b/0'

Machine  State  DNS            Inst id          Series  AZ  Message
260      down   10.44.127.190  juju-ed3163-260  trusty      Running
269      down   10.44.127.69   juju-ed3163-269  trusty      Running
272      down   10.44.127.118  juju-ed3163-272  trusty      Running
277      down   10.44.127.128  juju-ed3163-277  trusty      Running
278      down   10.44.127.236  juju-ed3163-278  trusty      Running
282      down   10.44.127.61   juju-ed3163-282  trusty      Running
283      down   10.44.127.228  juju-ed3163-283  trusty      Running

# Example: juju machine-remove 260
juju machine-remove ${"machine" whose ip is related to juju_instance_id} --force

# Start Juju in the juju controller
root@VCA:~# lxc exec juju-f050fc-0 bash
# Cancel if needed or run in background
root@juju-f050fc-0:~# /var/lib/juju/init/jujud-machine-0/exec-start.sh &
^C
# Verify that the process is running
root@juju-f050fc-0:~# sudo netstat -apen | grep 17070
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 127.0.0.1:58402         127.0.0.1:17070         ESTABLISHED 0          22936276    359/jujud
tcp        0      0 10.44.127.136:56746     10.44.127.136:17070     ESTABLISHED 0          22936277    359/jujud
tcp        0      0 10.44.127.136:56740     10.44.127.136:17070     ESTABLISHED 0          22940759    359/jujud
tcp        0      0 127.0.0.1:58432         127.0.0.1:17070         ESTABLISHED 0          22940767    359/jujud
tcp6       0      0 :::17070                :::*                    LISTEN      0          22940744    359/jujud
tcp6       0      0 127.0.0.1:17070         127.0.0.1:58432         ESTABLISHED 0          22930280    359/jujud
tcp6       0      0 10.44.127.136:17070     10.44.127.136:56740     ESTABLISHED 0          22939104    359/jujud
tcp6       0      0 10.44.127.136:17070     10.44.127.136:56746     ESTABLISHED 0          22936278    359/jujud
tcp6       0      0 127.0.0.1:17070         127.0.0.1:58402         ESTABLISHED 0          22940756    359/jujud

# Go to SO-ub container and restart SO service to connect again against the Juju controller
$ lxc exec SO-ub bash
root@SO-ub:~# service launchpad restart

After this process, access the OSM dashboard and check again the connectivity from the "Accounts" tab. It should be green, and now any new NS instantiated should correctly load its associated charm.

"Instantiation failed" and VMs and network were not created at VIM

Q. After trying to instantiate, I got the message that the instantiation failed without much information about the reason. I connected to the VIM and checked that the VMs are networks were not created

A. You are hitting a SO-RO timeout, caused either by the lack of communication from the RO to the VIM or because the creation of VMs and networks from the RO to the VIMs takes too long.

SO connection error: not possible to contact OPENMANO-SERVER (openmanod)

Q. The NS operational data of an instantiated NS in the SO CLI shows "Connection error: not possible to contact OPENMANO-SERVER (openmanod)"

A. Please check connectivity from SO-ub to RO container. Can you ping the RO IP address (configured in SO) from SO-ub container? If not, then make sure that osm-ro service is up and running on the RO container.

$ lxc exec RO -- bash
root@SO-ub:~# service osm-ro status
root@SO-ub:~# OPENMANO_TENANT=osm openmano datacenter-list

Deployment fails with the error message "Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)"

Q. During instantiation, I got, the following error message: Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)".

A. The cause is that Openstack has not been properly added to openmano with the right credentials

Go to the RO container:

lxc exec RO bash

Install the package python-openstackclient, in case it does not exist:

apt install -y python-openstackclient

Execute the following commands with the appropriate substitution to check that your openstack is reachable and you can do specific actions:

openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug network list 
openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug host list 
openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug flavor list 
openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug server list 
#providing the same URL, credentials as you provide to openmano. The (--debug) will show more info so that you will see the IP/ports it try to access

This is to ensure that you have access to the openstack endpoints and that you are using the right credentials.

Case 1. If any of the previous commands do not work, then either your connectivity to Openstack endpoints is incorrect or you are not using the right parameters. Please check internally and debug until the previous commands work. Check also the guidelines here: OSM_Release_TWO#Openstack_site.

Case 2. If all of them worked, then follow these guidelines:

  • Use "v2" authorization URL. "v3" is currently experimental in the master branch and is not recommended yet.
  • If https (instead of http) is used for authorization URL, you can either use the insecure option at datacenter-create (See Openstack_configuration_(Release_TWO)#Add_openstack_at_OSM); or install the certificate at RO container, by e.g. putting a .crt (not .pem) certificate at /usr/local/share/ca-certificates and running update-ca-certificates.
  • Check the parameters you used to create and attached the datacenter by running the following commands
export OPENMANO_TENANT=osm
openmano datacenter-list
openmano datacenter-list <DATACENTER_NAME> -vvv
  • If all seems right, maybe the password was wrong. Try to detach and delete the datacenter, and then create and attach it again with the right password.
openmano datacenter-detach openstack-site
openmano datacenter-delete openstack-site
openmano datacenter-create openstack-site http://10.10.10.11:5000/v2.0 --type openstack --description "OpenStack site"
openmano datacenter-attach openstack-site --user=admin --password=userpwd --vim-tenant-name=admin

Deployment fails with the error message "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)"

Follow the same instructions as for the error "Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)"

OSM RO service fails to start with a message "DATABASE wrong version"

Q. OSM RO service (osm-ro service in RO container) fails to start and logs show "DATABASE wrong version"

2016-11-02T17:19:51 CRITICAL  openmano openmanod:268 DATABASE wrong version '0.15'.
Try to upgrade/downgrade to version '0.16' with 'database_utils/migrate_mano_db.sh'

A. The reason is that the RO has been upgraded with a new version that requires a new database version. To upgrade the database version, run database_utils/migrate_mano_db.sh and provide credentials if needed (by default database user is 'mano', and database password is 'manopw')

Deployment fails with the message Error creating image at VIM 'xxxx': Cannot create image without location

The reason of the failure is that ther is a mismatch between image names (and checksum) at OSM VNFD and at VIM. Basically the image is not present at VIM.

To fix it, you should add the image at VIM and ensure that it is visible for the VIM credentials provided to RO. At RO container you can list the VIM images using these credentialas easily with the command:

openmano vim-image-list --datacenter <xxxxx>