Technical FAQ (Release THREE): Difference between revisions
Fernandezca (talk | contribs) (Solution to fully loaded VCA containers where also Juju controller ceased to run) |
Fernandezca (talk | contribs) m (Mispelled command) |
||
Line 106: | Line 106: | ||
283 down 10.44.127.228 juju-ed3163-283 trusty Running | 283 down 10.44.127.228 juju-ed3163-283 trusty Running | ||
# Example: juju machine | # Example: juju remove-machine 260 | ||
juju machine | juju remove-machine ${"machine" whose ip is related to juju_instance_id} --force | ||
# Start Juju in the juju controller | # Start Juju in the juju controller |
Revision as of 13:27, 10 May 2018
"Instantiation failed", but VMs and networks were successfully created
Q. After trying to instantiate, I got the message that the instantiation failed without much information about the reason. After checking the logs, it seems to be a timeout issue. However, I am seeing that the VMs and networks were created at the VIM.
A. First check in the RO that there is an IP address in the management interface of each VNF of the NS.
lxc exec RO --env OPENMANO_TENANT=osm openmano instance-scenario-list # to identify the running scenarios in the RO lxc exec RO --env OPENMANO_TENANT=osm openmano instance-scenario-list <id> -vvv |grep ip # to get verbose information on a specific scenario in the RO
If no IP address is present in the management interface of each VNF, then you are hitting a SO-RO timeout issue. The reason is typically a wrong configuration of the VIM. The way how management IP addresses are assigned to the VNFs change from one VIM to another. In all the cases, the recommendation is the following:
- Pre-provision a management network in the VIM, with DHCP enabled. You can see, for instance, the instructions in the case of Openstack (https://osm.etsi.org/wikipub/index.php/Openstack_configuration_(Release_TWO) ).
- Then make sure that, at instantiation time, you specify a mapping between the management network in th NS and the VIM network name that you pre-provisioned at the VIM.
If the IP address is present in the management interface, then you are probably hitting a SO-VCA timeout, caused because the VNF configuration via Juju charms takes too long. To confirm, connect to VCA container and check "juju status".
lxc exec VCA -- juju status
Then, if you see an error, you should debug the VNF charm or ask the people providing that VNF package.
"cannot load cookies: file locked for too long" where charms are not loaded
Depending on the remainder of the error message, this most likely means that a condition on the server hosting OSM is not allowing the charm to be loaded.
For instance, the following error indicates that the VCA container is full and cannot host new containers for Juju: "Cannot load charms due to "ERROR cannot load cookies: file locked for too long; giving up: cannot acquire lock: open /root/.local/share/juju/cookies/osm.json.lock: no space left on device".
To solve that, the containers and data created through Juju should be removed. Check the connection to the OSM Juju config agent account. Is it red/unavailable? Check if the service is running on port 17070, inside the Juju controller container that runs inside the VCA container; otherwise restore it.
# Access VCA container $ lxc exec VCA bash # Check the Juju status. The command may get stalled if the service is not running in the Juju controller root@VCA:~# juju status # Check if Juju is running on the Juju controller # First, check the name for the LXC with the Juju controller # In this case, juju_controller_instance_id = 10.44.127.136 root@VCA:~# lxc list | grep "${juju_controller_instance_id}" | juju-f050fc-0 | RUNNING | 10.44.127.136 (eth0) | | PERSISTENT | 0 | # Check if the service for the agent account is running. It is not running in this case root@VCA:~# lxc exec juju-f050fc-0 -- netstat -apen | grep 17070 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) # Check if disk is completely filled in VCA root@VCA:~# df -h # If it is, remove some LXC beloging to the failed machines # Note: keep the Juju controller container! (here, the last row) # The IP is available on the configuration section, under "Accounts" in the OSM dashboard root@VCA:~# lxc list +-----------------+---------+----------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-260 | RUNNING | 10.44.127.190 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-269 | RUNNING | 10.44.127.69 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-272 | RUNNING | 10.44.127.118 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-277 | RUNNING | 10.44.127.128 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-278 | RUNNING | 10.44.127.236 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-282 | RUNNING | 10.44.127.61 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-ed3163-283 | RUNNING | 10.44.127.228 (eth0) | | PERSISTENT | 0 | +-----------------+---------+----------------------+------+------------+-----------+ | juju-f050fc-0 | RUNNING | 10.44.127.136 (eth0) | | PERSISTENT | 0 | # <- Do not remove the Juju controller! +-----------------+---------+----------------------+------+------------+-----------+ # Example: lxc stop juju-ed3163-260; lxc delete juju-ed3163-260 lxc stop ${name}; lxc delete ${name} # Clean status in Juju by removing the machines, units and apps that whose LXCs were removed before root@VCA:~# juju status Model Controller Cloud/Region Version SLA default osm localhost/localhost 2.2.2 unsupported App Version Status Scale Charm Store Rev OS Notes flhf-testcf-flhfilter-b active 0/1 fl7filter local 20 ubuntu flhf-testd-flhfilter-b active 0/1 fl7filter local 22 ubuntu flhf-testdc-flhfilter-b active 0/1 fl7filter local 25 ubuntu flhf-va-flhfilter-b active 0/1 fl7filter local 21 ubuntu ids-test-ac-ids-b active 0/1 ids local 1 ubuntu lala-dpi-b maintenance 0/1 dpi local 4 ubuntu lhbcdf-lcdfilter-b maintenance 0/1 l23filter local 20 ubuntu Unit Workload Agent Machine Public address Ports Message flhf-testcf-flhfilter-b/0 unknown lost 260 10.44.127.190 agent lost, see 'juju show-status-log flhf-testcf-flhfilter-b/0' flhf-testd-flhfilter-b/1 unknown lost 278 10.44.127.236 agent lost, see 'juju show-status-log flhf-testd-flhfilter-b/1' flhf-testdc-flhfilter-b/2 unknown lost 282 10.44.127.61 agent lost, see 'juju show-status-log flhf-testdc-flhfilter-b/2' flhf-va-flhfilter-b/0 unknown lost 272 10.44.127.118 agent lost, see 'juju show-status-log flhf-va-flhfilter-b/0' ids-test-ac-ids-b/0 unknown lost 277 10.44.127.128 agent lost, see 'juju show-status-log ids-test-ac-ids-b/0' lala-dpi-b/1 maintenance failed 283 10.44.127.228 installing charm software lhbcdf-lcdfilter-b/0 unknown lost 269 10.44.127.69 agent lost, see 'juju show-status-log lhbcdf-lcdfilter-b/0' Machine State DNS Inst id Series AZ Message 260 down 10.44.127.190 juju-ed3163-260 trusty Running 269 down 10.44.127.69 juju-ed3163-269 trusty Running 272 down 10.44.127.118 juju-ed3163-272 trusty Running 277 down 10.44.127.128 juju-ed3163-277 trusty Running 278 down 10.44.127.236 juju-ed3163-278 trusty Running 282 down 10.44.127.61 juju-ed3163-282 trusty Running 283 down 10.44.127.228 juju-ed3163-283 trusty Running # Example: juju remove-machine 260 juju remove-machine ${"machine" whose ip is related to juju_instance_id} --force # Start Juju in the juju controller root@VCA:~# lxc exec juju-f050fc-0 bash # Cancel if needed or run in background root@juju-f050fc-0:~# /var/lib/juju/init/jujud-machine-0/exec-start.sh & ^C # Verify that the process is running root@juju-f050fc-0:~# sudo netstat -apen | grep 17070 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 127.0.0.1:58402 127.0.0.1:17070 ESTABLISHED 0 22936276 359/jujud tcp 0 0 10.44.127.136:56746 10.44.127.136:17070 ESTABLISHED 0 22936277 359/jujud tcp 0 0 10.44.127.136:56740 10.44.127.136:17070 ESTABLISHED 0 22940759 359/jujud tcp 0 0 127.0.0.1:58432 127.0.0.1:17070 ESTABLISHED 0 22940767 359/jujud tcp6 0 0 :::17070 :::* LISTEN 0 22940744 359/jujud tcp6 0 0 127.0.0.1:17070 127.0.0.1:58432 ESTABLISHED 0 22930280 359/jujud tcp6 0 0 10.44.127.136:17070 10.44.127.136:56740 ESTABLISHED 0 22939104 359/jujud tcp6 0 0 10.44.127.136:17070 10.44.127.136:56746 ESTABLISHED 0 22936278 359/jujud tcp6 0 0 127.0.0.1:17070 127.0.0.1:58402 ESTABLISHED 0 22940756 359/jujud # Go to SO-ub container and restart SO service to connect again against the Juju controller $ lxc exec SO-ub bash root@SO-ub:~# service launchpad restart
After this process, access the OSM dashboard and check again the connectivity from the "Accounts" tab. It should be green, and now any new NS instantiated should correctly load its associated charm.
"Instantiation failed" and VMs and network were not created at VIM
Q. After trying to instantiate, I got the message that the instantiation failed without much information about the reason. I connected to the VIM and checked that the VMs are networks were not created
A. You are hitting a SO-RO timeout, caused either by the lack of communication from the RO to the VIM or because the creation of VMs and networks from the RO to the VIMs takes too long.
SO connection error: not possible to contact OPENMANO-SERVER (openmanod)
Q. The NS operational data of an instantiated NS in the SO CLI shows "Connection error: not possible to contact OPENMANO-SERVER (openmanod)"
A. Please check connectivity from SO-ub to RO container. Can you ping the RO IP address (configured in SO) from SO-ub container? If not, then make sure that osm-ro service is up and running on the RO container.
$ lxc exec RO -- bash root@SO-ub:~# service osm-ro status root@SO-ub:~# OPENMANO_TENANT=osm openmano datacenter-list
Deployment fails with the error message "Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)"
Q. During instantiation, I got, the following error message: Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)".
A. The cause is that Openstack has not been properly added to openmano with the right credentials
Go to the RO container:
lxc exec RO bash
Install the package python-openstackclient, in case it does not exist:
apt install -y python-openstackclient
Execute the following commands with the appropriate substitution to check that your openstack is reachable and you can do specific actions:
openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug network list openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug host list openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug flavor list openstack --os-project-name <project-name> --os-auth-url <auth-url> --os-username <auth-username> --os-password <auth-password> --debug server list #providing the same URL, credentials as you provide to openmano. The (--debug) will show more info so that you will see the IP/ports it try to access
This is to ensure that you have access to the openstack endpoints and that you are using the right credentials.
Case 1. If any of the previous commands do not work, then either your connectivity to Openstack endpoints is incorrect or you are not using the right parameters. Please check internally and debug until the previous commands work. Check also the guidelines here: OSM_Release_TWO#Openstack_site.
Case 2. If all of them worked, then follow these guidelines:
- Use "v2" authorization URL. "v3" is currently experimental in the master branch and is not recommended yet.
- If https (instead of http) is used for authorization URL, you can either use the insecure option at datacenter-create (See Openstack_configuration_(Release_TWO)#Add_openstack_at_OSM); or install the certificate at RO container, by e.g. putting a .crt (not .pem) certificate at /usr/local/share/ca-certificates and running update-ca-certificates.
- Check the parameters you used to create and attached the datacenter by running the following commands
export OPENMANO_TENANT=osm openmano datacenter-list openmano datacenter-list <DATACENTER_NAME> -vvv
- If all seems right, maybe the password was wrong. Try to detach and delete the datacenter, and then create and attach it again with the right password.
openmano datacenter-detach openstack-site openmano datacenter-delete openstack-site openmano datacenter-create openstack-site http://10.10.10.11:5000/v2.0 --type openstack --description "OpenStack site" openmano datacenter-attach openstack-site --user=admin --password=userpwd --vim-tenant-name=admin
Deployment fails with the error message "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)"
Follow the same instructions as for the error "Not possible to get_networks_list from VIM: AuthorizationFailure: Authorization Failed: The resource could not be found. (HTTPS: 404)"
OSM RO service fails to start with a message "DATABASE wrong version"
Q. OSM RO service (osm-ro service in RO container) fails to start and logs show "DATABASE wrong version"
2016-11-02T17:19:51 CRITICAL openmano openmanod:268 DATABASE wrong version '0.15'. Try to upgrade/downgrade to version '0.16' with 'database_utils/migrate_mano_db.sh'
A. The reason is that the RO has been upgraded with a new version that requires a new database version. To upgrade the database version, run database_utils/migrate_mano_db.sh and provide credentials if needed (by default database user is 'mano', and database password is 'manopw')
Deployment fails with the message Error creating image at VIM 'xxxx': Cannot create image without location
The reason of the failure is that ther is a mismatch between image names (and checksum) at OSM VNFD and at VIM. Basically the image is not present at VIM.
To fix it, you should add the image at VIM and ensure that it is visible for the VIM credentials provided to RO. At RO container you can list the VIM images using these credentialas easily with the command:
openmano vim-image-list --datacenter <xxxxx>