Common issues and troubleshooting: Difference between revisions
Ramonsalguer (talk | contribs) |
Ramonsalguer (talk | contribs) No edit summary |
||
(40 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
'''THIS PAGE IS DEPRECATED'''. OSM User Guide has been moved to a new location: '''https://osm.etsi.org/docs/user-guide/''' | |||
= Installation | --- | ||
__TOC__ | |||
== Installation== | |||
'''RECOMMENDATION:''' save a log of your installation: | '''RECOMMENDATION:''' save a log of your installation: | ||
$ ./install_osm.sh 2>&1 | tee osm_install_log.txt | $ ./install_osm.sh 2>&1 | tee osm_install_log.txt | ||
'''Were all docker images | === Add User in Group === | ||
Add the non-root user used for installation in ''sudo , lxd, docker'' groups | |||
This will skip below error :- | |||
''Finished installation of juju | |||
Password: | |||
'''sg: failed to crypt password with previous salt: Invalid argument | |||
ERROR No controllers registered.''''' | |||
===Docker=== | |||
====Were all docker images successfully built?==== | |||
Although controlled by the installer, you can check that the following images exist: | Although controlled by the installer, you can check that the following images exist: | ||
Line 23: | Line 40: | ||
wurstmeister/zookeeper latest 351aa00d2fe9 18 months ago 478MB | wurstmeister/zookeeper latest 351aa00d2fe9 18 months ago 478MB | ||
====Are all processes/services running?==== | |||
$ docker stack ps osm |grep -i running | $ docker stack ps osm |grep -i running | ||
Line 44: | Line 62: | ||
5bl73dhj1xl0 osm_zookeeper replicated 1/1 wurstmeister/zookeeper:latest | 5bl73dhj1xl0 osm_zookeeper replicated 1/1 wurstmeister/zookeeper:latest | ||
====Docker image failed to build==== | |||
=====Err:1 http://archive.ubuntu.com/ubuntu xenial InRelease===== | |||
In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use. | |||
Check if the following works: | |||
docker run busybox nslookup archive.ubuntu.com | |||
If it does not work, you have to configure Docker to use the available DNS. | |||
# Get the IP address you’re using for DNS: | |||
nmcli dev show | grep 'IP4.DNS' | |||
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step: | |||
{ | |||
"dns": ["192.168.24.10"] | |||
} | |||
# Restart docker | |||
sudo service docker restart | |||
# Re-run | |||
docker run busybox nslookup archive.ubuntu.com | |||
# Now you should be able to re-run the installer and move past the DNS issue. | |||
=====TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'===== | |||
In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the docker_gwbridge interface. | |||
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step: | |||
{ | |||
"mtu": 1458 | |||
} | |||
# Restart docker | |||
sudo service docker restart | |||
====Problem deploying stack osm==== | |||
=====network netosm could not be found===== | |||
'''Is Juju | The error is ''network "netosm" is declared as external, but could not be found. You need to create a swarm-scoped network before the stack is deployed'' | ||
It usually happens because a "docker system prune" is done with the stack stopped. The following script will create it: | |||
#!/bin/bash | |||
# Create OSM Docker Network ... | |||
[ -z "$OSM_STACK_NAME" ] && OSM_STACK_NAME=osm | |||
OSM_NETWORK_NAME=net${OSM_STACK_NAME} | |||
echo Creating OSM Docker Network | |||
DEFAULT_INTERFACE=$(route -n | awk '$1~/^0.0.0.0/ {print $8}') | |||
DEFAULT_MTU=$(ip addr show $DEFAULT_INTERFACE | perl -ne 'if (/mtu\s(\d+)/) {print $1;}') | |||
echo \# OSM_STACK_NAME = $OSM_STACK_NAME | |||
echo \# OSM_NETWORK_NAME = $OSM_NETWORK_NAME | |||
echo \# DEFAULT_INTERFACE = $DEFAULT_INTERFACE | |||
echo \# DEFAULT_MTU = $DEFAULT_MTU | |||
sg docker -c "docker network create --driver=overlay --attachable \ | |||
--opt com.docker.network.driver.mtu=${DEFAULT_MTU} \ | |||
${OSM_NETWORK_NAME}" | |||
===Juju=== | |||
====Bootstrap hangs==== | |||
If the Juju bootstrap takes a long time, stuck at this status... | |||
Installing Juju agent on bootstrap instance | |||
Fetching Juju GUI 2.14.0 | |||
Waiting for address | |||
Attempting to connect to 10.71.22.78:22 | |||
Connected to 10.71.22.78 | |||
Running machine configuration script... | |||
...it usually indicates that the LXD container with the Juju controller is having trouble connecting to the internet. | |||
Get the name of the LXD container. It will begin with 'juju-' and end with '-0'. | |||
lxc list | |||
+-----------------+---------+---------------------+------+------------+-----------+ | |||
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | | |||
+-----------------+---------+---------------------+------+------------+-----------+ | |||
| juju-0383f2-0 | RUNNING | 10.195.8.57 (eth0) | | PERSISTENT | | | |||
+-----------------+---------+---------------------+------+------------+-----------+ | |||
Next, tail the output of cloud-init to see where the bootstrap is stuck. | |||
lxc exec juju-0383f2-0 -- tail -f /var/log/cloud-init-output.log | |||
====Is Juju running?==== | |||
If running, you should see something like this: | If running, you should see something like this: | ||
Line 54: | Line 155: | ||
default osm localhost/localhost 2.3.7 unsupported | default osm localhost/localhost 2.3.7 unsupported | ||
'''Is the VIM URL reachable and operational? | ====ERROR controller osm already exists==== | ||
Did OSM installation fail during juju installation with an error like "ERROR controller osm already exists" ? | |||
$ ./install_osm.sh | |||
... | |||
ERROR controller "osm" already exists | |||
ERROR try was stopped | |||
### Jum Agu 24 15:19:33 WIB 2018 install_juju: FATAL error: Juju installation failed | |||
BACKTRACE: | |||
### FATAL /usr/share/osm-devops/jenkins/common/logging 39 | |||
### install_juju /usr/share/osm-devops/installers/full_install_osm.sh 564 | |||
### install_lightweight /usr/share/osm-devops/installers/full_install_osm.sh 741 | |||
### main /usr/share/osm-devops/installers/full_install_osm.sh 1033 | |||
Try to destroy the Juju controller and run the installation again: | |||
$ juju destroy-controller osm --destroy-all-models -y | |||
$ ./install_osm.sh | |||
If it does not work, you can destroy Juju container and run the installation again | |||
#Destroy the Juju container | |||
lxc stop juju-* | |||
lxc delete juju-* | |||
#Unregister the controller since we’ve manually freed the resources associated with it | |||
juju unregister -y osm | |||
#Verify that there are no controllers | |||
juju list-controllers | |||
#Run the installation again | |||
./install_osm.sh | |||
===LXD=== | |||
====ERROR profile default: /etc/default/lxd-bridge has IPv6 enabled==== | |||
Make sure that you follow the instructions in the Quickstart. | |||
When asked if you want to proceed with the installation and configuration of LXD, juju, docker CE and the initialization of a local docker swarm, as pre-requirements, Please answer "y". | |||
When dialog messages related to LXD configuration are shown, please answer in the following way: | |||
* Do you want to configure the LXD bridge? Yes | |||
* Do you want to setup an IPv4 subnet? Yes | |||
* << Default values apply for next questions >> | |||
* '''Do you want to setup an IPv6 subnet? No''' | |||
==Configuration== | |||
===VIMs=== | |||
====Is the VIM URL reachable and operational?==== | |||
When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services: | When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services: | ||
Line 70: | Line 220: | ||
$ openstack image list | $ openstack image list | ||
'''What | If the openstack client works, then make sure that you can reach the VIM from the RO docker: | ||
$ docker exec -it osm_ro.1.xxxxx bash | |||
$ curl <URL_CONTROLLER> | |||
''In some cases, the errors come from the fact that the VIM was added to OSM using names in the URL that are not Fully Qualified Domain Names (FQDN).'' | |||
When adding a VIM to OSM, you must use always FQDN or the IP addresses. It must be noted that “controller” or similar names are not proper FQDN (the suffix should be added). Non-FQDN names might be understood by docker’s dnsmasq as a docker container name to be resolved, which is not the case. In addition, all the VIM endpoints should also be FQDN or IP addresses, thus guaranteeing that all subsequent API calls can reach the appropriate endpoint. | |||
Think of an NFV infrastructure with tens of VIMs, first you will have to use different names for each controller (controller1, controller2, etc.), then you will have to add to every machine trying to interact with the different VIMs, not only OSM, all those entries in the /etc/hosts file. This is bad practice. | |||
However, it is useful to have a mean to work with lab environments using non-FQDN names. Three options here. Probably you are looking for the third one, but we recommend the first one: | |||
* Option 1. Change the admin URL and/or public URL of the endpoints to use an IP address or an FQDN. You might find this interesting if you want to bring your Openstack setup to production. | |||
* Option 2. Modify /etc/hosts in the docker RO container. This is not persistent after reboots or restarts of the osm docker stack. | |||
* Option 3. Modify /etc/osm/docker/docker-compose.yaml in the host, adding extra_hosts in the ro section with the entries that you want to add to /etc/hosts in the RO docker: | |||
ro: | |||
extra_hosts: | |||
controller: 1.2.3.4 | |||
Then restart the stack: | |||
docker stack rm osm | |||
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm | |||
This is persistent after reboots and restarts of the osm docker stack. | |||
====Authentication==== | |||
'''What should I check if the VIM authentication is failing?''' | |||
Typically, you will get the following error messsage: | Typically, you will get the following error messsage: | ||
Line 76: | Line 249: | ||
Error: "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)" | Error: "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)" | ||
If your OpenStack URL is based on HTTPS, OSM will check by default the authenticity of your VIM using the appropriate public certificate. The recommended way to solve this is by modifying /etc/osm/docker/docker-compose.yaml in the host, sharing the host file (e.g. /home/ubuntu/cafile.crt) by adding a volume to the ro section as follows: | |||
ro: | |||
... | |||
volumes: | |||
- /home/ubuntu/cafile.crt:/etc/osm/cafile.crt | |||
Then, when creating the VIM, you should use the config option "ca_cert" as follows: | |||
$ # Create the VIM with all the usual options, and add the config option to specify the certificate | $ # Create the VIM with all the usual options, and add the config option to specify the certificate | ||
$ osm vim-create VIM-NAME ... --config '{ca_cert: /etc/osm/ | $ osm vim-create VIM-NAME ... --config '{ca_cert: /etc/osm/cafile.crt}' | ||
For casual testing, when adding the VIM account to OSM, you can use 'insecure: True' (without quotes) as part of the VIM config parameters: | |||
$ osm vim-create VIM-NAME ... --config '{insecure: True}' | $ osm vim-create VIM-NAME ... --config '{insecure: True}' | ||
'''Is the VIM management network reachable from OSM (e.g. via ssh, port 22)''' | '''Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?''' | ||
The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host. | The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host. | ||
Line 93: | Line 271: | ||
* Security group policy in your VIM is blocking your traffic (contact your admin to fix it) | * Security group policy in your VIM is blocking your traffic (contact your admin to fix it) | ||
* IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs). | * IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs). | ||
==Operational issues== | |||
===Running out of disk space=== | |||
If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup: | |||
docker system prune | |||
docker image prune | |||
If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter) | |||
du -sk /var/lib/docker/containers/* |sort -n | |||
docker ps |grep <CONTAINER_ID> | |||
Then, remove the stack and redeploy it again after doing a prune: | |||
docker stack rm osm_metrics | |||
docker system prune | |||
docker image prune | |||
docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics | |||
===VCA (juju)=== | |||
====Status is not coherent with running NS==== | |||
In extraordinary situations, the output of "juju status" could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below: | |||
juju status -m <NS_ID> | |||
juju remove-application -m <NS_ID> <application> | |||
juju resolved -m <NS_ID> <unit> --no-retry # You'll likely have to run it several times, as it will probably have an error in the next queued hook.Once the last hook is marked resolved, the charm will continue its removal | |||
The following page also shows [https://docs.jujucharms.com/2.1/en/charms-destroy how to remove different Juju objects] | |||
====Dump Juju Logs ==== | |||
To dump the Juju debug-logs, run this command: | |||
juju debug-log --replay --no-tail > juju-debug.log | |||
juju debug-log --replay --no-tail -m <NS_ID> | |||
juju debug-log --replay --no-tail -m <NS_ID> --include <UNIT> | |||
====Manual recovery of Juju==== | |||
If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller. | |||
<nowiki> | |||
# Stop and delete all juju containers, then unregister the controller | |||
lxc list | |||
lxc stop juju-* #replace "*" by the right values | |||
lxc delete juju-* #replace "*" by the right values | |||
juju unregister -y osm | |||
# Create the controller again | |||
sg lxd -c "juju bootstrap --bootstrap-series=xenial localhost osm" | |||
# Get controller IP and update it in relevant OSM env files | |||
controller_ip=$(juju show-controller osm|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}') | |||
sudo sed -i 's/^OSMMON_VCA_HOST.*$/OSMMON_VCA_HOST='$controller_ip'/' /etc/osm/docker/mon.env | |||
sudo sed -i 's/^OSMLCM_VCA_HOST.*$/OSMLCM_VCA_HOST='$controller_ip'/' /etc/osm/docker/lcm.env | |||
#Get juju password and feed it to OSM env files | |||
function parse_juju_password { | |||
password_file="${HOME}/.local/share/juju/accounts.yaml" | |||
local controller_name=$1 | |||
local s='[[:space:]]*' w='[a-zA-Z0-9_-]*' fs=$(echo @|tr @ '\034') | |||
sed -ne "s|^\($s\):|\1|" \ | |||
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \ | |||
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $password_file | | |||
awk -F$fs -v controller=$controller_name '{ | |||
indent = length($1)/2; | |||
vname[indent] = $2; | |||
for (i in vname) {if (i > indent) {delete vname[i]}} | |||
if (length($3) > 0) { | |||
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")} | |||
if (match(vn,controller) && match($2,"password")) { | |||
printf("%s",$3); | |||
} | |||
} | |||
}' | |||
} | |||
juju_password=$(parse_juju_password osm) | |||
sudo sed -i 's/^OSMMON_VCA_SECRET.*$/OSMMON_VCA_SECRET='$juju_password'/' /etc/osm/docker/mon.env | |||
sudo sed -i 's/^OSMLCM_VCA_SECRET.*$/OSMLCM_VCA_SECRET='$juju_password'/' /etc/osm/docker/lcm.env | |||
juju_pubkey=$(cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub) | |||
sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/mon.env | |||
sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/lcm.env | |||
#Restart OSM stack | |||
docker stack rm osm | |||
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm | |||
#Reset iptable rules | |||
#Delete all rules listed with this command: | |||
sudo iptables -t nat -L | grep 17070 | |||
sudo iptables -t nat -A PREROUTING -p tcp -m tcp -d <source> --dport 17070 -j DNAT --to-destination <destination> | |||
#Create new iptable rule | |||
OSM_VCA_HOST=`sg lxd -c "juju show-controller osm"|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}'` | |||
DEFAULT_IF=`route -n |awk '$1~/^0.0.0.0/ {print $8}'` | |||
DEFAULT_IP=`ip -o -4 a |grep ${DEFAULT_IF}|awk '{split($4,a,"/"); print a[1]}'` | |||
sudo iptables -t nat -A PREROUTING -p tcp -m tcp -d $DEFAULT_IP --dport 17070 -j DNAT --to-destination $OSM_VCA_HOST | |||
</nowiki> | |||
====Slow deployment of charms==== | |||
You can make deplyment of charms quicker by: | |||
* Upgrading your LXD installation to use ZFS: [[LXD configuration for OSM Release FIVE]] | |||
** After LXD re-installation, you might need to reinstall the juju controller: [[Common_issues_and_troubleshooting#Manual_recovery_of_Juju|Reinstall Juju controller]] | |||
* Preventing Juju from running apt-get update && apt-get upgrade when starting a machine: [[Advanced_Charm_Development#Disable_OS_upgrades|Disable OS upgrades in charms]] | |||
* Building periodically a custom image that will be used as base image for all the charms: [[Advanced_Charm_Development#Build_a_custom_cloud_image|Custom base image for charms]] | |||
===Instantiation Errors=== | |||
====File juju_id_rsa.pub not found==== | |||
* '''ERROR''': ERROR creating VCA model name 'xxxx': Traceback (most recent call last): File "/usr/lib/python3/dist-packages/osm_lcm/ns.py", line 822, in instantiate await ... [Errno 2] No such file or directory: '/root/.local/share/juju/ssh/juju_id_rsa.pub' | |||
* '''CAUSE''': Normally a migration from release FIVE do not set properly the env for LCM | |||
* '''SOLUTION''': Ensure variable '''OSMLCM_VCA_PUBKEY''' is properly set at file ''/etc/osm/docker/lcm.env''. The value must mutch with the output of this command ''cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub''. If not, add or change it. Restart OSM, or just LCM service with ''docker service update osm_lcm --force --env-add OSMLCM_VCA_PUBKEY="<value>" '' | |||
===NBI Errors=== | |||
====Cannot login after migration to 6.0.2==== | |||
* '''ERROR''': NBI always return "UNAUTHORIZED" Cannot login neither with UI nor with CLI. CLI shows error "can't find a default project for this user" or "project admin not allowed for this user". | |||
* '''CAUSE''': Normally after a migration to release 6.0.2 There is a slight incompatibility with users creared from older versions | |||
* '''SOLUTION''': Delete user admin and reboot NBI so that a new compatible user is created by running these commands: | |||
curl --insecure https://localhost:9999/osm/test/db-clear/users | |||
docker service update osm_nbi --force | |||
==Checking the logs== | |||
You can check the logs of any container with the following commands: | |||
docker logs $(docker ps -aqf "name=osm_mon" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_pol" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_lcm" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_nbi" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_light-ui" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_ro.1" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_ro-db" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_mongo" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_kafka" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_zookeeper" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_keystone.1" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_keystone-db" -n 1) | |||
docker logs $(docker ps -aqf "name=osm_prometheus" -n 1) | |||
For each container, logs can be found under: | |||
/var/lib/docker/containers/DOCKER_ID/DOCKER_ID-json.log | |||
And the DOCKER_ID can be obtained this way, e.g. for MON | |||
docker ps -aqf "name=osm_mon" -n 1 --no-trunc |
Latest revision as of 17:19, 17 February 2021
THIS PAGE IS DEPRECATED. OSM User Guide has been moved to a new location: https://osm.etsi.org/docs/user-guide/
---
Installation
RECOMMENDATION: save a log of your installation:
$ ./install_osm.sh 2>&1 | tee osm_install_log.txt
Add User in Group
Add the non-root user used for installation in sudo , lxd, docker groups
This will skip below error :-
Finished installation of juju Password: sg: failed to crypt password with previous salt: Invalid argument ERROR No controllers registered.
Docker
Were all docker images successfully built?
Although controlled by the installer, you can check that the following images exist:
$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE osm/light-ui latest 1988aa262a97 18 hours ago 710MB osm/lcm latest c9ad59bf96aa 46 hours ago 667MB osm/ro latest 812c987fcb16 46 hours ago 791MB osm/nbi latest 584b4e0084a7 46 hours ago 497MB osm/pm latest 1ad1e4099f52 46 hours ago 462MB osm/mon latest b17efa3412e3 46 hours ago 725MB wurstmeister/kafka latest 7cfc4e57966c 10 days ago 293MB mysql 5 0d16d0a97dd1 2 weeks ago 372MB mongo latest 14c497d5c758 3 weeks ago 366MB wurstmeister/zookeeper latest 351aa00d2fe9 18 months ago 478MB
Are all processes/services running?
$ docker stack ps osm |grep -i running
10 docker containers should be running.
All the 10 services should have at least 1 replica: 1/1
$ docker service ls ID NAME MODE REPLICAS IMAGE PORTS yuyiqh8ty8pv osm_kafka replicated 1/1 wurstmeister/kafka:latest *:9092->9092/tcp y585906h5vy5 osm_lcm replicated 1/1 osm/lcm:latest pcdi5vb86nt9 osm_light-ui replicated 1/1 osm/light-ui:latest *:80->80/tcp i56jhl5k6re4 osm_mon replicated 1/1 osm/mon:latest *:8662->8662/tcp p5wyjtne93hp osm_mongo replicated 1/1 mongo:latest iz5uncfdzu23 osm_nbi replicated 1/1 osm/nbi:latest *:9999->9999/tcp 4ttw2v4z2g57 osm_pm replicated 1/1 osm/pm:latest xbg6bclp2anw osm_ro replicated 1/1 osm/ro:latest *:9090->9090/tcp sf7rayfolncu osm_ro-db replicated 1/1 mysql:5 5bl73dhj1xl0 osm_zookeeper replicated 1/1 wurstmeister/zookeeper:latest
Docker image failed to build
Err:1 http://archive.ubuntu.com/ubuntu xenial InRelease
In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.
Check if the following works:
docker run busybox nslookup archive.ubuntu.com
If it does not work, you have to configure Docker to use the available DNS.
# Get the IP address you’re using for DNS: nmcli dev show | grep 'IP4.DNS' # Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step: { "dns": ["192.168.24.10"] } # Restart docker sudo service docker restart # Re-run docker run busybox nslookup archive.ubuntu.com # Now you should be able to re-run the installer and move past the DNS issue.
TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'
In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the docker_gwbridge interface.
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step: { "mtu": 1458 }
# Restart docker sudo service docker restart
Problem deploying stack osm
network netosm could not be found
The error is network "netosm" is declared as external, but could not be found. You need to create a swarm-scoped network before the stack is deployed
It usually happens because a "docker system prune" is done with the stack stopped. The following script will create it:
#!/bin/bash # Create OSM Docker Network ... [ -z "$OSM_STACK_NAME" ] && OSM_STACK_NAME=osm OSM_NETWORK_NAME=net${OSM_STACK_NAME} echo Creating OSM Docker Network DEFAULT_INTERFACE=$(route -n | awk '$1~/^0.0.0.0/ {print $8}') DEFAULT_MTU=$(ip addr show $DEFAULT_INTERFACE | perl -ne 'if (/mtu\s(\d+)/) {print $1;}') echo \# OSM_STACK_NAME = $OSM_STACK_NAME echo \# OSM_NETWORK_NAME = $OSM_NETWORK_NAME echo \# DEFAULT_INTERFACE = $DEFAULT_INTERFACE echo \# DEFAULT_MTU = $DEFAULT_MTU sg docker -c "docker network create --driver=overlay --attachable \ --opt com.docker.network.driver.mtu=${DEFAULT_MTU} \ ${OSM_NETWORK_NAME}"
Juju
Bootstrap hangs
If the Juju bootstrap takes a long time, stuck at this status...
Installing Juju agent on bootstrap instance Fetching Juju GUI 2.14.0 Waiting for address Attempting to connect to 10.71.22.78:22 Connected to 10.71.22.78 Running machine configuration script...
...it usually indicates that the LXD container with the Juju controller is having trouble connecting to the internet.
Get the name of the LXD container. It will begin with 'juju-' and end with '-0'.
lxc list +-----------------+---------+---------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +-----------------+---------+---------------------+------+------------+-----------+ | juju-0383f2-0 | RUNNING | 10.195.8.57 (eth0) | | PERSISTENT | | +-----------------+---------+---------------------+------+------------+-----------+
Next, tail the output of cloud-init to see where the bootstrap is stuck.
lxc exec juju-0383f2-0 -- tail -f /var/log/cloud-init-output.log
Is Juju running?
If running, you should see something like this:
$ juju status Model Controller Cloud/Region Version SLA default osm localhost/localhost 2.3.7 unsupported
ERROR controller osm already exists
Did OSM installation fail during juju installation with an error like "ERROR controller osm already exists" ?
$ ./install_osm.sh ... ERROR controller "osm" already exists ERROR try was stopped ### Jum Agu 24 15:19:33 WIB 2018 install_juju: FATAL error: Juju installation failed BACKTRACE: ### FATAL /usr/share/osm-devops/jenkins/common/logging 39 ### install_juju /usr/share/osm-devops/installers/full_install_osm.sh 564 ### install_lightweight /usr/share/osm-devops/installers/full_install_osm.sh 741 ### main /usr/share/osm-devops/installers/full_install_osm.sh 1033
Try to destroy the Juju controller and run the installation again:
$ juju destroy-controller osm --destroy-all-models -y $ ./install_osm.sh
If it does not work, you can destroy Juju container and run the installation again
#Destroy the Juju container lxc stop juju-* lxc delete juju-* #Unregister the controller since we’ve manually freed the resources associated with it juju unregister -y osm #Verify that there are no controllers juju list-controllers #Run the installation again ./install_osm.sh
LXD
ERROR profile default: /etc/default/lxd-bridge has IPv6 enabled
Make sure that you follow the instructions in the Quickstart.
When asked if you want to proceed with the installation and configuration of LXD, juju, docker CE and the initialization of a local docker swarm, as pre-requirements, Please answer "y".
When dialog messages related to LXD configuration are shown, please answer in the following way:
- Do you want to configure the LXD bridge? Yes
- Do you want to setup an IPv4 subnet? Yes
- << Default values apply for next questions >>
- Do you want to setup an IPv6 subnet? No
Configuration
VIMs
Is the VIM URL reachable and operational?
When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services:
Error: "VIM Exception vimmconnConnectionException ConnectFailure: Unable to establish connection to <URL>"
- In order to debug potential issues with the connection, in the case of an OpenStack VIM, you can install the OpenStack client in the OSM VM and run some basic tests. I.e.:
$ # Install the OpenStack client $ sudo apt-get install python-openstackclient $ # Load your OpenStack credentials. For instance, if your credentials are saved in a file named 'myVIM-openrc.sh', you can load them with: $ source myVIM-openrc.sh $ # Test if the VIM API is operational with a simple command. For instance: $ openstack image list
If the openstack client works, then make sure that you can reach the VIM from the RO docker:
$ docker exec -it osm_ro.1.xxxxx bash $ curl <URL_CONTROLLER>
In some cases, the errors come from the fact that the VIM was added to OSM using names in the URL that are not Fully Qualified Domain Names (FQDN).
When adding a VIM to OSM, you must use always FQDN or the IP addresses. It must be noted that “controller” or similar names are not proper FQDN (the suffix should be added). Non-FQDN names might be understood by docker’s dnsmasq as a docker container name to be resolved, which is not the case. In addition, all the VIM endpoints should also be FQDN or IP addresses, thus guaranteeing that all subsequent API calls can reach the appropriate endpoint.
Think of an NFV infrastructure with tens of VIMs, first you will have to use different names for each controller (controller1, controller2, etc.), then you will have to add to every machine trying to interact with the different VIMs, not only OSM, all those entries in the /etc/hosts file. This is bad practice.
However, it is useful to have a mean to work with lab environments using non-FQDN names. Three options here. Probably you are looking for the third one, but we recommend the first one:
- Option 1. Change the admin URL and/or public URL of the endpoints to use an IP address or an FQDN. You might find this interesting if you want to bring your Openstack setup to production.
- Option 2. Modify /etc/hosts in the docker RO container. This is not persistent after reboots or restarts of the osm docker stack.
- Option 3. Modify /etc/osm/docker/docker-compose.yaml in the host, adding extra_hosts in the ro section with the entries that you want to add to /etc/hosts in the RO docker:
ro: extra_hosts: controller: 1.2.3.4
Then restart the stack:
docker stack rm osm docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm
This is persistent after reboots and restarts of the osm docker stack.
Authentication
What should I check if the VIM authentication is failing?
Typically, you will get the following error messsage:
Error: "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)"
If your OpenStack URL is based on HTTPS, OSM will check by default the authenticity of your VIM using the appropriate public certificate. The recommended way to solve this is by modifying /etc/osm/docker/docker-compose.yaml in the host, sharing the host file (e.g. /home/ubuntu/cafile.crt) by adding a volume to the ro section as follows:
ro: ... volumes: - /home/ubuntu/cafile.crt:/etc/osm/cafile.crt
Then, when creating the VIM, you should use the config option "ca_cert" as follows:
$ # Create the VIM with all the usual options, and add the config option to specify the certificate $ osm vim-create VIM-NAME ... --config '{ca_cert: /etc/osm/cafile.crt}'
For casual testing, when adding the VIM account to OSM, you can use 'insecure: True' (without quotes) as part of the VIM config parameters:
$ osm vim-create VIM-NAME ... --config '{insecure: True}'
Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?
The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host.
For instance, in the case of an OpenStack VIM you could try something like this:
$ openstack server create --image ubuntu --flavor m1.small --nic mgmtnet test
If this does not work, typically it is due to one of these issues:
- Security group policy in your VIM is blocking your traffic (contact your admin to fix it)
- IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs).
Operational issues
Running out of disk space
If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:
docker system prune docker image prune
If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)
du -sk /var/lib/docker/containers/* |sort -n docker ps |grep <CONTAINER_ID>
Then, remove the stack and redeploy it again after doing a prune:
docker stack rm osm_metrics docker system prune docker image prune docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics
VCA (juju)
Status is not coherent with running NS
In extraordinary situations, the output of "juju status" could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below:
juju status -m <NS_ID> juju remove-application -m <NS_ID> <application> juju resolved -m <NS_ID> <unit> --no-retry # You'll likely have to run it several times, as it will probably have an error in the next queued hook.Once the last hook is marked resolved, the charm will continue its removal
The following page also shows how to remove different Juju objects
Dump Juju Logs
To dump the Juju debug-logs, run this command:
juju debug-log --replay --no-tail > juju-debug.log juju debug-log --replay --no-tail -m <NS_ID> juju debug-log --replay --no-tail -m <NS_ID> --include <UNIT>
Manual recovery of Juju
If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller.
# Stop and delete all juju containers, then unregister the controller lxc list lxc stop juju-* #replace "*" by the right values lxc delete juju-* #replace "*" by the right values juju unregister -y osm # Create the controller again sg lxd -c "juju bootstrap --bootstrap-series=xenial localhost osm" # Get controller IP and update it in relevant OSM env files controller_ip=$(juju show-controller osm|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}') sudo sed -i 's/^OSMMON_VCA_HOST.*$/OSMMON_VCA_HOST='$controller_ip'/' /etc/osm/docker/mon.env sudo sed -i 's/^OSMLCM_VCA_HOST.*$/OSMLCM_VCA_HOST='$controller_ip'/' /etc/osm/docker/lcm.env #Get juju password and feed it to OSM env files function parse_juju_password { password_file="${HOME}/.local/share/juju/accounts.yaml" local controller_name=$1 local s='[[:space:]]*' w='[a-zA-Z0-9_-]*' fs=$(echo @|tr @ '\034') sed -ne "s|^\($s\):|\1|" \ -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \ -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $password_file | awk -F$fs -v controller=$controller_name '{ indent = length($1)/2; vname[indent] = $2; for (i in vname) {if (i > indent) {delete vname[i]}} if (length($3) > 0) { vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")} if (match(vn,controller) && match($2,"password")) { printf("%s",$3); } } }' } juju_password=$(parse_juju_password osm) sudo sed -i 's/^OSMMON_VCA_SECRET.*$/OSMMON_VCA_SECRET='$juju_password'/' /etc/osm/docker/mon.env sudo sed -i 's/^OSMLCM_VCA_SECRET.*$/OSMLCM_VCA_SECRET='$juju_password'/' /etc/osm/docker/lcm.env juju_pubkey=$(cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub) sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/mon.env sudo sed -i 's/^OSMLCM_VCA_PUBKEY.*$/OSMLCM_VCA_PUBKEY='$juju_pubkey'/' /etc/osm/docker/lcm.env #Restart OSM stack docker stack rm osm docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm #Reset iptable rules #Delete all rules listed with this command: sudo iptables -t nat -L | grep 17070 sudo iptables -t nat -A PREROUTING -p tcp -m tcp -d <source> --dport 17070 -j DNAT --to-destination <destination> #Create new iptable rule OSM_VCA_HOST=`sg lxd -c "juju show-controller osm"|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}'` DEFAULT_IF=`route -n |awk '$1~/^0.0.0.0/ {print $8}'` DEFAULT_IP=`ip -o -4 a |grep ${DEFAULT_IF}|awk '{split($4,a,"/"); print a[1]}'` sudo iptables -t nat -A PREROUTING -p tcp -m tcp -d $DEFAULT_IP --dport 17070 -j DNAT --to-destination $OSM_VCA_HOST
Slow deployment of charms
You can make deplyment of charms quicker by:
- Upgrading your LXD installation to use ZFS: LXD configuration for OSM Release FIVE
- After LXD re-installation, you might need to reinstall the juju controller: Reinstall Juju controller
- Preventing Juju from running apt-get update && apt-get upgrade when starting a machine: Disable OS upgrades in charms
- Building periodically a custom image that will be used as base image for all the charms: Custom base image for charms
Instantiation Errors
File juju_id_rsa.pub not found
- ERROR: ERROR creating VCA model name 'xxxx': Traceback (most recent call last): File "/usr/lib/python3/dist-packages/osm_lcm/ns.py", line 822, in instantiate await ... [Errno 2] No such file or directory: '/root/.local/share/juju/ssh/juju_id_rsa.pub'
- CAUSE: Normally a migration from release FIVE do not set properly the env for LCM
- SOLUTION: Ensure variable OSMLCM_VCA_PUBKEY is properly set at file /etc/osm/docker/lcm.env. The value must mutch with the output of this command cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub. If not, add or change it. Restart OSM, or just LCM service with docker service update osm_lcm --force --env-add OSMLCM_VCA_PUBKEY="<value>"
NBI Errors
Cannot login after migration to 6.0.2
- ERROR: NBI always return "UNAUTHORIZED" Cannot login neither with UI nor with CLI. CLI shows error "can't find a default project for this user" or "project admin not allowed for this user".
- CAUSE: Normally after a migration to release 6.0.2 There is a slight incompatibility with users creared from older versions
- SOLUTION: Delete user admin and reboot NBI so that a new compatible user is created by running these commands:
curl --insecure https://localhost:9999/osm/test/db-clear/users docker service update osm_nbi --force
Checking the logs
You can check the logs of any container with the following commands:
docker logs $(docker ps -aqf "name=osm_mon" -n 1) docker logs $(docker ps -aqf "name=osm_pol" -n 1) docker logs $(docker ps -aqf "name=osm_lcm" -n 1) docker logs $(docker ps -aqf "name=osm_nbi" -n 1) docker logs $(docker ps -aqf "name=osm_light-ui" -n 1) docker logs $(docker ps -aqf "name=osm_ro.1" -n 1) docker logs $(docker ps -aqf "name=osm_ro-db" -n 1) docker logs $(docker ps -aqf "name=osm_mongo" -n 1) docker logs $(docker ps -aqf "name=osm_kafka" -n 1) docker logs $(docker ps -aqf "name=osm_zookeeper" -n 1) docker logs $(docker ps -aqf "name=osm_keystone.1" -n 1) docker logs $(docker ps -aqf "name=osm_keystone-db" -n 1) docker logs $(docker ps -aqf "name=osm_prometheus" -n 1)
For each container, logs can be found under:
/var/lib/docker/containers/DOCKER_ID/DOCKER_ID-json.log
And the DOCKER_ID can be obtained this way, e.g. for MON
docker ps -aqf "name=osm_mon" -n 1 --no-trunc