Updated troubleshooting guide: improved structure, added instructions for getting logs (b6d87b0c) · Commits · osm-doc / OSM User Guide

09-troubleshooting.md

+194 −164

Original line number	Diff line number	Diff line
		@@ -27,7 +27,9 @@ dpkg -l python3-osmclient
		ii python3-osmclient 8.0.0-1 all
		```

		## Recommended installation to facilitate troubleshooting
		## Troubleshooting installation

		### Recommended installation to facilitate troubleshooting

		It is highly recommended saving a log of your installation:

		@@ -35,106 +37,45 @@ It is highly recommended saving a log of your installation:
		./install_osm.sh 2>&1 \| tee osm_install_log.txt
		```

		## Common issues and troubleshooting

		### Add User in Group

		Add the non-root user used for installation in sudo , lxd, docker groups

		This will skip below error :-

		_Finished installation of juju_ Password: sg: failed to crypt password with previous salt: Invalid argument ERROR No controllers registered.

		### Docker

		#### Were all docker images successfully built?

		Although controlled by the installer, you can check that the following images exist:

		```bash
		$ docker image ls

		REPOSITORY TAG IMAGE ID CREATED SIZE
		osm/light-ui latest 1988aa262a97 18 hours ago 710MB
		osm/lcm latest c9ad59bf96aa 46 hours ago 667MB
		osm/ro latest 812c987fcb16 46 hours ago 791MB
		osm/nbi latest 584b4e0084a7 46 hours ago 497MB
		osm/pm latest 1ad1e4099f52 46 hours ago 462MB
		osm/mon latest b17efa3412e3 46 hours ago 725MB
		wurstmeister/kafka latest 7cfc4e57966c 10 days ago 293MB
		mysql 5 0d16d0a97dd1 2 weeks ago 372MB
		mongo latest 14c497d5c758 3 weeks ago 366MB
		wurstmeister/zookeeper latest 351aa00d2fe9 18 months ago 478MB
		```
		### Recommended checks after installation

		#### Are all processes/services running?
		#### Checking whether all processes/services are running in docker swarm

		```bash
		$ docker stack ps osm \|grep -i running
		docker stack ps osm \|grep -i running
		```

		10 docker containers should be running.

		All the 10 services should have at least 1 replica: 1/1
		All the services should have at least 1 replica: 1/1

		```bash
		$ docker service ls

		ID NAME MODE REPLICAS IMAGE PORTS
		yuyiqh8ty8pv osm_kafka replicated 1/1 wurstmeister/kafka:latest *:9092->9092/tcp
		y585906h5vy5 osm_lcm replicated 1/1 osm/lcm:latest
		pcdi5vb86nt9 osm_light-ui replicated 1/1 osm/light-ui:latest *:80->80/tcp
		i56jhl5k6re4 osm_mon replicated 1/1 osm/mon:latest *:8662->8662/tcp
		p5wyjtne93hp osm_mongo replicated 1/1 mongo:latest
		iz5uncfdzu23 osm_nbi replicated 1/1 osm/nbi:latest *:9999->9999/tcp
		4ttw2v4z2g57 osm_pm replicated 1/1 osm/pm:latest
		xbg6bclp2anw osm_ro replicated 1/1 osm/ro:latest *:9090->9090/tcp
		sf7rayfolncu osm_ro-db replicated 1/1 mysql:5
		5bl73dhj1xl0 osm_zookeeper replicated 1/1 wurstmeister/zookeeper:latest
		```

		#### Docker image failed to build

		##### Err:1 `http://archive.ubuntu.com/ubuntu xenial InRelease`

		In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.

		Check if the following works:

		```bash
		docker run busybox nslookup archive.ubuntu.com
		paxqvnwwubcf osm_grafana replicated 1/1 grafana/grafana:latest *:3000->3000/tcp
		xkn3jr7ipibf osm_kafka replicated 1/1 wurstmeister/kafka:latest *:30002->9092/tcp
		px2xfetg68z1 osm_keystone replicated 1/1 opensourcemano/keystone:8 *:5000->5000/tcp
		62yljr0s97vv osm_lcm replicated 1/1 opensourcemano/lcm:8
		lwtfoh29sb95 osm_light-ui replicated 1/1 opensourcemano/light-ui:8 *:80->80/tcp
		xjl2vx9t6ogz osm_mon replicated 1/1 opensourcemano/mon:8 *:8662->8662/tcp
		t6r9wjjxqy1v osm_mongo replicated 1/1 mongo:latest
		rmuhwvl5gkgo osm_mysql replicated 1/1 mysql:5
		vjyee8af3a8r osm_nbi replicated 1/1 opensourcemano/nbi:8 *:9999->9999/tcp
		ihdjxn68aa4p osm_pol replicated 1/1 opensourcemano/pol:8
		tnk91kubxfvk osm_prometheus replicated 1/1 prom/prometheus:latest *:9091->9090/tcp
		4e5c49m9x0by osm_prometheus-cadvisor replicated 1/1 google/cadvisor:latest *:8080->8080/tcp
		m1cxap6wkxmf osm_ro replicated 1/1 opensourcemano/ro:8 *:9090->9090/tcp
		97r6t2zrs4ho osm_zookeeper replicated 1/1 wurstmeister/zookeeper:latest
		```

		If it does not work, you have to configure Docker to use the available DNS.
		#### Checking whether all processes/services are running in K8s

		```bash
		# Get the IP address you’re using for DNS:
		nmcli dev show \| grep 'IP4.DNS'
		# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step:
		{
		"dns": ["192.168.24.10"]
		}
		# Restart docker
		sudo service docker restart
		# Re-run
		docker run busybox nslookup archive.ubuntu.com
		# Now you should be able to re-run the installer and move past the DNS issue.
		kubectl -n osm get all
		```

		##### TypeError: `unsupported operand type(s) for -=: 'Retry' and 'int'`
		### Issues on standard installation

		In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the `docker_gwbridge` interface.

		```bash
		# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step:
		{
		"mtu": 1458
		}
		# Restart docker
		sudo service docker restart
		```

		#### Problem deploying stack osm
		#### Docker Swarm

		##### `network netosm could not be found`

		@@ -159,9 +100,9 @@ It usually happens when a `docker system prune` is done with the stack stopped.
		${OSM_NETWORK_NAME}"
		```

		### Juju
		#### Juju

		#### Bootstrap hangs
		##### Juju bootstrap hangs

		If the Juju bootstrap takes a long time, stuck at this status...

		@@ -193,7 +134,7 @@ Next, tail the output of cloud-init to see where the bootstrap is stuck.
		lxc exec juju-0383f2-0 -- tail -f /var/log/cloud-init-output.log
		```

		#### Is Juju running?
		##### Is Juju running?

		If running, you should see something like this:

		@@ -204,7 +145,7 @@ Model Controller Cloud/Region Version SLA
		default osm localhost/localhost 2.3.7 unsupported
		```

		#### ERROR controller osm already exists
		##### ERROR controller osm already exists

		Did OSM installation fail during juju installation with an error like "ERROR controller osm already exists" ?

		@@ -243,9 +184,17 @@ juju list-controllers
		./install_osm.sh
		```

		### LXD
		##### No controllers registered

		The following error appears when the user used for installation does not belong to some groups:

		_Finished installation of juju_ Password: sg: failed to crypt password with previous salt: Invalid argument ERROR No controllers registered.

		To fix it, just add the non-root user used for installation in sudo , lxd, docker groups

		#### LXD

		#### ERROR profile default: `/etc/default/lxd-bridge` has IPv6 enabled
		##### ERROR profile default: `/etc/default/lxd-bridge` has IPv6 enabled

		Make sure that you follow the instructions in the [Quickstart](01-quickstart.md).

		@@ -258,50 +207,72 @@ When dialog messages related to LXD configuration are shown, please answer in th
		- << Default values apply for next questions >>
		- Do you want to setup an IPv6 subnet? No

		### Configuration
		### Issues on advanced installation (manual build of docker images)

		### NBI
		#### Manual build of images. Were all docker images successfully built?

		#### SSL certificate problem

		By default, OSM installer uses a self-signed certificate for HTTPS. That might lead to the error '_SSL certificate problem: self signed certificate_' on the client side. For testing environments, you might want to ignore this error just by using the appropriate options to skip certificate validation (e.g. `--insecure` for curl, `--no-check-certificate` for wget, etc.). However, for more stable setups you might prefer to address this issue by installing the appropriate certificate in your client system.

		These are the steps to install NBI certificate on the client side (tested for Ubuntu):
		Although controlled by the installer, you can check that the following images exist:

		1. Get the certificate file `cert.pem` by any of these means:
		- From running docker container:
		```bash
		docker ps \| grep nbi
		docker cp <docker-id>:/app/NBI/osm_nbi/http/cert.pem .
		$ docker image ls

		REPOSITORY TAG IMAGE ID CREATED SIZE
		osm/light-ui latest 1988aa262a97 18 hours ago 710MB
		osm/lcm latest c9ad59bf96aa 46 hours ago 667MB
		osm/ro latest 812c987fcb16 46 hours ago 791MB
		osm/nbi latest 584b4e0084a7 46 hours ago 497MB
		osm/pm latest 1ad1e4099f52 46 hours ago 462MB
		osm/mon latest b17efa3412e3 46 hours ago 725MB
		wurstmeister/kafka latest 7cfc4e57966c 10 days ago 293MB
		mysql 5 0d16d0a97dd1 2 weeks ago 372MB
		mongo latest 14c497d5c758 3 weeks ago 366MB
		wurstmeister/zookeeper latest 351aa00d2fe9 18 months ago 478MB
		```
		- From source code: NBI-folder/osm_nbi/http/cert.pem
		- From ETSI's git:

		#### Docker image failed to build

		##### Err:1 `http://archive.ubuntu.com/ubuntu xenial InRelease`

		In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.

		Check if the following works:

		```bash
		wget -O cert.pem "https://osm.etsi.org/gitweb/?p=osm/NBI.git;a=blob_plain;f=osm_nbi/http/cert.pem;hb=refs/heads/v8.0"
		docker run busybox nslookup archive.ubuntu.com
		```
		2. Then, you should install this certificate:

		If it does not work, you have to configure Docker to use the available DNS.

		```bash
		sudo cp cert.pem /usr/local/share/ca-certificates/osm_nbi_cert.pem.crt
		sudo update-ca-certificates
		# 1 added, 0 removed; done
		```
		3. Add to the list of `/etc/hosts` a host called "nbi" with the IP address where OSM is running.
		- It can be `localhost` if client and server are the same machine.
		- For localhost, you would need to add (or edit) these lines:
		```text
		127.0.0.1 localhost nbi
		OSM-ip nbi
		# Get the IP address you’re using for DNS:
		nmcli dev show \| grep 'IP4.DNS'
		# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step:
		{
		"dns": ["192.168.24.10"]
		}
		# Restart docker
		sudo service docker restart
		# Re-run
		docker run busybox nslookup archive.ubuntu.com
		# Now you should be able to re-run the installer and move past the DNS issue.
		```
		4. Finally, for the URL, use the `nbi` as host name (i.e. <httts://nbi:9999/osm>).
		- Do not use neither `localhost` nor 127.0.0.1.
		- You can run a quick test with `curl` by:

		##### TypeError: `unsupported operand type(s) for -=: 'Retry' and 'int'`

		In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the `docker_gwbridge` interface.

		```bash
		curl https://nbi:9999/osm/version
		# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step:
		{
		"mtu": 1458
		}
		# Restart docker
		sudo service docker restart
		```

		### VIMs
		## Common issues with VIMs

		#### Is the VIM URL reachable and operational?
		### Is the VIM URL reachable and operational?

		When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services:

		@@ -354,7 +325,7 @@ docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm

		This is persistent after reboots and restarts of the osm docker stack.

		#### Authentication
		### VIM authentication

		What should I check if the VIM authentication is failing?

		@@ -384,6 +355,8 @@ For casual testing, when adding the VIM account to OSM, you can use `'insecure:
		$ osm vim-create VIM-NAME ... --config '{insecure: True}'
		```

		### Issues when trying to access VM from OSM

		Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?

		The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host.
		@@ -399,36 +372,9 @@ If this does not work, typically it is due to one of these issues:
		- Security group policy in your VIM is blocking your traffic (contact your admin to fix it)
		- IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs).

		### Operational issues
		## Common issues with VCA/Juju

		### Running out of disk space

		If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

		```bash
		docker system prune
		docker image prune
		```

		If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)

		```bash
		du -sk /var/lib/docker/containers/* \|sort -n
		docker ps \|grep <CONTAINER_ID>
		```

		Then, remove the stack and redeploy it again after doing a prune:

		```bash
		docker stack rm osm_metrics
		docker system prune
		docker image prune
		docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics
		```

		### VCA (juju)

		#### Status is not coherent with running NS
		### Status is not coherent with running NS

		In extraordinary situations, the output of `juju status` could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below:

		@@ -440,7 +386,7 @@ juju resolved -m <NS_ID> <unit> --no-retry # You'll likely have to run it

		The following page also shows [how to remove different Juju objects](https://docs.jujucharms.com/2.1/en/charms-destroy)

		#### Dump Juju Logs
		### Dump Juju Logs

		To dump the Juju debug-logs, run this command:

		@@ -450,7 +396,7 @@ juju debug-log --replay --no-tail -m <NS_ID>
		juju debug-log --replay --no-tail -m <NS_ID> --include <UNIT>
		```

		#### Manual recovery of Juju
		### Manual recovery of Juju

		If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller.

		@@ -502,7 +448,7 @@ docker stack rm osm
		docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm
		```

		#### Slow deployment of charms
		### Slow deployment of charms

		You can make deployment of charms quicker by:

		@@ -511,17 +457,54 @@ You can make deployment of charms quicker by:
		- Preventing Juju from running `apt-get update && apt-get upgrade` when starting a machine: [Disable OS upgrades in charms](14-advanced-charm-development.md#disable-os-upgrades)
		- Building periodically a custom image that will be used as base image for all the charms: [Custom base image for charms](14-advanced-charm-development.md#build-a-custom-cloud-image)

		### Instantiation Errors
		## Common instantiation errors

		#### File juju_id_rsa.pub not found
		### File juju_id_rsa.pub not found

		- ERROR: `ERROR creating VCA model name 'xxxx': Traceback (most recent call last): File "/usr/lib/python3/dist-packages/osm_lcm/ns.py", line 822, in instantiate await ... [Errno 2] No such file or directory: '/root/.local/share/juju/ssh/juju_id_rsa.pub'`
		- CAUSE: Normally a migration from release FIVE do not set properly the env for LCM
		- SOLUTION: Ensure variable OSMLCM_VCA_PUBKEY is properly set at file `/etc/osm/docker/lcm.env`. The value must match with the output of this command `cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub`. If not, add or change it. Restart OSM, or just LCM service with `docker service update osm_lcm --force --env-add OSMLCM_VCA_PUBKEY=""`

		### NBI Errors
		## Common issues whwn interacting with NBI

		### SSL certificate problem

		By default, OSM installer uses a self-signed certificate for HTTPS. That might lead to the error '_SSL certificate problem: self signed certificate_' on the client side. For testing environments, you might want to ignore this error just by using the appropriate options to skip certificate validation (e.g. `--insecure` for curl, `--no-check-certificate` for wget, etc.). However, for more stable setups you might prefer to address this issue by installing the appropriate certificate in your client system.

		These are the steps to install NBI certificate on the client side (tested for Ubuntu):

		1. Get the certificate file `cert.pem` by any of these means:
		- From running docker container:
		```bash
		docker ps \| grep nbi
		docker cp <docker-id>:/app/NBI/osm_nbi/http/cert.pem .
		```
		- From source code: NBI-folder/osm_nbi/http/cert.pem
		- From ETSI's git:
		```bash
		wget -O cert.pem "https://osm.etsi.org/gitweb/?p=osm/NBI.git;a=blob_plain;f=osm_nbi/http/cert.pem;hb=refs/heads/v8.0"
		```
		2. Then, you should install this certificate:
		```bash
		sudo cp cert.pem /usr/local/share/ca-certificates/osm_nbi_cert.pem.crt
		sudo update-ca-certificates
		# 1 added, 0 removed; done
		```
		3. Add to the list of `/etc/hosts` a host called "nbi" with the IP address where OSM is running.
		- It can be `localhost` if client and server are the same machine.
		- For localhost, you would need to add (or edit) these lines:
		```text
		127.0.0.1 localhost nbi
		OSM-ip nbi
		```
		4. Finally, for the URL, use the `nbi` as host name (i.e. <httts://nbi:9999/osm>).
		- Do not use neither `localhost` nor 127.0.0.1.
		- You can run a quick test with `curl` by:
		```bash
		curl https://nbi:9999/osm/version
		```

		#### Cannot login after migration to 6.0.2
		### Cannot login after migration to 6.0.2

		- ERROR: NBI always return "UNAUTHORIZED". Cannot login neither with UI nor with CLI. CLI shows error "`can't find a default project for this user`" or "`project admin not allowed for this user`".
		- CAUSE: Normally after a migration to release 6.0.2 There is a slight incompatibility with users created from older versions.
		@@ -532,6 +515,35 @@ curl --insecure https://localhost:9999/osm/test/db-clear/users
		docker service update osm_nbi --force
		```

		## Other operational issues

		### Running out of disk space

		If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

		```bash
		docker system prune
		docker image prune
		```

		If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)

		```bash
		du -sk /var/lib/docker/containers/* \|sort -n
		docker ps \|grep <CONTAINER_ID>
		```

		Then, remove the stack and redeploy it again after doing a prune:

		```bash
		docker stack rm osm_metrics
		docker system prune
		docker image prune
		docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics
		```

		## Logs

		### Checking the logs

		You can check the logs of any container with the following commands:
		@@ -552,6 +564,24 @@ docker logs $(docker ps -aqf "name=osm_keystone-db" -n 1)
		docker logs $(docker ps -aqf "name=osm_prometheus" -n 1)
		```

		For live debugging, the following commands can be useful to save the log output to a file and show it in the screen:

		```bash
		docker logs -f $(docker ps -aqf "name=osm_mon.1" -n 1) 2>&1 \| tee mon-log.txt
		docker logs -f $(docker ps -aqf "name=osm_pol" -n 1) 2>&1 \| tee pol-log.txt
		docker logs -f $(docker ps -aqf "name=osm_lcm" -n 1) 2>&1 \| tee lcm-log.txt
		docker logs -f $(docker ps -aqf "name=osm_nbi" -n 1) 2>&1 \| tee nbi-log.txt
		docker logs -f $(docker ps -aqf "name=osm_light-ui" -n 1) 2>&1 \| tee light-log.txt
		docker logs -f $(docker ps -aqf "name=osm_ro.1" -n 1) 2>&1 \| tee ro-log.txt
		docker logs -f $(docker ps -aqf "name=osm_ro-db" -n 1) 2>&1 \| tee rodb-log.txt
		docker logs -f $(docker ps -aqf "name=osm_mongo" -n 1) 2>&1 \| tee mongo-log.txt
		docker logs -f $(docker ps -aqf "name=osm_kafka" -n 1) 2>&1 \| tee kafka-log.txt
		docker logs -f $(docker ps -aqf "name=osm_zookeeper" -n 1) 2>&1 \| tee zookeeper-log.txt
		docker logs -f $(docker ps -aqf "name=osm_keystone.1" -n 1) 2>&1 \| tee keystone-log.txt
		docker logs -f $(docker ps -aqf "name=osm_keystone-db" -n 1) 2>&1 \| tee keystonedb-log.txt
		docker logs -f $(docker ps -aqf "name=osm_prometheus" -n 1) 2>&1 \| tee prometheus-log.txt
		```

		For each container, logs can be found under:

		```bash