Common issues and troubleshooting: Difference between revisions

From OSM Public Wiki
Jump to: navigation, search
No edit summary
Line 69: Line 69:
  # Now you should be able to re-run the installer and move past the DNS issue.
  # Now you should be able to re-run the installer and move past the DNS issue.


====Problem deploying stack osm====
=====network netosm could not be found=====
The error is ''network "netosm" is declared as external, but could not be found. You need to create a swarm-scoped network before the stack is deployed''
It usually happens because a "docker system prune"  is done with the stack stopped. The following script will create it:
  #!/bin/bash
  # Create OSM Docker Network ...
  [ -z "$OSM_STACK_NAME" ] && OSM_STACK_NAME=osm
  OSM_NETWORK_NAME=net${OSM_STACK_NAME}
  echo Creating OSM Docker Network
  DEFAULT_INTERFACE=$(route -n | awk '$1~/^0.0.0.0/ {print $8}')
  DEFAULT_MTU=$(ip addr show $DEFAULT_INTERFACE | perl -ne 'if (/mtu\s(\d+)/) {print $1;}')
  echo \# OSM_STACK_NAME = $OSM_STACK_NAME
  echo \# OSM_NETWORK_NAME = $OSM_NETWORK_NAME
  echo \# DEFAULT_INTERFACE = $DEFAULT_INTERFACE
  echo \# DEFAULT_MTU = $DEFAULT_MTU
  sg docker -c "docker network create --driver=overlay --attachable \
                --opt com.docker.network.driver.mtu=${DEFAULT_MTU} \
                ${OSM_NETWORK_NAME}"


===Juju===
===Juju===

Revision as of 09:06, 18 June 2019

Installation

RECOMMENDATION: save a log of your installation:

$ ./install_osm.sh 2>&1 | tee osm_install_log.txt


Docker

Were all docker images successfully built?

Although controlled by the installer, you can check that the following images exist:

$ docker image ls

REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
osm/light-ui             latest              1988aa262a97        18 hours ago        710MB
osm/lcm                  latest              c9ad59bf96aa        46 hours ago        667MB
osm/ro                   latest              812c987fcb16        46 hours ago        791MB
osm/nbi                  latest              584b4e0084a7        46 hours ago        497MB
osm/pm                   latest              1ad1e4099f52        46 hours ago        462MB
osm/mon                  latest              b17efa3412e3        46 hours ago        725MB
wurstmeister/kafka       latest              7cfc4e57966c        10 days ago         293MB
mysql                    5                   0d16d0a97dd1        2 weeks ago         372MB
mongo                    latest              14c497d5c758        3 weeks ago         366MB
wurstmeister/zookeeper   latest              351aa00d2fe9        18 months ago       478MB


Are all processes/services running?

$ docker stack ps osm |grep -i running

10 docker containers should be running.

All the 10 services should have at least 1 replica: 1/1

$ docker service ls

ID                  NAME                MODE                REPLICAS            IMAGE                           PORTS
yuyiqh8ty8pv        osm_kafka           replicated          1/1                 wurstmeister/kafka:latest       *:9092->9092/tcp
y585906h5vy5        osm_lcm             replicated          1/1                 osm/lcm:latest
pcdi5vb86nt9        osm_light-ui        replicated          1/1                 osm/light-ui:latest             *:80->80/tcp
i56jhl5k6re4        osm_mon             replicated          1/1                 osm/mon:latest                  *:8662->8662/tcp
p5wyjtne93hp        osm_mongo           replicated          1/1                 mongo:latest
iz5uncfdzu23        osm_nbi             replicated          1/1                 osm/nbi:latest                  *:9999->9999/tcp
4ttw2v4z2g57        osm_pm              replicated          1/1                 osm/pm:latest
xbg6bclp2anw        osm_ro              replicated          1/1                 osm/ro:latest                   *:9090->9090/tcp
sf7rayfolncu        osm_ro-db           replicated          1/1                 mysql:5
5bl73dhj1xl0        osm_zookeeper       replicated          1/1                 wurstmeister/zookeeper:latest

Docker image failed to build

Err:1 http://archive.ubuntu.com/ubuntu xenial InRelease

In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.

Check if the following works:

docker run busybox nslookup archive.ubuntu.com

If it does not work, you have to configure Docker to use the available DNS.

# Get the IP address you’re using for DNS:
nmcli dev show | grep 'IP4.DNS'
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step:
{
   "dns": ["192.168.24.10"]
}
# Restart docker
sudo service docker restart
# Re-run
docker run busybox nslookup archive.ubuntu.com
# Now you should be able to re-run the installer and move past the DNS issue.

Problem deploying stack osm

network netosm could not be found

The error is network "netosm" is declared as external, but could not be found. You need to create a swarm-scoped network before the stack is deployed

It usually happens because a "docker system prune" is done with the stack stopped. The following script will create it:

 #!/bin/bash
 # Create OSM Docker Network ...
 [ -z "$OSM_STACK_NAME" ] && OSM_STACK_NAME=osm
 OSM_NETWORK_NAME=net${OSM_STACK_NAME}
 echo Creating OSM Docker Network
 DEFAULT_INTERFACE=$(route -n | awk '$1~/^0.0.0.0/ {print $8}')
 DEFAULT_MTU=$(ip addr show $DEFAULT_INTERFACE | perl -ne 'if (/mtu\s(\d+)/) {print $1;}')
 echo \# OSM_STACK_NAME = $OSM_STACK_NAME
 echo \# OSM_NETWORK_NAME = $OSM_NETWORK_NAME
 echo \# DEFAULT_INTERFACE = $DEFAULT_INTERFACE
 echo \# DEFAULT_MTU = $DEFAULT_MTU
 sg docker -c "docker network create --driver=overlay --attachable \
                --opt com.docker.network.driver.mtu=${DEFAULT_MTU} \
                ${OSM_NETWORK_NAME}"

Juju

Bootstrap hangs

If the Juju bootstrap takes a long time, stuck at this status...

Installing Juju agent on bootstrap instance
Fetching Juju GUI 2.14.0
Waiting for address
Attempting to connect to 10.71.22.78:22
Connected to 10.71.22.78
Running machine configuration script...

...it usually indicates that the LXD container with the Juju controller is having trouble connecting to the internet.

Get the name of the LXD container. It will begin with 'juju-' and end with '-0'.

lxc list
+-----------------+---------+---------------------+------+------------+-----------+
|      NAME       |  STATE  |        IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+-----------------+---------+---------------------+------+------------+-----------+
| juju-0383f2-0   | RUNNING | 10.195.8.57 (eth0)  |      | PERSISTENT |           |
+-----------------+---------+---------------------+------+------------+-----------+

Next, tail the output of cloud-init to see where the bootstrap is stuck.

lxc exec juju-0383f2-0 -- tail -f /var/log/cloud-init-output.log


Is Juju running?

If running, you should see something like this:

$ juju status

Model    Controller  Cloud/Region         Version  SLA
default  osm         localhost/localhost  2.3.7    unsupported


ERROR controller osm already exists

Did OSM installation fail during juju installation with an error like "ERROR controller osm already exists" ?


$ ./install_osm.sh
...
ERROR controller "osm" already exists
ERROR try was stopped

### Jum Agu 24 15:19:33 WIB 2018 install_juju: FATAL error: Juju installation failed
BACKTRACE:
### FATAL /usr/share/osm-devops/jenkins/common/logging 39
### install_juju /usr/share/osm-devops/installers/full_install_osm.sh 564
### install_lightweight /usr/share/osm-devops/installers/full_install_osm.sh 741
### main /usr/share/osm-devops/installers/full_install_osm.sh 1033

Try to destroy the Juju controller and run the installation again:

$ juju destroy-controller osm --destroy-all-models -y
$ ./install_osm.sh

If it does not work, you can destroy Juju container and run the installation again

#Destroy the Juju container
lxc stop juju-*
lxc delete juju-*
#Unregister the controller since we’ve manually freed the resources associated with it
juju unregister -y osm
#Verify that there are no controllers
juju list-controllers
#Run the installation again
./install_osm.sh

LXD

ERROR profile default: /etc/default/lxd-bridge has IPv6 enabled

Make sure that you follow the instructions in the Quickstart.

When asked if you want to proceed with the installation and configuration of LXD, juju, docker CE and the initialization of a local docker swarm, as pre-requirements, Please answer "y".

When dialog messages related to LXD configuration are shown, please answer in the following way:

  • Do you want to configure the LXD bridge? Yes
  • Do you want to setup an IPv4 subnet? Yes
  • << Default values apply for next questions >>
  • Do you want to setup an IPv6 subnet? No

Configuration

VIMs

Is the VIM URL reachable and operational?

When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services:

Error: "VIM Exception vimmconnConnectionException ConnectFailure: Unable to establish connection to <URL>"
  • In order to debug potential issues with the connection, in the case of an OpenStack VIM, you can install the OpenStack client in the OSM VM and run some basic tests. I.e.:
$ # Install the OpenStack client
$ sudo apt-get install python-openstackclient
$ # Load your OpenStack credentials. For instance, if your credentials are saved in a file named 'myVIM-openrc.sh', you can load them with:
$ source myVIM-openrc.sh
$ # Test if the VIM API is operational with a simple command. For instance:
$ openstack image list

If the openstack client works, then make sure that you can reach the VIM from the RO docker:

$ docker exec -it osm_ro.1.xxxxx bash
$ curl <URL_CONTROLLER>

In some cases, the errors come from the fact that the VIM was added to OSM using names in the URL that are not Fully Qualified Domain Names (FQDN).

When adding a VIM to OSM, you must use always FQDN or the IP addresses. It must be noted that “controller” or similar names are not proper FQDN (the suffix should be added). Non-FQDN names might be understood by docker’s dnsmasq as a docker container name to be resolved, which is not the case. In addition, all the VIM endpoints should also be FQDN or IP addresses, thus guaranteeing that all subsequent API calls can reach the appropriate endpoint.

Think of an NFV infrastructure with tens of VIMs, first you will have to use different names for each controller (controller1, controller2, etc.), then you will have to add to every machine trying to interact with the different VIMs, not only OSM, all those entries in the /etc/hosts file. This is bad practice.

However, it is useful to have a mean to work with lab environments using non-FQDN names. Three options here. Probably you are looking for the third one, but we recommend the first one:

  • Option 1. Change the admin URL and/or public URL of the endpoints to use an IP address or an FQDN. You might find this interesting if you want to bring your Openstack setup to production.
  • Option 2. Modify /etc/hosts in the docker RO container. This is not persistent after reboots or restarts of the osm docker stack.
  • Option 3. Modify /etc/osm/docker/docker-compose.yaml in the host, adding extra_hosts in the ro section with the entries that you want to add to /etc/hosts in the RO docker:
ro:
  extra_hosts:
    controller: 1.2.3.4

Then restart the stack:

docker stack rm osm
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm

This is persistent after reboots and restarts of the osm docker stack.

Authentication

What should I check if the VIM authentication is failing?

Typically, you will get the following error messsage:

Error: "VIM Exception vimconnUnexpectedResponse Unauthorized: The request you have made requieres authentication. (HTTP 401)"

If your OpenStack URL is based on HTTPS, OSM will check by default the authenticity of your VIM using the appropriate public certificate. The recommended way to solve this is by modifying /etc/osm/docker/docker-compose.yaml in the host, sharing the host file (e.g. /home/ubuntu/cafile.crt) by adding a volume to the ro section as follows:

 ro:
   ...
   volumes:
     - /home/ubuntu/cafile.crt:/etc/osm/cafile.crt

Then, when creating the VIM, you should use the config option "ca_cert" as follows:

$ # Create the VIM with all the usual options, and add the config option to specify the certificate
$ osm vim-create VIM-NAME ... --config '{ca_cert: /etc/osm/cafile.crt}'

For casual testing, when adding the VIM account to OSM, you can use 'insecure: True' (without quotes) as part of the VIM config parameters:

$ osm vim-create VIM-NAME ... --config '{insecure: True}'   

Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?

The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host.

For instance, in the case of an OpenStack VIM you could try something like this:

$ openstack server create --image ubuntu --flavor m1.small --nic mgmtnet test

If this does not work, typically it is due to one of these issues:

  • Security group policy in your VIM is blocking your traffic (contact your admin to fix it)
  • IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs).

Operational issues

Running out of disk space

If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

docker system prune
docker image prune

If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)

du -sk /var/lib/docker/containers/* |sort -n
docker ps |grep <CONTAINER_ID>

Then, remove the stack and redeploy it again after doing a prune:

docker stack rm osm_metrics
docker system prune
docker image prune
docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics

VCA (juju)

Status is not coherent with running NS

In extraordinary situations, the output of "juju status" could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below:

juju status
juju remove-application <application>
juju resolved <unit> --no-retry        # You'll likely have to run it several times, as it will probably have an error in the next queued hook.Once the last hook is marked resolved, the charm will continue its removal

The following page also shows how to remove different Juju objects

Dump Juju Logs

To dump the Juju debug-logs, run this command:

juju debug-log --replay --no-tail > juju-debug.log

Manual recovery of Juju

If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller.

# Stop and delete all juju containers, then unregister the controller
lxc list
lxc stop juju-*          #replace "*" by the right values
lxc delete juju-*        #replace "*" by the right values
juju unregister -y osm

# Create the controller again 
sg lxd -c "juju bootstrap --bootstrap-series=xenial localhost osm"

# Get controller IP and update it in relevant OSM env files
controller_ip=$(juju show-controller osm|grep api-endpoints|awk -F\' '{print $2}'|awk -F\: '{print $1}')
sudo sed -i 's/^OSMMON_VCA_HOST.*$/OSMMON_VCA_HOST='$controller_ip'/' /etc/osm/docker/mon.env
sudo sed -i 's/^OSMLCM_VCA_HOST.*$/OSMLCM_VCA_HOST='$controller_ip'/' /etc/osm/docker/lcm.env
 
#Get juju password and feed it to OSM env files
function parse_juju_password {
   password_file="${HOME}/.local/share/juju/accounts.yaml"
   local controller_name=$1
   local s='[[:space:]]*' w='[a-zA-Z0-9_-]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $password_file |
   awk -F$fs -v controller=$controller_name '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         if (match(vn,controller) && match($2,"password")) {
             printf("%s",$3);
         }
      }
   }'
}
juju_password=$(parse_juju_password osm)
sudo sed -i 's/^OSMMON_VCA_SECRET.*$/OSMMON_VCA_SECRET='$juju_password'/' /etc/osm/docker/mon.env
sudo sed -i 's/^OSMLCM_VCA_SECRET.*$/OSMLCM_VCA_SECRET='$juju_password'/' /etc/osm/docker/lcm.env

#Restart OSM stack
docker stack rm osm
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm

Slow deployment of charms

You can make deplyment of charms quicker by:

Checking the logs

You can check the logs of any container with the following commands:

docker logs $(docker ps -aqf "name=osm_mon" -n 1)
docker logs $(docker ps -aqf "name=osm_pol" -n 1)
docker logs $(docker ps -aqf "name=osm_lcm" -n 1)
docker logs $(docker ps -aqf "name=osm_nbi" -n 1)
docker logs $(docker ps -aqf "name=osm_light-ui" -n 1)
docker logs $(docker ps -aqf "name=osm_ro.1" -n 1)
docker logs $(docker ps -aqf "name=osm_ro-db" -n 1)
docker logs $(docker ps -aqf "name=osm_mongo" -n 1)
docker logs $(docker ps -aqf "name=osm_kafka" -n 1)
docker logs $(docker ps -aqf "name=osm_zookeeper" -n 1)
docker logs $(docker ps -aqf "name=osm_keystone.1" -n 1)
docker logs $(docker ps -aqf "name=osm_keystone-db" -n 1)
docker logs $(docker ps -aqf "name=osm_prometheus" -n 1)