Commit b6d87b0c authored by Gerardo García's avatar Gerardo García
Browse files

Updated troubleshooting guide: improved structure, added instructions for getting logs

parent fd4ec1d9
Loading
Loading
Loading
Loading
+194 −164
Original line number Diff line number Diff line
@@ -27,7 +27,9 @@ dpkg -l python3-osmclient
ii  python3-osmclient       8.0.0-1            all
```

## Recommended installation to facilitate troubleshooting
## Troubleshooting installation

### Recommended installation to facilitate troubleshooting

It is highly recommended saving a log of your installation:

@@ -35,106 +37,45 @@ It is highly recommended saving a log of your installation:
./install_osm.sh 2>&1 | tee osm_install_log.txt
```

## Common issues and troubleshooting

### Add User in Group

Add the non-root user used for installation in *sudo , lxd, docker* groups

This will skip below error :-

_Finished installation of juju_ Password: **sg: failed to crypt password with previous salt: Invalid argument** ERROR No controllers registered.

### Docker

#### Were all docker images successfully built?

Although controlled by the installer, you can check that the following images exist:

```bash
$ docker image ls

REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
osm/light-ui             latest              1988aa262a97        18 hours ago        710MB
osm/lcm                  latest              c9ad59bf96aa        46 hours ago        667MB
osm/ro                   latest              812c987fcb16        46 hours ago        791MB
osm/nbi                  latest              584b4e0084a7        46 hours ago        497MB
osm/pm                   latest              1ad1e4099f52        46 hours ago        462MB
osm/mon                  latest              b17efa3412e3        46 hours ago        725MB
wurstmeister/kafka       latest              7cfc4e57966c        10 days ago         293MB
mysql                    5                   0d16d0a97dd1        2 weeks ago         372MB
mongo                    latest              14c497d5c758        3 weeks ago         366MB
wurstmeister/zookeeper   latest              351aa00d2fe9        18 months ago       478MB
```
### Recommended checks after installation

#### Are all processes/services running?
#### Checking whether all processes/services are running in docker swarm

```bash
$ docker stack ps osm |grep -i running
docker stack ps osm |grep -i running
```

10 docker containers should be running.

All the 10 services should have at least 1 replica: 1/1
All the services should have at least 1 replica: 1/1

```bash
$ docker service ls

ID                  NAME                MODE                REPLICAS            IMAGE                           PORTS
yuyiqh8ty8pv        osm_kafka           replicated          1/1                 wurstmeister/kafka:latest       *:9092->9092/tcp
y585906h5vy5        osm_lcm             replicated          1/1                 osm/lcm:latest
pcdi5vb86nt9        osm_light-ui        replicated          1/1                 osm/light-ui:latest             *:80->80/tcp
i56jhl5k6re4        osm_mon             replicated          1/1                 osm/mon:latest                  *:8662->8662/tcp
p5wyjtne93hp        osm_mongo           replicated          1/1                 mongo:latest
iz5uncfdzu23        osm_nbi             replicated          1/1                 osm/nbi:latest                  *:9999->9999/tcp
4ttw2v4z2g57        osm_pm              replicated          1/1                 osm/pm:latest
xbg6bclp2anw        osm_ro              replicated          1/1                 osm/ro:latest                   *:9090->9090/tcp
sf7rayfolncu        osm_ro-db           replicated          1/1                 mysql:5
5bl73dhj1xl0        osm_zookeeper       replicated          1/1                 wurstmeister/zookeeper:latest
```

#### Docker image failed to build

##### Err:1 `http://archive.ubuntu.com/ubuntu xenial InRelease`

In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.

Check if the following works:

```bash
docker run busybox nslookup archive.ubuntu.com
paxqvnwwubcf        osm_grafana               replicated          1/1                 grafana/grafana:latest          *:3000->3000/tcp
xkn3jr7ipibf        osm_kafka                 replicated          1/1                 wurstmeister/kafka:latest       *:30002->9092/tcp
px2xfetg68z1        osm_keystone              replicated          1/1                 opensourcemano/keystone:8       *:5000->5000/tcp
62yljr0s97vv        osm_lcm                   replicated          1/1                 opensourcemano/lcm:8
lwtfoh29sb95        osm_light-ui              replicated          1/1                 opensourcemano/light-ui:8       *:80->80/tcp
xjl2vx9t6ogz        osm_mon                   replicated          1/1                 opensourcemano/mon:8            *:8662->8662/tcp
t6r9wjjxqy1v        osm_mongo                 replicated          1/1                 mongo:latest
rmuhwvl5gkgo        osm_mysql                 replicated          1/1                 mysql:5
vjyee8af3a8r        osm_nbi                   replicated          1/1                 opensourcemano/nbi:8            *:9999->9999/tcp
ihdjxn68aa4p        osm_pol                   replicated          1/1                 opensourcemano/pol:8
tnk91kubxfvk        osm_prometheus            replicated          1/1                 prom/prometheus:latest          *:9091->9090/tcp
4e5c49m9x0by        osm_prometheus-cadvisor   replicated          1/1                 google/cadvisor:latest          *:8080->8080/tcp
m1cxap6wkxmf        osm_ro                    replicated          1/1                 opensourcemano/ro:8             *:9090->9090/tcp
97r6t2zrs4ho        osm_zookeeper             replicated          1/1                 wurstmeister/zookeeper:latest
```

If it does not work, you have to configure Docker to use the available DNS.
#### Checking whether all processes/services are running in K8s

```bash
# Get the IP address you’re using for DNS:
nmcli dev show | grep 'IP4.DNS'
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step:
{
   "dns": ["192.168.24.10"]
}
# Restart docker
sudo service docker restart
# Re-run
docker run busybox nslookup archive.ubuntu.com
# Now you should be able to re-run the installer and move past the DNS issue.
kubectl -n osm get all
```

##### TypeError: `unsupported operand type(s) for -=: 'Retry' and 'int'`
### Issues on standard installation

In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the `docker_gwbridge` interface.

```bash
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step:
{
   "mtu": 1458
}
# Restart docker
sudo service docker restart
```

#### Problem deploying stack osm
#### Docker Swarm

##### `network netosm could not be found`

@@ -159,9 +100,9 @@ It usually happens when a `docker system prune` is done with the stack stopped.
                ${OSM_NETWORK_NAME}"
```

### Juju
#### Juju

#### Bootstrap hangs
##### Juju bootstrap hangs

If the Juju bootstrap takes a long time, stuck at this status...

@@ -193,7 +134,7 @@ Next, tail the output of cloud-init to see where the bootstrap is stuck.
lxc exec juju-0383f2-0 -- tail -f /var/log/cloud-init-output.log
```

#### Is Juju running?
##### Is Juju running?

If running, you should see something like this:

@@ -204,7 +145,7 @@ Model Controller Cloud/Region Version SLA
default  osm         localhost/localhost  2.3.7    unsupported
```

#### ERROR controller osm already exists
##### ERROR controller osm already exists

Did OSM installation fail during juju installation with an error like "ERROR controller osm already exists" ?

@@ -243,9 +184,17 @@ juju list-controllers
./install_osm.sh
```

### LXD
##### No controllers registered

The following error appears when the user used for installation does not belong to some groups:

_Finished installation of juju_ Password: **sg: failed to crypt password with previous salt: Invalid argument** ERROR No controllers registered.

To fix it, just add the non-root user used for installation in *sudo , lxd, docker* groups

#### LXD

#### ERROR profile default: `/etc/default/lxd-bridge` has IPv6 enabled
##### ERROR profile default: `/etc/default/lxd-bridge` has IPv6 enabled

Make sure that you follow the instructions in the [Quickstart](01-quickstart.md).

@@ -258,50 +207,72 @@ When dialog messages related to LXD configuration are shown, please answer in th
- << Default values apply for next questions >>
- **Do you want to setup an IPv6 subnet? No**

### Configuration
### Issues on advanced installation (manual build of docker images)

### NBI
#### Manual build of images. Were all docker images successfully built?

#### SSL certificate problem

By default, OSM installer uses a self-signed certificate for HTTPS. That might lead to the error '_SSL certificate problem: self signed certificate_' on the client side. For testing environments, you might want to ignore this error just by using the appropriate options to skip certificate validation (e.g. `--insecure` for curl, `--no-check-certificate` for wget, etc.). However, for more stable setups you might prefer to address this issue by installing the appropriate certificate in your client system.

These are the steps to install NBI certificate on the client side (tested for Ubuntu):
Although controlled by the installer, you can check that the following images exist:

1. Get the certificate file `cert.pem` by any of these means:
  - From running docker container:
```bash
    docker ps | grep nbi
    docker cp <docker-id>:/app/NBI/osm_nbi/http/cert.pem .
$ docker image ls

REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
osm/light-ui             latest              1988aa262a97        18 hours ago        710MB
osm/lcm                  latest              c9ad59bf96aa        46 hours ago        667MB
osm/ro                   latest              812c987fcb16        46 hours ago        791MB
osm/nbi                  latest              584b4e0084a7        46 hours ago        497MB
osm/pm                   latest              1ad1e4099f52        46 hours ago        462MB
osm/mon                  latest              b17efa3412e3        46 hours ago        725MB
wurstmeister/kafka       latest              7cfc4e57966c        10 days ago         293MB
mysql                    5                   0d16d0a97dd1        2 weeks ago         372MB
mongo                    latest              14c497d5c758        3 weeks ago         366MB
wurstmeister/zookeeper   latest              351aa00d2fe9        18 months ago       478MB
```
  - From source code: NBI-folder/osm_nbi/http/cert.pem
  - From ETSI's git:

#### Docker image failed to build

##### Err:1 `http://archive.ubuntu.com/ubuntu xenial InRelease`

In some cases, DNS resolution works on the host but fails when building the Docker container. This is caused when Docker doesn't automatically determine the DNS server to use.

Check if the following works:

```bash
    wget -O cert.pem "https://osm.etsi.org/gitweb/?p=osm/NBI.git;a=blob_plain;f=osm_nbi/http/cert.pem;hb=refs/heads/v8.0"
docker run busybox nslookup archive.ubuntu.com
```
2. Then, you should install this certificate:

If it does not work, you have to configure Docker to use the available DNS.

```bash
     sudo cp cert.pem /usr/local/share/ca-certificates/osm_nbi_cert.pem.crt
     sudo update-ca-certificates
     # 1 added, 0 removed; done
   ```
3. Add to the list of `/etc/hosts` a host called "nbi" with the IP address where OSM is running.
   - It can be `localhost` if client and server are the same machine.
   - For localhost, you would need to add (or edit) these lines:
     ```text
       127.0.0.1     localhost       nbi
       OSM-ip        nbi
# Get the IP address you’re using for DNS:
nmcli dev show | grep 'IP4.DNS'
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the DNS IP address with the output from the previous step:
{
   "dns": ["192.168.24.10"]
}
# Restart docker
sudo service docker restart
# Re-run
docker run busybox nslookup archive.ubuntu.com
# Now you should be able to re-run the installer and move past the DNS issue.
```
4. Finally, for the URL, use the `nbi` as host name (i.e. <httts://nbi:9999/osm>).
   - Do not use neither `localhost` nor 127.0.0.1.
   - You can run a quick test with `curl` by:

##### TypeError: `unsupported operand type(s) for -=: 'Retry' and 'int'`

In some cases, a MTU mismatch between the host and docker interfaces will cause this error while running pip. You can check this by running `ifconfig` and comparing the MTU of your host interface and the `docker_gwbridge` interface.

```bash
     curl https://nbi:9999/osm/version
# Create a new file, /etc/docker/daemon.json, that contains the following (but replace the MTU value with that of your host interface from the previous step:
{
   "mtu": 1458
}
# Restart docker
sudo service docker restart
```

### VIMs
## Common issues with VIMs

#### Is the VIM URL reachable and operational?
### Is the VIM URL reachable and operational?

When there are problems to access the VIM URL, an error message similar to the following is shown after attempts to instantiate network services:

@@ -354,7 +325,7 @@ docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm

This is persistent after reboots and restarts of the osm docker stack.

#### Authentication
### VIM authentication

**What should I check if the VIM authentication is failing?**

@@ -384,6 +355,8 @@ For casual testing, when adding the VIM account to OSM, you can use `'insecure:
$ osm vim-create VIM-NAME ... --config '{insecure: True}'
```

### Issues when trying to access VM from OSM

**Is the VIM management network reachable from OSM (e.g. via ssh, port 22)?**

The simplest check would consist on deploying a VM attached to the management network and trying to access it via e.g. ssh from the OSM host.
@@ -399,36 +372,9 @@ If this does not work, typically it is due to one of these issues:
- Security group policy in your VIM is blocking your traffic (contact your admin to fix it)
- IP address space in the management network is not routable from outside (or in the reverse direction, for the ACKs).

### Operational issues
## Common issues with VCA/Juju

### Running out of disk space

If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

```bash
docker system prune
docker image prune
```

If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)

```bash
du -sk /var/lib/docker/containers/* |sort -n
docker ps |grep <CONTAINER_ID>
```

Then, remove the stack and redeploy it again after doing a prune:

```bash
docker stack rm osm_metrics
docker system prune
docker image prune
docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics
```

### VCA (juju)

#### Status is not coherent with running NS
### Status is not coherent with running NS

In extraordinary situations, the output of `juju status` could show pending units that should have been removed when deleting a NS. In those situations, you can clean up VCA by following the procedure below:

@@ -440,7 +386,7 @@ juju resolved -m <NS_ID> <unit> --no-retry # You'll likely have to run it

The following page also shows [how to remove different Juju objects](https://docs.jujucharms.com/2.1/en/charms-destroy)

#### Dump Juju Logs
### Dump Juju Logs

To dump the Juju debug-logs, run this command:

@@ -450,7 +396,7 @@ juju debug-log --replay --no-tail -m <NS_ID>
juju debug-log --replay --no-tail -m <NS_ID> --include <UNIT>
```

#### Manual recovery of Juju
### Manual recovery of Juju

If juju gets in a corrupt state and you cannot run `juju status` or contact the juju controller, you might need to remove manually the controller and register again, making OSM aware of the new controller.

@@ -502,7 +448,7 @@ docker stack rm osm
docker stack deploy -c /etc/osm/docker/docker-compose.yaml osm
```

#### Slow deployment of charms
### Slow deployment of charms

You can make deployment of charms quicker by:

@@ -511,17 +457,54 @@ You can make deployment of charms quicker by:
- Preventing Juju from running `apt-get update && apt-get upgrade` when starting a machine: [Disable OS upgrades in charms](14-advanced-charm-development.md#disable-os-upgrades)
- Building periodically a custom image that will be used as base image for all the charms: [Custom base image for charms](14-advanced-charm-development.md#build-a-custom-cloud-image)

### Instantiation Errors
## Common instantiation errors

#### File juju_id_rsa.pub not found
### File juju_id_rsa.pub not found

- **ERROR**: `ERROR creating VCA model name 'xxxx': Traceback (most recent call last): File "/usr/lib/python3/dist-packages/osm_lcm/ns.py", line 822, in instantiate await ... [Errno 2] No such file or directory: '/root/.local/share/juju/ssh/juju_id_rsa.pub'`
- **CAUSE**: Normally a migration from release FIVE do not set properly the env for LCM
- **SOLUTION**: Ensure variable **OSMLCM_VCA_PUBKEY** is properly set at file `/etc/osm/docker/lcm.env`. The value must match with the output of this command `cat $HOME/.local/share/juju/ssh/juju_id_rsa.pub`. If not, add or change it. Restart OSM, or just LCM service with `docker service update osm_lcm --force --env-add OSMLCM_VCA_PUBKEY=""`

### NBI Errors
## Common issues whwn interacting with NBI

### SSL certificate problem

By default, OSM installer uses a self-signed certificate for HTTPS. That might lead to the error '_SSL certificate problem: self signed certificate_' on the client side. For testing environments, you might want to ignore this error just by using the appropriate options to skip certificate validation (e.g. `--insecure` for curl, `--no-check-certificate` for wget, etc.). However, for more stable setups you might prefer to address this issue by installing the appropriate certificate in your client system.

These are the steps to install NBI certificate on the client side (tested for Ubuntu):

1. Get the certificate file `cert.pem` by any of these means:
  - From running docker container:
    ```bash
    docker ps | grep nbi
    docker cp <docker-id>:/app/NBI/osm_nbi/http/cert.pem .
    ```
  - From source code: NBI-folder/osm_nbi/http/cert.pem
  - From ETSI's git:
    ```bash
    wget -O cert.pem "https://osm.etsi.org/gitweb/?p=osm/NBI.git;a=blob_plain;f=osm_nbi/http/cert.pem;hb=refs/heads/v8.0"
    ```
2. Then, you should install this certificate:
   ```bash
     sudo cp cert.pem /usr/local/share/ca-certificates/osm_nbi_cert.pem.crt
     sudo update-ca-certificates
     # 1 added, 0 removed; done
   ```
3. Add to the list of `/etc/hosts` a host called "nbi" with the IP address where OSM is running.
   - It can be `localhost` if client and server are the same machine.
   - For localhost, you would need to add (or edit) these lines:
     ```text
       127.0.0.1     localhost       nbi
       OSM-ip        nbi
     ```
4. Finally, for the URL, use the `nbi` as host name (i.e. <httts://nbi:9999/osm>).
   - Do not use neither `localhost` nor 127.0.0.1.
   - You can run a quick test with `curl` by:
     ```bash
     curl https://nbi:9999/osm/version
     ```

#### Cannot login after migration to 6.0.2
### Cannot login after migration to 6.0.2

- **ERROR**: NBI always return "UNAUTHORIZED". Cannot login neither with UI nor with CLI. CLI shows error "`can't find a default project for this user`" or "`project admin not allowed for this user`".
- **CAUSE**: Normally after a migration to release 6.0.2 There is a slight incompatibility with users created from older versions.
@@ -532,6 +515,35 @@ curl --insecure https://localhost:9999/osm/test/db-clear/users
docker service update  osm_nbi --force
```

## Other operational issues

### Running out of disk space

If you are upgrading frequently your OSM installation, you might face that your disk is running out of space. The reason is that the previous dockers and docker images might be consuming some disk space. Running the following two commands should be enough to clear your docker setup:

```bash
docker system prune
docker image prune
```

If you are still experiencing issues with disk space, logs in one of the dockers could be the cause of your issue. Check the containers that are consuming more space (typically kafka-exporter)

```bash
du -sk /var/lib/docker/containers/* |sort -n
docker ps |grep <CONTAINER_ID>
```

Then, remove the stack and redeploy it again after doing a prune:

```bash
docker stack rm osm_metrics
docker system prune
docker image prune
docker stack deploy -c /etc/osm/docker/osm_metrics/docker-compose.yml osm_metrics
```

## Logs

### Checking the logs

You can check the logs of any container with the following commands:
@@ -552,6 +564,24 @@ docker logs $(docker ps -aqf "name=osm_keystone-db" -n 1)
docker logs $(docker ps -aqf "name=osm_prometheus" -n 1)
```

For live debugging, the following commands can be useful to save the log output to a file and show it in the screen:

```bash
docker logs -f $(docker ps -aqf "name=osm_mon.1" -n 1) 2>&1 | tee mon-log.txt
docker logs -f $(docker ps -aqf "name=osm_pol" -n 1) 2>&1 | tee pol-log.txt
docker logs -f $(docker ps -aqf "name=osm_lcm" -n 1) 2>&1 | tee lcm-log.txt
docker logs -f $(docker ps -aqf "name=osm_nbi" -n 1) 2>&1 | tee nbi-log.txt
docker logs -f $(docker ps -aqf "name=osm_light-ui" -n 1) 2>&1 | tee light-log.txt
docker logs -f $(docker ps -aqf "name=osm_ro.1" -n 1) 2>&1 | tee ro-log.txt
docker logs -f $(docker ps -aqf "name=osm_ro-db" -n 1) 2>&1 | tee rodb-log.txt
docker logs -f $(docker ps -aqf "name=osm_mongo" -n 1) 2>&1 | tee mongo-log.txt
docker logs -f $(docker ps -aqf "name=osm_kafka" -n 1) 2>&1 | tee kafka-log.txt
docker logs -f $(docker ps -aqf "name=osm_zookeeper" -n 1) 2>&1 | tee zookeeper-log.txt
docker logs -f $(docker ps -aqf "name=osm_keystone.1" -n 1) 2>&1 | tee keystone-log.txt
docker logs -f $(docker ps -aqf "name=osm_keystone-db" -n 1) 2>&1 | tee keystonedb-log.txt
docker logs -f $(docker ps -aqf "name=osm_prometheus" -n 1) 2>&1 | tee prometheus-log.txt
```

For each container, logs can be found under:

```bash