Installation of an Openvim Compute Node

From OSM Public Wiki
Jump to: navigation, search

Introduction

This article contains the general guidelines to configure a compute node for NFV based on a 64 bits Linux system OS with KVM, qemu and libvirt (e.g. RHEL7.1, RHEL7.0, CentOS 7.1, Ubuntu Server 16.04).

This article is general for all Linux systems, and try to gather all the configuration steps. These steps have not been thoroughly tested in all Linux distros and there are no guarantees that the steps below will be 100% accurate.

For specifics of the installation procedure for a specific distro, follow these links:

Note: Openvim Controller has been tested with servers based on Xeon E5-based Intel processors with Ivy Bridge architecture, and with Intel X520 NICs based on Intel 82599 controller. No tests have been carried out with Intel Core i3, i5 and i7 families, so there are no guarantees that the integration will be seamless.

The configuration that must be applied to the compute node is the following:

  • BIOS setup
  • Install virtualization packages (kvm, qemu, libvirt, etc.)
  • Use a kernel with support of huge page TLB cache in IOMMU
  • Enable IOMMU
  • Enable 1G hugepages, and reserve enough hugepages for running the VNFs
  • Isolate CPUs so that the host OS is restricted to run on the first core of each NUMA node.
  • Enable SR-IOV
  • Enable all processor virtualization features in the BIOS;
  • Enable hyperthreading in the BIOS (optional)
  • Deactivate KSM
  • Pre-provision Linux bridges
  • Additional configuration to allow access from Openvim Controller, including the configuration to access the image repository and the creation of appropriate folders for image on-boarding

A full description of this configuration is detailed below.

BIOS setup

  • Ensure that virtualization options are active. If they are active, the following command should give a non empty output:
egrep "(vmx|svm)" /proc/cpuinfo
  • It is also recommended to activate hyperthreading. If it is active, the following command should give a non empty output:
egrep ht /proc/cpuinfo
  • Ensure no power saving option is enabled.

Installation of virtualization packages

  • Install the following packages in your host OS: qemu-kvm libvirt-bin bridge-utils virt-viewer virt-manager

IOMMU TLB cache support

  • Use a kernel with support huge page TLB cache in IOMMU. For example RHEL7.1, Ubuntu 14.04, or a vanilla kernel 3.14 or higher. In case you are using a kernel without this support, you should update your kernel. For instance, you can use the following kernel for RHEL7.0 (not needed for RHEL7.1):
wget http://people.redhat.com/~mtosatti/qemu-kvm-take5/kernel-3.10.0-123.el7gig2.x86_64.rpm
rpm -Uvh kernel-3.10.0-123.el7gig2.x86_64.rpm --oldpackage

Enabling IOMMU

Enable IOMMU, by adding the following to the grub command line

intel_iommu=on 

Enabling 1G hugepages

Enable 1G hugepages, by adding the following to the grub command line

default_hugepagesz=1G hugepagesz=1G

There are several options to indicate the memory to reserve:

  • At boot option, adding **hugepages=24** at grub, (reserves 24GB)
  • With a hugetlb-gigantic-pages.service for modern kernels. For a RHEL based linux system you need to create a configuration file /usr/lib/systemd/system/hugetlb-gigantic-pages.service with this content
[Unit]
Description=HugeTLB Gigantic Pages Reservation
DefaultDependencies=no
Before=dev-hugepages.mount
ConditionPathExists=/sys/devices/system/node
ConditionKernelCommandLine=hugepagesz=1G
      
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/hugetlb-reserve-pages

[Install]
WantedBy=sysinit.target

Then set the huge pages at each numa node. For instance, in a system with 2 NUMA nodes, in case we want to reserve 4GB for the host OS (2GB on each NUMA node), and all remaining memory for hugepages:

totalmem=`dmidecode --type 17|grep Size |grep MB |gawk '{suma+=$2} END {print suma/1024}'`
hugepages=$(($totalmem-4))
echo $((hugepages/2)) > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
echo $((hugepages/2)) > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
  • Copy the last two lines into /usr/lib/systemd/hugetlb-reserve-pages file for automatic execution after boot

CPU isolation

  • Isolate CPUs so that the host OS is restricted to run on the first core of each NUMA node, by adding the isolcpus field to the grub command line. For instance:
isolcpus=1-9,11-19,21-29,31-39
  • The exact CPU numbers might differ depending on the CPU numbers presented by the host OS. In the previous example, CPUs 0, 10, 20 and 30 are excluded because CPU 0 and its sibling 20 correspond to the first core of NUMA node 0, and CPU 10 and its sibling 30 correspond to the first core of NUMA node 1. Running this awk script will suggest the value to use in your compute node:
gawk 'BEGIN{pre=-2;} ($1=="processor"){pro=$3;} ($1=="core" && $4!=0){ if (pre+1==pro){endrange="-" pro} else{cpus=cpus endrange sep pro; sep=","; endrange="";}; pre=pro;} END{printf("isolcpus=%s\n",cpus endrange);}' /proc/cpuinfo

Deactivating KSM

KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine, a new page is created for that guest virtual machine.

KSM has a performance overhead which may be too large for certain environments or host physical machine systems.

KSM can be deactivated by stopping the ksmtuned and the ksm service. Stopping the services deactivates KSM but does not persist after restarting.

# service ksmtuned stop
Stopping ksmtuned:                                         [  OK  ]
# service ksm stop
Stopping ksm:                                              [  OK  ]

Persistently deactivate KSM with the chkconfig command. To turn off the services, run the following commands:

# chkconfig ksm off
# chkconfig ksmtuned off

Check [RHEL 7 - THE KSM TUNING SERVICE](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/chap-KSM.html) for more information.

Enabling SR-IOV

We assume that you are using Intel X520 NICs (based on Intel 82599 controller) or Intel Fortville NICs. In case you are using other NICs, the configuration might be different.

  • Configure several virtual functions (e.g. 8 is an appropriate value) on each 10G network interface. A larger number can be configured if desired. (This paragraph is provissional, because not allways works for all nic cards!!!)
for iface in `ifconfig -a | grep ": " | cut -f 1 -d":" | grep -v -e "_" -e "\." -e "lo" -e "virbr" -e "tap"`
do
  driver=`ethtool -i $iface| awk '($0~"driver"){print $2}'`
  if [ "$driver" == "i40e" -o "$driver" == "ixgbe" ]
    #Create 8 SR-IOV per PF
    echo 0 >  /sys/bus/pci/devices/`ethtool -i $iface | awk '($0~"bus-info"){print $2}'`/sriov_numvfs
    echo 8 >  /sys/bus/pci/devices/`ethtool -i $iface | awk '($0~"bus-info"){print $2}'`/sriov_numvfs
  fi
done
  • For Niantic X520 NICs the parameter max_vfs must be set to workaround a bug with the ixgbe driver managing VFs by the sysfs interface:
echo "options ixgbe max_vfs=8" >> /etc/modprobe.d/ixgbe.conf
  • Blacklist the ixgbevf module, by adding the following to the grub command line. The reason for blacklisting this driver is because it causes that the VLAN tag of broadcast packets is not properly removed when received by an SRIOV port.
modprobe.blacklist=ixgbevf

Pre-provision of Linux bridges

Openvim relies on Linux bridges to interconnect VMs when there are no high performance requirements for I/O. This is the case of control plane VNF interfaces that are expected to carry a small amount of traffic.

A set of Linux bridges must be pre-provisioned on every host. Every Linux bridge must be attached to a physical host interface with a specific VLAN. In addition, a external switch must be used to interconnect those physical host interfaces. Bear in mind that the host interfaces used for data plane VM interfaces will be different from the host interfaces used for control plane VM interfaces.

For example, in RHEL7.0, to create a bridge associated to the physical "em1" interface, it is needed to add two files per bridge at /etc/sysconfig/network-scripts folder:

  • File with name ifcfg-virbrManX with the content:
DEVICE=virbrManX
TYPE=Bridge
ONBOOT=yes
DELAY=0
NM_CONTROLLED=no
USERCTL=no
  • File with name em1.200X (using vlan tag 200X)
DEVICE=em1.200X
ONBOOT=yes
NM_CONTROLLED=no
USERCTL=no
VLAN=yes
BOOTPROTO=none
BRIDGE=virbrManX

The name of the bridge and the VLAN tag can be different. In case you use a different name for the bridge, you should take it into account in 'openvimd.cfg'.

Additional configuration to allow access from openvim

  • Uncomment the following lines of **/etc/libvirt/libvirtd.conf** to allow external connection to libvirtd:
unix_sock_group = "libvirt"
unix_sock_rw_perms = "0770"
unix_sock_dir = "/var/run/libvirt"
auth_unix_rw = "none"
  • Create and configure a user to access the compute node from openvim. The user must belong to group libvirt.
#creates a new user 
useradd -m -G libvirt <user>
#or modified an existing user
usermod -a -G libvirt <user>
  • Allow <user> to get root privileges without password, for example all members of group libvirt:
sudo visudo # add the line:   %libvirt ALL=(ALL) NOPASSWD: ALL
       
  • Copy the ssh key of openvim into compute node. From the machine where OPENVIM is running (not from the compute node), run:
ssh-keygen  #needed for generate ssh keys if not done before
ssh-copy-id <user>@<compute host>
  • After that, ensure that you can access directly without password prompt from openvim to compute host:
ssh <user>@<compute host>
  • Configure access to image repository
    • The way that openvim deals with images is a bit different from other CMS. Instead of copying the images when doing the on-boarding, openvim assumes that images are locally accessible on each compute node on a local folder, identical for all compute nodes. This does not mean that the images are forced to be copied on each compute node disk.
    • Typically this can be done by storing all images in a remote shared location accessible by all compute nodes through a NAS file system and mounting locally the shared folder via NFS on a specific local folder with identical on each compute node.
    • VNF descriptors contain image paths pointing to a location on that folder. When doing the on-boarding, the image will be copied from the image path (accessible through NFS) to the on-boarding folder, whose configuration is described next.
  • Create a local folder for image on-boarding and grant access from openvim. A local folder for image on-boarding must be created on each compute note (in default configuration, we assume that the folder is /opt/VNF/images). This folder must be created in a disk with enough space to store the images of the active VMs. If there is only a root partition in the server, the recommended procedure is to link the openvim required folder to the standard libvirt folder for holding images:
mkdir -p /opt/VNF/
ln -s /var/lib/libvirt/images /opt/VNF/images
chown -R <user>:nfvgroup /opt/VNF
chown -R root:nfvgroup /var/lib/libvirt/images
chmod g+rwx /var/lib/libvirt/images
  • In case there is a partition (e.g. "/home") that contains more disk space than the "/" partition, we suggest to use that partition, although a soft link can be created anywhere else. As an example, this is what our script for automatic installation in RHEL7.0 does:
mkdir -p /home/<user>/VNF_images
rm -f /opt/VNF/images
mkdir -p /opt/VNF/
ln -s /home/<user>/VNF_images /opt/VNF/images
chown -R <user> /opt/VNF
  • Besides, access to that folder must be granted to libvirt group in a SElinux system.
# SElinux management
semanage fcontext -a -t virt_image_t "/home/<user>/VNF_images(/.*)?"
cat /etc/selinux/targeted/contexts/files/file_contexts.local |grep virt_image
restorecon -R -v /home/<user>/VNF_images

Compute node configuration in special cases

Datacenter with different types of compute nodes

In a datacenter with different types of compute nodes, it might happen that compute nodes use different interface naming schemes. In that case, you can take the most used interface naming scheme as the default one, and make an additional configuration in the compute nodes that do not follow the default naming scheme.

In order to do that, you should create the file **hostinfo.yaml** file inside the image local folder (e.g. typically /opt/VNF/images). It contains entries with:

openvim-expected-name: local-iface-name

For example, if openvim contains a network using macvtap to the physical interface em1 (macvtap:em1) but in this compute node the interface is called eth1, creates a **local-image-folder/hostinfo.yaml** file with this content:

em1: eth1

Configure compute node in 'developer' mode

In order to test a VM, it is not really required to have a full NFV environment with 10G data plane interfaces and Openflow switches. If the VM is able to run with virtio interfaces, you can configure a compute node in a simpler way and use the 'developer mode' in openvim. In that mode, during the instantiation phase, VMs are deployed without hugepages and with all data plane interfaces changed to virtio interfaces. It must be noticed that openvim flavors do not change and keep identical (including all EPA attributes), but openvim performs an intelligent translation during the instantiation phase.

The configuration of a compute node to be used in 'developer mode' removes the configuration that is not needed for testing purposes, that is:

  • IOMMU configuration is not required since no passthrough or SR-IOV interfaces will be used
  • Huge pages configuration is unnecessary. All memory will be assigned in 4KB pages, allowing oversubscription (as in traditional clouds).
  • No configuration of data plane interfaces (e.g. SR-IOV) is required.

A VNF developer will typically use the developer mode in order to test its VNF in its own computer. Although part of the configuration is not required, the rest of the compute node configuration is still necessary. In order to prepare your own computer or a separate one as a compute node for developing purposes, you can use the script configure-compute-node-develop.sh, that can be found in OSM/openvim repo, under the scripts folder.

In order to execute the script, just run this command:

sudo ./configure-compute-node-develop.sh <user> <iface>