How to install OpenStack Kolla-ansible on top of a Bright Cluster

How to install OpenStack Kolla-ansible on top of a Bright Cluster with CentOS 8

** This document was tested on Bright 9.0 with CentOS 8.1

Preparing the Cluster

In this section, we will walk through the steps required to prepare the cluster to be a Kolla-Ansible OpenStack cluster.

The head node of the cluster will be used as the deployment host only. In theory, the head node can also be used as a controller/network/hypervisor/storage/etc but it’s not recommended to use the head node of the cluster in the OpenStack setup because the Kolla-Ansible will be creating a new instance of MariaDB for the OpenStack cluster beside the other configuration changes required by the docker containers on the controller/network nodes.

Kolla-Ansible ease of configuration and deployment highly depends on the uniformity of the network interface names which will be used by OpenStack on the controller/network/hypervisors/storage/etc. nodes, so it’s highly recommended to unify the network interface names across all the nodes, including the head node if it will be used as the deployment node.

Unifying the network interface names on CentOS 8 can be done in several ways:

Switch to traditional naming conventions for network interfaces:

For the compute nodes, this can be done by appending net.ifname=0 to the kernel parameters in the software image:

# cmsh
% softwareimage use <image-name>
% append kernelparameters ” net.ifnames=0″
% commit

For the head node(s), this can be done by updating the GRUB_CMDLINE_LINUX parameters in /etc/default/grub.conf and re-writing the /boot/grub2/grub.conf

# grep net /etc/default/grub
GRUB_CMDLINE_LINUX=”vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.driver.blacklist=nouveau net.ifnames=0″
# grub2-mkconfig -o /boot/grub2/grub.cfg
# reboot

Specify the MAC address and device name in the corresponding ifcfg-* files.

Kolla-ansible is installed via “python-pip” so it’s mandatory to update “pip” to the latest:

# pip3 install -U pip

By default, Bright assumes that there are no important data stored on the compute nodes so each time a compute node is rebooted, the software image is re-synced to the local drive of the compute node resulting in adding any missing data or wiping out any data which don’t exist in the software image.

To avoid wiping out the data that will be introduced by the Kolla-ansible deployment, we will need to add the following paths to the sync/update exclude lists:

– /var/lib/docker
– /var/lib/docker/*
– /etc/kolla
– /etc/kolla/*

To avoid pulling unnecessary data from the compute nodes that will be deployed by the Kolla-ansible deployment, we will need to add the following paths to the grab exclude lists:

– /var/lib/docker
– /var/lib/docker/*
– /etc/kolla
– /etc/kolla/*

** Note: The exclude lists are better updated on the category level. If the setup will only include one compute node, then it may be better to update the exclude lists directly on the node.

Installing Ansible

Kolla-ansible requires ansible version 2.8 or higher and up to version 2.9 as of writing this document (June 2020)

yum install epel-release -y
yum install ansible -y

Configuring Ansible (optional)

Tuning the configurations of Ansible may improve the deployment time of Kolla-ansible so it’s suggested to do tune the ansible configurations for better results:

Increase the value of the forks as you see necessary. The forks parameter controls how many hosts can be configured in parallel, so if you have the processing power on the head node to run several forks, then it’s a good practice to increase the number of forks.
Disable host_key_checking. Ansible has host key checking enabled by default. If a host is reinstalled and has a different key in ‘known_hosts’, then this will result in an error message until corrected. If a host is not initially in ‘known_hosts’ this will result in prompting for confirmation of the key, which will require an interactive intervention.
Enable pipelining. Enabling pipelining reduces the number of SSH operations required to execute a module on the remote server. This can result in a significant performance improvement when enabled. By default, this option is disabled.

# grep -E “forks|host_key_checking|pipelining” /etc/ansible/ansible.cfg | grep -v “^#”
forks = 100
host_key_checking = False
pipelining = True

Installing Kolla-ansible

Kolla-ansible and its dependencies can be installed using python pip:

# pip install kolla-ansible

Create default configuration directory. By default, Kolla-ansible expects the global configurations to be stored under /etc/kolla on the deployment node.

# mkdir /etc/kolla

Copy default globals.yml and password.yml templates from Kolla-ansible installation directory:

# cp -r /usr/local/share/kolla-ansible/etc_examples/kolla/* /etc/kolla

Copy all-in-one and multinode inventory files to the current directory.

# cp /usr/local/share/kolla-ansible/ansible/inventory/* .

Preparing initial Kolla-ansible configurations

The globals.yml is the main configuration file for Kolla-Ansible. There are a few parameters that are required to deploy Kolla-ansible:

Kolla-ansible provides container images for deploying OpenStack on several Linux distributions as host system: CentOS 7/8, Ubuntu 18.04, Debian and RHEL 7/8. For the purpose of this document, we will use CentOS 8 as the host system. Using kolla_base_distro parameter, “centos” can be specified to refer to CentOS as base distribution.
Type of installation can be either binary: using repositories like apt or yum or source: using raw source archives, git repositories, or local source directory. According to the official Kolla-ansible documentation, type source is proven to be slightly more reliable. The install type can be specified using the kolla_install_type parameter.
The release of OpenStack to be deployed has to be specified using the openstack_release parameter. For the purpose of this document, we will use the OpenStack train.
Kolla-ansible requires two main networking options, network_interface: the default interface for multiple management-type networks (tenants networks) and neutron_external_interface: a dedicated network interface for Neutron external/public networks (provider networks) which can be vlan or flat, depends on how the networks are created. This interface should be active without an IP address on controller/network nodes. Otherwise, OpenStack instances won’t be accessible from the external networks. For the purpose of this document, we will use flat networks.
A floating IP for management traffic should be specified using kolla_internal_vip_address. This IP will be managed by keepalived to provide high availability and should be a free IP address in the internal management network that is connected to the same network_interface.
Additional services can be enabled. For the purpose of this document, we will enable the cinder service using enable_cinder and enable_cinder_backend_nfs parameters. The cinder service with NFS backend requires an additional configuration file that has to be placed under /etc/kolla/config/ with the name of nfs_shares. This file includes the name/IP of the NFS servers and the path on the server. The file can include several NFS servers and different paths:

# cat /etc/kolla/config/nfs_shares
10.141.0.1:/cinder
# grep -vE “^#|$^” /etc/kolla/globals.yml
—
kolla_base_distro: “centos”
kolla_install_type: “source”
openstack_release: “ussuri”
kolla_internal_vip_address: “10.141.255.245”
network_interface: “eth0”
neutron_external_interface: “eth1”
enable_cinder: “yes”
enable_cinder_backend_nfs: “yes”

** Please make sure that the NFS server can accept mount requests from the OpenStack nodes. Otherwise, OpenStack will fail to launch instances because it will fail to allocate storage for them.

** OpenStack release “train” is used with kolla-ansible 9.1.0. If a higher version of kolla-ansible is to be used, then another release for OpenStack should be used. For example, kolla-ansible version 10.0.0 can be used with OpenStack release “ussuri”.

The multinode template file is an ansible inventory file that has the main server groups for different OpenStack components that will be installed.

** DO NOT REMOVE THE CONTENT OF THE “multinode” FILE. WHAT IS SHOWN IN THE FOLLOWING SNIPPET IS JUST THE FIRST 20 LINES WHICH YOU NEED TO EDIT. THE REST WILL REMAIN THE SAME AND SHOULD NOT BE REMOVED.

# grep -vE “^#|$^” multinode | head -n 20
[control]
node001node002node003
[network]
node001node002node003
[compute]
node001node002node003
[monitoring]
node001node002node003
[storage]
localhost ansible_connection=local
[deployment]
localhost ansible_connection=local
[baremetal:children]
control
network
compute
storage
monitoring

Deploying Kolla-Ansible

Make sure that the nodes specified in the inventory are all reachable:

# ansible -i multinode all -m ping

The template file /etc/kolla/passwords.yml is a placeholder for all the passwords that will be used during the kolla-ansible deployment:

# grep -vE “^#|$^” /etc/kolla/passwords.yml | head
—
ceph_cluster_fsid:
ceph_rgw_keystone_password:
rbd_secret_uuid:
cinder_rbd_secret_uuid:
database_password:
mariadb_backup_database_password:
docker_registry_password:
opendaylight_password:
vmware_dvs_host_password:

All passwords in passwords.yml are blank and have to be filled either manually or by running a random password generator:

# kolla-genpwd

Bootstrap all nodes with Kolla deploy dependencies. This will include installing packages and writing out configuration files:

# kolla-ansible -i ./multinode bootstrap-servers

Run the prechecks for deployment for hosts. This will make sure that the packages have been installed successfully and the configuration files do exist in the expected location:

# kolla-ansible -i ./multinode prechecks

If the previous two steps did run without failures, then you can proceed to deploy OpenStack:

# kolla-ansible -i ./multinode deploy

After deployment is successful, an openrc file must be generated wherein the credentials for admin users are listed. By running the following command, an admin-openrc.sh file will be generated under /etc/kolla directory on the deployment (head) node:

# kolla-ansible post-deploy

Now the changes which has been applied to the compute nodes have to be grabbed back to the software image of the compute nodes to avoid losing them:

# cmsh
% device use node001
% grabimage -w

Make sure that SELinux is still disabled and not set to permissive:

# grep -w SELINUX /cm/images/default-image/etc/selinux/config | grep -v “^#”
SELINUX=disabled

Installing OpenStack CLI client

To start using OpenStack which has just been deployed, OpenStack CLI needs to be installed first and the openrc file needs to be sourced:

# pip install python-openstackclient
# . /etc/kolla/admin-openrc.sh

Initializing the OpenStack environment for the first use

To start using OpenStack to create instances and access them from the external network, the “/usr/local/share/kolla-ansible/init-runonce” has to be run. This script uses “10.0.2.0/24” as the external network. In most cases, the init-runonce has to be adjusted to match the external network:

# This EXT_NET_CIDR is your public network,that you want to connect to the internet via.
ENABLE_EXT_NET=${ENABLE_EXT_NET:-1}
EXT_NET_CIDR=${EXT_NET_CIDR:-‘10.0.2.0/24’}
EXT_NET_RANGE=${EXT_NET_RANGE:-‘start=10.0.2.150,end=10.0.2.199’}
EXT_NET_GATEWAY=${EXT_NET_GATEWAY:-‘10.0.2.1’}
[…]
if [[ $ENABLE_EXT_NET -eq 1 ]]; then
openstack network create –external –provider-physical-network physnet1 \
–provider-network-type flat public1
openstack subnet create –no-dhcp \
–allocation-pool ${EXT_NET_RANGE} –network public1 \
–subnet-range ${EXT_NET_CIDR} –gateway ${EXT_NET_GATEWAY} public1-subnet
fi
openstack network create –provider-network-type vxlan demo-net
openstack subnet create –subnet-range 10.0.0.0/24 –network demo-net \
–gateway 10.0.0.1 –dns-nameserver 8.8.8.8 demo-subnet

Typically, provider networks are directly associated with a physical network but that is not a requirement. Users create tenant networks for connectivity within projects. By default, these tenant networks are fully isolated and are not shared with other projects. OpenStack Networking supports the following types of network isolation and overlay technologies:

Flat

All instances reside on the same network, which can also be shared with the hosts. No VLAN tagging or other network segregation takes place.

VLAN

Networking allows users to create multiple provider or tenant networks using VLAN IDs (802.1Q tagged) that correspond to VLANs present in the physical network. This allows instances to communicate with each other across the environment. They can also communicate with dedicated servers, firewalls, load balancers, and other networking infrastructure on the same layer 2 VLAN.

GRE and VXLAN

VXLAN and GRE are encapsulation protocols that create overlay networks to activate and control communication between compute instances. A Networking router is required to allow traffic to flow outside of the GRE or VXLAN tenant network. A router is also required to connect directly-connected tenant networks with external networks, including the Internet. The router provides the ability to connect to instances directly from an external network using floating IP addresses.

For the purpose of this document, we will create a provider network of type flat. After adjusting the init-runonce script and running it, there will be provider and tenant networks created connected by a router. Next, we will deploy the first OpenStack instance and access it from the external network:

Deploying the first Instance

OpenStack dashboard can be accessed via the Virtual IP (VIP) which has been configured earlier in the process. The following steps will show how to create the first instance and be able to access it from the external network using the OpenStack CLI. Similar steps can be following from the OpenStack dashboard to achieve similar results.

Create a demo2 instance and attach it to the demo-net created by the init-runonce script:

# openstack server create --image cirros --flavor m1.tiny --key-name mykey --network demo-net demo2

Allocate a floating IP on the public (provider) network created by the init-runonce script:

# openstack floating ip create public1
# openstack floating ip list -c "Floating IP Address" -c "Port"                                                                                                                      
+---------------------+--------------------------------------+                                                                                                                                             
| Floating IP Address | Port                                 |                                                                                                                                             
+---------------------+--------------------------------------+                                                                                                                                             
| 192.168.1.183       | ab1fa4b7-3e8d-4bcc-8b48-a53ad3360fa2 |                                                                                                                                             
| 192.168.1.158       | None                                 |                                                                                                                                             
+---------------------+--------------------------------------+

Associate the floating IP with the demo2 instance created in the first step:

# openstack server add floating ip demo2 192.168.1.158

As a sanity check, make sure that the name spaces are updated correctly on the controller/network node:

# ip netns exec qrouter-21c38859-4375-4bb4-9207-c74e45e12602 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
16: qr-6f76c4b6-21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:8e:f1:7c brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 brd 10.0.0.255 scope global qr-6f76c4b6-21
      valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe8e:f17c/64 scope link
      valid_lft forever preferred_lft forever
17: qg-3660ae4c-b8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:51:81:b3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.189/24 brd 192.168.1.255 scope global qg-3660ae4c-b8
      valid_lft forever preferred_lft forever
    inet 192.168.1.183/32 brd 192.168.1.183 scope global qg-3660ae4c-b8
      valid_lft forever preferred_lft forever
    inet 192.168.1.158/32 brd 192.168.1.158 scope global qg-3660ae4c-b8
      valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe51:81b3/64 scope link
      valid_lft forever preferred_lft forever

SSH into the OpenStack instance using the associated floating IP address from the external network:

# ssh -l cirros 192.168.1.158
Warning: Permanently added '192.168.1.158' (ECDSA) to the list of known hosts.
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast qlen 1000
    link/ether fa:16:3e:08:bd:30 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.67/24 brd 10.0.0.255 scope global eth0
      valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe08:bd30/64 scope link
      valid_lft forever preferred_lft forever
$

*** NEW SECTION ***

How to integrate Kolla-ansible with an external Ceph Storage

Using Ceph as backend storage for different OpenStack services greatly reduces network traffic and increases performance since Ceph can clone images/volumes/ephemeral disks instead of copying them. In addition, it makes migrating between OpenStack deployments much simpler.

Preparing the Ceph Storage Pools for OpenStack Services

Glance Service

By default, i.e. when not using Ceph, Glance images are stored locally on the controller nodes. When they are needed for creating a VM they are copied to the compute hosts (hypervisors). The hypervisors can then cache those images. But the images need to be copied again, every time an image is updated, or whenever a hypervisor undergoes a FULL install.

If Glance is configured with Ceph as the storage backend, things work differently. In such case, Glance stores images in Ceph, not on the head node. Depending on the format of the image, they might still be downloaded from Ceph to the hypervisor node (and get cached there).

If both Glance and Cinder are configured to use Ceph, it’s possible to avoid copying the image from Ceph into the Hypervisors. This can be done using ‘copy-on-write’ (CoW).

With CoW, the storage volume of a VM is thinly pre-provisioned (from the image) directly in the Ceph backend. In other words, the image does not have to be copied from glance into the hypervisor, nor from hypervisor to Cinder.

If Ceph is available we recommend to use it both for Glance, and Cinder so that the two services can make use of CoW. Note that for CoW to work, image format of images stored in glance needs to be “raw”. Using CoW often results in best VM creation times, and least load on the infrastructure. The drawback of using CoW is increased disk I/O latency if the VM ends up modifying existing files on it’s CoW-created volume. This is because data needs to be deduplicated the first time a unique block is written.

To configure Glance with Ceph, on the Ceph cluster, run the following commands:

Create a pool for glance images:

# ceph osd pool create images 64

Create a user for glance service which has write privileges to the pool created in the previous step and store its keyring:

# ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' -o /etc/ceph/ceph.client.glance.keyring

Verify if the pool has been created properly and user has the required privileges:

# ceph osd pool ls
images
# rados lspools
images
# ceph auth ls
[...]
client.glance
        key: AQDniwRfNVzlJRAAKa9QDXlMGDerCPY7J8pAGw==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images
[...]

** Note: How to choose Placement Group Number for a Ceph pool:

When creating a Ceph OSD pool, it is mandatory to specify a value for placement group number (pg_num) because it cannot be calculated automatically. From the Ceph online manual, we list here some recommendations:

Less than 5 OSDs set pg_num to 128.
Between 5 and 10 OSDs set pg_num to 512.
Between 10 and 50 OSDs set pg_num to 4096.
If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the pg_num value by yourself.
For calculating pg_num value by yourself please take help of pgcalc tool.

As the number of OSDs increases, choosing the right value for pg_num becomes more important because it has a significant influence on the behavior of the Ceph cluster as well as the durability of the data when something goes wrong (i.e. the probability that a catastrophic event leads to data loss).

Cinder Service

Cinder is the block storage service in OpenStack. It’s responsible for storing volumes and volume snapshots that can be attached to VMs or can be used to launch instances. In this document, we have demonstrated how to use NFS as a backend storage for Cinder service.

Cinder can be configured to use Ceph as its backend storage instead of NFS.

On the Ceph cluster, run the following commands:

Create a pool for cinder volumes:

# ceph osd pool create volumes 32

Create a user for cinder service which has write privileges to the pool created in the previous step and store its keyring:

# ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images' -o /etc/ceph/ceph.client.cinder.keyring

Verify if the pool has been created properly and user has the required privileges:

# ceph osd pool ls
images
volumes
#ceph auth ls
[...]
client.cinder
        key: AQAZeQRfpfx0DRAAigGXZf4jbas9jQNDMUcIog==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images
[...]

Nova Service

Nova is the compute service within OpenStack. By default, Nova stores ephemeral disks associated with running VMs locally on the hypervisors under /var/lib/nova/instances. There are a few drawbacks to using local storage on hypervisors, large images can cause filesystem to fill up, thus crashing compute nodes and a disk crash on hypervisor could cause loss of virtual disk and as such a VM recovery would be impossible. (if VMs are used with Cinder-manager volumes, those volumes are not stored on the hypervisors, they are stored in whatever’s configured as the Cinder backend)

Nova can be configured to use Ceph as its backend storage.

On the Ceph cluster, run the following commands:

Create a pool for nova ephemeral disks:

# ceph osd pool create vms 128

Create a user for nova service which has write privileges to the pool created in the previous step and store its keyring:

# ceph auth get-or-create client.nova mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=vms, allow rx pool=images' -o /etc/ceph/ceph.client.nova.keyring

Verify if the pool has been created properly and user has the required privileges:

# ceph osd pool ls
vms
images
volumes
[...]
# ceph auth ls
[...]
client.nova
        key: AQB6dwRfgrhaGRAAxXB/6YKcXIW+eYdoiSSMlg==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=vms, allow rx pool=images
[...]

Preparing Kolla-ansible configurations to use Ceph Storage as backend storage for Glance/Cinder/Nova services

The globals.yml is the main configuration file for Kolla-Ansible. In addition to the parameters mentioned at the very beginning of this document, there are a few other parameters that are required to enable several OpenStack services to use Ceph as their backend storage:

The parameter ceph_{ service }_user is used to specify the Ceph user that will be used to write the Ceph pool for a particular service. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.
The parameter ceph_{ service }_pool_name is used to specify the name of the pool created in Ceph which will be used by a particular service. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.
For Ceph users to be able to read/write Ceph pools a keyring is required. The keyring is specified by ceph_{ service }_keyring which points to the location of the keyring file for specific service. The variable “{ service }” should be replaced with the service name like glance, cinder or nova.
The parameter { service }_backend_ceph is used to enable/disable a particular service to use Ceph as its backend storage. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.

# grep -vE "^#|$^" /etc/kolla/globals.yml
---
kolla_base_distro: "centos"
kolla_install_type: "source"
openstack_release: "train"
kolla_internal_vip_address: "10.141.255.245"
network_interface: "eth0"
neutron_external_interface: "eth1"
enable_cinder: "yes"
ceph_glance_keyring: ceph.client.glance.keyring
ceph_glance_user: glance
ceph_glance_pool_name: images
ceph_cinder_keyring: ceph.client.cinder.keyring
ceph_cinder_user: client.admin
ceph_cinder_pool_name: volumes
ceph_cinder_backup_keyring: ceph.client.cinder-backup.keyring
ceph_cinder_backup_user: client.admin
ceph_cinder_backup_pool_name: backups
ceph_nova_keyring: ceph.client.nova.keyring
ceph_nova_user: nova
ceph_nova_pool_name: vms
glance_backend_ceph: "yes"
cinder_backend_ceph: "yes"
nova_backend_ceph: "yes"

** Note: the “ceph_cinder_backup_pool_name” is the same as the “ceph_cinder_pool_name”, however, it’s not strictly necessary to have both pools the same. You can create a separate pool for cinder_backup_pool.

Glance Configuration

The configuration files which will tell Glance how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:

Create a configuration directory for glance:

# mkdir -p /etc/kolla/config/glance

Copy ceph.conf to the same directory created in the previous step:

# cp /etc/ceph/ceph.conf /etc/kolla/config/glance/

Copy the keyring to the same directory:

# cp /etc/ceph/ceph.client.glance.keyring /etc/kolla/config/glance/

Create glance-api.conf in the same directory with the following contents:

# cat /etc/kolla/config/glance/glance-api.conf
[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf

Cinder Configuration

The configuration files which will tell cinder how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:

Create a configuration directory for cinder:

# mkdir -p /etc/kolla/config/cinder/{cinder-volume,cinder-backup}

Copy ceph.conf to the same directory created in the previous step:

# cp /etc/ceph/ceph.conf /etc/kolla/config/cinder/ceph.conf

Copy the keyring to the same directory:

# cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-volume/ceph.client.cinder.keyring
# cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-backup/ceph.client.cinder.keyring
# cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-backup/ceph.client.cinder-backup.keyring

Create cinder-volume.conf and cinder-backup.conf under /etc/kolla/config/cinder directory with the following contents:

# cat /etc/kolla/config/cinder/cinder-volume.conf
[DEFAULT]
enabled_backends=rbd-1

[rbd-1]
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=cinder
backend_host=rbd:volumes
rbd_pool=volumes
volume_backend_name=rbd-1
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_secret_uuid = {{ cinder_rbd_secret_uuid }}

# cat /etc/kolla/config/cinder/cinder-backup.conf
[DEFAULT]
backup_ceph_conf=/etc/ceph/ceph.conf
backup_ceph_user=cinder
backup_ceph_chunk_size = 134217728
backup_ceph_pool=volumes
backup_driver = cinder.backup.drivers.ceph.CephBackupDriver
backup_ceph_stripe_unit = 0
backup_ceph_stripe_count = 0
restore_discard_excess_bytes = true

Note: {{ cinder_rbd_secret_uuid }} can be found in /etc/kolla/passwords.yml

Nova Service

The configuration files which will tell cinder how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:

Create a configuration directory for nova:

# mkdir -p /etc/kolla/config/nova

Copy ceph.conf to the same directory created in the previous step:

# cp /etc/ceph/ceph.conf /etc/kolla/config/nova/

Copy the nova and cinder client keyrings to the same directory:

# cp /etc/ceph/ceph.client.nova.keyring /etc/kolla/config/nova/ceph.client.nova.keyring
# cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/nova/ceph.client.cinder.keyring

Create nova-compute.conf in the same directory with the following contents:

# cat /etc/kolla/config/nova/nova-compute.conf
[libvirt]
images_rbd_pool=vms
images_type=rbd
images_rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=nova

Verify if OpenStack is writing to Ceph Backend Storage

After deploying kolla-ansible following the steps mentioned earlier, you can verify if OpenStack is writing data to Ceph by inspecting Ceph storage itself:

Test creating images (glance service)

# openstack image create --disk-format qcow2 --container-format bare --file /tmp/cirros-0.4.0-x86_64-disk.img cirros

# rados df
POOL_NAME   USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS     RD WR_OPS     WR USED COMPR UNDER COMPR
images    37 MiB       8      0     24                  0       0        0     90 70 KiB     21 12 MiB        0 B         0 B
vms          0 B       0      0      0                  0       0        0      0    0 B      0    0 B        0 B         0 B
volumes      0 B       0      0      0                  0       0        0      0    0 B      0    0 B        0 B         0 B

total_objects    8
total_used       3.0 GiB
total_avail      297 GiB
total_space      300 GiB

# rados -p images ls
rbd_data.179c773a98d0.0000000000000001
rbd_object_map.179c773a98d0.0000000000000004
rbd_directory
rbd_info
rbd_object_map.179c773a98d0
rbd_header.179c773a98d0
rbd_id.9223ba8b-ddbd-4b69-a5ea-db1fb31ee1e0
rbd_data.179c773a98d0.0000000000000000

As you can see in the previous output, after creating an image, it is present in the Ceph storage.

Test creating a volume (cinder service)

# openstack volume create --size 10 --availability-zone nova mynewvolume
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2020-07-09T15:59:17.000000           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | 1ee10849-f948-46b0-ae33-d7bbd5733c6a |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | mynewvolume                          |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 10                                   |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | __DEFAULT__                          |
| updated_at          | None                                 |
| user_id             | be4c99e41dab462a881b55585dbcfe15     |
+---------------------+--------------------------------------+

# rados df
POOL_NAME    USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS      RD WR_OPS     WR USED COMPR UNDER COMPR
images    112 MiB      20      0     60                  0       0        0    267 209 KiB     59 36 MiB        0 B         0 B
vms           0 B       0      0      0                  0       0        0      0     0 B      0    0 B        0 B         0 B
volumes   576 KiB       5      0     15                  0       0        0     36  27 KiB      6  6 KiB        0 B         0 B

# rados -p volumes ls
rbd_header.a554d675abc8
rbd_directory
rbd_object_map.a554d675abc8
rbd_id.volume-1ee10849-f948-46b0-ae33-d7bbd5733c6a
rbd_info

As you can see in the previous output, after creating a volume, it is present in the Ceph storage.

Test creating a VM (nova service)

# openstack server create --image cirros --flavor m1.tiny --key-name mykey --network demo-net demo1
+-------------------------------------+-----------------------------------------------+
| Field                               | Value                                         |
+-------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                        |
| OS-EXT-AZ:availability_zone         |                                               |
| OS-EXT-SRV-ATTR:host                | None                                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None                                          |
| OS-EXT-SRV-ATTR:instance_name       |                                               |
| OS-EXT-STS:power_state              | NOSTATE                                       |
| OS-EXT-STS:task_state               | scheduling                                    |
| OS-EXT-STS:vm_state                 | building                                      |
| OS-SRV-USG:launched_at              | None                                          |
| OS-SRV-USG:terminated_at            | None                                          |
| accessIPv4                          |                                               |
| accessIPv6                          |                                               |
| addresses                           |                                               |
| adminPass                           | JqLnjPKb5Awj                                  |
| config_drive                        |                                               |
| created                             | 2020-07-09T16:07:47Z                          |
| flavor                              | m1.tiny (1)                                   |
| hostId                              |                                               |
| id                                  | a2fdb85d-d6bf-4150-ab96-dd22b7d4e9d4          |
| image                               | cirros (2de81533-2410-4c67-baa4-5260cfb9f1a8) |
| key_name                            | mykey                                         |
| name                                | demo1                                         |
| progress                            | 0                                             |
| project_id                          | b108a02fe6b24fffa88b7de24e05b685              |
| properties                          |                                               |
| security_groups                     | name='default'                                |
| status                              | BUILD                                         |
| updated                             | 2020-07-09T16:07:48Z                          |
| user_id                             | be4c99e41dab462a881b55585dbcfe15              |
| volumes_attached                    |                                               |
+-------------------------------------+-----------------------------------------------+

# openstack server list
+--------------------------------------+-------+--------+---------------------+--------+---------+
| ID                                   | Name  | Status | Networks            | Image  | Flavor  |
+--------------------------------------+-------+--------+---------------------+--------+---------+
| a2fdb85d-d6bf-4150-ab96-dd22b7d4e9d4 | demo1 | ACTIVE | demo-net=10.0.0.134 | cirros | m1.tiny |
+--------------------------------------+-------+--------+---------------------+--------+---------+

# rados df                                                                                                                                                                   
POOL_NAME    USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS      RD WR_OPS     WR USED COMPR UNDER COMPR                                                                            
images     37 MiB       8      0     24                  0       0        0    520  13 MiB    105 49 MiB        0 B         0 B                                                                            
vms       207 MiB     138      0    414                  0       0        0   1880  19 MiB    987 58 MiB        0 B         0 B                                                                            
volumes   576 KiB       5      0     15                  0       0        0    306 243 KiB      6  6 KiB        0 B         0 B                                                                            
                                                                                                                                                                                                           
total_objects    151                                                                                                                                                                                       
total_used       3.3 GiB                                                                                                                                                                                   
total_avail      297 GiB                                                                                                                                                                                   
total_space      300 GiB                                                                                                                                                              

rados -p vms ls | head
rbd_data.a6b31261fa8f.0000000000000003
rbd_data.a6b31261fa8f.000000000000009c
rbd_data.a6b31261fa8f.000000000000007c
rbd_data.a6b31261fa8f.0000000000000028
rbd_data.a6b31261fa8f.0000000000000022
rbd_data.a6b31261fa8f.00000000000000c4
rbd_header.a6b31261fa8f
rbd_data.a6b31261fa8f.000000000000002a
rbd_data.a6b31261fa8f.00000000000000fe
rbd_data.a6b31261fa8f.0000000000000006

Troubleshoot

Issue: “kolla-ansbible deploy” step failed with the following error:

TASK [nova : Running Nova API bootstrap container] *********************************************************************************************************************************************************
fatal: [node002]: FAILED! => {"changed": false, "msg": "Unknown error message: open /var/lib/docker/tmp/GetImageBlob027787082: no space left on device"}
NO MORE HOSTS LEFT *****************************************************************************************************************************************************************************************
PLAY RECAP *************************************************************************************************************************************************************************************************
localhost                  : ok=48   changed=12   unreachable=0    failed=1    skipped=17   rescued=0    ignored=0
node001                    : ok=40   changed=7    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0
node002                    : ok=202  changed=99   unreachable=0    failed=1    skipped=79   rescued=0    ignored=1 
node003                    : ok=78   changed=10   unreachable=0    failed=0    skipped=76   rescued=0    ignored=0

Resolution:

The openstack node(s) (controller or hypervisors) ran out of space and no more containers can be run. A bigger disk at, least 60G, should be used for controller and hypervisor nodes.

Issue: kolla-ansible deploy fails after haproxy container fails to start

RUNNING HANDLER [Waiting for haproxy to start] ************************************************************************************
fatal: [node003]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 10.141.0.3:61313"}
RUNNING HANDLER [haproxy : Waiting for virtual IP to appear] **********************************************************************
NO MORE HOSTS LEFT ****************************************************************************************************************
PLAY RECAP ************************************************************************************************************************
localhost                  : ok=35   changed=0    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0
node001                    : ok=35   changed=0    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0
node002                    : ok=35   changed=0    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0
node003                    : ok=71   changed=3    unreachable=0    failed=1    skipped=76   rescued=0    ignored=0
Command failed ansible-playbook -i ./multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e kolla_action=deploy /usr/local/share/kolla-ansible/ansible/site.yml

Resolution:

These sort of issues usually happen when a higher kolla-ansible version is used with an older openstack release. Always make sure that the openstack_release in globals.yml points to a release which is supported by the currently installed kolla-ansible version.

Example, kolla-ansible version 9.1.0 supports OpenStack “train”, while kolla-ansible version 10.0.0 supports OpenStack “ussri”.

Issue: I have rebooted all my OpenStack controller nodes at the same time and now mariadb container is continuously failing.

Resolution:

It is highly likely that the database cluster has become inactive and thus a database recovery should be attempted:

# kolla-ansible -i ./multinode mariadb_recovery

Tips and tricks

When running the kolla-ansible CLI, additional arguments may be passed to ansible-playbook via the EXTRA_OPTS environment variable.

kolla-ansible -i INVENTORY deploy is used to deploy and start all Kolla containers.
kolla-ansible -i INVENTORY destroy is used to clean up containers and volumes in the cluster.
kolla-ansible -i INVENTORY mariadb_recovery is used to recover a completely stopped mariadb cluster.
kolla-ansible -i INVENTORY prechecks is used to check if all requirements are meet before deploy for each of the OpenStack services.
kolla-ansible -i INVENTORY post-deploy is used to do post deploy on deploy node to get the admin openrc file.
kolla-ansible -i INVENTORY pull is used to pull all images for containers.
kolla-ansible -i INVENTORY reconfigure is used to reconfigure OpenStack service.
kolla-ansible -i INVENTORY upgrade is used to upgrades existing OpenStack Environment.
kolla-ansible -i INVENTORY check is used to do post-deployment smoke tests.
kolla-ansible -i INVENTORY stop is used to stop running containers.
kolla-ansible -i INVENTORY deploy-containers is used to check and if necessary update containers, without generating configuration.
kolla-ansible -i INVENTORY prune-images is used to prune orphaned Docker images on hosts.

Updated on November 24, 2020

Leave a Comment Cancel