How to install OpenStack Kolla-ansible on top of a Bright Cluster with CentOS 8
** This document was tested on Bright 9.0 with CentOS 8.1
Preparing the Cluster
In this section, we will walk through the steps required to prepare the cluster to be a Kolla-Ansible OpenStack cluster.
The head node of the cluster will be used as the deployment host only. In theory, the head node can also be used as a controller/network/hypervisor/storage/etc but it’s not recommended to use the head node of the cluster in the OpenStack setup because the Kolla-Ansible will be creating a new instance of MariaDB for the OpenStack cluster beside the other configuration changes required by the docker containers on the controller/network nodes.
Kolla-Ansible ease of configuration and deployment highly depends on the uniformity of the network interface names which will be used by OpenStack on the controller/network/hypervisors/storage/etc. nodes, so it’s highly recommended to unify the network interface names across all the nodes, including the head node if it will be used as the deployment node.
Unifying the network interface names on CentOS 8 can be done in several ways:
- Switch to traditional naming conventions for network interfaces:
- For the compute nodes, this can be done by appending net.ifname=0 to the kernel parameters in the software image:
# cmsh % softwareimage use <image-name> % append kernelparameters ” net.ifnames=0″ % commit |
- For the head node(s), this can be done by updating the GRUB_CMDLINE_LINUX parameters in /etc/default/grub.conf and re-writing the /boot/grub2/grub.conf
# grep net /etc/default/grub GRUB_CMDLINE_LINUX=”vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.driver.blacklist=nouveau net.ifnames=0″ # grub2-mkconfig -o /boot/grub2/grub.cfg # reboot |
- Specify the MAC address and device name in the corresponding ifcfg-* files.
Kolla-ansible is installed via “python-pip” so it’s mandatory to update “pip” to the latest:
# pip3 install -U pip |
By default, Bright assumes that there are no important data stored on the compute nodes so each time a compute node is rebooted, the software image is re-synced to the local drive of the compute node resulting in adding any missing data or wiping out any data which don’t exist in the software image.
To avoid wiping out the data that will be introduced by the Kolla-ansible deployment, we will need to add the following paths to the sync/update exclude lists:
– /var/lib/docker – /var/lib/docker/* – /etc/kolla – /etc/kolla/* |
To avoid pulling unnecessary data from the compute nodes that will be deployed by the Kolla-ansible deployment, we will need to add the following paths to the grab exclude lists:
– /var/lib/docker – /var/lib/docker/* – /etc/kolla – /etc/kolla/* |
** Note: The exclude lists are better updated on the category level. If the setup will only include one compute node, then it may be better to update the exclude lists directly on the node.
Installing Ansible
Kolla-ansible requires ansible version 2.8 or higher and up to version 2.9 as of writing this document (June 2020)
yum install epel-release -y yum install ansible -y |
Configuring Ansible (optional)
Tuning the configurations of Ansible may improve the deployment time of Kolla-ansible so it’s suggested to do tune the ansible configurations for better results:
- Increase the value of the forks as you see necessary. The forks parameter controls how many hosts can be configured in parallel, so if you have the processing power on the head node to run several forks, then it’s a good practice to increase the number of forks.
- Disable host_key_checking. Ansible has host key checking enabled by default. If a host is reinstalled and has a different key in ‘known_hosts’, then this will result in an error message until corrected. If a host is not initially in ‘known_hosts’ this will result in prompting for confirmation of the key, which will require an interactive intervention.
- Enable pipelining. Enabling pipelining reduces the number of SSH operations required to execute a module on the remote server. This can result in a significant performance improvement when enabled. By default, this option is disabled.
# grep -E “forks|host_key_checking|pipelining” /etc/ansible/ansible.cfg | grep -v “^#” forks = 100 host_key_checking = False pipelining = True |
Installing Kolla-ansible
- Kolla-ansible and its dependencies can be installed using python pip:
# pip install kolla-ansible |
- Create default configuration directory. By default, Kolla-ansible expects the global configurations to be stored under /etc/kolla on the deployment node.
# mkdir /etc/kolla |
- Copy default globals.yml and password.yml templates from Kolla-ansible installation directory:
# cp -r /usr/local/share/kolla-ansible/etc_examples/kolla/* /etc/kolla |
- Copy all-in-one and multinode inventory files to the current directory.
# cp /usr/local/share/kolla-ansible/ansible/inventory/* . |
Preparing initial Kolla-ansible configurations
The globals.yml is the main configuration file for Kolla-Ansible. There are a few parameters that are required to deploy Kolla-ansible:
- Kolla-ansible provides container images for deploying OpenStack on several Linux distributions as host system: CentOS 7/8, Ubuntu 18.04, Debian and RHEL 7/8. For the purpose of this document, we will use CentOS 8 as the host system. Using kolla_base_distro parameter, “centos” can be specified to refer to CentOS as base distribution.
- Type of installation can be either binary: using repositories like apt or yum or source: using raw source archives, git repositories, or local source directory. According to the official Kolla-ansible documentation, type source is proven to be slightly more reliable. The install type can be specified using the kolla_install_type parameter.
- The release of OpenStack to be deployed has to be specified using the openstack_release parameter. For the purpose of this document, we will use the OpenStack train.
- Kolla-ansible requires two main networking options, network_interface: the default interface for multiple management-type networks (tenants networks) and neutron_external_interface: a dedicated network interface for Neutron external/public networks (provider networks) which can be vlan or flat, depends on how the networks are created. This interface should be active without an IP address on controller/network nodes. Otherwise, OpenStack instances won’t be accessible from the external networks. For the purpose of this document, we will use flat networks.
- A floating IP for management traffic should be specified using kolla_internal_vip_address. This IP will be managed by keepalived to provide high availability and should be a free IP address in the internal management network that is connected to the same network_interface.
- Additional services can be enabled. For the purpose of this document, we will enable the cinder service using enable_cinder and enable_cinder_backend_nfs parameters. The cinder service with NFS backend requires an additional configuration file that has to be placed under /etc/kolla/config/ with the name of nfs_shares. This file includes the name/IP of the NFS servers and the path on the server. The file can include several NFS servers and different paths:
# cat /etc/kolla/config/nfs_shares 10.141.0.1:/cinder # grep -vE “^#|$^” /etc/kolla/globals.yml — kolla_base_distro: “centos” kolla_install_type: “source” openstack_release: “ussuri” kolla_internal_vip_address: “10.141.255.245” network_interface: “eth0” neutron_external_interface: “eth1” enable_cinder: “yes” enable_cinder_backend_nfs: “yes” |
** Please make sure that the NFS server can accept mount requests from the OpenStack nodes. Otherwise, OpenStack will fail to launch instances because it will fail to allocate storage for them.
** OpenStack release “train” is used with kolla-ansible 9.1.0. If a higher version of kolla-ansible is to be used, then another release for OpenStack should be used. For example, kolla-ansible version 10.0.0 can be used with OpenStack release “ussuri”.
The multinode template file is an ansible inventory file that has the main server groups for different OpenStack components that will be installed.
** DO NOT REMOVE THE CONTENT OF THE “multinode” FILE. WHAT IS SHOWN IN THE FOLLOWING SNIPPET IS JUST THE FIRST 20 LINES WHICH YOU NEED TO EDIT. THE REST WILL REMAIN THE SAME AND SHOULD NOT BE REMOVED.
# grep -vE “^#|$^” multinode | head -n 20 [control] node001node002node003 [network] node001node002node003 [compute] node001node002node003 [monitoring] node001node002node003 [storage] localhost ansible_connection=local [deployment] localhost ansible_connection=local [baremetal:children] control network compute storage monitoring |
Deploying Kolla-Ansible
- Make sure that the nodes specified in the inventory are all reachable:
# ansible -i multinode all -m ping |
- The template file /etc/kolla/passwords.yml is a placeholder for all the passwords that will be used during the kolla-ansible deployment:
# grep -vE “^#|$^” /etc/kolla/passwords.yml | head — ceph_cluster_fsid: ceph_rgw_keystone_password: rbd_secret_uuid: cinder_rbd_secret_uuid: database_password: mariadb_backup_database_password: docker_registry_password: opendaylight_password: vmware_dvs_host_password: |
- All passwords in passwords.yml are blank and have to be filled either manually or by running a random password generator:
# kolla-genpwd |
- Bootstrap all nodes with Kolla deploy dependencies. This will include installing packages and writing out configuration files:
# kolla-ansible -i ./multinode bootstrap-servers |
- Run the prechecks for deployment for hosts. This will make sure that the packages have been installed successfully and the configuration files do exist in the expected location:
# kolla-ansible -i ./multinode prechecks |
- If the previous two steps did run without failures, then you can proceed to deploy OpenStack:
# kolla-ansible -i ./multinode deploy |
- After deployment is successful, an openrc file must be generated wherein the credentials for admin users are listed. By running the following command, an admin-openrc.sh file will be generated under /etc/kolla directory on the deployment (head) node:
# kolla-ansible post-deploy |
- Now the changes which has been applied to the compute nodes have to be grabbed back to the software image of the compute nodes to avoid losing them:
# cmsh % device use node001 % grabimage -w |
- Make sure that SELinux is still disabled and not set to permissive:
# grep -w SELINUX /cm/images/default-image/etc/selinux/config | grep -v “^#” SELINUX=disabled |
Installing OpenStack CLI client
To start using OpenStack which has just been deployed, OpenStack CLI needs to be installed first and the openrc file needs to be sourced:
# pip install python-openstackclient # . /etc/kolla/admin-openrc.sh |
Initializing the OpenStack environment for the first use
To start using OpenStack to create instances and access them from the external network, the “/usr/local/share/kolla-ansible/init-runonce” has to be run. This script uses “10.0.2.0/24” as the external network. In most cases, the init-runonce has to be adjusted to match the external network:
# This EXT_NET_CIDR is your public network,that you want to connect to the internet via. ENABLE_EXT_NET=${ENABLE_EXT_NET:-1} EXT_NET_CIDR=${EXT_NET_CIDR:-‘10.0.2.0/24’} EXT_NET_RANGE=${EXT_NET_RANGE:-‘start=10.0.2.150,end=10.0.2.199’} EXT_NET_GATEWAY=${EXT_NET_GATEWAY:-‘10.0.2.1’} […] if [[ $ENABLE_EXT_NET -eq 1 ]]; then openstack network create –external –provider-physical-network physnet1 \ –provider-network-type flat public1 openstack subnet create –no-dhcp \ –allocation-pool ${EXT_NET_RANGE} –network public1 \ –subnet-range ${EXT_NET_CIDR} –gateway ${EXT_NET_GATEWAY} public1-subnet fi openstack network create –provider-network-type vxlan demo-net openstack subnet create –subnet-range 10.0.0.0/24 –network demo-net \ –gateway 10.0.0.1 –dns-nameserver 8.8.8.8 demo-subnet |
Typically, provider networks are directly associated with a physical network but that is not a requirement. Users create tenant networks for connectivity within projects. By default, these tenant networks are fully isolated and are not shared with other projects. OpenStack Networking supports the following types of network isolation and overlay technologies:
Flat
All instances reside on the same network, which can also be shared with the hosts. No VLAN tagging or other network segregation takes place.
VLAN
Networking allows users to create multiple provider or tenant networks using VLAN IDs (802.1Q tagged) that correspond to VLANs present in the physical network. This allows instances to communicate with each other across the environment. They can also communicate with dedicated servers, firewalls, load balancers, and other networking infrastructure on the same layer 2 VLAN.
GRE and VXLAN
VXLAN and GRE are encapsulation protocols that create overlay networks to activate and control communication between compute instances. A Networking router is required to allow traffic to flow outside of the GRE or VXLAN tenant network. A router is also required to connect directly-connected tenant networks with external networks, including the Internet. The router provides the ability to connect to instances directly from an external network using floating IP addresses.
For the purpose of this document, we will create a provider network of type flat. After adjusting the init-runonce script and running it, there will be provider and tenant networks created connected by a router. Next, we will deploy the first OpenStack instance and access it from the external network:
Deploying the first Instance
OpenStack dashboard can be accessed via the Virtual IP (VIP) which has been configured earlier in the process. The following steps will show how to create the first instance and be able to access it from the external network using the OpenStack CLI. Similar steps can be following from the OpenStack dashboard to achieve similar results.
- Create a demo2 instance and attach it to the demo-net created by the init-runonce script:
# openstack server create --image cirros --flavor m1.tiny --key-name mykey --network demo-net demo2 |
- Allocate a floating IP on the public (provider) network created by the init-runonce script:
# openstack floating ip create public1 |
- Associate the floating IP with the demo2 instance created in the first step:
# openstack server add floating ip demo2 192.168.1.158 |
- As a sanity check, make sure that the name spaces are updated correctly on the controller/network node:
# ip netns exec qrouter-21c38859-4375-4bb4-9207-c74e45e12602 ip a |
- SSH into the OpenStack instance using the associated floating IP address from the external network:
# ssh -l cirros 192.168.1.158 |
*** NEW SECTION ***
How to integrate Kolla-ansible with an external Ceph Storage
Using Ceph as backend storage for different OpenStack services greatly reduces network traffic and increases performance since Ceph can clone images/volumes/ephemeral disks instead of copying them. In addition, it makes migrating between OpenStack deployments much simpler.
Preparing the Ceph Storage Pools for OpenStack Services
Glance Service
By default, i.e. when not using Ceph, Glance images are stored locally on the controller nodes. When they are needed for creating a VM they are copied to the compute hosts (hypervisors). The hypervisors can then cache those images. But the images need to be copied again, every time an image is updated, or whenever a hypervisor undergoes a FULL install.
If Glance is configured with Ceph as the storage backend, things work differently. In such case, Glance stores images in Ceph, not on the head node. Depending on the format of the image, they might still be downloaded from Ceph to the hypervisor node (and get cached there).
If both Glance and Cinder are configured to use Ceph, it’s possible to avoid copying the image from Ceph into the Hypervisors. This can be done using ‘copy-on-write’ (CoW).
With CoW, the storage volume of a VM is thinly pre-provisioned (from the image) directly in the Ceph backend. In other words, the image does not have to be copied from glance into the hypervisor, nor from hypervisor to Cinder.
If Ceph is available we recommend to use it both for Glance, and Cinder so that the two services can make use of CoW. Note that for CoW to work, image format of images stored in glance needs to be “raw”. Using CoW often results in best VM creation times, and least load on the infrastructure. The drawback of using CoW is increased disk I/O latency if the VM ends up modifying existing files on it’s CoW-created volume. This is because data needs to be deduplicated the first time a unique block is written.
To configure Glance with Ceph, on the Ceph cluster, run the following commands:
- Create a pool for glance images:
# ceph osd pool create images 64 |
- Create a user for glance service which has write privileges to the pool created in the previous step and store its keyring:
# ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' -o /etc/ceph/ceph.client.glance.keyring |
- Verify if the pool has been created properly and user has the required privileges:
# ceph osd pool ls |
** Note: How to choose Placement Group Number for a Ceph pool:
When creating a Ceph OSD pool, it is mandatory to specify a value for placement group number (pg_num) because it cannot be calculated automatically. From the Ceph online manual, we list here some recommendations:
- Less than 5 OSDs set pg_num to 128.
- Between 5 and 10 OSDs set pg_num to 512.
- Between 10 and 50 OSDs set pg_num to 4096.
- If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the pg_num value by yourself.
- For calculating pg_num value by yourself please take help of pgcalc tool.
As the number of OSDs increases, choosing the right value for pg_num becomes more important because it has a significant influence on the behavior of the Ceph cluster as well as the durability of the data when something goes wrong (i.e. the probability that a catastrophic event leads to data loss).
Cinder Service
Cinder is the block storage service in OpenStack. It’s responsible for storing volumes and volume snapshots that can be attached to VMs or can be used to launch instances. In this document, we have demonstrated how to use NFS as a backend storage for Cinder service.
Cinder can be configured to use Ceph as its backend storage instead of NFS.
On the Ceph cluster, run the following commands:
- Create a pool for cinder volumes:
# ceph osd pool create volumes 32 |
- Create a user for cinder service which has write privileges to the pool created in the previous step and store its keyring:
# ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images' -o /etc/ceph/ceph.client.cinder.keyring |
- Verify if the pool has been created properly and user has the required privileges:
# ceph osd pool ls |
Nova Service
Nova is the compute service within OpenStack. By default, Nova stores ephemeral disks associated with running VMs locally on the hypervisors under /var/lib/nova/instances. There are a few drawbacks to using local storage on hypervisors, large images can cause filesystem to fill up, thus crashing compute nodes and a disk crash on hypervisor could cause loss of virtual disk and as such a VM recovery would be impossible. (if VMs are used with Cinder-manager volumes, those volumes are not stored on the hypervisors, they are stored in whatever’s configured as the Cinder backend)
Nova can be configured to use Ceph as its backend storage.
On the Ceph cluster, run the following commands:
- Create a pool for nova ephemeral disks:
# ceph osd pool create vms 128 |
- Create a user for nova service which has write privileges to the pool created in the previous step and store its keyring:
# ceph auth get-or-create client.nova mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=vms, allow rx pool=images' -o /etc/ceph/ceph.client.nova.keyring |
- Verify if the pool has been created properly and user has the required privileges:
# ceph osd pool ls |
Preparing Kolla-ansible configurations to use Ceph Storage as backend storage for Glance/Cinder/Nova services
The globals.yml is the main configuration file for Kolla-Ansible. In addition to the parameters mentioned at the very beginning of this document, there are a few other parameters that are required to enable several OpenStack services to use Ceph as their backend storage:
- The parameter ceph_{ service }_user is used to specify the Ceph user that will be used to write the Ceph pool for a particular service. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.
- The parameter ceph_{ service }_pool_name is used to specify the name of the pool created in Ceph which will be used by a particular service. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.
- For Ceph users to be able to read/write Ceph pools a keyring is required. The keyring is specified by ceph_{ service }_keyring which points to the location of the keyring file for specific service. The variable “{ service }” should be replaced with the service name like glance, cinder or nova.
- The parameter { service }_backend_ceph is used to enable/disable a particular service to use Ceph as its backend storage. The variable “{ service }” should be replaced with the service name like glance, cinder, or nova.
# grep -vE "^#|$^" /etc/kolla/globals.yml |
** Note: the “ceph_cinder_backup_pool_name” is the same as the “ceph_cinder_pool_name”, however, it’s not strictly necessary to have both pools the same. You can create a separate pool for cinder_backup_pool.
Glance Configuration
The configuration files which will tell Glance how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:
- Create a configuration directory for glance:
# mkdir -p /etc/kolla/config/glance |
- Copy ceph.conf to the same directory created in the previous step:
# cp /etc/ceph/ceph.conf /etc/kolla/config/glance/ |
- Copy the keyring to the same directory:
# cp /etc/ceph/ceph.client.glance.keyring /etc/kolla/config/glance/ |
- Create glance-api.conf in the same directory with the following contents:
# cat /etc/kolla/config/glance/glance-api.conf |
Cinder Configuration
The configuration files which will tell cinder how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:
- Create a configuration directory for cinder:
# mkdir -p /etc/kolla/config/cinder/{cinder-volume,cinder-backup} |
- Copy ceph.conf to the same directory created in the previous step:
# cp /etc/ceph/ceph.conf /etc/kolla/config/cinder/ceph.conf |
- Copy the keyring to the same directory:
# cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/cinder/cinder-volume/ceph.client.cinder.keyring |
- Create cinder-volume.conf and cinder-backup.conf under /etc/kolla/config/cinder directory with the following contents:
# cat /etc/kolla/config/cinder/cinder-volume.conf |
Note: {{ cinder_rbd_secret_uuid }} can be found in /etc/kolla/passwords.yml
Nova Service
The configuration files which will tell cinder how to use Ceph should be placed in the location where Kolla-ansible would expect to find them:
- Create a configuration directory for nova:
# mkdir -p /etc/kolla/config/nova |
- Copy ceph.conf to the same directory created in the previous step:
# cp /etc/ceph/ceph.conf /etc/kolla/config/nova/ |
- Copy the nova and cinder client keyrings to the same directory:
# cp /etc/ceph/ceph.client.nova.keyring /etc/kolla/config/nova/ceph.client.nova.keyring # cp /etc/ceph/ceph.client.cinder.keyring /etc/kolla/config/nova/ceph.client.cinder.keyring |
- Create nova-compute.conf in the same directory with the following contents:
# cat /etc/kolla/config/nova/nova-compute.conf |
Verify if OpenStack is writing to Ceph Backend Storage
After deploying kolla-ansible following the steps mentioned earlier, you can verify if OpenStack is writing data to Ceph by inspecting Ceph storage itself:
- Test creating images (glance service)
# openstack image create --disk-format qcow2 --container-format bare --file /tmp/cirros-0.4.0-x86_64-disk.img cirros |
As you can see in the previous output, after creating an image, it is present in the Ceph storage.
- Test creating a volume (cinder service)
# openstack volume create --size 10 --availability-zone nova mynewvolume |
As you can see in the previous output, after creating a volume, it is present in the Ceph storage.
- Test creating a VM (nova service)
# openstack server create --image cirros --flavor m1.tiny --key-name mykey --network demo-net demo1 |
Troubleshoot
Issue: “kolla-ansbible deploy” step failed with the following error:
TASK [nova : Running Nova API bootstrap container] ********************************************************************************************************************************************************* |
Resolution:
The openstack node(s) (controller or hypervisors) ran out of space and no more containers can be run. A bigger disk at, least 60G, should be used for controller and hypervisor nodes.
Issue: kolla-ansible deploy fails after haproxy container fails to start
RUNNING HANDLER [Waiting for haproxy to start] ************************************************************************************ |
Resolution:
These sort of issues usually happen when a higher kolla-ansible version is used with an older openstack release. Always make sure that the openstack_release in globals.yml points to a release which is supported by the currently installed kolla-ansible version.
Example, kolla-ansible version 9.1.0 supports OpenStack “train”, while kolla-ansible version 10.0.0 supports OpenStack “ussri”.
Issue: I have rebooted all my OpenStack controller nodes at the same time and now mariadb container is continuously failing.
Resolution:
It is highly likely that the database cluster has become inactive and thus a database recovery should be attempted:
# kolla-ansible -i ./multinode mariadb_recovery |
Tips and tricks
When running the kolla-ansible CLI, additional arguments may be passed to ansible-playbook via the EXTRA_OPTS environment variable.
- kolla-ansible -i INVENTORY deploy is used to deploy and start all Kolla containers.
- kolla-ansible -i INVENTORY destroy is used to clean up containers and volumes in the cluster.
- kolla-ansible -i INVENTORY mariadb_recovery is used to recover a completely stopped mariadb cluster.
- kolla-ansible -i INVENTORY prechecks is used to check if all requirements are meet before deploy for each of the OpenStack services.
- kolla-ansible -i INVENTORY post-deploy is used to do post deploy on deploy node to get the admin openrc file.
- kolla-ansible -i INVENTORY pull is used to pull all images for containers.
- kolla-ansible -i INVENTORY reconfigure is used to reconfigure OpenStack service.
- kolla-ansible -i INVENTORY upgrade is used to upgrades existing OpenStack Environment.
- kolla-ansible -i INVENTORY check is used to do post-deployment smoke tests.
- kolla-ansible -i INVENTORY stop is used to stop running containers.
- kolla-ansible -i INVENTORY deploy-containers is used to check and if necessary update containers, without generating configuration.
- kolla-ansible -i INVENTORY prune-images is used to prune orphaned Docker images on hosts.