kolla-mitaka-eol 部署openstack mitaka遇到的问题

实验需要使用 kolla 部署 openstack mitaka环境,由于是两年前的版本,实验过程中遇到了一些坑,记录如下。

系统环境

操作系统:CentOS Linux release 7.2.1511 (Core) 内核版本:3.10.0-327.28.3.el7.x86_64 kolla版本:mitaka-eol docker版本:Docker version 1.13.1, build 092cba3 docker镜像:官方tag 2.0.2 (对应 openstack mitaka版本)

问题一: openvswitch_db 容器无法运行

问题描述

kolla-ansible deploy 部署openstack的时候总会遇到 openvswitch_db service 无法启动的问题

1
2
3
4
5
6
TASK: [neutron | Waiting the openvswitch_db service to be ready] ************** 
failed: [localhost] => {"attempts": 30, "changed": false, "cmd": ["docker", "exec", "openvswitch_db", "ovs-vsctl", "--no-wait", "show"], "delta": "0:00:00.032518", "end": "2018-07-09 07:33:12.680647", "failed": true, "rc": 1, "start": "2018-07-09 07:33:12.648129", "stdout_lines": [], "warnings": []}
stderr: Error response from daemon: Container 0cec739aabe06805aa0e1624318ac052d9f8fb176078df3d20a13c4df304fa7a is restarting, wait until the container is running
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

查看容器日志有如下报错

1
2
3
4
5
6
7
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log'
ovsdb-server: I/O error: open: /var/lib/openvswitch/conf.db failed (No such file or directory)

排查问题

1. 手动创建 openvswitch_db 容器,并进入交互模式

1
docker run -it kolla/centos-source-openvswitch-db-server:2.0.2 /bin/bash

2. 查看启动命令

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# cd /usr/local/bin
# vi kolla_start

#!/bin/bash
set -o errexit

# Wait for the log socket
if [[ ! "${!SKIP_LOG_SETUP[@]}" && -e /var/lib/kolla/heka ]]; then
    while [[ ! -S /var/lib/kolla/heka/log ]]; do
        sleep 1
    done
fi

# Processing /var/lib/kolla/config_files/config.json as root.  This is necessary
# to permit certain files to be controlled by the root user which should
# not be writable by the dropped-privileged user, especially /run_command
sudo -E kolla_set_configs
CMD=$(cat /run_command)
ARGS=""

if [[ ! "${!KOLLA_SKIP_EXTEND_START[@]}" ]]; then
    # Run additional commands if present
    source kolla_extend_start
fi

echo "Running command: '${CMD}${ARGS:+ $ARGS}'"
exec ${CMD} ${ARGS}

注意到 kolla_extend_start 这个脚本

3. 查看 kolla_extend_start

1
2
3
4
5
6
7
8
# vi kolla_extend_start

#!/bin/bash

mkdir -p "/run/openvswitch"
if [[ ! -e "/etc/openvswitch/conf.db" ]]; then
    ovsdb-tool create "/etc/openvswitch/conf.db"
fi

容器启动首先运行 kolla_start 脚本,当没有配置 KOLLA_SKIP_EXTEND_START 变量的时候继续执行 kolla_extend_start 进行一些初始化操作。 问题就出在创建 conf.db ,kolla_extend_start 创建的 conf.db 位于 /etc/openvswitch/conf.db ,而启动命令传入的路径却是 /var/lib/openvswitch/conf.db ,所以会报错找不到 /var/lib/openvswitch/conf.db

修改方法

两种修改方法,一是修改 openvswitch_db 镜像里面的 kolla_extend_start 脚本创建conf.db的位置,二是修改 openvswitch_db 容器的启动命令参数,这里为了镜像的完整性和操做便利我选择了方法二。
编辑 vim /etc/kolla/openvswitch-db-server/config.json 修改如下

1
2
3
4
{
    "command": "/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
    "config_files": []
}

这种情况下的配置文件会被 clean-host 命令清理掉,所有我们继续查找 /etc/kolla/openvswitch-db-server/config.json 是在哪里生成的。 查找到如下位置 /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2 当执行 deploy 操做的时候 openvswitch-db-server.json.j2 会被解析成 openvswitch-db-server.json 并复制到 /etc/kolla/的对应位置,所以只需要修改 openvswitch-db-server.json.j2 即可。

1
2
3
4
5
6
# vim /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2

{
    "command": "/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
    "config_files": []
}

验证

1
2
3
# /root/kolla/tools/cleanup-containers
# /root/kolla/tools/cleanup-host
# kolla-ansible deploy

问题二: kolla_toolbox 镜像中 ansible 版本不符

问题描述

解决问题一后运行 deploy 到最后依旧会报错

1
2
3
4
5
6
7
8
9
TASK: [horizon | Creating the _member_ role] ********************************** 
failed: [localhost] => {"attempts": 10, "changed": false, "cmd": ["docker", "exec", "-t", "kolla_toolbox", "/usr/bin/ansible", "localhost", "-m", "os_keystone_role", "-a", "name=_member_ auth={# openstack_horizon_auth #}", "-e", "{'openstack_horizon_auth':{'username': 'admin', 'project_name': 'admin', 'password': 'admin', 'auth_url': 'http://172.16.15.115:35357'}}"], "delta": "0:00:01.223878", "end": "2018-07-09 09:18:24.813123", "failed": true, "rc": 2, "start": "2018-07-09 09:18:23.589245", "stdout_lines": ["localhost | FAILED! => {", "    \"failed\": true, ", "    \"msg\": \"The module os_keystone_role was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem.\"", "}"], "warnings": []}
stdout: localhost | FAILED! => {
    "failed": true, 
    "msg": "The module os_keystone_role was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem."
}
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

执行 git submodule update –init –recursive 后报此错误

排查错误

经查询这是一个 bug ,详情见: Hitting ansible error “The module os_keystone_role was not found in configured module paths”

1. 查看 ansible 版本

1
2
3
4
5
# docker exec -ti kolla_toolbox /usr/bin/ansible --version

ansible 2.1.0
  config file = /home/ansible/.ansible.cfg
  configured module search path = /usr/share/ansible

2. 更新 ansible

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# docker exec -ti kolla_toolbox sudo pip install ansible==2.1.1.0

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for ansible: 

需要 root 权限,容器是以 ansible 用户运行,并且没有密码,无法更新。执行尝试手动 build kolla_toolbox 镜像

3. 使用 kolla_toolbox 的 Dockerfile 文件手动 build 镜像

build 镜像的时候由于各种预制的源已经不存在或者无法访问,因此决定不使用官方pull的镜像,而采用手动更改源的地址,重新build所有镜像,所以也能解决问题二。

问题三: 重新build kolla镜像

问题描述

需要重新 build openstack mitaka的镜像,但是由于时间问题,镜像内部很多源的地址已经失效,手动替换源为可用的地址,重新build

操作步骤

1. 修改kolla使用的源的地址

以下文件会被 COPY 到容器内,可以直接修改

  • kibana.yum.repo
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
修改为最新的6.x版本,4.x版本无法访问
vi /usr/share/kolla/docker/base/kibana.yum.repo

-[kibana-4.4]
-name=Kibana repository for 4.4.x packages
-baseurl=http://packages.elastic.co/kibana/4.4/centos
-gpgcheck=1
-gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
-enabled=1

+[kibana-6.x]
+name=Kibana repository for 6.x packages
+baseurl=https://artifacts.elastic.co/packages/6.x/yum
+gpgcheck=0
+enabled=1
  • elasticsearch.repo
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
为了和kibana的兼容性,修改elasticsearch源为6.x版本

# vi /usr/share/kolla/docker/base/elasticsearch.repo

-[elasticsearch-2.x]
-name=Elasticsearch repository for 2.x packages
-baseurl=http://packages.elastic.co/elasticsearch/2.x/centos
-gpgcheck=1
-gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
-enabled=1

+[elasticsearch-6.x]
+name=Elasticsearch repository for 6.x packages
+baseurl=https://artifacts.elastic.co/packages/6.x/yum
+gpgcheck=0
+enabled=1   #elasticsearch的源在 AWS 上,国内访问不太稳定

2. 修改 ceph、openstack、QEMU-EV 源地址

因为 ceph 、openstack 和 QEMU-EV 的源是在安装了centos-release-ceph-hammer 、centos-release-openstack-mitaka 和 centos-release-qemu-ev 自动生成的因此需要修改对应的 Dockerfile文件,加入修改源地址的操作。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# vim /usr/share/kolla/docker/base/Dockerfile.j2

RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 \
    && yum install -y \
         epel-release \
-        centos-release-openstack-mitaka \   #extra源里面已经没有centos-release-openstack-mitaka这个包,手动安装
+        http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/centos-release-openstack-mitaka-1-5.el7.noarch.rpm \
         yum-plugin-priorities \
         centos-release-ceph-hammer \
         centos-release-qemu-ev \
+   && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-Ceph-Hammer.repo \
+   && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-OpenStack-mitaka.repo \
+   && sed -i s#mirror.centos.org/\$contentdir/#mirror.neu.edu.cn/centos/#g /etc/yum.repos.d/CentOS-QEMU-EV.repo \
+   && rm -rf /etc/yum.repos.d/CentOS-Base.repo && curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization \
    && yum clean all

3. 使用阿里云和豆瓣的源进行加速

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# vim /usr/share/kolla/docker/base/Dockerfile.j2

RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 \
    && yum install -y \
         epel-release \
         http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/centos-release-openstack-mitaka-1-5.el7.noarch.rpm \
         yum-plugin-priorities \
         centos-release-ceph-hammer \
         centos-release-qemu-ev \
    && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-Ceph-Hammer.repo \
    && sed -i s/mirror.centos.org/mirror.neu.edu.cn/g /etc/yum.repos.d/CentOS-OpenStack-mitaka.repo \
    && sed -i s#mirror.centos.org/\$contentdir/#mirror.neu.edu.cn/centos/#g /etc/yum.repos.d/CentOS-QEMU-EV.repo \
+   && rm -rf /etc/yum.repos.d/CentOS-Base.repo && curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo \
+   && rm -rf /etc/yum.repos.d/epel.* && curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage \
    && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization \
    && yum clean all
    
+#use douban source
+RUN mkdir ~/.pip \
+     && > ~/.pip/pip.conf \
+     && echo "[global]" > ~/.pip/pip.conf \
+     && echo "index-url = http://pypi.douban.com/simple" >> ~/.pip/pip.conf \
+     && echo "[install]" >> ~/.pip/pip.conf \
+     && echo "trusted-host = pypi.douban.com" >> ~/.pip/pip.conf

4. 修改 kolla_toolbox

测试发现编译 kolla_toolbox 镜像的时候使用 pip 安装python包的时候安装了最新的openstack client版本,需要安装 requests>=2.14.2 ,由于依赖会升级 chardet, 而chardet 是系统依赖,在容器中无法升级。所以手动指定openstack client的版本,版本来源 OpenStack Projects Release Notes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# vim /usr/share/kolla/docker/kolla_toolbox/Dockerfile.j2

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
    && python get-pip.py \
    && rm get-pip.py \
    && pip --no-cache-dir install \
+       openstacksdk==0.9.0 \
+       osc-lib==0.4.0 \
+       oslo.config==3.13.0 \
+       oslo.i18n==3.8.0 \
+       oslo.serialization==2.11.0 \
+       oslo.utils==3.16.0 \
+       python-cinderclient==1.8.0 \
+       python-glanceclient==2.2.0 \
+       python-heatclient==1.3.0 \
+       python-ironicclient==1.5.0 \
+       python-keystoneclient==3.2.0 \
+       python-neutronclient==4.2.0 \
+       python-novaclient==5.0.0 \
+       python-openstackclient==2.6.0 \
+       python-swiftclient==3.0.0 \
+       python-troveclient==2.3.0 \
+       stevedore==1.16.0 \
+       debtcollector==1.6.0 \
+       keystoneauth1==2.9.0 \
+       cliff==2.1.0 \
+       cmd2==0.6.8 \
+       pbr==1.10.0 \
+       requests==2.10.0 \
        ansible==2.1.1.0 \
        MySQL-python \
        os-client-config==1.16.0 \
        pyudev \
        shade==1.4.0

5. 手动 build 镜像

1
2
# 只build基本的镜像
# kolla-build --base centos -t binary horizon cinder heat nova neutron glance keystone ironic rabbitmq keepalived haproxy heka kolla_toolbox mariadb memcached cron openvswitch

6. 查看编译后的镜像

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# docker images

REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE
kolla/centos-binary-openvswitch-db-server       2.0.4               58e8a1cdc387        10 minutes ago       379 MB
kolla/centos-binary-openvswitch-vswitchd        2.0.4               cb85d198f02c        10 minutes ago       379 MB
kolla/centos-binary-openvswitch-base            2.0.4               0baee06b57a4        10 minutes ago       379 MB
kolla/centos-binary-kolla-toolbox               2.0.4               d3f9e86e1292        43 minutes ago      631 MB
kolla/centos-binary-base                        2.0.4               4481fe643afa        About an hour ago   344 MB
kolla/centos-binary-openvswitch-base            2.0.4               09ab40d1a684        6 hours ago         380 MB
kolla/centos-binary-ironic-inspector            2.0.4               0a692e9b679e        10 hours ago        641 MB
kolla/centos-binary-ironic-api                  2.0.4               087a20a24a84        10 hours ago        635 MB
kolla/centos-binary-ironic-conductor            2.0.4               35450cd6b73b        10 hours ago        662 MB
kolla/centos-binary-nova-compute-ironic         2.0.4               d0a5a0a85ab7        10 hours ago        1.07 GB
kolla/centos-binary-ironic-pxe                  2.0.4               d505622b6982        10 hours ago        639 MB
kolla/centos-binary-elasticsearch               2.0.4               63064a9d79d1        10 hours ago        692 MB
kolla/centos-binary-kibana                      2.0.4               d52426e4d06f        10 hours ago        874 MB
kolla/centos-binary-ironic-base                 2.0.4               09a4902a06be        10 hours ago        612 MB
kolla/centos-binary-nova-libvirt                2.0.4               81f523f3d656        11 hours ago        1.11 GB
kolla/centos-binary-nova-compute                2.0.4               5b70975fe56d        11 hours ago        1.11 GB
kolla/centos-binary-cinder-volume               2.0.4               acc66141a640        11 hours ago        859 MB
kolla/centos-binary-cinder-api                  2.0.4               5ff44bfe4063        11 hours ago        850 MB
kolla/centos-binary-cinder-rpcbind              2.0.4               d241518b407c        11 hours ago        838 MB
kolla/centos-binary-cinder-backup               2.0.4               70e279b6adc7        11 hours ago        808 MB
kolla/centos-binary-cinder-scheduler            2.0.4               d4c8e5140be4        11 hours ago        808 MB
kolla/centos-binary-glance-api                  2.0.4               92ed32bb6344        11 hours ago        732 MB
kolla/centos-binary-nova-conductor              2.0.4               c7f752689dfc        11 hours ago        671 MB
kolla/centos-binary-nova-consoleauth            2.0.4               5bb4f725b42d        11 hours ago        671 MB
kolla/centos-binary-nova-scheduler              2.0.4               00ddedda23c4        11 hours ago        671 MB
kolla/centos-binary-glance-registry             2.0.4               e59b0948281e        11 hours ago        732 MB
kolla/centos-binary-nova-ssh                    2.0.4               01c404afdc8b        11 hours ago        672 MB
kolla/centos-binary-nova-api                    2.0.4               d29b3451c045        11 hours ago        671 MB
kolla/centos-binary-nova-network                2.0.4               a5cead8aee0b        11 hours ago        672 MB
kolla/centos-binary-neutron-openvswitch-agent   2.0.4               36226e14b4d7        11 hours ago        670 MB
kolla/centos-binary-neutron-linuxbridge-agent   2.0.4               e9ad4cb7c6cc        11 hours ago        670 MB
kolla/centos-binary-nova-novncproxy             2.0.4               3689a51a5db8        11 hours ago        672 MB
kolla/centos-binary-nova-spicehtml5proxy        2.0.4               55fc9d8a62b5        11 hours ago        672 MB
kolla/centos-binary-neutron-metadata-agent      2.0.4               45f0f090cf38        11 hours ago        646 MB
kolla/centos-binary-cinder-base                 2.0.4               dd0c5b78af7b        11 hours ago        808 MB
kolla/centos-binary-neutron-server              2.0.4               171b7bab73ab        11 hours ago        646 MB
kolla/centos-binary-heat-api                    2.0.4               f334caf10d5a        11 hours ago        633 MB
kolla/centos-binary-horizon                     2.0.4               88ceecbc8cf8        11 hours ago        763 MB
kolla/centos-binary-heat-engine                 2.0.4               a53651463235        11 hours ago        633 MB
kolla/centos-binary-heat-api-cfn                2.0.4               7ec6cdd4b04b        11 hours ago        633 MB
kolla/centos-binary-neutron-l3-agent            2.0.4               d4744d180b09        11 hours ago        646 MB
kolla/centos-binary-neutron-dhcp-agent          2.0.4               d4744d180b09        11 hours ago        646 MB
kolla/centos-binary-glance-base                 2.0.4               f78fa9de5c7b        11 hours ago        732 MB
kolla/centos-binary-nova-base                   2.0.4               aa7ae5ae4818        11 hours ago        648 MB
kolla/centos-binary-neutron-base                2.0.4               ed5f4f60a6f4        11 hours ago        646 MB
kolla/centos-binary-heat-base                   2.0.4               4798734eb0d4        11 hours ago        610 MB
kolla/centos-binary-keystone                    2.0.4               55cf2686b33a        11 hours ago        644 MB
kolla/centos-binary-openstack-base              2.0.4               9d511be689b7        22 hours ago        572 MB
kolla/centos-binary-mariadb                     2.0.4               cb4c65a6a637        22 hours ago        682 MB
kolla/centos-binary-openvswitch-vswitchd        2.0.4               0179076733aa        22 hours ago        380 MB
kolla/centos-binary-openvswitch-db-server       2.0.4               a9e0e1bd0968        22 hours ago        380 MB
kolla/centos-binary-rabbitmq                    2.0.4               cbaeb0b64930        22 hours ago        438 MB
kolla/centos-binary-memcached                   2.0.4               11aa506130a6        22 hours ago        403 MB
kolla/centos-binary-heka                        2.0.4               89c723045d40        22 hours ago        420 MB
kolla/centos-binary-cron                        2.0.4               4b0b36b058a1        22 hours ago        366 MB
kolla/centos-binary-keepalived                  2.0.4               7fbc06505ddb        23 hours ago        411 MB
kolla/centos-binary-haproxy                     2.0.4               7f65eba43909        23 hours ago        367 MB
centos                                          latest              49f7960eb7e4        5 weeks ago         200 MB

问题四: openvswitch_db 容器无法启动

问题描述

重新编译了镜像,可是 deploy 的时候又出现了以下错误

1
2
3
4
5
6
TASK: [neutron | Waiting the openvswitch_db service to be ready] ************** 
failed: [localhost] => {"attempts": 30, "changed": false, "cmd": ["docker", "exec", "openvswitch_db", "ovs-vsctl", "--no-wait", "show"], "delta": "0:00:00.057827", "end": "2018-07-12 08:04:09.663118", "failed": true, "rc": 1, "start": "2018-07-12 08:04:09.605291", "stdout_lines": [], "warnings": []}
stderr: Error response from daemon: Container 03426314b560db08c762a8f9aebdb4423571a29ba1c22862e3415ac913289c21 is restarting, wait until the container is running
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

和问题一的报错一致,查看容器日志

1
2
3
4
5
6
7
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log'
ovsdb-server: I/O error: open: /etc/openvswitch/conf.db failed (No such file or directory)

排查问题

之前使用的镜像是直接从dockerhub pull的官方的镜像,tag 是 2.0.2,而我使用的 kolla 的版本是 2.0.4。对比了下 2.0.2 和 2.0.4 的openvswitch_db 部分的代码,问题就很明了了。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
tag:2.0.2

kolla/docker/openvswitch/openvswitch-db-server/extend_start.sh

#!/bin/bash

mkdir -p "/run/openvswitch"
if [[ ! -e "/etc/openvswitch/conf.db" ]]; then
    ovsdb-tool create "/etc/openvswitch/conf.db"
fi
1
2
3
4
5
6
7
8
tag:mitaka-eol

kolla/docker/openvswitch/openvswitch-db-server/extend_start.sh

mkdir -p "/run/openvswitch"
if [[ ! -e "/var/lib/openvswitch/conf.db" ]]; then
    ovsdb-tool create "/var/lib/openvswitch/conf.db"
fi

2.0.2的镜像会创建 /etc/openvswitch/conf.db 这个文件,而使用2.0.4的版本启动命令是

1
2
3
4
{
    "command": "/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
    "config_files": []
}

所有会导致问题一的出现,而我重新 build 了镜像,也就是说现在容器启动的时候创建的是 /var/lib/openvswitch/conf.db 。而我修改了启动命令,所以会找不到 /etc/openvswitch/conf.db 文件。解决办法是还原问题一所作的修改:

1
2
3
4
5
6
# vim /usr/share/kolla/ansible/roles/neutron/templates/openvswitch-db-server.json.j2

{
    "command": "/usr/sbin/ovsdb-server /var/lib/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/db.sock --log-file=/var/log/openvswitch/ovsdb-server.log",
    "config_files": []
}

验证操作

重新deploy后成功部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# kolla-ansible deploy

……
TASK: [manila | Creating Manila database] ************************************* 
skipping: [localhost]

TASK: [manila | Reading json from variable] *********************************** 
skipping: [localhost]

TASK: [manila | Creating Manila database user and setting permissions] ******** 
skipping: [localhost]

TASK: [manila | Running Manila bootstrap container] *************************** 
skipping: [localhost]

TASK: [manila | Starting manila-api container] ******************************** 
skipping: [localhost]

TASK: [manila | Starting manila-scheduler container] ************************** 
skipping: [localhost]

TASK: [manila | Starting manila-share container] ****************************** 
skipping: [localhost]

PLAY RECAP ******************************************************************** 
localhost                  : ok=311  changed=123  unreachable=0    failed=0   

问题五: nova_compute 和 nova_libvirt 容器启动失败

问题描述

接问题四,虽然部署成功了,但是查看容器状态,nova 的两个容器总是在启动中

1
2
3
4
# docker ps

648c226f0980        kolla/centos-binary-nova-compute:2.0.4                "kolla_start"            3 days ago          Restarting (0) 21 hours ago                       nova_compute
c492de413c81        kolla/centos-binary-nova-libvirt:2.0.4                "kolla_start"            3 days ago          Restarting (6) 21 hours ago                       nova_libvirt

查看对应容器的日志 nova_compute 容器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# docker logs 648c226f0980
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Removing existing destination: /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/nova.conf to /etc/nova/nova.conf
INFO:__main__:Setting permissions for /etc/nova/nova.conf
INFO:__main__:Writing out command to execute
Running command: 'nova-compute'
/usr/lib/python2.7/site-packages/pkg_resources/__init__.py:187: RuntimeWarning: You have iterated over the result of pkg_resources.parse_version. This is a legacy behavior which is inconsistent with the new version class introduced in setuptools 8.0. In most cases, conversion to a tuple is unnecessary. For comparison of versions, sort the Version instances directly. If you have another use case requiring the tuple, please file a bug with the setuptools project describing that need.
  stacklevel=1,
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/queue.py", line 118, in switch
    self.greenlet.switch(value)
  File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 683, in run_service
    raise SystemExit(1)
SystemExit: 1

nova_libvirt 容器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# docker logs c492de413c81

INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Removing existing destination: /etc/libvirt/libvirtd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/libvirtd.conf to /etc/libvirt/libvirtd.conf
INFO:__main__:Setting permissions for /etc/libvirt/libvirtd.conf
INFO:__main__:Removing existing destination: /etc/libvirt/qemu.conf
INFO:__main__:Copying /var/lib/kolla/config_files/qemu.conf to /etc/libvirt/qemu.conf
INFO:__main__:Setting permissions for /etc/libvirt/qemu.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/libvirtd --listen'

问题排查

1. google

查找到此问题是 kolla-ansible 的一个 bug,详细介绍见 Fix nova-libvirt and nova-compute fails to deploy

2. 依照上文的方法进行修复

1
2
3
4
5
6
7
8
9
# vim /usr/share/kolla/ansible/roles/nova/templates/libvirtd.conf.j2

+listen_tls = 0
listen_tcp = 1
auth_tcp = "none"
ca_file = ""
log_level = 3
log_outputs = "3:file:/var/log/kolla/libvirt/libvirtd.log"
listen_addr = "{{ hostvars[inventory_hostname]['ansible_' + api_interface]['ipv4']['address'] }}"

验证

清除之前的容器重新 deploy 后,nova_compute 和 nova_libvirt 状态正常

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# /root/kolla/tools/cleanup-containers
# /root/kolla/tools/cleanup-host
# kolla-ansible deploy

# docker ps

......

cffc9fc2774b        kolla/centos-binary-nova-compute:2.0.4                "kolla_start"            5 minutes ago       Up 5 minutes                            nova_compute
5f27c8052238        kolla/centos-binary-nova-libvirt:2.0.4                "kolla_start"            5 minutes ago       Up 5 minutes                            nova_libvirt

问题六: dashboard 无法访问

问题描述

完成以上步骤后,发现控制台无法访问,端口已经监听,浏览器访问报 “504 Gateway Time-out”

问题排查

1. 查看 dashboard 日志

1
2
3
4
5
# docker exec -it heka bash
(heka)[heka@allinone /]$ tail -50f /var/log/kolla/horizon/horizon.log
[Mon Jul 16 05:25:13.065432 2018] [core:error] [pid 41] [client 172.16.15.246:59248] End of script output before headers: django.wsgi
[Mon Jul 16 05:31:23.408902 2018] [core:error] [pid 43] [client 172.16.15.227:57733] End of script output before headers: django.wsgi
[Mon Jul 16 05:31:33.443843 2018] [core:error] [pid 40] [client 172.16.15.246:36708] End of script output before headers: django.wsgi

2. dashboard无法访问的问题

之前使用 packstack 安装M版也遇到 dashboard 无法访问的问题,问题和此问题一致。详情可以参考 Openstack Mitaka: can not access dashboard(internal server 500)

3. 修改dashboard文件,加入以下内容

1
2
3
4
5
6
# vim /etc/kolla/horizon/horizon.conf

    WSGIScriptReloading On
    WSGIDaemonProcess horizon-http processes=5 threads=1 user=horizon group=horizon display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
    WSGIProcessGroup horizon-http
+   WSGIApplicationGroup %{GLOBAL}

为了使 clean 之后还能使用,还需要修改以下部分

1
2
3
4
5
6
# vim /usr/share/kolla/ansible/roles/horizon/templates/horizon.conf.j2

    WSGIScriptReloading On
    WSGIDaemonProcess horizon-http processes=5 threads=1 user=horizon group=horizon display-name=%{GROUP} python-path={{ python_path }}
    WSGIProcessGroup horizon-http
+   WSGIApplicationGroup %{GLOBAL}

验证

重启容器后验证 dashboard 能否打开

1
# docker restart horizon

问题七: dashboard 登陆报错

问题描述

可以正常打开 dashboard 了,但是登陆后有如下错误

排查问题

1. 查看 dashboard 日志

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# docker exec -it heka bash

(heka)[heka@allinone /]$ tail -50f /var/log/kolla/horizon/horizon.log
……
[Mon Jul 16 09:29:07.368340 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/template/base.py", line 905, in render
[Mon Jul 16 09:29:07.368353 2018] [:error] [pid 19]     bit = self.render_node(node, context)
[Mon Jul 16 09:29:07.368373 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/template/base.py", line 919, in render_node
[Mon Jul 16 09:29:07.368387 2018] [:error] [pid 19]     return node.render(context)
[Mon Jul 16 09:29:07.368399 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/templatetags/i18n.py", line 145, in render
[Mon Jul 16 09:29:07.368412 2018] [:error] [pid 19]     result = translation.ungettext(singular, plural, count)
[Mon Jul 16 09:29:07.368425 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/utils/translation/__init__.py", line 88, in ungettext
[Mon Jul 16 09:29:07.368438 2018] [:error] [pid 19]     return _trans.ungettext(singular, plural, number)
[Mon Jul 16 09:29:07.368451 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/utils/translation/trans_real.py", line 381, in ungettext
[Mon Jul 16 09:29:07.368464 2018] [:error] [pid 19]     return do_ntranslate(singular, plural, number, 'ungettext')
[Mon Jul 16 09:29:07.368506 2018] [:error] [pid 19]   File "/usr/lib/python2.7/site-packages/django/utils/translation/trans_real.py", line 358, in do_ntranslate
[Mon Jul 16 09:29:07.368550 2018] [:error] [pid 19]     return getattr(t, translation_function)(singular, plural, number)
[Mon Jul 16 09:29:07.368571 2018] [:error] [pid 19]   File "/usr/lib64/python2.7/gettext.py", line 411, in ungettext
[Mon Jul 16 09:29:07.368585 2018] [:error] [pid 19]     tmsg = self._catalog[(msgid1, self.plural(n))]
[Mon Jul 16 09:29:07.368597 2018] [:error] [pid 19] AttributeError: DjangoTranslation instance has no attribute 'plural'

2. 查看 dashboard 国际化文件夹

发现中文等目录中内容都为空,对比正常环境得知此目录中应该有 LC_MESSAGES 文件,用于生成国际化文件。

1
2
3
# docker exec -it horizon bash

(horizon)[root@allinone ~]# ls /usr/lib/python2.7/site-packages/openstack_dashboard/locale/zh_CN

3. 启动容器,模拟 horizon 镜像的生成

以 kolla/centos-binary-openstack-base:2.0.4 为镜像创建一个容器

1
# docker run -it kolla/centos-binary-openstack-base:2.0.4 /bin/bash

在其中执行安装生成 horizon 镜像的操作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# vim /usr/share/kolla/docker/horizon/Dockerfile.j2

RUN yum -y install \
        openstack-dashboard \
        httpd \
        mod_wsgi \
        gettext \
    && yum clean all \
    && useradd --user-group horizon \
    && sed -i -r 's,^(Listen 80),#\1,' /etc/httpd/conf/httpd.conf \
    && ln -s /usr/share/openstack-dashboard/openstack_dashboard /usr/lib/python2.7/site-packages/openstack_dashboard \
    && ln -s /usr/share/openstack-dashboard/static /usr/lib/python2.7/site-packages/static \
    && chown -R horizon: /etc/openstack-dashboard /usr/share/openstack-dashboard \
    && chown -R apache: /usr/share/openstack-dashboard/static
……

最终发现在执行了第一步安装 openstack-dashboard 后就没有在 /usr/share/openstack-dashboard/openstack_dashboard/locale 目录中生成对应的 LC_MESSAGES 文件

4. 使用 rpm 命令安装

直接使用 rpm 安装可以生成对应的国际化文件

1
2
3
4
()[root@e06f6d94adba ~]# yum remove openstack-dashboard -y
()[root@e06f6d94adba ~]# rpm -ivh  http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/openstack-dashboard-9.0.1-1.el7.noarch.rpm
()[root@e06f6d94adba ~]# ls /usr/share/openstack-dashboard/openstack_dashboard/locale/zh_CN/
LC_MESSAGES

5. 修改 horizon 的 dockerfile

先用 yum 安装 openstack-dashboard,解决依赖问题。在删除,使用 rpm 安装 openstack-dashboard

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# vim /usr/share/kolla/docker/horizon/Dockerfile.j2

RUN yum -y install \
        openstack-dashboard \
        httpd \
        mod_wsgi \
        gettext \
+   && rpm -e  openstack-dashboard && rm -rf /etc/openstack-dashboard/local_settings.rpmsave \
+   && rpm -ivh  http://mirror.neu.edu.cn/centos/7/cloud/x86_64/openstack-mitaka/openstack-dashboard-9.0.1-1.el7.noarch.rpm \
    && yum clean all \

6. 重新 build horizon镜像

1
# kolla-build --base centos -t binary horizon

验证

重新 deploy 后,dashboard 可以正常打开,登陆后可以正常显示

一个默默无闻的工程师的日常
Built with Hugo
主题 StackJimmy 设计