使用Kubeadm工具快速安装K8S集群

预备工作

  • 准备两台 CentOS 7.8 的虚拟机

  • 安装 docker engineer

    1. 清理docker,没有安装过 docker ,没有必要执行

      1
      2
      3
      4
      5
      6
      7
      8
      sudo yum remove docker \
      docker-client \
      docker-client-latest \
      docker-common \
      docker-latest \
      docker-latest-logrotate \
      docker-logrotate \
      docker-engine
    2. 安装 yum-utils,提供了组件 yum-config-manager

      sudo yum install -y yum-utils

    3. 添加 yum 源
      sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

    4. 安装 docker engine
      sudo yum install docker-ce docker-ce-cli containerd.io

    5. 开启 docker 服务
      sudo systemctl start docker

    6. automatic startup
      sudo systemctl enable docker

  • 关闭防火墙
1
2
systemctl disable firewalld 
systemctl stop firewalld

可以配置防火墙的端口号,不用关闭防火墙,各个组件的端口号

组件名称 默认端口
API Server 8080(HTTP 非安全端口号)
6443 (HTTPS 安全端口号)
Controller Manager 10252
Scheduler 10251
kubelet 10250
10255(只读端口号)
etcd 2379(供客户端访问)
2380 (供 etcd 集群内部节点之间访问)
集群 DNS 服务 53(UPD)
53(TCP)
  • 建议在主机上禁用 SELinux
    修改 /etc/sysconfig/selinux,将 SELINUX=enforcing 修改为 disabled,让容器可以读取主机文件系统

安装 kubeadm

  • 配置 k8s 的 官方yum 源,修改 /etc/yum.repos.d/kubernetes.repo, 内容如下:
1
2
3
4
5
6
7
8
9
[kubernetes] 
name=Kubernetes Repository
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
  • 运行 yum install 命令安装 kubeadm,kubelet 和 kubectl:

    yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

  • 启动 kubeadm

    1
    2
    systemctl start kubelet
    systemctl enable kubelet
  • kubeadm 需要关闭 linux 的 swap 系统交换区

    swapoff -a,这个命令只是临时关闭,永久关闭,注释掉文件 /etc/fstabswap 那一行,然后重启,建议永久关闭。

修改 kubeadm 的默认配置

查看初始化的配置,也称为控制平面(Control Plane)配置和加入节点的配置,

kubeadm config print init-defaults 查看 kubeadm init 命令默认参数的内容

init default 配置

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: node
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: 1.22.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}

kubeadm config print join-defaults 查看 kubeadm join 命令默认参数的内容

join defaults 配置
    
apiVersion: kubeadm.k8s.io/v1beta3
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
  bootstrapToken:
    apiServerEndpoint: kube-apiserver:6443
    token: abcdef.0123456789abcdef
    unsafeSkipCAVerification: true
  timeout: 5m0s
  tlsBootstrapToken: abcdef.0123456789abcdef
kind: JoinConfiguration
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: node01
  taints: null
  
  

目前暂用默认配置

下载 Kubernetes 的相关镜像

需要预先下载所需要的镜像,通过命令 kubeadm config images list 查看

kubeadm config images list

1
2
3
4
5
6
7
k8s.gcr.io/kube-apiserver:v1.22.3
k8s.gcr.io/kube-controller-manager:v1.22.3
k8s.gcr.io/kube-scheduler:v1.22.3
k8s.gcr.io/kube-proxy:v1.22.3
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns/coredns:v1.8.4

执行命令:kubeadm config images pull,如果配置了 config 文件就加上参数,--config=/path/config.yaml

运行 kubeadm init 命令安装 Master 节点

首先执行预检查命令,kubeadm init phase reflight 确保主机环境符合安装要求,然后再通过命令 kubeadm init 命令安装 K8S 的 Master 节点。

要说明的一点,K8S 默认设置 cgroup 驱动(cgroupdriver)为 systemd, 而 docker 服务的 cgroup 驱动默认值为 cgroupfs,建议修改为 systemd。修改 Docker 服务的配置文件(默认的位置:/etc/docker/daemon.json),如果没有的话,就新建一个 daemon.json,添加一下配置:

1
2
3
{
"exec-opts":["native.cgroupdriver=systemd"]
}

执行预检查命令显示:

1
2
3
4
5
6
[preflight] Running pre-flight checks
[WARNING Hostname]: hostname "node01" could not be reached
[WARNING Hostname]: hostname "node01": lookup node01 on 114.114.114.114:53: no such host error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
[ERROR Mem]: the system RAM (972 MB) is less than the minimum 1700 MB
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1

上面 error 错误中,前两个 error 原因主要是虚拟机的资源不够,加大到 2 核心 和 2G 内存,最后一个 error 意思是使用了 IPV6 的地址,并且 /proc/sys/net/bridge/bridge-nf-call-iptables 值不为1,通过 echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables 即可。

再次执行预检查命令,只有 warn ,没有 error 了,执行初始化命令。启动失败:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

kubelet 没有启动起来,查看日志:tail /var/log/messages

1
2
3
4
5
6
7
8
9
10
Nov  1 15:04:51 node01 kubelet: I1101 15:04:51.235877    8809 docker_service.go:242] "Hairpin mode is set" hairpinMode=hairpin-veth
Nov 1 15:04:51 node01 kubelet: I1101 15:04:51.235947 8809 cni.go:239] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
Nov 1 15:04:51 node01 kubelet: I1101 15:04:51.242254 8809 cni.go:239] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
Nov 1 15:04:51 node01 kubelet: I1101 15:04:51.242306 8809 docker_service.go:257] "Docker cri networking managed by the network plugin" networkPluginName="cni"
Nov 1 15:04:51 node01 kubelet: I1101 15:04:51.242331 8809 cni.go:239] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
Nov 1 15:04:51 node01 kubelet: I1101 15:04:51.249360 8809 docker_service.go:264] "Docker Info" dockerInfo=&{ID:ZB4Z:FUQW:IXZR:H3XP:E4PL:WXGO:4ODH:A72V:BDIY:D4AJ:F6S7:M2J2 Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:7 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:false Debug:false NFd:25 OomKillDisable:true NGoroutines:34 SystemTime:2021-11-01T15:04:51.242765302+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs CgroupVersion:1 NEventsListener:0 KernelVersion:3.10.0-1160.45.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion:7 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc0003fa070 NCPU:2 MemTotal:2093301760 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:node01 Labels:[] ExperimentalBuild:false ServerVersion:20.10.10 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:5b46e404f6b9f661a205e28d59c982d3634148f8 Expected:5b46e404f6b9f661a205e28d59c982d3634148f8} RuncCommit:{ID:v1.0.2-0-g52b36a2 Expected:v1.0.2-0-g52b36a2} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[WARNING: bridge-nf-call-ip6tables is disabled]}
Nov 1 15:04:51 node01 kubelet: E1101 15:04:51.249384 8809 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
Nov 1 15:04:51 node01 systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 1 15:04:51 node01 systemd: Unit kubelet.service entered failed state.
Nov 1 15:04:51 node01 systemd: kubelet.service failed.

发现是 K8S 和 docker 的 cgroup driver 不同。刚才配置了 docker 的 cgroup 的 daemon.json文件没有重启 docker,重启 docker 服务,让配置文件生效。 执行:

systemctl daemon-reload && systemctl restart docker

执行 kubeadm reset 命令回滚通过 kubeadm init 操作产生的文件,然后重新执行 kubeadm init

出现如下的日志,表示 Master 节点 Control Panel 安装成功了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[mark-control-plane] Marking the node node01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 810man.kfqm2lvq8mfqq2k7
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.170.111:6443 --token 810man.kfqm2lvq8mfqq2k7 \
--discovery-token-ca-cert-hash sha256:e89417229adccca09076f811a36cb50c0e19702c3d2bd0b1a4808c7f68ea1785

根据提示需要配置 CA 证书,如果是 root 用户,直接执行 export KUBECONFIG=/etc/kubernetes/admin.conf,然后就可以使用 kubectl 操作集群了。

但是这个只是临时的,关闭终端,下次登录会出现下面类似错误:

1
The connection to the server localhost:8080 was refused - did you specify the right host or port?

所以需 要执行 cp -i /etc/kubernetes/admin.conf $HOME/.kube/config,这样就可以了。

目前 Master 节点已经可以工作了,但是没有 Worker 节点,并且现在的集群是没有网络的,需要配置网络。

加入新的 node 节点

  • 准备 yum 源,步骤和 master 一样。

  • 安装 kubeadm 和 kubelet

    在 worker 节点上无需安装 kubectl

    yum install -y kubelet kubeadm --disableexcludes=kubernetes

    启动 kubelet 的服务器,并设置为开机启动

    systemctl start kubelet && systemctl enable kubelet

  • kubeadm 需要关闭 linux 的 swap 系统交换区

    swapoff -a

  • 修改新 node 节点的 docker cgroup 的配置, /etc/docker/daemon.json

    1
    2
    3
    {
    "exec-opts":["native.cgroupdriver=systemd"]
    }
  • 使用 kubeadm join 命令加入集群,可以从上面的安装完 Master 的提示中复制 join 的命令:

    kubeadm join 192.168.170.111:6443 --token 810man.kfqm2lvq8mfqq2k7 --discovery-token-ca-cert-hash sha256:e89417229adccca09076f811a36cb50c0e19702c3d2bd0b1a4808c7f68ea1785

    最后出现日志,This node has joined the cluster xxxxx

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    [preflight] Running pre-flight checks
    [WARNING Hostname]: hostname "node02" could not be reached
    [WARNING Hostname]: hostname "node02": lookup node02 on 114.114.114.114:53: no such host
    [preflight] Reading configuration from the cluster...
    [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Starting the kubelet
    [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.

    Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

    如果执行 kubeadm join 失败,可以执行 kubeadm reset 命令恢复原状,然后再重新执行 join 命令。同理 kubeadm init 命令也是一样。

    根据提示,在 Master 节点上执行命令 kubectl get nodes,获取到当前集群中的 node,

    1
    2
    3
    4
    [root@node01 docker]# kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    node01 NotReady control-plane,master 67m v1.22.3
    node02 NotReady <none> 5m38s v1.22.3

    这时候是没有网络的,所以节点都是 NOT READY 的状态,在查看 /var/log/messages 的日志时,一直报网络的错误:

    1
    2
    Nov  1 16:18:50 node01 kubelet: I1101 16:18:50.622393   13344 cni.go:239] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
    Nov 1 16:18:52 node01 kubelet: E1101 16:18:52.110886 13344 kubelet.go:2337] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"

    所以接下来,我们需要配置网络 CNI 插件。

    可以通过设置,让 Master 节点也作为 Node 角色。删除 Node 的 label node-role.kubernetes.io/master即可。通过命令:

    kubectl taint nodes --all node-role.kubernetes.io/master

安装 CNI 网络插件

对于 CNI 网络插件的安装,选择 Calico CNI 插件,在 Master 节点,运行命令一键安装:

kubectl apply -f 'https://docs.projectcalico.org/manifests/calico.yaml'

安装完之后,再次执行 kubectl get nodes ,node 已经 READY 状态。

验证 Kubernetes 集群是否正常工作

运行查看 POD 命令,验证 K8S 集群服务的 POD 是否创建成功且正常运行

kubectl get pods --all-namespaces,输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node01 docker]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-75f8f6cc59-cfwt7 1/1 Running 0 6m19s
kube-system calico-node-2gb5b 1/1 Running 0 6m19s
kube-system calico-node-5786l 1/1 Running 0 6m19s
kube-system coredns-78fcd69978-774r8 1/1 Running 0 93m
kube-system coredns-78fcd69978-dgpq4 1/1 Running 0 93m
kube-system etcd-node01 1/1 Running 1 93m
kube-system kube-apiserver-node01 1/1 Running 1 93m
kube-system kube-controller-manager-node01 1/1 Running 1 93m
kube-system kube-proxy-59xsr 1/1 Running 0 31m
kube-system kube-proxy-kdrt9 1/1 Running 0 93m
kube-system kube-scheduler-node01 1/1 Running 1 93m

至此, 一个正常的 K8S 集群就已经安装成功了。但是这个集群是不可靠的集群,一旦 Master 节点 down 调,那么整个集群就会崩溃,所以生产环境中还是要搭建高可用的集群。

补充说明

token 是有效期的,默认是 24h,如果过了有效期,新的 node 就无法加入,就会报错,比如:

1
2
3
4
5
6
7
8
9
10
11
12
I1103 13:57:58.187573   18232 checks.go:403] checking whether the given node name is valid and reachable using net.LookupHost
I1103 13:57:58.187665 18232 checks.go:618] validating kubelet version
I1103 13:57:58.222792 18232 checks.go:132] validating if the "kubelet" service is enabled and active
I1103 13:57:58.228624 18232 checks.go:205] validating availability of port 10250
I1103 13:57:58.228884 18232 checks.go:282] validating the existence of file /etc/kubernetes/pki/ca.crt
I1103 13:57:58.228894 18232 checks.go:432] validating if the connectivity type is via proxy or direct
I1103 13:57:58.228917 18232 join.go:475] [preflight] Discovering cluster-info
I1103 13:57:58.228939 18232 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "192.168.170.111:6443"
I1103 13:57:58.241805 18232 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "810man", will try again
I1103 13:58:04.154621 18232 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "810man", will try again
I1103 13:58:10.576176 18232 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "810man", will try again
I1103 13:58:16.580191 18232 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "810man", will try again

那么如何生成新的 token 呢,可以在 master 节点上,就是 control plane 的节点上,执行命令:
kubeadm token create --print-join-command
会输出新的 kubeadm join 的提示,也可以带上参数 --ttl=0,表示永不过期,输出如下:

1
kubeadm join 192.168.170.111:6443 --token cyvp4j.05wpywxo03q178v4 --discovery-token-ca-cert-hash sha256:e89417229adccca09076f811a36cb50c0e19702c3d2bd0b1a4808c7f68ea1785

再次放到新的 node 节点执行即可。然后再在 master 节点执行 kubectl get nodes 命令查看是否有新加入的节点。

作者

操先森

发布于

2021-11-01

更新于

2021-11-01

许可协议

评论