limx 2020-08-19
ip 主机名 角色 操作系统 192.168.157.130 master 主 centeos7 192.168.157.131 ode1 节点 Centos7 192.168.157.132 node2 节点 Centos7
systemctl stop firewalld.service systemctl stop iptables.service systemctl disable firewalld.service systemctl disable iptables.service
将 SELinux 设置为 permissive 模式(将其禁用)
setenforce 0 sed -i ‘s/^SELINUX=enforcing$/SELINUX=disabled/‘ /etc/selinux/config
添加docker的yum仓库
tee /etc/yum.repos.d/docker.repo <<-‘EOF‘ [dockerrepo] name=Docker Repository baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/ enabled=1 gpgcheck=1 gpgkey=https://yum.dockerproject.org/gpg EOF
安装docker
yum install -y docker-engine
启动docker
systemctl start docker
开启启动
systemctl enable docker
kubeadm:用来初始化集群(Cluster)kubelet:运行在集群中的所有节点上,负责启动 pod 和 容器。kubectl:这个是 Kubernetes 命令行工具。通过 kubectl 可以部署和管理应用,查看各种资源,创建、删除和更新各种组件。
安装这三个工具,为避免出现网络不可达错误,将谷歌镜像换成阿里云镜像
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg exclude=kube* EOF
使用yum安装
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
这种开机自启动
systemctl enable kubelet && systemctl start kubelet
注意:–pod-network-cidr=10.244.0.0/16 是 k8s 的网络插件所需要用到的配置信息,用来给 node 分配子网段。然后我们这边用到的网络插件是 flannel,就是这么配。
kubeadm init --pod-network-cidr=10.244.0.0/16
有如下的报错信息:
[ ~]# kubeadm init --pod-network-cidr=10.244.0.0/16 W0108 16:59:41.129123 31192 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) W0108 16:59:41.129373 31192 version.go:102] falling back to the local client version: v1.17.0 W0108 16:59:41.129733 31192 validation.go:28] Cannot validate kube-proxy config - no validator is available W0108 16:59:41.129757 31192 validation.go:28] Cannot validate kubelet config - no validator is available [init] Using Kubernetes version: v1.17.0 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 17.05.0-ce. Latest validated version: 19.03 [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using ‘kubeadm config images pull‘ error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 108.177.97.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 108.177.97.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 74.125.203.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 74.125.203.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 74.125.203.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.3-0: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 74.125.203.82:443: getsockopt: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.6.5: output: Error response from daemon: Get https://k8s.gcr.io/v1/_ping: dial tcp 74.125.203.82:443: getsockopt: connection refused , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher
这个报错是因为国内gcr.io无法访问(谷歌自己的容器镜像仓库),我们需要通过阿里云镜像下载下来,重新打上标签来解决。
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.17.0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.17.0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.17.0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.4.3-0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.6.5
重新打回标签
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.17.0 k8s.gcr.io/kube-apiserver:v1.17.0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.17.0 k8s.gcr.io/kube-controller-manager:v1.17.0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.17.0 k8s.gcr.io/kube-scheduler:v1.17.0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.0 k8s.gcr.io/kube-proxy:v1.17.0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.4.3-0 k8s.gcr.io/etcd:3.4.3-0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.6.5 k8s.gcr.io/coredns:1.6.5
安装好相关镜像后,再次执行init命令后,kubeadm就能成功安装。在成功安装好后,kubeadm会提示其他节点要加入集群需要执行的命令,同时提供加入集群需要的token,这个要记录下来。
Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.169.134:6443 --token nl8oq0.ly7ti8ibkg47t3kj --discovery-token-ca-cert-hash sha256:42b7132d686c233cc931628043dbe7ea277284a967e959521ee03d0872ff64df
根据上面的提示,master节点需要新建一个新用户来执行相关命令
su - kubernetes mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
为了使用更加便捷,启用 kubectl 命令的自动补全功能。
echo "source <(kubectl completion bash)" >> ~/.bashrc
Kubernetes 支持多种网络方案,这里我们使用 flannel。执行如下命令即可部署 flannel:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
在我的虚拟机执行的时候会报错Network file descriptor is not connected,我是直接在宿主机访问该地址拿到文件内容,并保存上传到服务器/kube-flannel.yml,因为-f后面接文件名。执行的时候还是报了如下错
[ ~]# kubectl apply -f kube-flannel.yml The connection to the server localhost:8080 was refused - did you specify the right host or port?
原因:kubenetes master没有与本机绑定,集群初始化的时候没有设置解决办法:执行以下命令
export KUBECONFIG=/etc/kubernetes/admin.conf
/etc/kubernetes/admin.conf这个文件主要是集群初始化的时候用来传递参数的
在两个node节点上分别执行
kubeadm join 192.168.169.134:6443 --token nl8oq0.ly7ti8ibkg47t3kj --discovery-token-ca-cert-hash sha256:42b7132d686c233cc931628043dbe7ea277284a967e959521ee03d0872ff64df
出现如下信息说明成功加入了集群
This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run ‘kubectl get nodes‘ on the control-plane to see this node join the cluster.
在每一个node节点安装quay.io/coreos/flannel:v0.11.0-amd64(这个可以直接下载)、k8s.gcr.io/pause 和 k8s.gcr.io/kube-proxy 这三个镜像,其中后面两个镜像具体版本可以执行kubeadm config images list 查看一下:
[ ~]# kubeadm config images list W0109 10:25:18.538307 10045 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) W0109 10:25:18.539340 10045 version.go:102] falling back to the local client version: v1.17.0 W0109 10:25:18.540594 10045 validation.go:28] Cannot validate kubelet config - no validator is available W0109 10:25:18.540636 10045 validation.go:28] Cannot validate kube-proxy config - no validator is available k8s.gcr.io/kube-apiserver:v1.17.0 k8s.gcr.io/kube-controller-manager:v1.17.0 k8s.gcr.io/kube-scheduler:v1.17.0 k8s.gcr.io/kube-proxy:v1.17.0 k8s.gcr.io/pause:3.1 k8s.gcr.io/etcd:3.4.3-0 k8s.gcr.io/coredns:1.6.5
还是同样的方法,从阿里云下载,再重现打标签。
docker pull quay.io/coreos/flannel:v0.11.0-amd64 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.0 docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.0 k8s.gcr.io/kube-proxy:v1.17.0 docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
在 master 节点上执行 kubectl get nodes 查看节点状态:如果节点状态处理NotReady,这是因为每个节点都需要启动若干组件,这些组件都是在 Pod 中运行,需要一段时间。查看Pod状态CrashLoopBackOff、ContainerCreating、Init:0/1 等都表明 Pod 没有就绪,只有 Running 才是就绪状态。
kubectl get pod --all-namespaces
查看Pod的具体情况
kubectl describe pod <Pod Name> --namespace=kube-system
当所有的 Pod 都处于 Running 状态后,可以发现所有的节点也就准备好了。至此 Kubernetes集群创建成功。
附录:
在node上执行kubeadm reset 可以断开node,然后重新join在master上执行kubeadm reset后可以重新init