微微一笑 2019-12-26
查看官方文档升级的操作需要做如下注意事项。
所以针对现有的情况,需要先升级至 2.6.5+ ,再升级 3.x。
2019/12/25
现有环境,使用 etcdv2 进行存储的 calico 数据。
[ kubelet]# which etcdv2 alias etcdv2=‘export ETCDCTL_API=2; /bin/etcdctl --ca-file /etc/etcd/ssl/etcd-root-ca.pem --cert-file /etc/etcd/ssl/etcd.pem --key-file /etc/etcd/ssl/etcd-key.pem --endpoints https://10.111.32.239:2379,https://10.111.32.241:2379,https://10.111.32.242:2379‘ [ kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4 /calico/ipam/v2/assignment/ipv4/block [ kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4/block /calico/ipam/v2/assignment/ipv4/block/10.20.134.64-26 /calico/ipam/v2/assignment/ipv4/block/10.20.253.64-26 /calico/ipam/v2/assignment/ipv4/block/10.20.28.192-26 /calico/ipam/v2/assignment/ipv4/block/10.20.51.128-26 /calico/ipam/v2/assignment/ipv4/block/10.20.78.0-26 /calico/ipam/v2/assignment/ipv4/block/10.20.112.64-26 /calico/ipam/v2/assignment/ipv4/block/10.20.15.128-26 /calico/ipam/v2/assignment/ipv4/block/10.20.235.0-26 /calico/ipam/v2/assignment/ipv4/block/10.20.53.64-26 /calico/ipam/v2/assignment/ipv4/block/10.20.72.128-26
根据文档中的说明,升级至 3.0 需要至少 2.6.5+ ,且需要进行一些手动的操作,因为 3.x 的使用 etcdv3, 而 2.6.x 的使用 etcdv2。
现在集群使用的是 2.6.1 的版本,先将其升级至 2.6.5+。
下载 calico.yaml 文件
[ v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/rbac.yaml [ v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/calico.yaml # 更改 calico.yaml 中的配置 [ v2.6]# sh -x modify_calico_yaml.sh
[ v2.6]# grep image calico.yaml image: quay.io/calico/node:v2.6.12 image: quay.io/calico/cni:v1.11.8 image: quay.io/calico/kube-controllers:v1.0.5 image: quay.io/calico/kube-controllers:v1.0.5
文档中说的一些升级步骤,比如先升级 calico-kube-controllers ,再升级 calico-node 的daemonset ,这里就直接 apply 新的资源文件
并不包含 calico 的 rbac 资源。
[ v2.6]# k239 apply -f calico.yaml configmap "calico-config" unchanged secret "calico-etcd-secrets" unchanged daemonset "calico-node" configured deployment "calico-kube-controllers" configured deployment "calico-policy-controller" configured serviceaccount "calico-kube-controllers" unchanged serviceaccount "calico-node" unchanged
提交之后, daemonset 的 calico-node 并没有更新,现在删除 pod ,使其更新
[ v2.6]# kubectl -n kube-system get pod -o wide |grep calico calico-kube-controllers-6768b96c5f-rdbjp 1/1 Running 0 4m 10.111.32.243 k8s-4.geotmt.com calico-node-45lnh 0/1 ContainerCreating 0 4h 10.111.32.241 k8s-2.geotmt.com calico-node-49mq7 1/1 Running 1 5h 10.111.32.243 k8s-4.geotmt.com calico-node-m86hr 1/1 Running 0 5h 10.111.32.244 k8s-5.geotmt.com calico-node-mm5fz 0/1 ContainerCreating 0 4h 10.111.32.239 k8s-1.geotmt.com calico-node-shrfw 1/1 Running 0 4h 10.111.32.242 k8s-3.geotmt.com calico-node-xx8hk 1/1 Running 0 5h 10.111.32.245 k8s-6.geotmt.com
其中一个的示例,新的 calico-node 其中有两个容器。
[ v2.6]# kubectl -n kube-system get pod -o wide |grep calico |grep k8s-6 calico-node-fj4t8 2/2 Running 0 25s 10.111.32.245 k8s-6.geotmt.com
测试 ping 其他节点的 pod 正常
bash-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: : <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: : <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP link/ether 6e:20:a3:45:42:49 brd ff:ff:ff:ff:ff:ff inet 10.20.235.12/32 scope global eth0 valid_lft forever preferred_lft forever bash-4.4# ping 10.20.15.135 PING 10.20.15.135 (10.20.15.135): 56 data bytes 64 bytes from 10.20.15.135: seq=0 ttl=62 time=1.133 ms 64 bytes from 10.20.15.135: seq=1 ttl=62 time=0.631 ms
这个版本的仍需手动添加 toleration,以便在 master 节点上部署 pod。
升级至 2.6.12 完成。
上述两条都满足。
[ net.d]# etcdctl version etcdctl version: 3.3.11 API version: 3.3
[ ansible]# wget https://github.com/projectcalico/calico-upgrade/releases/download/v1.0.5/calico-upgrade [ k8s_239]# ansible-playbook install_calico-upgrade.yml
使用 dry-run 执行测试
[ calico-upgrade]# calico-upgrade dry-run --output-dir=tmp --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg
[ calico-upgrade]# calico-upgrade start --ignore-v3-data --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg Preparing reports directory * creating report directory if it does not exist * validating permissions and removing old reports Checking Calico version is suitable for migration * determined Calico version of: v2.6.12 * the v1 API data can be migrated to the v3 API Validating conversion of v1 data to v3 * handling FelixConfiguration (global) resource * handling ClusterInformation (global) resource * handling FelixConfiguration (per-node) resources * handling BGPConfiguration (global) resource * handling Node resources * handling BGPPeer (global) resources * handling BGPPeer (node) resources * handling HostEndpoint resources * handling IPPool resources * handling GlobalNetworkPolicy resources * handling Profile resources * handling WorkloadEndpoint resources * data conversion successful Data conversion validated successfully Validating the v3 datastore * the v3 datastore is not empty ------------------------------------------------------------------------------- Successfully validated v1 to v3 conversion. You are about to start the migration of Calico v1 data format to Calico v3 data format. During this time and until the upgrade is completed Calico networking will be paused - which means no new Calico networked endpoints can be created. No Calico configuration should be modified using calicoctl during this time. Type "yes" to proceed (any other input cancels): yes Pausing Calico networking * successfully paused Calico networking in the v1 configuration Calico networking is now paused - waiting for 15s Querying current v1 snapshot and converting to v3 * handling FelixConfiguration (global) resource * handling ClusterInformation (global) resource * handling FelixConfiguration (per-node) resources * handling BGPConfiguration (global) resource * handling Node resources * handling BGPPeer (global) resources * handling BGPPeer (node) resources * handling HostEndpoint resources * handling IPPool resources * handling GlobalNetworkPolicy resources * handling Profile resources * handling WorkloadEndpoint resources * data converted successfully Storing v3 data * Storing resources in v3 format * success: resources stored in v3 datastore Migrating IPAM data * listing and converting IPAM allocation blocks * listing and converting IPAM affinity blocks * listing IPAM handles * storing IPAM data in v3 format * IPAM data migrated successfully Data migration from v1 to v3 successful * check the output for details of the migrated resources * continue by upgrading your calico/node versions to Calico v3.x ------------------------------------------------------------------------------- Successfully migrated Calico v1 data to v3 format. Follow the detailed upgrade instructions available in the release documentation to complete the upgrade. This includes: * upgrading your calico/node instances and orchestrator plugins (e.g. CNI) to the required v3.x release * running ‘calico-upgrade complete‘ to complete the upgrade and resume Calico networking See report(s) below for details of the migrated data. Reports: - name conversion: /root/calico-upgrade/calico-upgrade-report/convertednames
[ v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/rbac.yaml [ v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/calico.yaml
3.0的改变可参考 3.0release note
预先下载所需镜像
[ v3.0]# grep image calico.yaml image: quay.io/calico/node:v3.0.12 image: quay.io/calico/cni:v3.0.12 image: quay.io/calico/kube-controllers:v3.0.12
[ v3.0]# k239 apply -f calico.yaml configmap "calico-config" configured secret "calico-etcd-secrets" unchanged daemonset "calico-node" configured deployment "calico-kube-controllers" configured serviceaccount "calico-kube-controllers" unchanged serviceaccount "calico-node" unchanged
这里的 pod 可以实现滚动重启,待pod 都升级完成后。
[ calico-upgrade]# calico-upgrade complete --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg You are about to complete the upgrade process to Calico v3. At this point, the v1 format data should have been successfully converted to v3 format, and all calico/node instances and orchestrator plugins (e.g. CNI) should be running Calico v3.x. Type "yes" to proceed (any other input cancels): yes Completing upgrade Enabling Calico networking for v3 * successfully resumed Calico networking in the v3 configuration (updated ClusterInformation) Upgrade completed successfully ------------------------------------------------------------------------------- Successfully completed the upgrade process.
如不执行上述命令,会有如下报错
E1225 19:56:04.837028 3281 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests E1225 19:56:04.837049 3281 kuberuntime_manager.go:647] createPodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests E1225 19:56:04.837167 3281 pod_workers.go:186] Error syncing pod 1dd28cf0-270d-11ea-bd6c-c6a864ab864a ("demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)"), skipping: failed to "CreatePodSandbox" for "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"demo-deployment-6f4c6779b-b8zqq_default\" network: Calico is currently not ready to process requests"
升级至 3.0.12 成功。
根据 3.11 的 Upgrading Calico on Kubernetes
说明。升级时,只需要提交新的资源文件即可(本环境不涉及 Application Layer Policy
)。
这个版本的 calico 已经可以完整支持 k8s api 的datastore, 更新时要注意下载文件时是否与自己的环境契合。
本环境下载 etcd datastore 的版本。
[ v3.11]# wget https://docs.projectcalico.org/v3.11/manifests/calico-etcd.yaml # 修改其中关于 etcd 的配置 [ v3.11]# bash -x modify_calico_yaml.sh
[ v3.11]# grep image calico-etcd.yaml image: calico/cni:v3.11.1 image: calico/pod2daemon-flexvol:v3.11.1 image: calico/node:v3.11.1 image: calico/kube-controllers:v3.11.1
[ v3.11]# k239 apply -f calico-etcd.yaml secret "calico-etcd-secrets" unchanged configmap "calico-config" configured clusterrole "calico-kube-controllers" configured clusterrolebinding "calico-kube-controllers" configured clusterrole "calico-node" configured clusterrolebinding "calico-node" configured daemonset "calico-node" configured serviceaccount "calico-node" unchanged deployment "calico-kube-controllers" configured serviceaccount "calico-kube-controllers" unchanged
查看新版本的 pod, 每个 pod 内只有一个容器,这个版本的将 install-cni 和 flexvol-driver(旧版本没有) 作为了 initContainers
,所以常驻的就只有一个容器了
[ ~]# k239 -n kube-system get pod -o wide |grep calico calico-kube-controllers-85dc4fd46b-4wnmt 1/1 Running 0 1m 10.111.32.243 k8s-4.geotmt.com calico-node-4bgkc 1/1 Running 0 59s 10.111.32.241 k8s-2.geotmt.com calico-node-5jg2t 1/1 Running 0 31s 10.111.32.244 k8s-5.geotmt.com calico-node-9fn6r 1/1 Running 0 43s 10.111.32.245 k8s-6.geotmt.com calico-node-9n7dn 1/1 Running 0 1m 10.111.32.243 k8s-4.geotmt.com calico-node-fxr46 1/1 Running 0 1m 10.111.32.239 k8s-1.geotmt.com calico-node-pgh5c 1/1 Running 0 1m 10.111.32.242 k8s-3.geotmt.com
测试 pod 的跨主机通信
[ ~]# kubectl exec -it demo-deployment-6f4c6779b-b8zqq /bin/bash bash-4.4# ping 10.20.235.12 PING 10.20.235.12 (10.20.235.12): 56 data bytes 64 bytes from 10.20.235.12: seq=0 ttl=62 time=1.232 ms ^C --- 10.20.235.12 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 1.232/1.232/1.232 ms bash-4.4# ping 10.20.253.80 PING 10.20.253.80 (10.20.253.80): 56 data bytes 64 bytes from 10.20.253.80: seq=0 ttl=62 time=1.730 ms 64 bytes from 10.20.253.80: seq=1 ttl=62 time=1.385 ms ^C --- 10.20.253.80 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 1.385/1.557/1.730 ms bash-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: : <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: : <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP link/ether fa:d1:55:42:ab:6c brd ff:ff:ff:ff:ff:ff inet 10.20.15.163/32 scope global eth0 valid_lft forever preferred_lft forever
测试pod重建分配地址,成功
[ ~]# kubectl delete pod nginx-deployment-7b66d98974-2rh87 pod "nginx-deployment-7b66d98974-2rh87" deleted [ ~]# kubectl get pod nginx-deployment-7b66d98974-nd8h7 -o wide NAME READY STATUS RESTARTS AGE IP NODE nginx-deployment-7b66d98974-nd8h7 1/1 Running 0 1m 10.20.253.86 k8s-4.geotmt.com
calico 3.0.12 升级至 3.11.1 成功。