【原创】大叔经验分享(98)mesos slave启动失败

81103054 2020-01-10

mesos slave启动失败,查看状态如下:

# systemctl status mesos-slave
● mesos-slave.service - Mesos Slave
   Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Sat 2019-12-28 21:41:50 CST; 13s ago
  Process: 15627 ExecStart=/usr/bin/mesos-init-wrapper slave (code=exited, status=1/FAILURE)
 Main PID: 15627 (code=exited, status=1/FAILURE)

Dec 28 21:41:50 test-003 systemd[1]: Unit mesos-slave.service entered failed state.
Dec 28 21:41:50 test-003 systemd[1]: mesos-slave.service failed.

查看mesos-slave日志如下:

# journalctl -u mesos-slave -f -n 300
...
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.604262 15978 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.604274 15978 group.cpp:419] Trying to create path ‘/mesos‘ in ZooKeeper
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.602254 15961 slave.cpp:615] Agent resources: [{"name":"ports","ranges":{"range":[{"begin":80,"end":60000}]},"type":"RANGES"},{"name":"cpus","scalar":{"value":8.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":30987.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":95544.0},"type":"SCALAR"}]
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605845 15961 slave.cpp:623] Agent attributes: [  ]
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605868 15961 slave.cpp:632] Agent hostname: test003
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.605935 15977 task_status_update_manager.cpp:181] Pausing sending task status updates
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.606037 15982 detector.cpp:152] Detected a new leader: (id=‘79‘)
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.606160 15981 group.cpp:700] Trying to get ‘/mesos/json.info_0000000079‘ in ZooKeeper
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607014 15975 state.cpp:66] Recovering state from ‘/var/lib/mesos/meta‘
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607070 15975 state.cpp:742] No committed checkpointed resources found at ‘/var/lib/mesos/meta/resources/resources.info‘
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.607249 15981 zookeeper.cpp:262] A new leading master (:5050) is detected
Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.646075 15979 slave.cpp:6951] Finished recovering checkpointed state from ‘/var/lib/mesos/meta‘, beginning agent recovery
Dec 28 21:42:10 test-003 mesos-slave[15974]: E1228 21:42:10.649549 15979 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.
Dec 28 21:42:10 test-003 mesos-slave[15974]: ecovery
Dec 28 21:42:10 test-003 mesos-slave[15974]: ------------------------------------------------------------
Dec 28 21:42:10 test-003 mesos-slave[15974]: Old agent info:
Dec 28 21:42:10 test-003 mesos-slave[15974]: hostname: "test003"
Dec 28 21:42:10 test-003 mesos-slave[15974]: resources {
Dec 28 21:42:10 test-003 mesos-slave[15974]:   name: "ports"
Dec 28 21:42:10 test-003 mesos-slave[15974]:   type: RANGES
Dec 28 21:42:10 test-003 mesos-slave[15974]:   ranges {
Dec 28 21:42:10 test-003 mesos-slave[15974]:     range {
Dec 28 21:42:10 test-003 mesos-slave[15974]:       begin: 80
Dec 28 21:42:10 test-003 mesos-slave[15974]:       end: 60000
Dec 28 21:42:10 test-003 mesos-slave[15974]:     }
Dec 28 21:42:10 test-003 mesos-slave[15974]:   }
Dec 28 21:42:10 test-003 mesos-slave[15974]: }
Dec 28 21:42:10 test-003 mesos-slave[15974]: resources {
Dec 28 21:42:10 test-003 mesos-slave[15974]:   name: "cpus"
Dec 28 21:42:10 test-003 mesos-slave[15974]:   type: SCALAR
Dec 28 21:42:10 test-003 mesos-slave[15974]:   scalar {
Dec 28 21:42:10 test-003 mesos-slave[15974]:     value: 8
Dec 28 21:42:10 test-003 mesos-slave[15974]:   }
Dec 28 21:42:10 test-003 mesos-slave[15974]: }
Dec 28 21:42:10 test-003 mesos-slave[15974]: resources {
Dec 28 21:42:10 test-003 systemd[1]: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Dec 28 21:42:10 test-003 systemd[1]: Unit mesos-slave.service entered failed state.
Dec 28 21:42:10 test-003 systemd[1]: mesos-slave.service failed.

注意关键的几行

Dec 28 21:42:10 test-003 mesos-slave[15974]: I1228 21:42:10.646075 15979 slave.cpp:6951] Finished recovering checkpointed state from ‘/var/lib/mesos/meta‘, beginning agent recovery
Dec 28 21:42:10 test-003 mesos-slave[15974]: E1228 21:42:10.649549 15979 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.

尝试从/var/lib/mesos/meta恢复,但是失败了,然后进程退出,

# rm -rf /var/lib/mesos/meta/*

将meta目录删除之后再启动mesos slave成功,问题解决;

相关推荐