Apache Mesos Master

The mesos_master daemon is responsible for delivering tasks to each mesos-slave, resource pooling, and framework integration and communication.

Links

Prereq

You must first install Zookeeper on all masters before you have any chance of maintaining a fault resistant cluster.

yum install java-1.7.0-openjdk zookeeper rpm python-setuptools
echo 1 | sudo tee -a /var/lib/zookeeper/myid >/dev/null #ensure each master has 
   a unique id
zookeeper-server-initialize myid=1 --force

vim /etc/zookeeper/conf/zoo.cfg #All master servers should have the exact same zoo.cfg
maxClientCnxns=50
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=10.0.0.[Z1]:2888:3888
server.2=10.0.0.[Z2]:2888:3888
server.3=10.0.0.[Z3]:2888:3888


vim /usr/local/bin/start_zookeeper

#!/bin/bash
##Start Zookeeper
/usr/bin/zookeeper-server start

#Also add the same to /usr/local/bin/start_services.sh

Install

cd /opt
rpm -Uvh http://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm 
OR rpm -Uvh http://repos.mesosphere.io/el/6/noarch/RPMS/mesosphere-el-repo-6-2.noarch.rpm -- used these in the wxdappa environment
#On Master
yum install docker jpackage-utils
yum install mesos chronos marathon
#On Slave
yum install mesos

Running Mesos Master

  1. You must try and start all masters at the same time, so that they can elect a master through zookeeper.

Click here for all configuration options

vim /usr/local/bin/start_mesos # change ip and hostname to match local machine.

#!/bin/sh

##Start Mesos
/usr/sbin/mesos-master --work_dir=/var/run/mesos --ip=10.0.0.[Z1] 
   --hostname=mesos01 --zk=zk://10.0.0.[Z1]:2181,10.0.0.[Z2]:2181,10.0.0.[Z3]:2181/mesos 
   --cluster=Modeling --quorum=2 >/dev/null 2&>1 &

vim /usr/local/bin/start_services.sh ##add the same line here so you can start all services at once on startup

chmod 744 /usr/local/bin/start_services.sh
vim /etc/rc.local
+ /usr/local/bin/start_services.sh &

Running Mesos Slave

on wxdappa01-a04 I have installed mesos through the rpm

vim /usr/local/bin/start_services.sh

#!/bin/sh

#Start Mesos-slave
/usr/sbin/mesos-slave --master=zk://10.0.0.[Z1]:2181,10.0.0.[Z2]:2181,10.0.0.[Z3]:2181/mesos 
>/dev/null 2>&1 &

add to the rc.local

chmod 744 /usr/local/bin/start_services.sh
vim /etc/rc.local
+ /usr/local/bin/start_services.sh &

Start Services

For each piece of mesos that I install I created a /usr/local/bin/start_servicename and append it to /usr/local/bin/start_services.sh. I do this because of the nature that one must invoke these programs to run. If you run them direct from the cmd then they will be running under your shell session and not as a service under the root process tree. Should your session close so to would the service end. To avoid this problem I have created these start scripts that “bounce” the service out to run under the root process tree.

  • /usr/local/bin/start_services.sh # start all the services installed on the machine under the mesos frameworks, is not used by cfengine, only at startup(/etc/rc.local).
  • /usr/local/bin/start_zookeeper #starts only the zookeeper service on the masters and is used by CFengine should the process stop to start it again automatically
  • /usr/local/bin/start_mesos #starts only the mesos master/slave service and is used by CFengine should the process stop to start it again automatically
  • /usr/local/bin/start_marathon #starts only the marathon service on the masters and is used by CFengine should the process stop to start it again automatically
  • /usr/local/bin/start_chronos #starts only the chronos¬†service on the masters and is used by CFengine should the process stop to start it again automatically
  • /usr/local/bin/start_singularity #starts only the marathon service on the masters and is used by CFengine should the process stop to start it again automatically

This is no longer necessary as know when I use the mesosphere packages they install systemd start scripts which allow much easier administration of the services.

CFEngine and Monit

For the Mesos Masters group CFEngine has a policy to ensure that the mesos-master, and zookeeper-server daemons are always running and if they are not running to start them.
I decided that CFEngine while great at what it does, was too slow at restarting processes (especially as we move to a more HA environment) so I needed to find something that would catch failures and respond faster than 5 minutes. Monit appears to be the answer. Step 2 of this roll out is to have CFEngine monitor Monit so we can answer the “Who watches the watchmen?” concern.

API

 

See My Page On the seperate Frameworks and APIs that I use as part of Apache MEsos

Zookeeper Maintenance

zookeeper takes many snapshots and logs as time goes on and this takes up a lot of space very quickly, I have set crontab to run once a day to keep that down to 3 logs and 3 snapshots. This should be done on every mesos-master machine.

 01 01 * * *  /bin/java -Dlog4j.configuration=file:///etc/zookeeper/conf.dist/log4j.properties -cp /usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:conf org.apache.zookeeper.server.PurgeTxnLog /var/lib/zookeeper/ /var/lib/zookeeper/ -n 3