Getting High with Lenny
The aim here is to set up some high available services on Debian Lenny (at this moment still due to be released)
Most of the documentation available for such a setup i found on the net are based on Xen but i prefer to use Vserver for the “virtualisation” because of its configurability, shared memory and cpu resources and basically the raw speed.
DRBD8 and Heartbeat should take care of the availability magic in case a machine shuts down unexpectedly.
The partitioning looks as follows
c0d0p1 Boot Primary Linux ext3 10001.95
c0d0p5 Logical Linux swap / Solaris 1003.49
c0d0p6 Logical Linux 280325.77
For this setup we go for 1 single DRBD partition, node1 is primary and node2 secondary.
Trying to not to confuse ourselves we follow the naming scheme below.
(for an almost setup (not tested) with 2 drbd disks, 1 primary on each node ha-hosting-setup-vserver-double-drbd)
<note>
machine1 will use the following names
-
hostname = node1
-
IP number = 192.168.1.100
-
is primary for r0 on disk c0d0p6
-
physical volume on r0 is /dev/drbd0
-
volume group on /dev/drbd0 is called drbdvg0
</note>
<note>
machine2 will use the following names
-
hostname = node2
-
IP number = 192.168.1.200
-
is secondary for r0 on disk c0d0p6
-
physical volume on r0 is /dev/drbd0
-
volume group on /dev/drbd0 is called drbdvg0
</note>
Get secure
Loadbalance-Failover the network cards
http://manpages.songshu.org/manpages/lenny/en/man8/ethtool.8.html
http://manpages.songshu.org/manpages/lenny/en/man8/mii-tool.8.html
Another great article by Carla Schroder with more details can be found here http://www.enterprisenetworkingplanet.com/nethub/article.php/3696561
We need both mii-tool and ethtool.
apt-get install ethtool ifenslave-2.6
nano /etc/modprobe.d/arch/i386
To load the modules with the correct options at boot time.
alias bond0 bonding options bond0 mode=balance-alb miimon=100
And set the interfaces eth0 and eth1 as slaves to bond0, also eth2 is set here for the crossover cable for the DRBD connection to the fail over machine.
nano /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto bond0
iface bond0 inet static
address 123.123.123.100
netmask 255.255.255.0
network 123.123.123.0
broadcast 123.123.123.255
gateway 123.123.123.1
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 123.123.123.45
dns-search example.com
up /sbin/ifenslave bond0 eth0 eth1
down ifenslave -d bond0 eth0 eth1
auto eth2
iface eth2 inet static
address 192.168.1.100
netmask 255.255.255.0
<note>
This way the system needs to be rebooted before the changes take effect, otherwise you should load the drivers and ifdown eth0 and eth1 first before ifup bond0 but i'm planning to install a new kernel anyway in the next step.
</note>
Install the Vserver packages
http://manpages.songshu.org/manpages/lenny/en/man2/vserver.2.html
apt-get install linux-image-2.6-vserver-686-bigmem util-vserver vserver-debiantools
As usual a reboot is needed to boot this kernel.
<note>
With Etch i found that the Vserver kernel often ended up as second in the grub list, not so in Lenny but to be safe check the kernel stanza in /boot/grub/menu.lst especially when doing this from a remote location.
</note>
Install DRBD8, LVM2, Heartbeat and quota
http://manpages.songshu.org/manpages/lenny/en/man8/drbd.8.html
apt-get install drbd8-modules-2.6-vserver-686-bigmem drbd8-module-source lvm2 heartbeat-2 quotatool quota
<note>
not sure about this, but DRBD always needed to be compiled against the running kernel, is this still the case with the kernel specific modules? I did not check.
</note>
Build DRBD8
m-a a-i drbd8
depmod -ae
modprobe drbd
Configure DRBD8
mv /etc/drbd.conf /etc/drbd.conf.original
nano /etc/drbd.conf
global {
usage-count no;
}
common {
syncer { rate 100M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt f";
local-io-error "echo o > /proc/sysrq-trigger ; halt f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
}
on node1 {
device /dev/drbd0;
disk /dev/cciss/c0d0p6;
address 192.168.1.100:7788;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/cciss/c0d0p6;
address 192.168.1.200:7788;
meta-disk internal;
}
}
chgrp haclient /sbin/drbdsetup chmod o-x /sbin/drbdsetup chmod u+s /sbin/drbdsetup chgrp haclient /sbin/drbdmeta chmod o-x /sbin/drbdmeta chmod u+s /sbin/drbdmeta
On both nodes
node1
drbdadm create-md r0
node2
drbdadm create-md r0
node1
drbdadm up r0
node2
drbdadm up r0
<note warning>
The following should be done on the node that will be the primary
</note>
On node1
drbdadm -- --overwrite-data-of-peer primary r0
watch cat /proc/drbd should show you something like this
version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
ns:62059328 nr:0 dw:3298052 dr:58770141 al:2102 bm:3641 lo:1 pe:261 ua:251 ap:0
[===>................] sync'ed: 22.1% (208411/267331)M
finish: 4:04:44 speed: 14,472 (12,756) K/sec
resync: used:1/61 hits:4064317 misses:5172 starving:0 dirty:0 changed:5172
act_log: used:0/257 hits:822411 misses:46655 starving:110 dirty:44552 changed:2102
Configure LVM2
http://manpages.songshu.org/manpages/lenny/en/man8/lvm.8.html
<note important>
LVM will normally scan all available devices under /dev, but since /dev/cciss/c0d0p6 and /dev/drbd0 are basically the same this will lead to errors where LVM reads and writes the same data to both devices.
So to limit it to scan /dev/drbd devices only we do the following on both nodes.
</note>
cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.original
nano /etc/lvm/lvm.conf
#filter = [ "a/.*/" ]
filter = [ "a|/dev/drbd|", "r|.*|" ]
to re-scan with the new settings on both nodes
vgscan
Create the Physical Volume
The following only needs to be done on the node that is the primary!!
On node1
pvcreate /dev/drbd0
Create the Volume Group
One node1
vgcreate drbdvg0 /dev/drbd0
Create the Logical Volume
in this example about 50GB, this leaves plenty of space to expand the volumes or to add extra volumes later on.
On node1
lvcreate -L50000 -n web drbdvg0
Then we put a file system on the logical volumes
mkfs.ext3 /dev/drbdvg0/web
create the directory where we want to mount the Vservers
mkdir -p /VSERVERS/web
and mount the volume group to the mount point
mount -t ext3 /dev/drbdvg0/web /VSERVERS/web/
Get informed
Offcourse we want to be informed by heartbeat in case a node goes down, so we install postfix to send the mail.
http://manpages.songshu.org/manpages/lenny/en/man1/postfix.1.html
apt-get install postfix mailx
http://manpages.songshu.org/manpages/lenny/en/man1/mailx.1.html
and go for the defaults, “internet site” and node1.example.com”
We don't want postfix to listen to all interfaces,
nano /etc/postfix/main.cf
and change the line at the bottom to read like this
inet_interfaces = loopback-only
Heartbeat
http://manpages.songshu.org/manpages/lenny/en/man8/heartbeat.8.html
Add the other node in the hosts file of both nodes.
so for node1 do
nano /etc/hosts
and add node2
192.168.1.200 node2
nano /etc/ha.d/ha.cf
autojoin none #crm on #enables heartbeat2 cluster manager - we want that! use_logd on logfacility syslog keepalive 1 deadtime 10 warntime 10 udpport 694 auto_failback on #resources move back once node is back online mcast bond0 239.0.0.43 694 1 0 bcast eth2 node node1 #hostnames of the nodes node node2
nano /etc/ha.d/authkeys
auth 3 3 md5 failover ## this is just a string, enter what you want ! auth 3 md5 uses md5 encryption
chmod 600 /etc/ha.d/authkeys
set up some keys on both boxes, (defaults, no passphrase)
ssh-keygen
then copy over the public keys
scp /root/.ssh/id_rsa.pub 192.168.1.100:/root/.ssh/authorized_keys
scp /root/.ssh/id_rsa.pub 192.168.1.200:/root/.ssh/authorized_keys
<note>
We will be using heartbeat R1-style configuration here simply because i don't understand the R2 xml based syntax.
</note>
If both hosts files know where they can find each other we only need to do this on node1 and can update node2 like so
/usr/lib/heartbeat/ha_propagate
nano /etc/ha.d/haresources
node1 drbddisk::r1 LVM::drbdvg1 Filesystem::/dev/drbdvg1/web::/VSERVERS/web::ext3 vserver-web SendArp::123.123.123.125/bond0
The above will default the Vserver named web to node1 and specify the mount points, the vserver-web script will heartbeat start and stop it, the sendarp is for notifying the network that this IP can be found somewhere else then before. (have added the SendArp an extra time below for better resuly)
The vserver-web script is basically a demolished version of the original R2 style agent by Martin Fick from here http://www.theficks.name/bin/lib/ocf/VServer.
What i did is remove the sensible top part and replace “$OCF_RESKEY_vserver” with the specific Vserver name, also added an extra
/etc/ha.d/resource.d/SendArp 123.123.123.126/bond0 start
to the start part because i had various results when done by Heartbeat.
nano /etc/ha.d/resource.d/Vserver-web
#!/bin/sh
#
# License: GNU General Public License (GPL)
# Author: Martin Fick <mogulguy@yahoo.com>
# Date: 04/19/07
# Version: 1.1
#
# This script manages a VServer instance
#
# It can start or stop a VServer
#
# usage: $0 {start|stop|status|monitor|meta-data}
#
#
# OCF parameters are as below
# OCF_RESKEY_vserver
#
#######################################################################
# Initialization:
#
#. /usr/lib/heartbeat/ocf-shellfuncs
#
#USAGE="usage: $0 {start|stop|status|monitor|meta-data}";
#
#######################################################################
#
#
#meta_data() {
# cat <<END
#<?xml version="1.0"?>
#<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
#<resource-agent name="VServer">
# <version>1.0</version>
# <longdesc lang="en">
#This script manages a VServer instance.
#It can start or stop a VServer.
# </longdesc>
# <shortdesc lang="en">OCF Resource Agent compliant VServer script.</shortdesc>
#
# <parameters>
#
# <parameter name="vserver" unique="1" required="1">
# <longdesc lang="en">
#The vserver name is the name as found under /etc/vservers
# </longdesc>
# <shortdesc lang="en">VServer Name</shortdesc>
# <content type="string" default="" />
# </parameter>
#
# </parameters>
#
# <actions>
# <action name="start" timeout="2m" />
# <action name="stop" timeout="1m" />
# <action name="monitor" depth="0" timeout="1m" interval="5s" start-delay="2m" />
# <action name="status" depth="0" timeout="1m" interval="5s" start-delay="2m" />
# <action name="meta-data" timeout="1m" />
# </actions>
#</resource-agent>
#END
#}
vserver_reload() {
vserver_stop || return
vserver_start
}
vserver_stop() {
#
# Is the VServer already stopped?
#
vserver_status
[ $? -ne 0 ] && return 0
/usr/sbin/vserver "web" "stop"
vserver_status
[ $? -ne 0 ] && return 0
return 1
}
vserver_start() {
vserver_status
[ $? -eq 0 ] && return 0
/usr/sbin/vserver "web" "start"
vserver_status
/etc/ha.d/resource.d/SendArp 123.123.123.125/bond0 start
}
vserver_status() {
/usr/sbin/vserver "web" "status"
rc=$?
if [ $rc -eq 0 ]; then
echo "running"
return 0
elif [ $rc -eq 3 ]; then
echo "stopped"
else
echo "unknown"
fi
return 7
}
vserver_monitor() {
vserver_status
}
vserver_usage() {
echo $USAGE >&2
}
vserver_info() {
cat - <<!INFO
Abstract=VServer Instance takeover
Argument=VServer Name
Description:
A Vserver is a simulated server which is fairly hardware independent
so it can be easily setup to run on several machines.
Please rerun with the meta-data command for a list of
valid arguments and their defaults.
!INFO
}
#
# Start or Stop the given VServer...
#
if [ $# -ne 1 ] ; then
vserver_usage
exit 2
fi
case "$1" in
start|stop|status|monitor|reload|info|usage) vserver_$1 ;;
meta-data) meta_data ;;
validate-all|notify|promote|demote) exit 3 ;;
*) vserver_usage ; exit 2 ;;
esac
chmod a+x /etc/ha.d/resource.d/Vserver-web
add a modificaton to the drbddisk resource, as pointed out by Christian Balzer on the Vserver mailing list http://list.linux-vserver.org/archive?mss:835:200803:cgehldioambmojimggpf
nano /etc/ha.d/resource.d/drbddisk
stop)
# Kill off any vserver mounts that might hog this
VNSPACE=/usr/sbin/vnamespace
for CTX in `/usr/sbin/vserver-stat | tail -n +2 | awk '{print $1}'`
do
MPOINT="`$VNSPACE -e $CTX cat /proc/mounts | grep $RES | awk '{print $2}'`"
echo Unmounting mount point $MPOINT from within context $CTX
### MOUNT POINT IS COMPULSORY. DEVICE NAME DOES NOT WORK!!!
$VNSPACE -e $CTX /bin/umount $MPOINT || continue;
done
# exec, so the exit code of drbdadm propagates
exec $DRBDADM secondary $RES
Create a Vserver
Note that we already have mounted the LVM partition on /VSERVERS/web in an earlier step, we're going to place both the /var and /etc directories on the mountpoint and symlink to it, this way the complete Vserver and its config are available on the other node when mounted.
mkdir -p /VSERVERS/web/etc
mkdir -p /VSERVERS/web/barrier/var
When making the Vserver it will be in the default location /var/lib/vservers/web and its config in /etc/vservers/web
<pre>
newvserver –hostname web –domain example.com –ip 123.123.123.125/24 –dist etch –mirror http://123.123.123.81:3142/debian.apt-get.eu/debian –interface bond0
</pre>
<pre>
enter the root password
</pre>
<pre>
Create a normal user account now?
<No>
</pre>
<pre>
Choose software to install:
<Ok>
</pre>
On node1 we move the Vserver directories to the LVM volume on the DRBD disks and make symlinks from the normal locations.
On node1
mv /etc/vservers/web/* /VSERVERS/web/etc/
rmdir /etc/vservers/web/
ln -s /VSERVERS/web/etc /etc/vservers/web
mv /var/lib/vservers/web/* /VSERVERS/web/barrier/var
rmdir /var/lib/vservers/web/
ln -s /VSERVERS/web/barrier/var /var/lib/vservers/web
We need to set the same symlinks on node2, but the we need the Vserver directories available there first.
The mounting should be handled by heartbeat by now so we make our resources move to the other machine.
On node1
/etc/init.d/heartbeat stop
On node2
ln -s /VSERVERS/web/etc /etc/vservers/web
ln -s /VSERVERS/web/barrier/var /var/lib/vservers/web
On node1
/etc/init.d/heartbeat start
Vserver web start
and enjoy!