debian,ubuntu,linux,howtos,manuals,notes,manpages

Getting High with Lenny

February 15, 2009
By

Getting High with Lenny

The aim here is to set up some high available services on Debian Lenny (at this moment still due to be released)
Most of the documentation available for such a setup i found on the net are based on Xen but i prefer to use Vserver for the “virtualisation” because of its configurability, shared memory and cpu resources and basically the raw speed.
DRBD8 and Heartbeat should take care of the availability magic in case a machine shuts down unexpectedly.

The partitioning looks as follows

      c0d0p1             Boot              Primary         Linux ext3                                        10001.95
      c0d0p5                               Logical         Linux swap / Solaris                               1003.49
      c0d0p6                               Logical         Linux                                            280325.77

For this setup we go for 1 single DRBD partition, node1 is primary and node2 secondary.
Trying to not to confuse ourselves we follow the naming scheme below.

(for an almost setup (not tested) with 2 drbd disks, 1 primary on each node ha-hosting-setup-vserver-double-drbd)
<note>
machine1 will use the following names

  • hostname = node1
  • IP number = 192.168.1.100
  • is primary for r0 on disk c0d0p6
  • physical volume on r0 is /dev/drbd0
  • volume group on /dev/drbd0 is called drbdvg0

</note>

<note>
machine2 will use the following names

  • hostname = node2
  • IP number = 192.168.1.200
  • is secondary for r0 on disk c0d0p6
  • physical volume on r0 is /dev/drbd0
  • volume group on /dev/drbd0 is called drbdvg0

</note>

Get secure

Loadbalance-Failover the network cards

http://manpages.songshu.org/manpages/lenny/en/man8/ethtool.8.html
http://manpages.songshu.org/manpages/lenny/en/man8/mii-tool.8.html

Another great article by Carla Schroder with more details can be found here http://www.enterprisenetworkingplanet.com/nethub/article.php/3696561

We need both mii-tool and ethtool.

apt-get install ethtool ifenslave-2.6
nano /etc/modprobe.d/arch/i386

To load the modules with the correct options at boot time.

alias bond0 bonding
options bond0 mode=balance-alb miimon=100 

And set the interfaces eth0 and eth1 as slaves to bond0, also eth2 is set here for the crossover cable for the DRBD connection to the fail over machine.

nano /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto bond0
iface bond0 inet static
        address 123.123.123.100
        netmask 255.255.255.0
        network 123.123.123.0
        broadcast 123.123.123.255
        gateway 123.123.123.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 123.123.123.45
        dns-search example.com
        up /sbin/ifenslave bond0 eth0 eth1
        down ifenslave -d bond0 eth0 eth1

auto eth2
iface eth2 inet static
        address 192.168.1.100
        netmask 255.255.255.0

<note>
This way the system needs to be rebooted before the changes take effect, otherwise you should load the drivers and ifdown eth0 and eth1 first before ifup bond0 but i'm planning to install a new kernel anyway in the next step.
</note>

Install the Vserver packages

http://manpages.songshu.org/manpages/lenny/en/man2/vserver.2.html

apt-get install linux-image-2.6-vserver-686-bigmem util-vserver vserver-debiantools

As usual a reboot is needed to boot this kernel.

<note>
With Etch i found that the Vserver kernel often ended up as second in the grub list, not so in Lenny but to be safe check the kernel stanza in /boot/grub/menu.lst especially when doing this from a remote location.
</note>

Install DRBD8, LVM2, Heartbeat and quota

http://manpages.songshu.org/manpages/lenny/en/man8/drbd.8.html

apt-get install drbd8-modules-2.6-vserver-686-bigmem drbd8-module-source lvm2 heartbeat-2 quotatool quota

<note>
not sure about this, but DRBD always needed to be compiled against the running kernel, is this still the case with the kernel specific modules? I did not check.
</note>

Build DRBD8

m-a a-i drbd8
depmod -ae
modprobe drbd

Configure DRBD8

mv /etc/drbd.conf /etc/drbd.conf.original
nano /etc/drbd.conf
global {
        usage-count no;
}

common {
  syncer { rate 100M; }
}

resource r0 {
  protocol C;
  handlers {
    pri-on-incon-degr "echo o &gt; /proc/sysrq-trigger ; halt  f";
    pri-lost-after-sb "echo o &gt; /proc/sysrq-trigger ; halt f";
    local-io-error "echo o &gt; /proc/sysrq-trigger ; halt f";
    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
  }

  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
  }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 100M;
    al-extents 257;
  }

        on node1 {
                device     /dev/drbd0;
                disk       /dev/cciss/c0d0p6;
                address    192.168.1.100:7788;
                meta-disk  internal;
        }

        on node2 {
                device     /dev/drbd0;
                disk       /dev/cciss/c0d0p6;
                address    192.168.1.200:7788;
                meta-disk  internal;
        }
}
chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta

On both nodes

node1

drbdadm create-md r0

node2

drbdadm create-md r0

node1

drbdadm up r0

node2

drbdadm up r0

<note warning>
The following should be done on the node that will be the primary
</note>
On node1

drbdadm -- --overwrite-data-of-peer primary r0

watch cat /proc/drbd should show you something like this

version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe build by phil@fat-tyre, 2008-08-04 15:28:07
 0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
    ns:62059328 nr:0 dw:3298052 dr:58770141 al:2102 bm:3641 lo:1 pe:261 ua:251 ap:0
	[===&gt;................] sync'ed: 22.1% (208411/267331)M
	finish: 4:04:44 speed: 14,472 (12,756) K/sec
	resync: used:1/61 hits:4064317 misses:5172 starving:0 dirty:0 changed:5172
	act_log: used:0/257 hits:822411 misses:46655 starving:110 dirty:44552 changed:2102

Configure LVM2

http://manpages.songshu.org/manpages/lenny/en/man8/lvm.8.html

<note important>
LVM will normally scan all available devices under /dev, but since /dev/cciss/c0d0p6 and /dev/drbd0 are basically the same this will lead to errors where LVM reads and writes the same data to both devices.
So to limit it to scan /dev/drbd devices only we do the following on both nodes.

</note>

cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.original
nano /etc/lvm/lvm.conf
    #filter = [ "a/.*/" ]
    filter = [ "a|/dev/drbd|", "r|.*|" ]

to re-scan with the new settings on both nodes

vgscan

Create the Physical Volume

The following only needs to be done on the node that is the primary!!

On node1

pvcreate /dev/drbd0

Create the Volume Group

One node1

vgcreate drbdvg0 /dev/drbd0

Create the Logical Volume

in this example about 50GB, this leaves plenty of space to expand the volumes or to add extra volumes later on.
On node1

lvcreate -L50000 -n web drbdvg0

Then we put a file system on the logical volumes

mkfs.ext3 /dev/drbdvg0/web

create the directory where we want to mount the Vservers

mkdir -p /VSERVERS/web

and mount the volume group to the mount point

mount -t ext3 /dev/drbdvg0/web /VSERVERS/web/

Get informed

Offcourse we want to be informed by heartbeat in case a node goes down, so we install postfix to send the mail.

http://manpages.songshu.org/manpages/lenny/en/man1/postfix.1.html

apt-get install postfix mailx

http://manpages.songshu.org/manpages/lenny/en/man1/mailx.1.html

and go for the defaults, “internet site” and node1.example.com”

We don't want postfix to listen to all interfaces,

nano /etc/postfix/main.cf

and change the line at the bottom to read like this

inet_interfaces = loopback-only

Heartbeat

http://manpages.songshu.org/manpages/lenny/en/man8/heartbeat.8.html

Add the other node in the hosts file of both nodes.
so for node1 do

nano /etc/hosts

and add node2

192.168.1.200   node2
nano /etc/ha.d/ha.cf
autojoin none
#crm             on      #enables heartbeat2 cluster manager - we want that!
use_logd        on
logfacility     syslog
keepalive       1
deadtime        10
warntime        10
udpport         694
auto_failback   on      #resources move back once node is back online
mcast bond0 239.0.0.43 694 1 0
bcast eth2
node node1       #hostnames of the nodes
node node2
nano /etc/ha.d/authkeys
auth 3
3 md5 failover  ## this is just a string, enter what you want ! auth 3 md5 uses md5 encryption
chmod 600 /etc/ha.d/authkeys

set up some keys on both boxes, (defaults, no passphrase)

ssh-keygen

then copy over the public keys

scp /root/.ssh/id_rsa.pub 192.168.1.100:/root/.ssh/authorized_keys
scp /root/.ssh/id_rsa.pub 192.168.1.200:/root/.ssh/authorized_keys

<note>
We will be using heartbeat R1-style configuration here simply because i don't understand the R2 xml based syntax.
</note>
If both hosts files know where they can find each other we only need to do this on node1 and can update node2 like so

/usr/lib/heartbeat/ha_propagate
nano /etc/ha.d/haresources
node1 drbddisk::r1 LVM::drbdvg1 Filesystem::/dev/drbdvg1/web::/VSERVERS/web::ext3 vserver-web SendArp::123.123.123.125/bond0

The above will default the Vserver named web to node1 and specify the mount points, the vserver-web script will heartbeat start and stop it, the sendarp is for notifying the network that this IP can be found somewhere else then before. (have added the SendArp an extra time below for better resuly)
The vserver-web script is basically a demolished version of the original R2 style agent by Martin Fick from here http://www.theficks.name/bin/lib/ocf/VServer.

What i did is remove the sensible top part and replace “$OCF_RESKEY_vserver” with the specific Vserver name, also added an extra
/etc/ha.d/resource.d/SendArp 123.123.123.126/bond0 start
to the start part because i had various results when done by Heartbeat.

nano /etc/ha.d/resource.d/Vserver-web
#!/bin/sh
#
# License: GNU General Public License (GPL)
# Author:  Martin Fick &lt;mogulguy@yahoo.com&gt;
# Date:    04/19/07
# Version: 1.1
#
#	This script manages a VServer instance
#
#	It can start or stop a VServer
#
#	usage: $0 {start|stop|status|monitor|meta-data}
#
#
#       OCF parameters are as below
#       OCF_RESKEY_vserver
#
#######################################################################
# Initialization:
#
#. /usr/lib/heartbeat/ocf-shellfuncs
#
#USAGE="usage: $0 {start|stop|status|monitor|meta-data}";
#
#######################################################################
#
#
#meta_data() {
#        cat &lt;&lt;END
#&lt;?xml version="1.0"?&gt;
#&lt;!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"&gt;
#&lt;resource-agent name="VServer"&gt;
# &lt;version&gt;1.0&lt;/version&gt;
# &lt;longdesc lang="en"&gt;
#This script manages a VServer instance.
#It can start or stop a VServer.
# &lt;/longdesc&gt;
# &lt;shortdesc lang="en"&gt;OCF Resource Agent compliant VServer script.&lt;/shortdesc&gt;
#
# &lt;parameters&gt;
#
#  &lt;parameter name="vserver" unique="1" required="1"&gt;
#   &lt;longdesc lang="en"&gt;
#The vserver name is the name as found under /etc/vservers
#   &lt;/longdesc&gt;
#   &lt;shortdesc lang="en"&gt;VServer Name&lt;/shortdesc&gt;
#    &lt;content type="string" default="" /&gt;
#   &lt;/parameter&gt;
#
# &lt;/parameters&gt;
#
# &lt;actions&gt;
#  &lt;action name="start"   timeout="2m" /&gt;
#  &lt;action name="stop"    timeout="1m" /&gt;
#  &lt;action name="monitor" depth="0"  timeout="1m" interval="5s" start-delay="2m" /&gt;
#  &lt;action name="status" depth="0"  timeout="1m" interval="5s" start-delay="2m" /&gt;
#  &lt;action name="meta-data"  timeout="1m" /&gt;
# &lt;/actions&gt;
#&lt;/resource-agent&gt;
#END
#}

vserver_reload() {
    vserver_stop || return
    vserver_start
}

vserver_stop() {
  #
  #	Is the VServer already stopped?
  #
    vserver_status
    [ $? -ne 0 ] &amp;&amp; return 0

    /usr/sbin/vserver "web" "stop"

    vserver_status
    [ $? -ne 0 ] &amp;&amp; return 0

    return 1
}

vserver_start() {
    vserver_status
    [ $? -eq 0 ] &amp;&amp; return 0

    /usr/sbin/vserver "web" "start"
    vserver_status
    /etc/ha.d/resource.d/SendArp 123.123.123.125/bond0 start
}

vserver_status() {
    /usr/sbin/vserver "web" "status"
    rc=$?
    if [ $rc -eq 0 ]; then
	echo "running"
        return 0
    elif [ $rc -eq 3 ]; then
	echo "stopped"
    else
	echo "unknown"
    fi
    return 7
}

vserver_monitor() {
  vserver_status
}

vserver_usage() {

  echo $USAGE &gt;&amp;2
}

vserver_info() {
cat - &lt;&lt;!INFO
	Abstract=VServer Instance takeover
	Argument=VServer Name
	Description:
	A Vserver is a simulated server which is fairly hardware independent
        so it can be easily setup to run on several machines.
	Please rerun with the meta-data command for a list of
	valid arguments and their defaults.
!INFO
}

#
#	Start or Stop the given VServer...
#

if [ $# -ne 1 ] ; then
  vserver_usage
  exit 2
fi

  case "$1" in
    start|stop|status|monitor|reload|info|usage)    vserver_$1 ;;
    meta-data)   meta_data ;;
    validate-all|notify|promote|demote)  exit 3 ;;

    *)  vserver_usage ; exit 2 ;;
  esac
chmod a+x /etc/ha.d/resource.d/Vserver-web

add a modificaton to the drbddisk resource, as pointed out by Christian Balzer on the Vserver mailing list http://list.linux-vserver.org/archive?mss:835:200803:cgehldioambmojimggpf

nano /etc/ha.d/resource.d/drbddisk
stop)
        # Kill off any vserver mounts that might hog this
        VNSPACE=/usr/sbin/vnamespace

        for CTX in `/usr/sbin/vserver-stat | tail -n +2 | awk '{print $1}'`
        do
            MPOINT="`$VNSPACE -e $CTX cat /proc/mounts | grep $RES | awk '{print $2}'`"
            echo Unmounting mount point $MPOINT from within context $CTX
            ### MOUNT POINT IS COMPULSORY. DEVICE NAME DOES NOT WORK!!!
            $VNSPACE -e $CTX /bin/umount $MPOINT || continue;
        done
        # exec, so the exit code of drbdadm propagates
        exec $DRBDADM secondary $RES
Create a Vserver

Note that we already have mounted the LVM partition on /VSERVERS/web in an earlier step, we're going to place both the /var and /etc directories on the mountpoint and symlink to it, this way the complete Vserver and its config are available on the other node when mounted.

mkdir -p /VSERVERS/web/etc
mkdir -p /VSERVERS/web/barrier/var

When making the Vserver it will be in the default location /var/lib/vservers/web and its config in /etc/vservers/web

<pre>
newvserver –hostname web –domain example.com –ip 123.123.123.125/24 –dist etch –mirror http://123.123.123.81:3142/debian.apt-get.eu/debian –interface bond0
</pre>

<pre>
enter the root password
</pre>

<pre>
Create a normal user account now?
<No>
</pre>

<pre>
Choose software to install:
<Ok>
</pre>

On node1 we move the Vserver directories to the LVM volume on the DRBD disks and make symlinks from the normal locations.

On node1

mv /etc/vservers/web/* /VSERVERS/web/etc/
rmdir /etc/vservers/web/
ln -s /VSERVERS/web/etc /etc/vservers/web
mv /var/lib/vservers/web/* /VSERVERS/web/barrier/var
rmdir /var/lib/vservers/web/
ln -s /VSERVERS/web/barrier/var /var/lib/vservers/web

We need to set the same symlinks on node2, but the we need the Vserver directories available there first.
The mounting should be handled by heartbeat by now so we make our resources move to the other machine.

On node1

/etc/init.d/heartbeat stop

On node2

ln -s /VSERVERS/web/etc /etc/vservers/web
ln -s /VSERVERS/web/barrier/var /var/lib/vservers/web

On node1

/etc/init.d/heartbeat start
Vserver web start

and enjoy!

Tags: , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*