Sunday, December 14, 2014

Ceph Pi - Installing Ceph on Raspberry Pi

Note: I will talk about Raspbian here.  It is a Debian port for the Raspberry.  I am sure most of the instructions here will work fine on Debian as well or Ubuntu.  The guide was built for Wheezy and Ceph 0.80.

Here is why installing Ceph on Raspbian is tricky and there isn't much shared on the internet.
  • Raspbian-wheezy has an incomplete (and very old - 0.43 vs modern - 0.80 ) set of packages, so you have to switch to testing to get everything you need
  • Ceph uses a program called ceph-deploy.  It tries to automatically install some of the packages you need.  As it runs it does an action it calls 'adjust repositories' which adds Ceph's repository to your list of sources.
  • Ceph's repository does not have all the packages compiled for the ARM chip of the Raspberry
  • You have to edit their source, and take out that step, so that wheezy-testing's packages are used
  • Finally (optionally) if you are planning to run ceph-client on one of the Raspberry Pis, you have to compile a kernel module for RBD.
  • We will also review the actual set up of the Ceph cluster and all the commands to get things going.
Its a daunting list, but I will walk you through it quickly and even though it will take a ridiculous length of time (unless you setup distcc - watch for a follow up article on that) it should be pretty smooth.

I will list the two most common errors encountered to hopefully let people find this article.

 [WARNIN] too many arguments: [--cluster,ceph]  

This one is caused by the ancient packages for Ceph in Raspbian-wheezy

 [WARNIN] E: Unable to locate package ceph-mds  
 [WARNIN] E: Unable to locate package ceph-fs-common  

This two are caused by ceph-deploy trying to adjust repositories to the Ceph project repository which does not contain the packages for the ARM architecture that the raspberry needs.

Onwards and upwards.

First lets go ahead and edit /etc/apt/sources.list

 sudo vi /etc/apt/sources.list  

Go ahead and replace the word 'wheezy' with 'testing' from this:
 deb wheezy main contrib non-free rpi  

To this:
 deb testing main contrib non-free rpi  

Then you have to update the whole system. Joy.

 sudo apt-get update  

 sudo apt-get upgrade  
 sudo apt-get dist-upgrade  

This will take a while but you will be prompted to say Yes a bunch of times as important config files get changed.  Since we have not really moved from the defaults, we are good to say Yes pretty much to everything.  Watch out for the increased security settings in ssh you 'root' lovers because it may just leave you locked out...  Anyway.  Watch a show.  Brooklyn Nine-Nine is pretty good.

Done?  Good.

A couple of more packages you will need are NTP and SSH if you somehow missed them

 sudo apt-get install openssh-server ntp  

Installing Ceph

Now we will actually start installing Ceph.  We are going to be loosely following for the most part, but we are not out of the woods yet, so follow this guide

  1. Create a ceph user on each node.  For simplicity sake I will call him 'ceph'.

     sudo useradd -d /home/ceph -m ceph  
     sudo passwd ceph  

    While you are at it, make it a 'sudoer'
     echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph  
     sudo chmod 0440 /etc/sudoers.d/ceph  

  2. Setup SSH keys.  ceph-deploy uses SSH to log into systems and does not prompt for passwords.  So we need to setup password-less SSH

    First we generate the keys.  Make sure you are logged in as the 'ceph' user.  Do not use root or sudo.  Leave the passphrase empty.  Technically you are required to do this only the admin node, and it probably a good idea to do it only the admin node for a production environment (for security purposes).  I did it on all my nodes since it makes extended setup a lot easier
     ssh-copy-id ceph@pi1  
     ssh-copy-id ceph@pi2  
     ssh-copy-id ceph@pi3  

  3. (Optional) Modify ~/.ssh/config file so that ceph-deploy does not have to use the '--username ceph' option each time you run a command.   Put the following text inside the file.

     Host pi1  
       Hostname pi1  
       User ceph  
     Host pi2  
       Hostname pi2  
       User ceph  
     Host pi3  
       Hostname pi3  
       User ceph  

  4. Add the ceph repository to apt sources.  First we do the release key, then we add the packages to the repo and then....  we update the repository and install ceph-deploy.Note: replace 'firefly' in the second command with the name of the current stable release of ceph.

     wget -q -O- ';a=blob_plain;f=keys/release.asc' | sudo apt-key add -  
     echo deb $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list  

  5. Important Step: We deviate from the script here a bit.  Ceph repo does not really have support for wheezy-testing (jessie).  We have to edit the ceph repo file and adjust it.

     sudo vi /etc/apt/sources.list.d/ceph.list  

    change the word 'testing' to 'wheezy' so that it looks like this

     deb wheezy main  

  6. Now it is time to finally install ceph-deploy!  I bet you were loosing hope!

     sudo apt-get update   
     sudo apt-get install ceph-deploy  

  7. We can actually create the cluster.  We have only one Mon node (pi1) at this time, but technically we can list multiple ones in the command.  Make sure you are in the home directory of the user 'ceph' we create earlier.  Notice there is also no 'sudo'

     ceph-deploy new pi1  

  8. By default ceph requires that there are at least 3 replicas of the data in the cluster.  That means at least 3 OSDs.  Since our initial build has only 2, we will have to change it.
     vi ./ceph.conf  
    Add the line
     osd_pool_default_size = 2  

  9. This is the most tricky step of the install.  According the guide we should be able to just run

     ceph-deploy install pi1 pi2 pi3  

    That, however causes us to get an error

     [pi1][WARNIN] W: Failed to fetch 404 Not Found [IP: 2607:f298:4:147::b05:fe2a 80]  
     [pi1][WARNIN] E: Some index files failed to download. They have been ignored, or old ones used instead.  
     [pi1][ERROR ] RuntimeError: command returned non-zero exit status: 100  
     [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-get -q update  

    The reason for this is that ceph-deploy tries to adjust repos on the fly.  It tries to make the ceph-repository authoritative for all things Ceph, even when it does not have the packages.  In this case it is because of the architecture.  Notice - binary-arfh.  I am working on getting a patch submitted that will get around this by giving the user the option not to adjust the repos, but for the time being, we actually have to change the code.  No worries - its one line!

     sudo vi /usr/share/pyshared/ceph_deploy/hosts/debian/  

    Near the very top of the file you will see some code that looks like this

     def install(distro, version_kind, version, adjust_repos):  
       codename = distro.codename  
       machine = distro.machine_type  

    You will need to add 1 line so it looks like this:
     def install(distro, version_kind, version, adjust_repos):  
       adjust_repos = False
       codename = distro.codename  
       machine = distro.machine_type  

    Also go ahead and edit the ceph source list
     sudo vi /etc/apt/sources.list.d/ceph.list  

    and comment out the line in it so it looks something like this
     #deb sid main  

    And now we are ready!!!!!
     ceph-deploy install pi1 pi2 pi3  

  10. Pfew.  Ok.  The hard part is over.  Give yourself a high five.
  11. Now we install the initial Mon(itor) nodes and make sure we get all the keys (remember - ceph-deploy uses ssh password-less mode to issue commands to all nodes).  Execute this on pi1 in the /home/ceph directory.

     ceph-deploy mon create-initial  

  12. Now its time to add the two OSDs.  As per the plan above - we will add one on pi2 and one pi3.  If you are following my exact build, you have a SATA drive attached via USB to your Pis.  In this step we are going to format them and mount them.  If you are going to just create a couple of small OSDs on your SD cards, you can skip this section except for the very next command and proceed.

    First we make a directory to which we will mount them.  We should execute this on all three nodes
     sudo mkdir /mnt/sda1/  

    This is a good time to talk again a little bit about Minimum System Requirements.  And how the Pi really does not meet them.  It comes down to both RAM and CPU, but mostly RAM...  My understanding is that for each Gig of managed storage, a Ceph OSD node needs about Meg or RAM.  The math is easy.  500Gigs of storage = 500Megs of RAM....  With my 1TB drives I am bound to swap.  I thought a bit about creating right-sized partitions, but the 'good news' is that the RAM is used on 'as-you-go' basis.  The space is not pre-allocated.  (Here I am pretty sure someone will come out and tell me about the awesome pre-allocation option, or raw HD options...  and I will be really happy, because that would be awesome.  So don't be shy.)  Long story short - since I intend to test, I want to be able to stretch the system beyond its breaking point in every way I can and identify the bottlenecks.  This is not intended to be a production build (yet?).

    be as it may be...

    Now we partition and FS format the disks.  Again - we do this on every machine.  The instructions I will provide will format the whole disk as a single partition.  If you have a different setup, or you don't want to blow away the whole HD, please use caution.

     sudo fdisk /dev/sda  
     <press 'p' and press 'Enter' to see if there are any partitions on this disk>  
     <press 'd' and press 'Enter' followed by the number of the partition you want to delete and 'Enter'>  
     <repeat this for all partitions>  
     <press 'n' and press 'Enter' to create a new partition>  
     <press 'p' to choose Primary and press 'Enter'>  
     <press '1' to choose Partition 1 and press 'Enter'>  
     <press 'Enter' 2 more times to confirm the default starting and ending sectors>  
     <press 'p' and 'Enter' to ensure that what you thought happened actually happened>  
     <press 'w' and Enter to write the new partition table to the disk and exit>  

    I personally favor XFS, so I will setup my drives with it.  Reading Ceph documentation, people talk about BtrFS as being a good, though possibly immature alternative.  And then there is Ext4...  Well, some day we can do a bake off.  But first we must finish this...

    If you don't have xfsprog installed, you have to run
     sudo apt-get install xfsprogs  

    Then we format the HD with
     sudo mkfs.xfs -f /dev/sda1  

    Time to mount the system.   Since we want to survive reboot we have to edit /etc/fstab
     sudo vi /etc/fstab  

    Add a line for the new drive
     /dev/sda1    /mnt/sda1    xfs   defaults,noatime,nobarrier 0    1  

    Then we have to re-read the /etc/fstab
     sudo mount -a  

    To verify everything is looking good 'df -h' and make sure your output looks good
     ceph@pi1 ~ $ df -h  
     Filesystem   Size Used Avail Use% Mounted on  
     /dev/root    14G 4.5G 8.2G 36% /  
     devtmpfs    231M   0 231M  0% /dev  
     tmpfs      235M   0 235M  0% /dev/shm  
     tmpfs      235M  27M 209M 12% /run  
     tmpfs      5.0M 4.0K 5.0M  1% /run/lock  
     tmpfs      235M   0 235M  0% /sys/fs/cgroup  
     /dev/mmcblk0p5  60M  29M  31M 48% /boot  
     tmpfs      47M   0  47M  0% /run/user/1001  
     /dev/sda1    932G  33M 932G  1% /mnt/sda1  
     ceph@pi1 ~ $   

  13. Back to creating the OSDs.  ceph-deploy OSDs to the directory we just created on pi2 and pi3.  Execute this from /home/ceph on pi1
     ceph-deploy osd prepare pi2:/mnt/sda1  
     ceph-deploy osd prepare pi3:/mnt/sda1  

  14. Activate the OSDs
  15.  ceph-deploy osd activate pi2:/mnt/sda1  
     ceph-deploy osd activate pi3:/mnt/sda1  

  16. Oh SNAP!  The cluster is live!  Ceph starts balancing the data to new OSDs.  Since these are the first 2 OSDs, and we don't have much data, nothing will be happening.  We will observe this later!!!  For now - the command
     sudo ceph -w  

    should give us cluster status. I am not sure why the 'sudo' is needed. Probably messed up something... Here is what I get
     ceph@pi1 ~ $ sudo ceph -w  
       cluster d8e62abb-488c-4cfe-abb4-65d2151cf42f  
        health HEALTH_WARN 1 pgs stuck unclean  
        monmap e1: 1 mons at {pi1=}, election epoch 2, quorum 0 pi1  
        osdmap e41: 4 osds: 3 up, 3 in  
        pgmap v422: 192 pgs, 3 pools, 0 bytes data, 0 objects  
           15463 MB used, 2778 GB / 2793 GB avail  
               1 active  
              191 active+clean  
     2014-12-15 03:19:54.879892 mon.0 [INF] pgmap v422: 192 pgs: 1 active, 191 active+clean; 0 bytes data, 15463 MB used, 2778 GB / 2793 GB avail  
     2014-12-15 03:22:07.859035 mon.0 [INF] pgmap v423: 192 pgs: 1 active, 191 active+clean; 0 bytes data, 15462 MB used, 2778 GB / 2793 GB avail  

  17. To make administration easier and avoid having to specify the monitor address every time run
     ceph-deploy admin pi1 pi2 pi3  

  18. Make sure you have the right permissions to the keyring
     sudo chmod +r /etc/ceph/ceph.client.admin.keyring  

  19. Finally lets do one last check to make sure everything is good with the world
     ceph health  
  20. CONGRATS on getting here...  if you had trouble - see me in the comments.  

    The next section is going to be about actually getting the new cluster mounted as a block device on Pi1.  Then we will create an OSD on Pi1 and add it to the cluster.  Then... well, I don't want to give you too many spoilers.  Starting with the next installment, I hope, there will be a ton of testing data as well.  Bandwidth, CPU usage, memory...  All in painful high-frequency polled detail (thanks to the efforts of Brandon Hale)!  

    Thanks for sticking through this!