Wednesday, December 31, 2014

Ceph Pi - performance of scp, ciphers, and the Raspberry Pi

This article is not really much about Ceph, but it does explore technologies very tangential to it.  So I decided to add it to the series.

I am sure this has been discussed many times before.  What is the best cipher for SCP to use so you get maximum throughput.  So I decided to do a quick survey of the current state of the art and share it.

SSH, in some versions, including the version distributed in Debian Jessie has a cool new option which lists all the ciphers that are supported.

 ssh -Q cipher  

In case yours does not, you can do

 man ssh_conifg  

So I put together a little bash script to run on my Debian server ( nothing a special - a single CPU i5 with an SSD ) to copy the same 500MB file ( Debian ISO ) to a target computer via every supported cipher.  First I tried this against a Raspberry Pi, and then my Mac Book Pro connected to the same gig switch.  Here is the script for your amusement and pleasure.
 for i in `ssh -Q cipher`  
     echo "scp -o Cipher=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1"  
     `time scp -o Ciphers=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1 2>&1`  
     echo ""  
     echo ""  

If your particular system does not support the -Q option you can copy and paste the list of ciphers from the 'man ssh_config' and modify the second line of the script to read something like

 for i in aes128-ctr aes192-ctr aes256-ctr arcfour256 arcfour128 aes128-cbc 3des-cbc blowfish-cbc cast128-cbc aes192-cbc aes256-cbc arcfour  

First observation - not all ciphers that the ssh client supports are enabled by default on all ssh servers so you will get a bunch of errors.  I am not going to enable the additional ciphers, since this is more of a practical guide.

Raspberry SCP results

Here are the throughput results copying the 500 MB file from the PC to Pi and vice versa.  
SCP Throughput with various algorithms using the SD card as storage on the Pi
I would think that the difference here comes down to one of two things.  Pipelining or the fact that computation on ciphers is harder on the encryption side than the decryption side.  I am honestly not too sure and would love some input as to this near 30% difference.

Using the USB attached 7200RPM HDD on the Pi as a storage device

SCP Throughput with various algorithms using USB 7200RPM HDD as storage

I really expected the HDD to be faster than the SD card, but it was not.  It was consistently a little bit slower.  Nothing very significant, but certainly a fact.  Looking for an explanation, I looked at the CPU utilization on the system.  It should be noted that the Pi is pretty much pegged at 100% during this operation.  This is split between System and User.  System takes care of the Disk and Network IO work and User is what the encryption algorithms use up.  Writing to the USB mounted HDD takes some extra CPU processing, and the media IO is not a bottleneck at the speeds we are achieving.  So CPU is still our limiting factor.
CPU System usage SD vs HDD
Notice that there is is a cool correlation - the higher the throughput of the algorithm, the higher the System CPU utilization since it has to manage more IOPS  both to the network and disk.  What is a bit surprising is how much CPU storage and network seem to require.  This would certainly be a place for future PIs to look for improvement.  The numbers seem to imply that even with no encryption what so ever, the Raspberry Pi would be limited at about 60 - 65 Mbs.

So what about User CPU per algorithm?

I did get some really interesting results here, but not from the Pi but from the PC.  The Pi looked pretty much as expected with less CPU resources being used on algorithms that achieved higher throughput.  
CPU Used by Cipher on the Raspberry Pi

This is not exactly the case on the PC.  Here our best performing algorithm - is using way more CPU than its brethren.

The data makes almost no sense.  The worst performing algorithms in terms of throughput seem to use the least CPU.  So what gives?  How is it possible for chacha to be using the most CPU yet have the highest throughput?

Acceleration, my dear Watson, acceleration.  The Intel CPU on my PC supports the AES-NI instruction set which makes it so that AES based ciphers, especially the GCM family.  According to Wikipedia, the acceleration provides an 'increase in throughput from approximately 28.0 cycles per byte to 3.5 cycles per byte'.

This acceleration is not available on the Raspberry Pi CPU, which is actually the bottleneck in this transfer.  Since the Intel is not hitting anywhere near a hundred percent, the fact that chacha is using 7.77% vs aes128-gcm at 1.55 makes no difference to our session performance.


The clear winner here when considering single session performance between a Pi and and a non-heavily utilized modern x86 is  If, however, the PC is doing a lot of other work, supporting many sessions and generally has its CPU peaking frequently, something like aes128-ctr may be a better choice which gives us a good balance.

PC and SCP - results (beta)

So now that we looked at what the goodness and limitations of the Raspberry Pi, I figured we should take a step further and check out what modern PCs can do.  The setup in this case is

Macbook Pro <-- 5 Ghz WiFi --> Router <-- Gig Switch --> Linux PC

You may question the wisdom of using WiFi.  I would.

Well, it turns out that I only have a 100Mbs USB2 adapter to use as a wired connection on the Mac.  And as you will shortly see, the WiFi actually gives us pretty high throughput.  Not quite a gig, but half way there.  Good enough for significant results.  I will repeat the test again when I go back to work after the holidays and publish a follow up.  Maybe even try to push a 10 Gig pipe.

So I did a quick wireless survey, found a clean 5Ghz channel, and put my Mac Book about a foot away from the router.  Here is what we saw.

PC -> Mac

The thing I find most surprising about this chart is that two of the algorithms I thought would perform best simply did not.  There was something in the way that the Debian tried to initialize the connection that OSX found distatasteful with the AES-XXX-GCM algorithms.  OSX also did not support te chacha algorithm.

The results are very interesting, but lets look a little deeper to find out more.

Lets examine the CPU utilization by the cipher on the client machine.  This will tell us how scalable the algorithm is in supporting multiple connections.  This chart shows relative CPU percentage utilization during the time of the copy scaled by the duration.  The empty space between the top of the column and 100% can roughly be interpreted as idle time.

Note: this should be interpreted as the utilization of a single core
In this test setup I only could measure the CPU consumed on the client and not the server.  When I repeat this test I will make sure my setup can measure both sides.

I tried to think of better ways to analyze and present the data, but in the end I felt I can't the story without a raw representation.  It tells the story better than any massaging could.  This shows the amount of clock time (real), user time (cipher) and system time each algorithm used.

Mac -> PC

For completeness sake - I tried to run the test from Mac -> PC.  There were only 3 supported ciphers - essentially the AES XXX CBC family.  They had exactly the same results, but slightly better performance.  Yawn.

Here are the graphs


Relative CPU

Time consumed by each function in seconds


When dealing with fully featured modern machines, I would probably choose AES-256-CBC.  It seems to have the best throughput/cpu utilization ratio and provides pretty decent security.

Sunday, December 28, 2014

Ceph Pi - Kernel Cross-Compile and also SSHFS

 Intro and Data

Well, I was saying I won't be writing this for a while, but I guess I lied.  This is going to be a tutorial specifically about using Cross-Compiling to build the kernel on Raspberry Pi and not a general purpose Cross-Compiling tutorial.

Again - we are basing this writing on this (official) guide.

Log onto your fast machine.  In my case I use an Debian box which is a tiny Intel NUC i5.

So if you have any doubts as to why compiling on the Pi will take forever, and if you have taken any offense when I called the Pi feeble, here is a graph showing the CPU utilization on the Pi for the 12 hours it took to build a kernel.

Yup.  Pegged.  For 12 hours.  So while the NUC is about $300 barebones or about 10x more expensive, it is 50x faster for this application.  I guess I am digressing but I do urge you to think well about your application.  The Raspberry Pi is amazing for some things (price / power), but heavy compute it will not do.  Here is a graph of the NUC doing the same work.

 Build the build environment

First thing first - lets make sure we have all the packages needed to actually compile and build packages.  Lets run
 apt-get update && apt-get upgrade  
 apt-get install build-essential libncurses5-dev 

Now go ahead and pull down the toolchain.
 git clone  

I am logged in as root on this box, which does break best practices.  Keep this in mind and adjust accordingly in the following steps.  I want to keep things organized so I create a directory where to put all files that have to do with this project.
 mkdir raspberry-kernel  
 mv tools/raspberry-kernel/  

The next step is to adjust our path to make it easier to work with.  Go ahead and edit your .bashrc
 vi .bashrc  

Add the following line to the bottom of the file

Next thing we need to pull the kernel sources
 cd raspberry-kernel/  
 git clone --depth=1
 cd linux 

We need to update the kernel config with the settings for the Raspberry Pi.  The space in the middle of the command is intentional.
 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig  

I had troubles with the next step.

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

I kept getting an error message that looked like this:

 root@charlie:~/raspberry-kernel/linux# make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig  
 # configuration written to .config  
 root@charlie:~/raspberry-kernel/linux# make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  
 make: arm-linux-gnueabihf-gcc: Command not found  
 scripts/kconfig/conf --silentoldconfig Kconfig  
 make: arm-linux-gnueabihf-gcc: Command not found  
  CHK   include/config/kernel.release  
  CHK   include/generated/uapi/linux/version.h  
  CHK   include/generated/utsrelease.h  
  HOSTCC scripts/pnmtologo  
  CC   scripts/mod/empty.o  
 /bin/sh: 1: arm-linux-gnueabihf-gcc: not found  
 scripts/ recipe for target 'scripts/mod/empty.o' failed  
 make[2]: *** [scripts/mod/empty.o] Error 127  
 scripts/ recipe for target 'scripts/mod' failed  
 make[1]: *** [scripts/mod] Error 2  
 make[1]: *** Waiting for unfinished jobs....  
 make[1]: 'include/generated/mach-types.h' is up to date.  
 Makefile:518: recipe for target 'scripts' failed  
 make: *** [scripts] Error 2  
 make: *** Waiting for unfinished jobs....  
  CC   kernel/bounds.s  
 /bin/sh: 1: arm-linux-gnueabihf-gcc: not found  
 /root/raspberry-kernel/linux/./Kbuild:35: recipe for target 'kernel/bounds.s' failed  
 make[1]: *** [kernel/bounds.s] Error 127  
 Makefile:840: recipe for target 'prepare0' failed  
 make: *** [prepare0] Error 2  

The file did exist, but every time I tried to run it even by hand I got the same error of File Not Found.  I ultimately found the answer here.  I think the reason it did not work on my machine but seemed to work for the writer of the original tutorial is that I am running Debian 64bit and they are running Ubuntu (I would guess 32 bit).  In any case.  All you have to do to get past the problem is to run

 sudo update  
 sudo upgrade  
 sudo apt-get install lsb-core  

This got me further, but I still conked-out.  This time the error was

 root@charlie:~/raspberry-kernel/linux# make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: cannot open shared object file: No such file or directory  
  CHK   include/config/kernel.release  
  CHK   include/generated/uapi/linux/version.h  
  CC   scripts/mod/empty.o  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: cannot open shared object file: No such file or directory  
 scripts/ recipe for target 'scripts/mod/empty.o' failed  
 make[2]: *** [scripts/mod/empty.o] Error 127  
 make[2]: *** Waiting for unfinished jobs....  
  CC   scripts/mod/devicetable-offsets.s  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: cannot open shared object file: No such file or directory  
 scripts/ recipe for target 'scripts/mod/devicetable-offsets.s' failed  
 make[2]: *** [scripts/mod/devicetable-offsets.s] Error 127  
 scripts/ recipe for target 'scripts/mod' failed  
 make[1]: *** [scripts/mod] Error 2  
 Makefile:518: recipe for target 'scripts' failed  
 make: *** [scripts] Error 2  

The solution this time ( I found it on an Android forum that I closed before I got the link ) seemed to be

 sudo apt-get install lib32stdc++6 lib32z1 lib32z1-dev  

Build that funky kernel

Now we can actually build the kernel.  So lets run the command again.  The '-j 6' option tells the compiler to use more CPUs.  I have a 4 core that is hyper-threaded.  6 is a good number since it leaves some resources for OS and such.

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

Damn!  That was fast!

One thing that both the guide missed and we skipped is that you likely want to do some configuration besides taking the defaults.  In our case we wanted the driver for the RBD block device driver module built.  To do so we must run

 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig menuconfg

So that was easy.  For completeness sake go to Device Drivers->Block Devices->Rados and make sure it checks with an <M>.  Then again run

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

Install and SSHFS...  in reverse order

This next part is the thing I really don't like.  The way the guide would have you go is to remove the SD card from your Pi, mount it on your linux box, and then do the install on it.  And then move it back.  I am not impressed.  If that is your thing, go follow the guide.  I can't help but think there are way better ways to do this.  Here is a brief enumeration of the things I came up with, and they are probably stupid and some who has done more Cross-Compile work will laugh.

Edit - I thought of using TAR originally, but when I did the compile on the Pi (disclaimer: using the Pi config, not the cross compile) it seems the 'make modules' section requires a ton of compile time as well ( I don't know why...  maybe the default config enables every damn module? )  It took 6 hrs this time.  The guide suggests that there is path info to be used even while only doing make modules before the make modules_install step.
  1. NFS mount
  2. SSHFS mount (my current choice)
  3. TAR the directory and SCP it (my original choice)
  4. Remove the SD card and plug it into a PC...
We have to write files to system directories.  So we need root permissions.  We can't get away with sudo on this one, because there is no way to execute 'sudo' over sftp or sshfs.  This is something we will enable only temporarily, so we should be fine.

Make sure you are on the Pi.  First we need to give root a pssword.  Root does not have a password by default to disallow anyone using it.

 sudo passwd root  

Now we have to make sshd allow root logins, which in current versions it does not do by default.

  sudo vi /etc/sshd_config  

Find the line that reads

PermitRootLogin without-password  

Comment it out and add another line, which sets the permission to yes.  This allows us to actually log in with This is what the file will look like:

 #PermitRootLogin without-password 
 PermitRootLogin yes  

Now restart sshd

 sudo /etc/init.d/ssh restart  

Ohkay.  Now log on the Cross-compile host as root.  You may have to do similar gymnastics as to what we just did on the Pi.  Run the following commands.  When prompted for pass phrase, just press enter.  Of course substitute pi1 with the name or IP of your Raspberry.

 ssh-copy-id root@pi1  

Test everything by running

 ssh root@pi1  

You should be successfully logged into your Raspberry.  Use this opportunity to edit your sshd config

  vi /etc/sshd_config  

and switch the comments on the to PrermitRootLogin lines, so they look like this

 PermitRootLogin without-password 
 #PermitRootLogin yes  

So anyone with root on your cross compile box can log into your Pi.  But no one can SSH in directly.  We should also remove the password from root.

  passwd -d root  

We are pretty munch to best security practices on our pi.  Now its time to mount up!  Lets make a directory we can use as mount point and mount up.

 cd /root/raspberry-kernel
 mkdir remote-host
 sshfs -o idmap=user  root@pi1:/ /root/raspberry-kernel/remote-host -o Cipher=blowfish

(Edit: since this article was originally published, I also wrote this analysis of the best crypto cipher to use with the Raspberry Pi)
The idmap=user makes it so the user account we are logging in as maps to the remote host's equivalent by name instead of by numeric id.  The  Cipher=blowfish  ensures that we use a pretty fast cipher.  The Pi is often constrained by CPU when it comes to doing encryption communications (someone should look into using that GPU to accelerate them).  SCP is usually pretty slow with the default ciphers, so its a good idea to choose your encryrption wisely.  Here is a good article that includes the image I post here.  You can't have a non-encrypted tunnel apparently.  I tried a couple of other ciphers which should have worked but did not.  I will look more into this later... its an interesting follow up topic.

So while researching this I stumbled on an amazing article describing how to avoid encryption in sshfs entirely.  I have searched for ways to use the ease of sshfs without having to deal with the encryption overhead.  And here it was in its glory.  I have not tested this, but I will soon and we will get some sweet test results.

... and back to the guide

So now that we have a mount on the Pi from our cross compile host with root permissions its kind of like having the SD card plugged into the cross-compile host.   So we can take the next steps from the guide with a slight adjustment for the directory names.

 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/root/raspberry-kernel/remote-host modules  
 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/root/raspberry-kernel/remote-host modules_install  

Now backup the old kernel image and copy the new one

 cp /root/raspberry-kernel/remote-host/boot/kernel.img /root/raspberry-kernel/remote-host/boot/kernel.img.orig  
 cp arch/arm/boot/Image /root/raspberry-kernel/remote-host/boot/kernel.img  


 umount /root/raspberry-kernel/remote-host   

OK...  Well...  reboot? 

... and we are back.

We can check if everything is copacetic by running

 uname -a   

and making sure that the right kernel version shows up.

In our case we can also verify that the new driver we wanted is properly installed.

 modprobe rbd  

If there is no error, then we are doing a-OK!

That's all folks.

Saturday, December 27, 2014

Ceph Pi - Mount Up - Mounting Your Ceph File System for the first time. On Raspberry Pi. 'Cause we can.

So we have an active and healthy Ceph cluster.  Great.  But we can't do anything useful with it yet.  As we discussed earlier, the point of this exercise is to create a block device we can use as the basis of a home storage solution.  So now we need to expose our cluster as a block device...

In Ceph land, this requires the installation and setup of ceph-client on a machine connected to the network.  In a production environment, this would be the system mounting the storage.  It would certainly not be one of the machines in the cluster.  Well, we are not exactly in a production environment.... yet.

In our case we will first use pi1 as the system that is going to host ceph-client.  Since ceph is already installed, we will skip the first couple of steps of the guide.

The following steps all must be run on the box that will be ceph-client.  In our case pi1

We will need lsb_release, so lets install it
 sudo apt-get install lsb-release  

Then we go to create a block device image
 rbd create ceph-block --size 4096  

We map the image to the block device

 sudo rbd map ceph-block --pool rbd --name client.admin  

...and we fail

 modinfo: ERROR: Module rbd not found.  
 modprobe: FATAL: Module rbd not found.  
 rbd: modprobe rbd failed! (256)  

Looks like daddy needs some more modules...  This is going to take some doing.

Building a custom kernel

(Later Edit: A day after this post I posted  .  It explains exactly how to do cross compiling for the kernel and saves a lot of time.  While the directions below are valid, I would encourage you to follow the guide above to save a ton of time)

(NOTE: This is a good time to examine how to do Cross-Compilation to help the fairly feeble Raspberry Pi compile the kernel in a reasonable amount of time with the aid of a beefier machine on the network.  On its lonesome it takes 18+ hours.  I am not going to be exploring Cross-Compilation in this post since I already did it the hard way.  A follow up post may come though.  And then there is distcc)

We are going to be loosely following this guide.  But since that guide does not talk about how to add new modules, will deviate a bit.  Before we start, we need to install a couple of packages that will be needed down the road

 sudo apt-get install libncurses5-dev  

First thing we got to do is pull down the sources.  If you are familiar with Debian you may think that something like sudo apt-get install linux-headers would work.  Not on Raspbian.  Here it is a place holder package or something because (I gather) of the proprietary bunch of  code that Rasberrys require.  In any case.  Here is what we got to do.

In your home directory run

 git clone --depth=1  

After you are done pulling sources there will be a directory called 'linux' in your home directory.  Next we have to make sure we have all dependencies

 sudo apt-get install bc  

Time to configure your kernel.  Luckily Raspberrys come with this really cool script that sets up all the defaults for us, so we don't have to spend hours hunting and pecking for drivers.  All the basics are taken care of and we just have to build the new things.

 cd linux  
 make bcmrpi_defconfig  

Here we diverge from the guide

 make menuconfig  

In the menu that appears, scroll and select

 Device Drivers -> Block Devices -> Rados Block Device (RBD)   

Make sure you are not selecting DRBD instead.  Once on top of RBD, press space until M appears in the 'check box' in front of it.  Then feel free to exit the config and make sure you save your changes.

Its probably a good idea to run this next command in a screen session.  It takes a long time.  Not 'lets make a tea' long time.  An 'overnight' long time.  Unless you have a cross-compile or distcc set up.  So what I heartily advise is to install screen and run make in a screen session.  That way if your connection drops for some reason you don't have to start from scratch.  If you are running Debian wheezy / testing (aka jessie) by following my previous guide (and having already installed ncruses), you should be able to do:

 sudo apt-get install screen  



...and about 12hrs later we are on the other side. One more 6 hrs biggie!

 make modules
...and breathe!  The other commands are somewhat shorter

 sudo make modules_install  
 sudo cp /boot/kernel.img /boot/kernel.img.bkup   
 sudo cp arch/arm/boot/Image /boot/kernel.img  

Reboot, yo!

Try to ssh back in!
are you alive?  I hope so!  Time to step it up!

The rest of this guide is too easy (famous last words).

Now that we have a kernel built...  Lets mount up!

We can now...  a day or so later...  re-run the command that failed earlier.  Lets get that block device mapped
 sudo rbd map ceph-block --pool rbd --name client.admin    

And now lets put a file system on it
 sudo mkfs.xfs /dev/rbd/rbd/ceph-block  

And now lets mount it!
 sudo mkdir /mnt/ceph-block-device  
 sudo mount /dev/rbd/rbd/foo /mnt/ceph-block-device  
 cd /mnt/ceph-block-device  

...and we are done!  We have a distributed storage array running on Raspberry Pis!  Could it be true???  Lets check!

 ceph@pi1 /mnt/ceph-block-device $ df -h  
 Filesystem   Size Used Avail Use% Mounted on  
 /dev/root    14G 3.1G 9.6G 25% /  
 devtmpfs    231M   0 231M  0% /dev  
 tmpfs      235M   0 235M  0% /dev/shm  
 tmpfs      235M 4.5M 231M  2% /run  
 tmpfs      5.0M 4.0K 5.0M  1% /run/lock  
 tmpfs      235M   0 235M  0% /sys/fs/cgroup  
 /dev/mmcblk0p5  60M  16M  44M 27% /boot  
 tmpfs      47M   0  47M  0% /run/user/1001  
 /dev/sda1    932G  33M 932G  1% /mnt/sda1  
 /dev/rbd1    4.0G  33M 4.0G  1% /mnt/ceph-block-device  
 ceph@pi1 /mnt/ceph-block-device $   

So how does this setup perform?  We will find shortly!

Tuesday, December 23, 2014

Ceph Pi - Intermission and another problem solved.

Intermission...  my second son was born last week, and as these things go, my life was flipped upside down for a few days.  Its been a very happy time with everyone doing extremely well.  The holiday atmosphere (it is late December 2014) has made it so we've been able to relax into this huge change.  I've been spending time at home, but my mind has not been focused on cool projects.  Instead I've been brushing the cobwebs off my diaper-changing skills and building legos with my soon-to-be-six-year-old.

I left things in a bit of disarray last week.  I was messing around with pi1 and had left it with a half destroyed init system (failed upstart install).  Everything was rebuilt today.  I did not do much to pi2 and pi3 besides blasting away their OSDs.

As I was activating the OSDs after the rebuild I run into some trouble that I wanted to write about.  Again, I blew away the Monitor - pi1.  I re-installed everything.  I did not do much to pi2 and pi3 except destroying the OSDs.

When I typed

 ceph-deploy osd activate pi2:/mnt/sda1  

I would get the following output

 [pi2][WARNIN] DEBUG:ceph-disk:Cluster uuid is 15f7e5bd-b4aa-4b8e-ab8e-95d86eaf2f15  
 [pi2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid  
 [pi2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph  
 [pi2][WARNIN] DEBUG:ceph-disk:OSD uuid is f3d2ad37-7529-4310-a41d-77780b15e0c4  
 [pi2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...  
 [pi2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise f3d2ad37-7529-4310-a41d-77780b15e0c4  
 [pi2][WARNIN] 2014-12-23 03:23:35.798919 b48ff460 0 librados: client.bootstrap-osd authentication error (1) Operation not permitted  
 [pi2][WARNIN] Error connecting to cluster: PermissionError  
 [pi2][WARNIN] ceph-disk: Error: ceph osd create failed: Command '/usr/bin/ceph' returned non-zero exit status 1:  
 [pi2][ERROR ] RuntimeError: command returned non-zero exit status: 1  
 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v activate --mark-init sysvinit --mount /mnt/sda1  

Research online brought me here.

On pi1 I run

 sudo scp /etc/ceph/ceph.client.admin.keyring ceph@pi2:  
 sudo scp /var/lib/ceph/bootstrap-osd/ceph.keyring ceph@pi2:  

Then on pi2 I run

 sudo mv ceph.client.admin.keyring /etc/ceph/  
 sudo mv ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring   

Then there was great success and much rejoicing when I once again run

 ceph-deploy osd activate pi2:/mnt/sda1  

Remember to do this for pi3 as well.

This is not a mandatory step.  But apparently something that can happen, and when it does, it...  took me 30 min to figure out.

Sunday, December 14, 2014

Ceph Pi - Installing Ceph on Raspberry Pi

Note: I will talk about Raspbian here.  It is a Debian port for the Raspberry.  I am sure most of the instructions here will work fine on Debian as well or Ubuntu.  The guide was built for Wheezy and Ceph 0.80.

Here is why installing Ceph on Raspbian is tricky and there isn't much shared on the internet.
  • Raspbian-wheezy has an incomplete (and very old - 0.43 vs modern - 0.80 ) set of packages, so you have to switch to testing to get everything you need
  • Ceph uses a program called ceph-deploy.  It tries to automatically install some of the packages you need.  As it runs it does an action it calls 'adjust repositories' which adds Ceph's repository to your list of sources.
  • Ceph's repository does not have all the packages compiled for the ARM chip of the Raspberry
  • You have to edit their source, and take out that step, so that wheezy-testing's packages are used
  • Finally (optionally) if you are planning to run ceph-client on one of the Raspberry Pis, you have to compile a kernel module for RBD.
  • We will also review the actual set up of the Ceph cluster and all the commands to get things going.
Its a daunting list, but I will walk you through it quickly and even though it will take a ridiculous length of time (unless you setup distcc - watch for a follow up article on that) it should be pretty smooth.

I will list the two most common errors encountered to hopefully let people find this article.

 [WARNIN] too many arguments: [--cluster,ceph]  

This one is caused by the ancient packages for Ceph in Raspbian-wheezy

 [WARNIN] E: Unable to locate package ceph-mds  
 [WARNIN] E: Unable to locate package ceph-fs-common  

This two are caused by ceph-deploy trying to adjust repositories to the Ceph project repository which does not contain the packages for the ARM architecture that the raspberry needs.

Onwards and upwards.

First lets go ahead and edit /etc/apt/sources.list

 sudo vi /etc/apt/sources.list  

Go ahead and replace the word 'wheezy' with 'testing' from this:
 deb wheezy main contrib non-free rpi  

To this:
 deb testing main contrib non-free rpi  

Then you have to update the whole system. Joy.

 sudo apt-get update  

 sudo apt-get upgrade  
 sudo apt-get dist-upgrade  

This will take a while but you will be prompted to say Yes a bunch of times as important config files get changed.  Since we have not really moved from the defaults, we are good to say Yes pretty much to everything.  Watch out for the increased security settings in ssh you 'root' lovers because it may just leave you locked out...  Anyway.  Watch a show.  Brooklyn Nine-Nine is pretty good.

Done?  Good.

A couple of more packages you will need are NTP and SSH if you somehow missed them

 sudo apt-get install openssh-server ntp  

Installing Ceph

Now we will actually start installing Ceph.  We are going to be loosely following for the most part, but we are not out of the woods yet, so follow this guide

  1. Create a ceph user on each node.  For simplicity sake I will call him 'ceph'.

     sudo useradd -d /home/ceph -m ceph  
     sudo passwd ceph  

    While you are at it, make it a 'sudoer'
     echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph  
     sudo chmod 0440 /etc/sudoers.d/ceph  

  2. Setup SSH keys.  ceph-deploy uses SSH to log into systems and does not prompt for passwords.  So we need to setup password-less SSH

    First we generate the keys.  Make sure you are logged in as the 'ceph' user.  Do not use root or sudo.  Leave the passphrase empty.  Technically you are required to do this only the admin node, and it probably a good idea to do it only the admin node for a production environment (for security purposes).  I did it on all my nodes since it makes extended setup a lot easier
     ssh-copy-id ceph@pi1  
     ssh-copy-id ceph@pi2  
     ssh-copy-id ceph@pi3  

  3. (Optional) Modify ~/.ssh/config file so that ceph-deploy does not have to use the '--username ceph' option each time you run a command.   Put the following text inside the file.

     Host pi1  
       Hostname pi1  
       User ceph  
     Host pi2  
       Hostname pi2  
       User ceph  
     Host pi3  
       Hostname pi3  
       User ceph  

  4. Add the ceph repository to apt sources.  First we do the release key, then we add the packages to the repo and then....  we update the repository and install ceph-deploy.Note: replace 'firefly' in the second command with the name of the current stable release of ceph.

     wget -q -O- ';a=blob_plain;f=keys/release.asc' | sudo apt-key add -  
     echo deb $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list  

  5. Important Step: We deviate from the script here a bit.  Ceph repo does not really have support for wheezy-testing (jessie).  We have to edit the ceph repo file and adjust it.

     sudo vi /etc/apt/sources.list.d/ceph.list  

    change the word 'testing' to 'wheezy' so that it looks like this

     deb wheezy main  

  6. Now it is time to finally install ceph-deploy!  I bet you were loosing hope!

     sudo apt-get update   
     sudo apt-get install ceph-deploy  

  7. We can actually create the cluster.  We have only one Mon node (pi1) at this time, but technically we can list multiple ones in the command.  Make sure you are in the home directory of the user 'ceph' we create earlier.  Notice there is also no 'sudo'

     ceph-deploy new pi1  

  8. By default ceph requires that there are at least 3 replicas of the data in the cluster.  That means at least 3 OSDs.  Since our initial build has only 2, we will have to change it.
     vi ./ceph.conf  
    Add the line
     osd_pool_default_size = 2  

  9. This is the most tricky step of the install.  According the guide we should be able to just run

     ceph-deploy install pi1 pi2 pi3  

    That, however causes us to get an error

     [pi1][WARNIN] W: Failed to fetch 404 Not Found [IP: 2607:f298:4:147::b05:fe2a 80]  
     [pi1][WARNIN] E: Some index files failed to download. They have been ignored, or old ones used instead.  
     [pi1][ERROR ] RuntimeError: command returned non-zero exit status: 100  
     [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-get -q update  

    The reason for this is that ceph-deploy tries to adjust repos on the fly.  It tries to make the ceph-repository authoritative for all things Ceph, even when it does not have the packages.  In this case it is because of the architecture.  Notice - binary-arfh.  I am working on getting a patch submitted that will get around this by giving the user the option not to adjust the repos, but for the time being, we actually have to change the code.  No worries - its one line!

     sudo vi /usr/share/pyshared/ceph_deploy/hosts/debian/  

    Near the very top of the file you will see some code that looks like this

     def install(distro, version_kind, version, adjust_repos):  
       codename = distro.codename  
       machine = distro.machine_type  

    You will need to add 1 line so it looks like this:
     def install(distro, version_kind, version, adjust_repos):  
       adjust_repos = False
       codename = distro.codename  
       machine = distro.machine_type  

    Also go ahead and edit the ceph source list
     sudo vi /etc/apt/sources.list.d/ceph.list  

    and comment out the line in it so it looks something like this
     #deb sid main  

    And now we are ready!!!!!
     ceph-deploy install pi1 pi2 pi3  

  10. Pfew.  Ok.  The hard part is over.  Give yourself a high five.
  11. Now we install the initial Mon(itor) nodes and make sure we get all the keys (remember - ceph-deploy uses ssh password-less mode to issue commands to all nodes).  Execute this on pi1 in the /home/ceph directory.

     ceph-deploy mon create-initial  

  12. Now its time to add the two OSDs.  As per the plan above - we will add one on pi2 and one pi3.  If you are following my exact build, you have a SATA drive attached via USB to your Pis.  In this step we are going to format them and mount them.  If you are going to just create a couple of small OSDs on your SD cards, you can skip this section except for the very next command and proceed.

    First we make a directory to which we will mount them.  We should execute this on all three nodes
     sudo mkdir /mnt/sda1/  

    This is a good time to talk again a little bit about Minimum System Requirements.  And how the Pi really does not meet them.  It comes down to both RAM and CPU, but mostly RAM...  My understanding is that for each Gig of managed storage, a Ceph OSD node needs about Meg or RAM.  The math is easy.  500Gigs of storage = 500Megs of RAM....  With my 1TB drives I am bound to swap.  I thought a bit about creating right-sized partitions, but the 'good news' is that the RAM is used on 'as-you-go' basis.  The space is not pre-allocated.  (Here I am pretty sure someone will come out and tell me about the awesome pre-allocation option, or raw HD options...  and I will be really happy, because that would be awesome.  So don't be shy.)  Long story short - since I intend to test, I want to be able to stretch the system beyond its breaking point in every way I can and identify the bottlenecks.  This is not intended to be a production build (yet?).

    be as it may be...

    Now we partition and FS format the disks.  Again - we do this on every machine.  The instructions I will provide will format the whole disk as a single partition.  If you have a different setup, or you don't want to blow away the whole HD, please use caution.

     sudo fdisk /dev/sda  
     <press 'p' and press 'Enter' to see if there are any partitions on this disk>  
     <press 'd' and press 'Enter' followed by the number of the partition you want to delete and 'Enter'>  
     <repeat this for all partitions>  
     <press 'n' and press 'Enter' to create a new partition>  
     <press 'p' to choose Primary and press 'Enter'>  
     <press '1' to choose Partition 1 and press 'Enter'>  
     <press 'Enter' 2 more times to confirm the default starting and ending sectors>  
     <press 'p' and 'Enter' to ensure that what you thought happened actually happened>  
     <press 'w' and Enter to write the new partition table to the disk and exit>  

    I personally favor XFS, so I will setup my drives with it.  Reading Ceph documentation, people talk about BtrFS as being a good, though possibly immature alternative.  And then there is Ext4...  Well, some day we can do a bake off.  But first we must finish this...

    If you don't have xfsprog installed, you have to run
     sudo apt-get install xfsprogs  

    Then we format the HD with
     sudo mkfs.xfs -f /dev/sda1  

    Time to mount the system.   Since we want to survive reboot we have to edit /etc/fstab
     sudo vi /etc/fstab  

    Add a line for the new drive
     /dev/sda1    /mnt/sda1    xfs   defaults,noatime,nobarrier 0    1  

    Then we have to re-read the /etc/fstab
     sudo mount -a  

    To verify everything is looking good 'df -h' and make sure your output looks good
     ceph@pi1 ~ $ df -h  
     Filesystem   Size Used Avail Use% Mounted on  
     /dev/root    14G 4.5G 8.2G 36% /  
     devtmpfs    231M   0 231M  0% /dev  
     tmpfs      235M   0 235M  0% /dev/shm  
     tmpfs      235M  27M 209M 12% /run  
     tmpfs      5.0M 4.0K 5.0M  1% /run/lock  
     tmpfs      235M   0 235M  0% /sys/fs/cgroup  
     /dev/mmcblk0p5  60M  29M  31M 48% /boot  
     tmpfs      47M   0  47M  0% /run/user/1001  
     /dev/sda1    932G  33M 932G  1% /mnt/sda1  
     ceph@pi1 ~ $   

  13. Back to creating the OSDs.  ceph-deploy OSDs to the directory we just created on pi2 and pi3.  Execute this from /home/ceph on pi1
     ceph-deploy osd prepare pi2:/mnt/sda1  
     ceph-deploy osd prepare pi3:/mnt/sda1  

  14. Activate the OSDs
  15.  ceph-deploy osd activate pi2:/mnt/sda1  
     ceph-deploy osd activate pi3:/mnt/sda1  

  16. Oh SNAP!  The cluster is live!  Ceph starts balancing the data to new OSDs.  Since these are the first 2 OSDs, and we don't have much data, nothing will be happening.  We will observe this later!!!  For now - the command
     sudo ceph -w  

    should give us cluster status. I am not sure why the 'sudo' is needed. Probably messed up something... Here is what I get
     ceph@pi1 ~ $ sudo ceph -w  
       cluster d8e62abb-488c-4cfe-abb4-65d2151cf42f  
        health HEALTH_WARN 1 pgs stuck unclean  
        monmap e1: 1 mons at {pi1=}, election epoch 2, quorum 0 pi1  
        osdmap e41: 4 osds: 3 up, 3 in  
        pgmap v422: 192 pgs, 3 pools, 0 bytes data, 0 objects  
           15463 MB used, 2778 GB / 2793 GB avail  
               1 active  
              191 active+clean  
     2014-12-15 03:19:54.879892 mon.0 [INF] pgmap v422: 192 pgs: 1 active, 191 active+clean; 0 bytes data, 15463 MB used, 2778 GB / 2793 GB avail  
     2014-12-15 03:22:07.859035 mon.0 [INF] pgmap v423: 192 pgs: 1 active, 191 active+clean; 0 bytes data, 15462 MB used, 2778 GB / 2793 GB avail  

  17. To make administration easier and avoid having to specify the monitor address every time run
     ceph-deploy admin pi1 pi2 pi3  

  18. Make sure you have the right permissions to the keyring
     sudo chmod +r /etc/ceph/ceph.client.admin.keyring  

  19. Finally lets do one last check to make sure everything is good with the world
     ceph health  
  20. CONGRATS on getting here...  if you had trouble - see me in the comments.  

    The next section is going to be about actually getting the new cluster mounted as a block device on Pi1.  Then we will create an OSD on Pi1 and add it to the cluster.  Then... well, I don't want to give you too many spoilers.  Starting with the next installment, I hope, there will be a ton of testing data as well.  Bandwidth, CPU usage, memory...  All in painful high-frequency polled detail (thanks to the efforts of Brandon Hale)!  

    Thanks for sticking through this!