Tuesday, January 27, 2015

Every "Wish" Has A Price

Note: I saw this on Diana Deleva's blog, and I translated it from Bulgarian.  She in turn had translated it from Russian.  I do not know who the original author was yet.  I will research it further, but if someone finds out in the meantime, let me know!

Every Wish Has a Price

In the far reaches of the Universe, there was a small shop.

It didn't have a sign hanging outside - the sign was blown away by a  hurricane, and the new proprietor decided not to hang a new one because all the locals knew what the store sold - wishes.

There was a huge selection.

Here you could buy practically anything: huge yachts, apartments, marriage, the post of vice president of a corporation, money, kids, a dream job, a beautiful body, victory in a contest, powerful cars, power, success, and much much more.

The only things not on sale were life and death.  Those were the purview of Head Quarters, which was located in another galaxy.

Everyone who would walk into the store (and by the way, there were many wishing people who never, not once entered the store but stayed wishing at home) wanted to know the price for their wish.

Prices were different.

For example, a dream job cost giving up on stability and predictability, being ready to plan all by yourself and structure your life, belief in yourself and allowing yourself to work where you would like and not where you have to.

Power cost a little more.  You have to give up some of your convictions, to find a rational explanation for everything, to be able to say no to others, to know your own price (and that price needs to be high enough), to allow yourself to say 'I', putting yourself forward no matter the approval or disapproval of those around you.

Some prices looked strange - marriage, it was possible, to get almost for free. However, a happy life was really expensive: to be personally responsible for your own happiness, to be able to get joy out of life, to know your true desires, to give up on wanting to conform to or imitate  those around you, to value what you have, to allow yourself to be happy, to be conscious of your own value and importance, to give up on the perks of being a 'victim', to take the risk that you will part with some friends and acquaintances.

Not everyone who came into the store was ready to immediately buy their wish.  Some, seeing the price turned around and walked away.  Others stood there for a long time thinking and calculating how much they have right now and where they could get the rest.

Some started complaining about the too-high prices, and begged for a discount or inquired about when there would be a sale.

And there were those who gave all their savings and got their most intimate wish wrapped in beautiful crinkly paper.

Other shoppers looked upon the happy buyers with envy, and gossiped that the proprietor is a friend and they got their wish just like that, with no effort at all.

Often people offered the owner to lower the prices so he would increase the number of buyers.  But he always refused, because then the quality of wishes would suffer.

When they asked if he is not worried that he would go broke, he always shook his head and answered that in all ages you can find a few brave souls who are ready to risk and change their lives; who would give up their ordinary, predictable life;  who are capable to believe in themselves; who have the strength and what it takes to pay to get their wish.

And on the door of the store there was written, at least for the past hundred years:
'If your wish is not coming true, it means it hasn't been paid for yet'.

Tuesday, January 6, 2015

Ceph Pi - Adding an OSD, and more performance

So one of the most important features of a distributed storage array is that...  being distributed you should be able to expand it quickly in terms of size as well as  get better throughput and performance as you add more nodes..

We are going to add another Raspberry Pi + USB attached HDD ( Pi1) as an OSD.

This will make our cluster look like

  1. Monitor, Admin, Ceph-client - ( x64, gig port, hostname: charlie )
  2. OSD ( Rapberry, 100Mbp eth, USB attached 1TB HDD, pi2)
  3. OSD ( Rapberry, 100Mbp eth, USB attached 1TB HDD, pi3)
  4. New OSD ( Rapberry, 100Mbp eth, USB attached 1TB HDD, pi1)
Attaching another OSD is pretty easy! (note - I am using Pi1, which was already initialized in previous guides, so there are some steps missing here)


From the Monitor/Admin node (nor running on my x64 box) we run
 ceph-deploy install pi1  
 ceph-deploy osd prepare pi1:/mnt/sda1  
 ceph-deploy osd activate pi1:/mnt/sda1  

Here is the performance of the cluster while the OSD was being added.  Its a pretty complex graph, but it does cover all the KPIs we are tracking.  

Saving data to the ceph-cluster

We are getting measurably better performance from the 3 OSD ceph-cluster.  The throughput is at about 44Mbps.

This compares very favorably to the 2 OSD test which run at about 30Mbps.  Here is the data as a reminder

The Raspberry Pis had their CPUs peaked pretty flat, so I am not sure we can squeeze a whole lot out of them.

Here is the ethernet port utilization on all the OSD Raspberrys.

Loading data from the ceph cluster

Performance when loading data from the ceph cluster was surprisingly lower than writing the data!  It was stuck at 40Mbs
  As a matter of fact we are running at exactly same speed as we did when we had only 2 OSDs involved (reference the previous article).

Here is the graphs of the CPU on the Raspberry OSDs.

The utilization of the CPUs is pretty low and we have quite a bit of head room.

Finally here is the throughput from the individual network utilization from every OSD.

And just because it looks cool and matches perfectly the throughput of the ceph-client host, here is the stack line graph of the same data.


Adding an extra OSD gave us a measurable boost on the writing end of the equation.  It did nothing for reading performance.  We did observe lower loads both on the OSD CPUs and network utilization, yet the numbers on the client were unmoved.  

I am at a loss as to how to explain this.  There is a core on the client that is doing an inordinate amount of Wait...  but why? Maybe it is network latency of some sort?  Thought I am on a gig switch that has all devices plugged in directly...  I do not know.

The next step (and probably the last in this series) will be measuring concurrent access.  See where that takes us.  

Ideas and requests are welcome.

Monday, January 5, 2015

Ceph Pi - ... and now for some production numbers!

Note: This is a follow up to this article.  This article will make a lot more sense if you read it after reviewing the previous one.

Copying to the Ceph Cluster

So here we are.  Ceph-client and Mon are installed on a micro x64 system.  2 OSDs are installed on 2 Raspberry Pis.  And here are the numbers.

We are running consistently in the 30 Megabit range copying to the Raspberry PIs.

The limiting factor is the CPU on the Raspberrys.  Here they are.

They are bellow 100, though.  Which is slightly puzzling.

For completeness here are are the network utilization numbers from the OSD RPis

...as well as the HDD utilization

Copying From the Ceph Cluster

Copying from the cluster is moving at a decent 40Megabits/second. 

The CPUs of the Ceph Nodes are not pegged.  I am not sure why we are not getting better performance.  There really does not seem to be a bottleneck anywhere in sight.

The disks are doing well too


While we are getting decent numbers, we are not pegging anything.  I am not sure why we are not getting better numbers.  Any clues would be REALLY appreciated.

The only clue I have is that there is pretty massive Wait time going on in the ceph-client node.  We are getting pretty much a whole core pegged.

Next Step

We are going to add one more Raspberry Pi OSD and observe the impact.  Hopefully we are going to get an increase in throughput.  

Again - comments and suggestions are very welcome!

Sunday, January 4, 2015

Ceph Pi - oDroid Fail

I had a frustrating weekend.  I was really excited about making my oDroid U3 the controller for the Ceph cluster.  I was pumped.  The CPU runs like magic ( compile time of the kernel on board was comparable to an x64 ), and I keep being excited about its 2GB or RAM (but not the 100Mbs ethernet).

And yet...  I failed.  A lot.

It really seems to have to do with the kernel.  The oDroid comes with a 3.0 kernel.  I had to compile the RBD module.  And it was hell.  I failed to get the on-board compile to work.  When I failed - there was no way to get any feedback.  I would have it plugged into a screen and get no feedback whatsoever.   I finally got it working by doing a cross compile.  Technically there is a a 3.8 branch as well, but I did not manage to get it working (well, I really did not get to it in the end, so just keep reading).

So then I tried set it up as a mon / ceph-client.  And I failed.  And failed.  And failed.  In the end, I found this article which describes the features included in various versions of the kernel.  It turns out my 3.0 was just too old.  And even if I get 3.8 working, its likely going to be trouble as well, since it was missing a ton of features.   I tried every work around I could, but most posts really talked about 3.9 as being the earliest viable version.  And the oDroid does not seem to support it... at least with Ubuntu.

So I gave, up and set up my Intel NUC mini PC which is running x64 with Debian Jessie.

I have not given up on the oDroid yet, but it will be a couple of weeks before I forget my current level of frustration and give it another whirl.

This may be a blessing in disguise because I now have a gig port.  

Friday, January 2, 2015

Ceph Pi - Initial Performance Measurements

Tests to be run

We are going to run 3 tests.

  • Local copy HDD to SD (control of maximum throughput)
  • Local copy SD to Ceph mount
  • SCP
  • SAMBA (network share)
We are going to look at 
  • Network Throughput
  • Throughput by protocol
  • CPU usage overall
  • CPU per key processes


This is a review of what the current setup looks like.  We are going to be using a 3.5 GB gzip file of a SD card backup for our testing purposes.

It would be also interesting to try it with a directory of small files and see what that will do.

Local Copy HDD to SD

The first test we are doing is to see what the maximum throughput of the Raspberry Pi is when copying from one on-board device to another.  We will also try to see if we can pin point the limiting factor - Disk throughput, IOPS or CPU.

The first hump in the graph is copy from the SD card (mmcblk0p6) to USB attached HDD (sda1).  The second is the reverse - HDD to SD card.

As we can see CPU is certainly the limiting factor, though SD card throughput is close second.  

Here is another graph showing the actual throughput.

The good news is that the SD card is plenty fast to read at 100Mbs.  

Local Copy SD to Ceph

We have the file loaded on the SD of Pi1, which is also our ceph-client.  Lets go ahead and copy it over from the SD over to the the Ceph cluster mounted partition.

We are getting decent throughput, certainly good enough to stream video at about 30 megabits / sec.  Lets go ahead and check out how the OSDs did.  Unfortunately I seem to have a bit of a problem with pi3, so I only have data from pi2.  In this setup I expect the two OSDs to be fairly close, though.

The OSD has some more breathing room.  It is interesting that there is a lot of traffic in the reverse direction comping out of the OSD!  The traffic is headed to the other OSD, though.  Here is a composite graph... a bit complex for my taste...

Local copy from Ceph to SD

Now lets take a look in the opposite direction - downloading the file from the ceph cluster to the local storage.  Here is the utilization on the Monitor / Client node.

Here is the utilization on the OSD

Notice that there is almost no chatter.  CPU is low as well.  So this makes it pretty clear that the client node is CPU limiting us.  

Network SCP to Ceph

Given that we are already being CPU limited, adding the extra overhead of encryption is going to make things poor indeed.  I am not even going to bother testing this, unless someone in the comments specifically requests it.

Network SAMBA ( network share to ceph ).

Uploading data to SAMBA we see

And then downloading it we get
Interestingly upload is a little faster than downloading.

At 25 Mbs we are doing well, especially considering the 35 Mbs optimum base case we got from the local copy, and the fact that we are limited to 100Mbs. It is workable for most home applications (streaming movies), but we will have to optimize a bit before we can consider this anything like production ready ( uploading or downloading will take ages ) even for a home storage solution.


While this is a somewhat workable setup with enough bandwidth for most everyday uses, it is pretty obvious that we can get a huge improvement by moving the client to a more powerful box.  OSDs seem to be keeping up well.  I am looking forward to expanding the test cases.

Right now we are doing pretty much at the same level as if we used a low-end SDHC card in our PC as storage.

Wednesday, December 31, 2014

Ceph Pi - performance of scp, ciphers, and the Raspberry Pi

This article is not really much about Ceph, but it does explore technologies very tangential to it.  So I decided to add it to the series.

I am sure this has been discussed many times before.  What is the best cipher for SCP to use so you get maximum throughput.  So I decided to do a quick survey of the current state of the art and share it.

SSH, in some versions, including the version distributed in Debian Jessie has a cool new option which lists all the ciphers that are supported.

 ssh -Q cipher  

In case yours does not, you can do

 man ssh_conifg  

So I put together a little bash script to run on my Debian server ( nothing a special - a single CPU i5 with an SSD ) to copy the same 500MB file ( Debian ISO ) to a target computer via every supported cipher.  First I tried this against a Raspberry Pi, and then my Mac Book Pro connected to the same gig switch.  Here is the script for your amusement and pleasure.
 for i in `ssh -Q cipher`  
     echo "scp -o Cipher=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1"  
     `time scp -o Ciphers=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1 2>&1`  
     echo ""  
     echo ""  

If your particular system does not support the -Q option you can copy and paste the list of ciphers from the 'man ssh_config' and modify the second line of the script to read something like

 for i in aes128-ctr aes192-ctr aes256-ctr arcfour256 arcfour128 aes128-gcm@openssh.com aes256-gcm@openssh.com aes128-cbc 3des-cbc blowfish-cbc cast128-cbc aes192-cbc aes256-cbc arcfour  

First observation - not all ciphers that the ssh client supports are enabled by default on all ssh servers so you will get a bunch of errors.  I am not going to enable the additional ciphers, since this is more of a practical guide.

Raspberry SCP results

Here are the throughput results copying the 500 MB file from the PC to Pi and vice versa.  
SCP Throughput with various algorithms using the SD card as storage on the Pi
I would think that the difference here comes down to one of two things.  Pipelining or the fact that computation on ciphers is harder on the encryption side than the decryption side.  I am honestly not too sure and would love some input as to this near 30% difference.

Using the USB attached 7200RPM HDD on the Pi as a storage device

SCP Throughput with various algorithms using USB 7200RPM HDD as storage

I really expected the HDD to be faster than the SD card, but it was not.  It was consistently a little bit slower.  Nothing very significant, but certainly a fact.  Looking for an explanation, I looked at the CPU utilization on the system.  It should be noted that the Pi is pretty much pegged at 100% during this operation.  This is split between System and User.  System takes care of the Disk and Network IO work and User is what the encryption algorithms use up.  Writing to the USB mounted HDD takes some extra CPU processing, and the media IO is not a bottleneck at the speeds we are achieving.  So CPU is still our limiting factor.
CPU System usage SD vs HDD
Notice that there is is a cool correlation - the higher the throughput of the algorithm, the higher the System CPU utilization since it has to manage more IOPS  both to the network and disk.  What is a bit surprising is how much CPU storage and network seem to require.  This would certainly be a place for future PIs to look for improvement.  The numbers seem to imply that even with no encryption what so ever, the Raspberry Pi would be limited at about 60 - 65 Mbs.

So what about User CPU per algorithm?

I did get some really interesting results here, but not from the Pi but from the PC.  The Pi looked pretty much as expected with less CPU resources being used on algorithms that achieved higher throughput.  
CPU Used by Cipher on the Raspberry Pi

This is not exactly the case on the PC.  Here our best performing algorithm - chacha20-poly1305@openssh.com is using way more CPU than its brethren.

The data makes almost no sense.  The worst performing algorithms in terms of throughput seem to use the least CPU.  So what gives?  How is it possible for chacha to be using the most CPU yet have the highest throughput?

Acceleration, my dear Watson, acceleration.  The Intel CPU on my PC supports the AES-NI instruction set which makes it so that AES based ciphers, especially the GCM family.  According to Wikipedia, the acceleration provides an 'increase in throughput from approximately 28.0 cycles per byte to 3.5 cycles per byte'.

This acceleration is not available on the Raspberry Pi CPU, which is actually the bottleneck in this transfer.  Since the Intel is not hitting anywhere near a hundred percent, the fact that chacha is using 7.77% vs aes128-gcm at 1.55 makes no difference to our session performance.


The clear winner here when considering single session performance between a Pi and and a non-heavily utilized modern x86 is chacha20-poly1305@openssh.com.  If, however, the PC is doing a lot of other work, supporting many sessions and generally has its CPU peaking frequently, something like aes128-ctr may be a better choice which gives us a good balance.

PC and SCP - results (beta)

So now that we looked at what the goodness and limitations of the Raspberry Pi, I figured we should take a step further and check out what modern PCs can do.  The setup in this case is

Macbook Pro <-- 5 Ghz WiFi --> Router <-- Gig Switch --> Linux PC

You may question the wisdom of using WiFi.  I would.

Well, it turns out that I only have a 100Mbs USB2 adapter to use as a wired connection on the Mac.  And as you will shortly see, the WiFi actually gives us pretty high throughput.  Not quite a gig, but half way there.  Good enough for significant results.  I will repeat the test again when I go back to work after the holidays and publish a follow up.  Maybe even try to push a 10 Gig pipe.

So I did a quick wireless survey, found a clean 5Ghz channel, and put my Mac Book about a foot away from the router.  Here is what we saw.

PC -> Mac

The thing I find most surprising about this chart is that two of the algorithms I thought would perform best simply did not.  There was something in the way that the Debian tried to initialize the connection that OSX found distatasteful with the AES-XXX-GCM algorithms.  OSX also did not support te chacha algorithm.

The results are very interesting, but lets look a little deeper to find out more.

Lets examine the CPU utilization by the cipher on the client machine.  This will tell us how scalable the algorithm is in supporting multiple connections.  This chart shows relative CPU percentage utilization during the time of the copy scaled by the duration.  The empty space between the top of the column and 100% can roughly be interpreted as idle time.

Note: this should be interpreted as the utilization of a single core
In this test setup I only could measure the CPU consumed on the client and not the server.  When I repeat this test I will make sure my setup can measure both sides.

I tried to think of better ways to analyze and present the data, but in the end I felt I can't the story without a raw representation.  It tells the story better than any massaging could.  This shows the amount of clock time (real), user time (cipher) and system time each algorithm used.

Mac -> PC

For completeness sake - I tried to run the test from Mac -> PC.  There were only 3 supported ciphers - essentially the AES XXX CBC family.  They had exactly the same results, but slightly better performance.  Yawn.

Here are the graphs


Relative CPU

Time consumed by each function in seconds


When dealing with fully featured modern machines, I would probably choose AES-256-CBC.  It seems to have the best throughput/cpu utilization ratio and provides pretty decent security.

Sunday, December 28, 2014

Ceph Pi - Kernel Cross-Compile and also SSHFS

 Intro and Data

Well, I was saying I won't be writing this for a while, but I guess I lied.  This is going to be a tutorial specifically about using Cross-Compiling to build the kernel on Raspberry Pi and not a general purpose Cross-Compiling tutorial.

Again - we are basing this writing on this (official) guide.

Log onto your fast machine.  In my case I use an Debian box which is a tiny Intel NUC i5.

So if you have any doubts as to why compiling on the Pi will take forever, and if you have taken any offense when I called the Pi feeble, here is a graph showing the CPU utilization on the Pi for the 12 hours it took to build a kernel.

Yup.  Pegged.  For 12 hours.  So while the NUC is about $300 barebones or about 10x more expensive, it is 50x faster for this application.  I guess I am digressing but I do urge you to think well about your application.  The Raspberry Pi is amazing for some things (price / power), but heavy compute it will not do.  Here is a graph of the NUC doing the same work.

 Build the build environment

First thing first - lets make sure we have all the packages needed to actually compile and build packages.  Lets run
 apt-get update && apt-get upgrade  
 apt-get install build-essential libncurses5-dev 

Now go ahead and pull down the toolchain.
 git clone https://github.com/raspberrypi/tools  

I am logged in as root on this box, which does break best practices.  Keep this in mind and adjust accordingly in the following steps.  I want to keep things organized so I create a directory where to put all files that have to do with this project.
 mkdir raspberry-kernel  
 mv tools/raspberry-kernel/  

The next step is to adjust our path to make it easier to work with.  Go ahead and edit your .bashrc
 vi .bashrc  

Add the following line to the bottom of the file

Next thing we need to pull the kernel sources
 cd raspberry-kernel/  
 git clone --depth=1 https://github.com/raspberrypi/linux
 cd linux 

We need to update the kernel config with the settings for the Raspberry Pi.  The space in the middle of the command is intentional.
 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig  

I had troubles with the next step.

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

I kept getting an error message that looked like this:

 root@charlie:~/raspberry-kernel/linux# make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig  
 # configuration written to .config  
 root@charlie:~/raspberry-kernel/linux# make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  
 make: arm-linux-gnueabihf-gcc: Command not found  
 scripts/kconfig/conf --silentoldconfig Kconfig  
 make: arm-linux-gnueabihf-gcc: Command not found  
  CHK   include/config/kernel.release  
  CHK   include/generated/uapi/linux/version.h  
  CHK   include/generated/utsrelease.h  
  HOSTCC scripts/pnmtologo  
  CC   scripts/mod/empty.o  
 /bin/sh: 1: arm-linux-gnueabihf-gcc: not found  
 scripts/Makefile.build:308: recipe for target 'scripts/mod/empty.o' failed  
 make[2]: *** [scripts/mod/empty.o] Error 127  
 scripts/Makefile.build:455: recipe for target 'scripts/mod' failed  
 make[1]: *** [scripts/mod] Error 2  
 make[1]: *** Waiting for unfinished jobs....  
 make[1]: 'include/generated/mach-types.h' is up to date.  
 Makefile:518: recipe for target 'scripts' failed  
 make: *** [scripts] Error 2  
 make: *** Waiting for unfinished jobs....  
  CC   kernel/bounds.s  
 /bin/sh: 1: arm-linux-gnueabihf-gcc: not found  
 /root/raspberry-kernel/linux/./Kbuild:35: recipe for target 'kernel/bounds.s' failed  
 make[1]: *** [kernel/bounds.s] Error 127  
 Makefile:840: recipe for target 'prepare0' failed  
 make: *** [prepare0] Error 2  

The file did exist, but every time I tried to run it even by hand I got the same error of File Not Found.  I ultimately found the answer here.  I think the reason it did not work on my machine but seemed to work for the writer of the original tutorial is that I am running Debian 64bit and they are running Ubuntu (I would guess 32 bit).  In any case.  All you have to do to get past the problem is to run

 sudo update  
 sudo upgrade  
 sudo apt-get install lsb-core  

This got me further, but I still conked-out.  This time the error was

 root@charlie:~/raspberry-kernel/linux# make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory  
  CHK   include/config/kernel.release  
  CHK   include/generated/uapi/linux/version.h  
  CC   scripts/mod/empty.o  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory  
 scripts/Makefile.build:308: recipe for target 'scripts/mod/empty.o' failed  
 make[2]: *** [scripts/mod/empty.o] Error 127  
 make[2]: *** Waiting for unfinished jobs....  
  CC   scripts/mod/devicetable-offsets.s  
 arm-linux-gnueabihf-gcc: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory  
 scripts/Makefile.build:204: recipe for target 'scripts/mod/devicetable-offsets.s' failed  
 make[2]: *** [scripts/mod/devicetable-offsets.s] Error 127  
 scripts/Makefile.build:455: recipe for target 'scripts/mod' failed  
 make[1]: *** [scripts/mod] Error 2  
 Makefile:518: recipe for target 'scripts' failed  
 make: *** [scripts] Error 2  

The solution this time ( I found it on an Android forum that I closed before I got the link ) seemed to be

 sudo apt-get install lib32stdc++6 lib32z1 lib32z1-dev  

Build that funky kernel

Now we can actually build the kernel.  So lets run the command again.  The '-j 6' option tells the compiler to use more CPUs.  I have a 4 core that is hyper-threaded.  6 is a good number since it leaves some resources for OS and such.

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

Damn!  That was fast!

One thing that both the guide missed and we skipped is that you likely want to do some configuration besides taking the defaults.  In our case we wanted the driver for the RBD block device driver module built.  To do so we must run

 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- bcmrpi_defconfig menuconfg

So that was easy.  For completeness sake go to Device Drivers->Block Devices->Rados and make sure it checks with an <M>.  Then again run

 make -j 6 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-  

Install and SSHFS...  in reverse order

This next part is the thing I really don't like.  The way the guide would have you go is to remove the SD card from your Pi, mount it on your linux box, and then do the install on it.  And then move it back.  I am not impressed.  If that is your thing, go follow the guide.  I can't help but think there are way better ways to do this.  Here is a brief enumeration of the things I came up with, and they are probably stupid and some who has done more Cross-Compile work will laugh.

Edit - I thought of using TAR originally, but when I did the compile on the Pi (disclaimer: using the Pi config, not the cross compile) it seems the 'make modules' section requires a ton of compile time as well ( I don't know why...  maybe the default config enables every damn module? )  It took 6 hrs this time.  The guide suggests that there is path info to be used even while only doing make modules before the make modules_install step.
  1. NFS mount
  2. SSHFS mount (my current choice)
  3. TAR the directory and SCP it (my original choice)
  4. Remove the SD card and plug it into a PC...
We have to write files to system directories.  So we need root permissions.  We can't get away with sudo on this one, because there is no way to execute 'sudo' over sftp or sshfs.  This is something we will enable only temporarily, so we should be fine.

Make sure you are on the Pi.  First we need to give root a pssword.  Root does not have a password by default to disallow anyone using it.

 sudo passwd root  

Now we have to make sshd allow root logins, which in current versions it does not do by default.

  sudo vi /etc/sshd_config  

Find the line that reads

PermitRootLogin without-password  

Comment it out and add another line, which sets the permission to yes.  This allows us to actually log in with This is what the file will look like:

 #PermitRootLogin without-password 
 PermitRootLogin yes  

Now restart sshd

 sudo /etc/init.d/ssh restart  

Ohkay.  Now log on the Cross-compile host as root.  You may have to do similar gymnastics as to what we just did on the Pi.  Run the following commands.  When prompted for pass phrase, just press enter.  Of course substitute pi1 with the name or IP of your Raspberry.

 ssh-copy-id root@pi1  

Test everything by running

 ssh root@pi1  

You should be successfully logged into your Raspberry.  Use this opportunity to edit your sshd config

  vi /etc/sshd_config  

and switch the comments on the to PrermitRootLogin lines, so they look like this

 PermitRootLogin without-password 
 #PermitRootLogin yes  

So anyone with root on your cross compile box can log into your Pi.  But no one can SSH in directly.  We should also remove the password from root.

  passwd -d root  

We are pretty munch to best security practices on our pi.  Now its time to mount up!  Lets make a directory we can use as mount point and mount up.

 cd /root/raspberry-kernel
 mkdir remote-host
 sshfs -o idmap=user  root@pi1:/ /root/raspberry-kernel/remote-host -o Cipher=blowfish

(Edit: since this article was originally published, I also wrote this analysis of the best crypto cipher to use with the Raspberry Pi)
The idmap=user makes it so the user account we are logging in as maps to the remote host's equivalent by name instead of by numeric id.  The  Cipher=blowfish  ensures that we use a pretty fast cipher.  The Pi is often constrained by CPU when it comes to doing encryption communications (someone should look into using that GPU to accelerate them).  SCP is usually pretty slow with the default ciphers, so its a good idea to choose your encryrption wisely.  Here is a good article that includes the image I post here.  You can't have a non-encrypted tunnel apparently.  I tried a couple of other ciphers which should have worked but did not.  I will look more into this later... its an interesting follow up topic.

So while researching this I stumbled on an amazing article describing how to avoid encryption in sshfs entirely.  I have searched for ways to use the ease of sshfs without having to deal with the encryption overhead.  And here it was in its glory.  I have not tested this, but I will soon and we will get some sweet test results.

... and back to the guide

So now that we have a mount on the Pi from our cross compile host with root permissions its kind of like having the SD card plugged into the cross-compile host.   So we can take the next steps from the guide with a slight adjustment for the directory names.

 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/root/raspberry-kernel/remote-host modules  
 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/root/raspberry-kernel/remote-host modules_install  

Now backup the old kernel image and copy the new one

 cp /root/raspberry-kernel/remote-host/boot/kernel.img /root/raspberry-kernel/remote-host/boot/kernel.img.orig  
 cp arch/arm/boot/Image /root/raspberry-kernel/remote-host/boot/kernel.img  


 umount /root/raspberry-kernel/remote-host   

OK...  Well...  reboot? 

... and we are back.

We can check if everything is copacetic by running

 uname -a   

and making sure that the right kernel version shows up.

In our case we can also verify that the new driver we wanted is properly installed.

 modprobe rbd  

If there is no error, then we are doing a-OK!

That's all folks.