Introduction

Recently I was looking at backup options in Linux.

Of course with all the cloud craze nowadays, it is quite easy to just let someone else worry about the setup and sync your files with an external service for free or very cheap. Needless to mention the convenience of accessing said files from anywhere and sharing them with people.

There are cases in which I prefer a local setup, though.

  • Security is probably handled better in online services, but it still feels different to have your files 'out there'.
  • Large files take long to sync, so it is not always convenient.
  • Some files are simply not needed outside the 'home context'.

Tools like grsync are great, but did not quite satisfy my needs. One of the main shortcomings is that each 'session' is a single pair of source and destination paths.

Perhaps in a perfect setup all files that you want to backup are in a single location. This is not the case in reality. There are things you want to backup all over the place - mail and dotfiles in your home directory, media in another (perhaps on a separate partition or even different disk), documents in a third, etc.

OK, maybe I'm just sloppy with my files.

I started playing with a Python script that would call rsync and backup files on a USB hard drive. It is not much more complex than a shell script, but allows some more flexibility.

At around the same time came out the quad core, 1 gig RAM Raspberry Pi Model B, and I thought it would make a wonderful home server. Its most important function would be backup, but there were other possibilities.

The idea started to form. The Pi would be a little more than a storage room:

  • Files are backed up on a hard drive through rsync.
  • Backed up media can be played from another device with dlna. Then the Pi can be connected to an audio amp and act as the audio renderer.
  • Central git repository for all code. This might be backed up separately. At first just a collection of bare repos.
  • A web server can run small local services like wikis, but also owncloud.
  • The web server can act as a staging area to test web projects.

In this tutorial I am describing the setup that I made, with some possible tweaks. However you might want to extend it, for instance to sync specific directories to the cloud, etc.

Preparation

The server in this setup was a Raspberry Pi running Raspbian. Go ahead and set up the Pi if you haven't.

However, this guide should work with any computer running *nix with minimal adjustments (e.g. paths).

If you are using Raspberry Pi and intend to backup on a hard drive, have in mind that you need a powered USB hub, as the Pi's ports do not provide enough power.

Implementation

I looked around and found scripts such as this one, but it was still not exactly what I wanted. I decided to reuse my python script and get it to rsync remotely to the rPi.

The rPi needs to be configured to run the various daemons, provide backup directories, etc.

Using rsync in daemon mode on the Pi seemed like the way to go, but turned out slightly trickier than I thought (See Sever config for more on this).

So here we go:

This is the server (i.e. the main) part of the Raspberry Pi home server setup.

Here you configure your Pi to be easily accessible on the network, create users and directories, set up permissions, configure daemons. Once all this is done you can start the daemons and test them.

First things first.

Static IP

If you use DHCP, it is useful to assign a static IP to your server, so you don't have to chase its IP and update hosts files constantly. This is not absolutely necessary, though.

To do this edit /etc/network/interfaces and configure your interface, in my case wlan0:

allow-hotplug wlan0
iface wlan0 inet static
address 192.168.1.5
netmask 255.255.255.0
gateway 192.168.1.1
wpa-ssid "XXX"
wpa-psk "XXX"

I thought the wifi would be able to obtain the keys from the supplicant. Even though wpa_supplicant was configured, I could not get it to work, so entering the ssid and psk solved it.

Then reload networking:

sudo /etc/init.d/networking reload

We have given the server the IP 192.168.1.5. Test if the network connection works. Run ifconfig and verify that your interface is assigned the IP by looking at the inet addr field.

Users

Go ahead and create the users that are going to perform backups on the server:

sudo adduser birdman

You might want to add these users to a home_group group of some sort. It is up to you.

sudo addgroup home_group
sudo usermod birdman -a -G home_group

I also have a special user for running the daemons, let's call it imaginatively server_user.

sudo adduser server_user
sudo usermod server_user -a -G home_group

Directories

Backup

You need to come up with a directory layout for your backup.

In my setup, there are personal backups per user, and for instance media files and code repos are in a shared space.

Each user can have their own directory on the hard drive.

Let us assume that the drive is mounted on /mnt/backup_drive.

  • /mnt/backup_drive
    • birdman
      • documents
      • projects
    • music
    • code
    • pictures

Once you are satisfied with the setup, you could add this mount to your fstab. The entry would be something like

UUID=XXXX-XXXX  /mnt/backup_drive  auto    rw,noauto,user,uid=server_user

The mount point should be owned by server_user. (also check man fstab)

Create birdman's directory and make him the owner:

mkdir /mnt/backup_drive/birdman
chown birdman /mnt/backup_drive/birdman

Other directories are shared and here owned by server_user:

mkdir /mnt/backup_drive/music /mnt/backup_drive/code

I changed their group to the home_group group and made them writable by the group. This will allow your users to write to the shared directories.

chgrp -R home_group /mnt/backup_drive/music
chmod 770 /mnt/backup_drive/music

Daemons

I decided not to run the daemons as root, but as server_user. Since this user does not have access to /var/run and /var/logs, we are going to use an alternative structure for the necessary files in the home directory:

  • /home/server_user/home_server/
    • config/
      • rsyncd.conf
      • minidlna.conf
    • logs/
    • pids/
    • locks/

config contains configuration files for the daemons.

We are going to start the daemons with their PIDs written to files in the pid directory. This would allow us to kill the daemons without having to search for the pid.

We are also going to configure them to write their logs in the logs directory, and their locks, well, you know where.

Get the script

Check out this repo into server_user's home directory

git clone https://github.com/monomon/home_server.git ~/home_server
cd ~/home_server

Create the necessary directories:

mkdir logs pids locks

Follow the instructions below for configuring each daemon.

Then you can invoke the included init.d-style script to start and stop services:

./home_services.sh rsync start

These commands are responsible for calling the daemons with the config and pid locations set in this directory.

Setting up the rsync daemon

One of the advantages of using the rsync daemon is that it is configured through a file that gives you logical locations to sync to. This means that you could change the real layout of your backup and only have to update this config file instead of all the client configs that would otherwise contain absolute paths.

To find out more about the rsync daemon configuration, run man rsyncd.conf. It contains global parameters and the so-called 'modules'. A module can override any global option. Each module is a directory tree that is given a logical name. For instance the client does not need to know about the path /mnt/backup_drive.

Let us have a look at ~/home_server/config/rsyncd.conf:

pid file = /home/server_user/home_server/pids/rsync.pid
log file = /home/server_user/home_server/logs/rsync.log
lock file = /home/server_user/home_server/locks/rsync.lock
use chroot = no
read only = no
hosts allow = 192.168.1.0/24
port = 54777

[bird]
    uid = birdman
    gid = birdman
    path = /mnt/backup_drive/birdman
    comment = "Birdman's space"

[music]
    uid = nobody
    gid = nobody
    path = /mnt/backup_drive/music
    comment = "Rock on"

As described above, resources such as pids and logs are configured to be created in the corresponding directories and not under root-owned ones.

We want to be able to write to the directories, and will not use chroot.

Only hosts from the local network are allowed on a custom port.

Then come the modules. The name in square brackes is the path rsync clients are going to use. For example (note the double colon):

rsync -avz ~/Documents hostname::bird/documents
rsync -avz ~/Music hostname::music

to write to a subfolder documents of /mnt/backup_drive/birdman and directly in /mnt/backup_drive/music.

Setting uid and gid for a directory only works when the daemon is run by root. These are here for illustration purposes.

git

Not much to say here, I normally create some bare repos in code/ and push to them.

cd /mnt/backup_drive/code
git init --bare imaginary_project

Then a client could do

git push 192.168.1.5:/mnt/backup_drive/code/imaginary_project master

(Here we use absolute paths)

Setting up apache

There are plenty of resources on how to do this, so I will leave it out of this tutorial.

Setting up dlna

minidlna worked right away for me. Go ahead and install it:

sudo apt-get install minidlna

It is configured with a file with an ini sort of format:

media_dir=A,/mnt/backup_drive/music

Where A stands for audio. Run man minidlna.conf for more information and look at the systemwide configuration in /etc/minidlna.conf and the provided home_server/config/minidlna.conf.example. It is best to copy this file and use it as base.

Create a minidlna.conf file in home_server/config and set up your media directories.

Then run

~/home_server/home_services.sh dlna start

And this should do it.

other possibilities

I am running a moinmoin wiki for home-wide info.

owncloud is certainly a good fit to this setup. I had good experience with it previously, so it's on my todo list.

Of course you can open up your entire setup to access it from outside your home network, but this would be the topic of another tutorial.

You may have different devices that you want to sync. The instructions below work on *nix computers, but for instance dlna services can be accessed from any enabled client, such as an android device. It also worked in the past with cygwin on Windows, but hasn't been tested lately.

Backup

Clients can use the backup script and JSON configuration file from this repo:

git clone https://github.com/monomon/rsync_backup.git

Let us look at the example configuration:

{
    "command" : "rsync",
    "options" : [
        "--rsh=/usr/bin/ssh -i /home/birdman/.ssh/mybox-homeserver-rsync",
        "--log-file=/home/birdman/rsync.log",
        "--port=54777",
        "--recursive",
        "--verbose",
        "--progress",
        "--stats",
        "--perms",
        "--times",
        "--compress",
        "--cvs-exclude",
        "--exclude=\".*.swp\"",
        "--links"
    ], 
    "remote_host" : "shisharka_server",
    "direction" : ">",
    "mail" : false,
    "mail_profiles_dir" : "/home/birdman/.thunderbird",
    "mail_dest" : "bird/userdata/",
    "mode" : "daemon",
    "locations" :[
        {
            "src" : "/mnt/data/Projects",
            "dest" : "bird/projects"
        }, {
            "src" : "~/.bashrc",
            "dest" : "bird/dotfiles/"
        }, {
            "src" : "~/.tmux.conf",
            "dest" : "bird/dotfiles/"
        }, {
            "src" : "/mnt/data/music,
            "dest" : "music"
        }
    ]
}
  • command at the moment is only rsync.
  • options are additional command-line options to pass to rsync
    • note the --rsh option. This specifies ssh as the remote shell, passing in the key to be used (see ssh for more details)
    • port the rsync daemon runs on a custom port on the server
  • remote_host: IP or hostname to sync to
  • direction: if this equals '<' then source and destination are swapped. This is simply for recycling the same configuration file and reversing direction, but may become confusing if you do not take care.
  • mail: if true, the script searches the mail_profiles_dir and tries to extract the active thunderbird profile. If its directory exists, then it is added to the other backup locations with mail_dest. Profiles for other programs should be added.
  • mode: if daemon, then a double colon (::) is inserted between the hostname and the target path. This is interpreted as an rsync daemon module. Otherwise a single colon is inserted between the hostname and path, which is interpreted as an absolute path.
  • locations: a list of pairs of source and destination paths. Trailing slashes can change the meaning of paths, so you need to verify they are really what you intend.

Copy the example config

cp conf_backup_example.json conf_backup.json

Set up all options as you wish, and then just run:

python remote_backup.py conf_backup.json

And there it goes! If anything goes wrong you could check the log file (specified in the config) for clues.

If you would like, you could add this command as a cron job to run at scheduled times. I skipped this option, because I prefer to have control over the moments this load comes on both computers.

ssh

Preferably all the backup and git traffic would go through ssh using keys. git can do this out of the box, but as I found out the rsync daemon can only work through the rsync protocol, and does not yet support ssh. It is possible, however, to create an ssh connection that would start its own daemon. This is explained in Configuring the server: rsync.

First create a key for everything else (git, ssh)

ssh-keygen -t rsa -b 2048 -f ~/.ssh/mybox-homeserver

Copy the public key to the server

ssh-copy-id -i ~/.ssh/mybox-homeserver This email address is being protected from spambots. You need JavaScript enabled to view it..1.5

You will be prompted for server_user's password.

Create a separate key for rsync:

ssh-keygen -t rsa -b 2048 -f ~/.ssh/mybox-homeserver-rsync

Copy this one too:

ssh-copy-id -i ~/.ssh/mybox-homeserver-rsync This email address is being protected from spambots. You need JavaScript enabled to view it..1.5

If you are using a daemon on the server, you need to ssh to it and edit the public key that was just copied:

ssh -i ~/.ssh/mybox-homeserver This email address is being protected from spambots. You need JavaScript enabled to view it..1.5
vim /home/server_user/.ssh/authorized_keys

Find the corresponding key and prepend it with:

command="/home/server_user/home_server/home_services.sh remote_rsync start" ssh-rsa AAAAB....

It is important that the command comes before the key itself. This will cause each connection with this key to launch the rsync daemon.

To make it even easier, edit ~/.ssh/config and add an entry for the server:

Host raspi_server
    Hostname 192.168.1.5
    IdentityFile ~/.ssh/mybox-homeserver
    User birdman

This way you only have to ssh raspi_server and you're on. This hostname can also be used as a git remote, which would automatically use the specified identity file and username.

git

If you set up repos in /mnt/backup_drive/code on the server, you could push to them:

git clone 192.168.1.5:/mnt/backup_drive/code/imaginary_repo

git push 192.168.1.5:/mnt/backup_drive/code/imaginary_repo

if you set up your ssh keys and ~/.ssh/config as described above you can use the Host as a remote:

git remote add raspi git+ssh://raspi_server:/mnt/backup_drive/code/imaginary_repo

Then push to it like this:

git push raspi master

This will use the configured identity file and user for the connection.

dlna

Clients should be able to see your server if the minidlna daemon has been started on it and you have configured your media directories.