oDisk Development

peter@ohler.com

Author: Peter Ohler Published: July 30, 2012

The disappearance of Apple's iDisk left a hole in my backup strategy that needed to be filled. I no longer had a place to backup my files and pull them onto multiple computers. iCloud, the iDisk replacement supports only Apple compliant product files and not arbitrary files. It was the perfect opportunity or rather and excuse to try out OPEE and see how well the alternative threading approach worked. As long as I was rolling my own solution for remote storage it might as well be encrypted and compressed. The result is oDisk.

 

Evaluation

An iDisk replacement seemed like a good application as it involved several steps that were time consuming and would benefit from using multiple threads. Copying files involves waiting for IO. Encryption takes some time and will keep the CPU busy while performing the encryption or decryption. Finally accessing the disk for directory and file information along with reads and writes involves more IO waiting.

Each step of the development process will be covered based on the though processes I went though when building oDisk using Opee. It really wasn't that much different than a more traditional development approach other than the components were a little different. Starting from the beginning, requirements remain the same but design, development, testing, and deployment and tuning are all similar.

I found that like any agile development cycle work in one area often ripples back to an earlier step. Testing drives development. Development challenges can force changes in design. Design limitation can force alterations in requirements. In the end when the application is in use new requirements or changes in requirements are almost always expected.

Development Cycle

Requirements

At a high level oDisk should replace key iDisk functionality. The important functionality desired was the ability to backup files to a remote location and be able to sync multiple computers to that remote storage. As a bonus the remotely store files should be encrypted and compressed. This saves a bit of money on storage and provides a little comfort knowing private files are not exposed even if the remote storage security is breached.

The first iteration only needs to be triggered by executing a script. Later, once the system is solid a cron job can be set up to execute periodically to synchronize with the remote storage. The remote storage, at least initially, must be running an ssh and sftp daemon. This also allows a local sftp and ssh to be used on locally mounted file systems.

Synchronization must support file permissions, ownership, symlinks, and directories. If done in stages the first stage should allow for uploading, downloading, and synchronization of any new files. A second iteration can add support for removal of files either by automatic detection of deletion or by explicit deletion by an oDisk command. It is acceptable to use the file size and modification date for detecting changes. A single directory describes the set of files and directories to synchronize.

Requirements

Design

With a minimum of requirements it made sense to jump right in and figure out what steps are required to implement the necessary steps to complete the process of synchronization of a local folder to a remote storage. Using Opee as the framework for threading encourages have two kinds of Objects, immutable like data Objects and Actors that perform the processing. With Opee there is no need for Mutexs or synchronization locks but you have to play by the rules to take advantage of not needing locks.

The first step is figuring out what has to be done. To do that information about the local and remote directories had to be obtained. By addressing one directory at a time it was possible to break the sync problem into smaller independent steps. That should help distribute the work load and allow for parallel processing of directories. To capture that information some kind of directory information structure was needed.

A Digest data Object was defined along with data Objects for file, directory, and symlink information. Several different Job Objects are also defined to support passing data from one Actor to the next. Keeping in mind that these would be strictly for storing data only no methods were defined that modified the content of the data Objects. This is important to avoid setting up locks when more than one Actor is operating on the data Objects. In a sense the data Object are treated as immutable.

With a Digest Object defined there had to be some place to persist the digest so that remote Digests could be downloaded and local Digests could be compared to the current state of a directory. A hidden .odisk directory is used to store the serialized Digest Object and the Oj gem is used to serialize the Digest into a JSON String. Actors were then defined for creating a Digest and for fetching a Digest from remote storage.

The next step is planning for further actions. This requires at least three different Digests, the previous local Digest, the current local Digest, and the remote Digest. This presents a problem in that planning is blocked by two separate Actors. The individual actors should not be blocked waiting for the Planner Actor. Opee provides a Opee::Collector class for collecting input from multiple sources. Reading Digests and fetching Digests from remote storage did not seem like heavy or time consuming operations so only one Digester and one Fetcher were planned. Of course this could be changed later if the assumption proved to be incorrect.

The Planner compares the Digests and decides with files need to be encrypted, uploaded, downloaded, and decrypted. It then issues requests for those actions to the Copier and Crypter Actors. All these operations are expected to take some time either due to CPU usage in the case of encryption or IO usage in the case of uploading and downloading. Multiple instance of each Actor would most likely speed up the processing to take advantage of multiple CPU or multiple open connections. That requires some sort of work distribution component. Opee provides that functionality with the Opee::AskQueue. As an Actor finishes an operation it asks the queue to give it another task if there is one waiting. This keeps all Actors busy until there is no more work for them to perform.

I found that there is a balance between how many Actors are defined and how much a single Actor does. The balance is subjective. If an actor reads a directory it could gather the file information itself or pass the list of files on to another actor. Since the overhead of looking up the file information is relatively low and is consistent with gathering information both those steps were included in one Digester actor.

Consideration was given to how each actor could be tested. That is not to say that the design revolves around testing but it seems to work out that a good design allows for testing individual components simple because those components are well encapsulated. Using Opee as the design basis really encourages this encapsulation and testing.

 

These are the primary Actors in the oDisk system.

    Copier
    Uploads and downloads files.
    Crypter
    Encrypts and decrypts files using GnuPG.
    Digester
    Creates Digests.
    Fetcher
    Fetches remote Digests.
    Planner
    Plans synchronization actions.
    StatFixer
    Sets the owner, permission, and symlinks for files and directories.
    SyncStarter
    Initiates synchronization of a directory.

Designing with Actors

Development and Testing

The encapsulated Objects with minimal interfaces makes testing straight forward. Basically a test stub invokes methods on an Actor and verifies the output is as expected. Since Actors are not expected to modify any data that is shared unless it is owns the modify rights to a chunk of data there are no multiple Object interactions to test until integration time.

There is a wrinkle in the testing though. Since Actors operate asynchronously tests are not as simple as invoking a method and checking the output. I found that setting up preconditions, invoking a method, and then waiting for the Actor to be idle before validating results worked fairly well. The Opee::Env.wait_finish() class method made this simple enough. One of the problems I encountered was the Actor stalling and reporting busy so that it never finished. Opee was tweaked a bit to reduce these problems but testing in separate threads and collecting output when completed is sensitive to stalls and never ending loops more than single threaded programming.

Test were developed as Actors were implemented. Sometimes tests were written first and sometimes classes cam first. I got a little lazy after some of the base classes were ready and jumped into integration testing. Two local directories were set up along with a remote directory. A locally running sshd and sftpd were used to run the tests and verify the results without requiring a separate remote machine. That saved a lot of time.

Unit Testing

Application Testing

Use and Tuning

iDisk was shut off so the development phase quickly jumped to an early release with enough functionality to perform a backup and sync. Using a system always uncovers missing requirements and features. oDisk was not different. Even with a partially completed system some things became apparent immediately. There were several releases with just changes in status and error messages. More informative output made tracking down problem more efficient.

With actual use it became clear that the most time was spent on IO. While the network locally in Japan is extremely fast the bandwidth to the remote server was not as quick. This prompted the addition of options for setting the number of Copier and Crypter instance so oDisk could be tuned for the local machine and the bandwidth limitation. This was a trivial change due to the design and the nature of Opee.

In addition to adding the missing features in future release some new requirements or nice to have features were identified. One would be the inclusion of filter patterns on file and directory names. A filter feature only needs to be added to the Digester. It will not be difficult to add.

A more interesting feature would be to add a new Actor that provides a real time progress while synchronization is going on. While this is not trivial the progress reporting capability requires only a few additions to the current Actors. Most of the functionality will be in a new Progress Actor.

Tuning

Summary

oDisk development served both intended purposes. It provides an iDisk alternative and also provided some experience with an alternative threading model provided by Opee.

As an iDisk replacement oDisk is a base for future expansion and optimization. It is nice to be able to have my data backup and even better to have it compressed and encrypted.

The alternate Opee threading model was an interesting change to multi-threaded development. I am biased but it feels cleaner and safer. It was nice to have the design validated by finding it was easy to incorporate new features. A good part of that is the threading model itself. The built in tracing for Opee Actors was tuned a little during the oDisk development and proved to be very useful.

oDisk development will continue. Stay tuned for new features and improvements.