Hello Storage!

A single machine can host multitudes of actors as long as they don't all have to be in memory at the same time. That's why we typically use relational databases, but that's not the kind of storage that feels most natural for actors.

The interesting thing about actors is their autonomy, and it makes sense to make the mode of persistence honor this autonomy. If you have a great number of actors but they all store their data in the same database, you tend to defeat the purpose of them being separate by centralizing after all. If instead they can persist themselves individually, new opportunities for mobility and scalability appear.

To get the ball rolling, the storage actor must be activated and introduced. Unlike the startTimer() command, startStorage() requires that you tell it the root directory where it should save things. On top of that, it also requires a closure which is executed on each actor the moment that it has been recovered from file and reconstituted, called the wakeup closure, in which any kinds of live initialization can be done, including of course setting up a timer callback, for example.


dx.startStorage(dir) {
    println "wake up $it!"
}

The persistence strategy is therefore to have actors persist themselves, but for that process to be sufficiently well managed, the platform must be able to determine when the persistence happens. You might imagine persisting every actor in the context of an orderly shutdown of the dependency exchange.

It makes sense to persist an actor from time to time anyway just in case. For that we have the Storage actor, and its accompanying outside interface:


public interface StorageOutside {
    void save();
}

Like the timer interface this is one which uses ID Decoration so that it knows who is doing the calling. The call to save() amounts to a request for a save, of course, to be executed as soon as possible.

The more interesting part is how exactly the actor does the persisting, because that involves a choice. The actor is expected to be able to serialize its state and to recover its state from a serialized version. This is sometimes called marshalling/unmarshalling, but in Groovy Actors it's just called load() and save(), and these methods get called with an InputStream and an OutputStream respectively.


void load(is) {
    def storageactor = new XmlSlurper().parse(is)
    this.name = storageactor.@name
}
void save(os) {
    def xml = new MarkupBuilder(new OutputStreamWriter(os));
    xml.storageactor(name:name)
}

Input and output streams are nothing more than byte streams, so in order to give the bytes some meaning the actor has to use some serialization strategy. Java has its own serialization system, managed by the Java Virtual Machine, but one of the disadvantages of this format is that it's not human readable. Groovy, on the other hand, has some really interesting magic which makes it easy and tidy enough to do this by hand instead: Builders and GPath. Why use an unreadable binary format when you can use XML or JSON or something!

The persistence mechanism lets actors persist themselves, and handles the interaction with the file system that lays the foundation. It stores files in a directory structure according to the time when they are saved, and stores a single index file referring to the latest version of each file. This process may need some refinement, but it seems to work quite well so far.

Keep in mind that if an actor can serialize and deserialize itself then persistence is not the only option. The load and save methods can be re-used just as easily in the context of transporting an actor from one host machine to another! Mobility is basically something you get for free, and it can be used as the basis of a very robust load-balancing system.