arangodb.com/2014/11/18/intro-pregel-module

Ever since Google introduced Pregel as a system for large-scale graph processing we thought of a way how to enable this feature in ArangoDB. So we set up an ArangoDB cluster, created some huge graphs and started evaluating the concept. We came up with a new ArangoDB module (called pregelRunner) that enables the user to write own algorithms, pass them to ArangoDB and have them executed in a Pregel-like fashion. This means that the user’s algorithm is executed step wise on each server in the cluster in parallel for all its local vertices. In each step the vertices can send messages to each other to distribute information. These messages can be received by the other vertex in the next step. The algorithm terminates when there are no more active vertices left and no message has been sent. We started to implement an experimental version of Pregel in ArangoDB. You need to check-out the pregel branch of ArangoDB in order to play with the following examples. Please be advised that the implementation is still in an early phase and very like to change. In this post we will provide a brief introduction to ArangoDB’s Pregel module by guiding you through the implementation of an example algorithm. An algorithm to detect “connected components” Our example algorithm is supposed to solve the connected components problem, which means it identifies all connected sub-graphs within a graph. So first of all we need a graph to work with … The graph We will generate a simple graph (called CountryGraph) containing 10 countries and a relation named hasBorderWith which forms 4 connected components: Germany, Austria, Switzerland Morocco, Algeria, Tunisia Brazil, Argentina, Uruguay Australia To import this graph save this file, open an ArangoShell and execute the following line: [crayon-546b95437d26f453705568/] The worker algorithm First of all the worker algorithm is executed on each vertex in the graph, a cursor on the messages for this vertex and a global object containing step information and user-defined global data. So our algorithm signature has to look like this: [crayon-546b95437d282271689795/] So how can an algorithm be designed that is supposed to detect all components? We will implement this in a very straight-forward fashion: Each component’s identifier is the alphabetically sorted last _key attribute of its vertices. So in step 0 each vertex stores its own key in the result and reaches out to its connected edges. Note that you can notaaa


Comments (0)

Sign in to post comments.