Experiment #2

Now that I got my feet wet with Neo4J, I wanted to test it out performance-wise. I know Neo4J has a bulk load utility, but I need to know the limits of doing things myself. Plus, its a learning exercise when you do things yourself. :-)

Here is the code:

/*Neo4J Performance Exercise
 * Chris Freyer
 * Chris@TheFreyers.net
 * July 18,2010
 */
package com.freyer.neo4j.test;

import java.util.Calendar;
import org.neo4j.graphdb.Direction;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.index.IndexService;
import org.neo4j.index.lucene.LuceneIndexService;
import org.neo4j.kernel.EmbeddedGraphDatabase;

public class Neo4JPerformanceExercise {

    //static constants
    private static final String DB_PATH = "neo4j-store";
    private static final String USERNAME_KEY = "username";
    private static final String ADDRESS_KEY = "address";
    private static final String CITY_KEY = "city";
    private static final String STATE_KEY = "state";
    private static final String ZIP_KEY = "zip";

    //class variables
    private static GraphDatabaseService graphDb;
    private static IndexService indexService;
    private static Node usersReferenceNode;
    private static long numUsers = 50000L;
    private static long numQueries = 100000L;
    private static Transaction tx;

    private static void addUsers() {
        // ADD USERS & INDEX THEM
        try {
            tx = graphDb.beginTx();
            for (long id = 0; id < numUsers; id++) {
                Node userNode = createAndIndexUser(idToUserName(id));
                usersReferenceNode.createRelationshipTo(userNode, RelTypes.USER);
                //use periodic commits to conserve RAM
                if (id % 5000 == 0) {
                    output("Created " + id + " users so far...");
                    tx.success();
                    tx.finish();
                    tx = graphDb.beginTx();
                }
            }
            tx.success();
        } catch (Exception e) {
            output("Exception creating users:\n" + e.getMessage());
            tx.failure();
            System.exit(2);
        } finally {
            tx.finish();
        }
        output("" + numUsers + " Users created");
    }

    private static void deleteUsers() {
        // DELETE USERS & REMOVE FROM INDEX
        output("Removing users...");
        try {
            tx = graphDb.beginTx();
            int i = 0;
            for (Relationship relationship : usersReferenceNode.getRelationships(RelTypes.USER, Direction.OUTGOING)) {
                Node user = relationship.getEndNode();
                indexService.removeIndex(user, USERNAME_KEY, user.getProperty(USERNAME_KEY));
                user.delete();
                relationship.delete();
                //use periodic commits to conserve RAM
                if (++i % 5000 == 0) {
                    output("Deleted " + i + "...");
                    tx.success();
                    tx.finish();
                    tx = graphDb.beginTx();
                }
            }
            tx.success();
        } catch (Exception e) {
            output("Exception deleting users:\n" + e.getMessage());
            tx.failure();
            System.exit(4);
        } finally {
            tx.finish();
        }
        output("done.");
    }

    private static void runQueries() {
        // Query many times...
        output("Querying users...");
        try {
            tx = graphDb.beginTx();
            for (int i = 0; i < numQueries; i++) {
                long idToFind = (long) (Math.random() * (numUsers - 1));
                Node foundUser = indexService.getSingleNode(USERNAME_KEY, idToUserName(idToFind));
                if (i % 10000 == 0) {
                    output("Queried " + i + " times...");
//                    tx.success();
  //                  tx.finish();
    //                tx = graphDb.beginTx();
                }
            }
            tx.success();
        } catch (Exception e) {
            output("Exception querying users:\n" + e.getMessage());
            tx.failure();
            System.exit(3);
        } finally {
            tx.finish();
        }
        output("done.");
    }

    private static void setup() {
        // Startup with a reference node...

        graphDb = new EmbeddedGraphDatabase(DB_PATH);
        indexService = new LuceneIndexService(graphDb);
        registerShutdownHook();

        try {
            tx = graphDb.beginTx();
            usersReferenceNode = graphDb.createNode();
            graphDb.getReferenceNode().createRelationshipTo(usersReferenceNode, RelTypes.USERS_REFERENCE);
        } catch (Exception e) {
            output("Exception setting up the program...:\n" + e.getMessage() + e.getLocalizedMessage());
            tx.failure();
            System.exit(1);
        } finally {
            tx.success();
            tx.finish();
        }
    }

    private static void teardown() {
        //Delete the main user reference node
        tx = graphDb.beginTx();
        try {
            usersReferenceNode.getSingleRelationship(RelTypes.USERS_REFERENCE, Direction.INCOMING).delete();
            usersReferenceNode.delete();
            tx.success();
        } catch (Exception e) {
            output("Exception deleting reference node:\n" + e.getMessage());
            tx.failure();
            System.exit(5);
        } finally {
            tx.finish();
        }
        output("Shutting down database ...");
        shutdown();
    }

    private static void output(String value) {
        System.out.println(value);
    }

    private static enum RelTypes implements RelationshipType {

        USERS_REFERENCE, USER,
    }

    public static void main(final String[] args) {
        long starttime = Calendar.getInstance().getTimeInMillis();
        setup();
        addUsers();
        runQueries();
        deleteUsers();
        teardown();
        long endtime = Calendar.getInstance().getTimeInMillis();
        output("Runtime:  " + (endtime - starttime) + "ms.");
    }

    private static void shutdown() {
        indexService.shutdown();
        graphDb.shutdown();
    }

    private static String idToUserName(final long id) {
        return "user_" + id;
    }

    private static Node createAndIndexUser(final String username) {
        Node node = graphDb.createNode();
        node.setProperty(USERNAME_KEY, username);
        node.setProperty(ADDRESS_KEY, "1234 South Central Avenue");
        node.setProperty(CITY_KEY, "Atlantaville");
        node.setProperty(STATE_KEY, "Texiana");
        node.setProperty(ZIP_KEY, "12345");

        indexService.index(node, USERNAME_KEY, username);
        return node;
    }

    private static void registerShutdownHook() {
        Runtime.getRuntime().addShutdownHook(new Thread() {
            @Override
            public void run() {
                shutdown();
            }
        });
    }
}

If you look at the main() method, you'll see I calculate my own runtime. This is because the IDE (Eclipse, Netbeans) normally includes the JVM startup in its runtime total.

Here is the output of running the program:

run:
Created 0 users so far...
Created 5000 users so far...
Created 10000 users so far...
Created 15000 users so far...
Created 20000 users so far...
Created 25000 users so far...
Created 30000 users so far...
Created 35000 users so far...
Created 40000 users so far...
Created 45000 users so far...
50000 Users created
Querying users...
Queried 0 times...
Queried 10000 times...
Queried 20000 times...
Queried 30000 times...
Queried 40000 times...
Queried 50000 times...
Queried 60000 times...
Queried 70000 times...
Queried 80000 times...
Queried 90000 times...
done.
Removing users...
Deleted 5000...
Deleted 10000...
Deleted 15000...
Deleted 20000...
Deleted 25000...
Deleted 30000...
Deleted 35000...
Deleted 40000...
Deleted 45000...
Deleted 50000...
done.
Shutting down database ...
Runtime:  218234ms.

So in 3 minutes 38 seconds, I'm able to:

start the database
add 50,000 nodes, index them, and tie them to the reference node
perform 100,000 random queries against an indexed property
remove the same 50,000 nodes and relationships
shutdown the database

Overall, I'd say this is pretty good performance, especially considering its running on my laptop (overworked, underpowered).

Points of Interest

The concept of a reference node can be important. If each kind of node (i.e. domain object) in a graph is associated with a unique reference node, a form of data segmentation is created. The performance benefits can be huge.
Relationships have a type and a direction. A type can be a structural relationship as mentioned above, or a domain-related one.
Neo4J does not require a database schema to be prepared in advance. Any object persisted in code is accepted. This has several implications:
- Prototyping applications is much faster than with relational databases.
- Responsibility for database consistency is moved from the DBA's to the programmers.
- Its a very real possibility that old and new versions of objects will exist in the database simultaneously. Your code must be able to handle it.
The IndexService is not a native part of the database. Developers must be certain to add and remove objects from the index during CRUD operations.