How to: Get up to speed and scale with Aerospike Graph on Google Cloud Marketplace

#gremlin #gcp #graph #tinkerpop

In June of this year, Aerospike put graph developers front and center with the announcement of a new, developer-ready, real-time scalable graph database. Aerospike Graph is a massively scalable, high-performance graph database that simplifies building and deploying applications for highly connected enterprise-scale datasets.

Aerospike Graph combines the unlimited scale, high throughput, and low latency of the Aerospike Database with the Aerospike Graph Service, an independently scalable graph compute engine built on Apache Tinkerpop. Let’s take a closer look.

Figure 1: Aerospike Graph on Google Cloud enables developers to use the popular Gremlin query language while keeping graph compute and storage separately scalable to optimize costs.

Scalable storage

Scalable storage is facilitated by the Aerospike Database which functions as the storage layer. Aerospike is a shared-nothing, multi-threaded, multimodel database designed to operate efficiently on a cluster of server nodes. Utilizing modern hardware and network architectures, Aerospike delivers exceptional high performance, through petabytes of data, using its patented Hybrid Memory Architecture ^TM (HMA); storing data on Flash devices (such as SSDs) and indexes in DRAM.

Scalable compute

Aerospike Graph Service (AGS) is the compute layer of Aerospike Graph. AGS is a deep integration of Apache TinkerPop and the Aerospike Database, offering a graph-native developer interface through the Gremlin query language.
AGS is purpose-built to maximize query performance through dozens of custom Gremlin steps, compiler optimization strategies, and a highly optimized graph encoding for the Aerospike Database. AGS nodes are stateless compute instances and can be elastically scaled to meet your application's throughput needs.

Unique, efficient architecture

Aerospike Graph’s unique architecture features independent scaling of compute and storage resources. This enables optimal resource utilization, consistently low-latency queries, and extreme throughput for complex multi-hop graph queries at scale. This translates to an industry-leading total cost of ownership, ensuring cost-effective operations without compromising performance.

This focus on performance, extreme scale, and high-efficiency operation provides a new proposition for companies looking to move demand operational workloads to a graph database. Making that available on an elastically scalable cloud environment like Google Cloud provides a future-proof path for IT and development organizations to follow. Let’s see what it takes to get started.

Getting Started with Aerospike Graph on Google Cloud

To Get Started with Aerospike Graph on Google Cloud, follow these steps:

Locate Aerospike Database (BYOL) in the Cloud Marketplace. (Note: the other Aerospike Database listed is for Private Offers.)
- In the Cloud Console navigation menu, click Marketplace.
- Type Aerospike in the Search Marketplace bar and click on the highlighted option from the search results.
Click on Launch.
Follow the on-screen guide to configure and deploy your Aerospike Cluster.
Once the VMs are provisioned, note the Internal IP of the Aerospike Database cluster nodes. You can do this from the Google Cloud UI by navigating to the VM instances section of Compute Engine in the navigation menu. (You will need this information later in Step 9).
Create a Compute Optimized Compute Engine instance to host the Aerospike Graph Service.
Now, Locate Aerospike Graph Service in the Cloud Marketplace through the Search bar.
Click on Show Pull Command. This will provide the commands to pull the docker image from the Google Container Registry.
SSH into the GCE instance for Aerospike Graph Service and run the command in Step 7. This will pull the docker image to your instance.
Click on the Get Started with Aerospike Graph Service button. This will provide you with instructions on how to connect AGS to the Aerospike Cluster.
Follow along the technical documentation in Step 9 for details on data loading, running gremlin queries and building your graph applications.

Working with graph data

There are several options for connecting to and interacting with Aerospike Graph. Some possibilities are:

The Gremlin Console, an interactive command-line terminal for sending queries and receiving responses.
A client application. This page provides code samples for Python and Java client code.
A Jupyter Notebook.

You can use a client application or the Gremlin Console to query an existing data set, or add new edges and vertices to a data set. You can view examples in the product documentation that demonstrate adding new vertices one at a time to a database. To bulk load new data into a Graph database, use the Graph bulk loader.

Kelvin Lawrence has compiled and made public a data set containing information about airlines, airports around the world, and routes between them, designed for use with a graph database. The data set is large enough to be interesting and useful, but small enough to be practical for testing and experimentation purposes. To run the following examples, download one of the .graphml data files.

To use the .graphml file, you must bind it to a Docker volume. When you start the AGS Docker image, use the -v option to bind the local .graphml file directory to a directory in the Docker container. For example, if you download the data file air-routes-small.graphml to the directory /home/users/data, start the AGS Docker image with the -v option:

docker run -p 8182:8182 -v /home/user/data/:/opt/air-routes/ aerospike/aerospike-graph-service

You can use the Gremlin console to load the air-routes data set into Aerospike Graph with the following command:

g.with("evaluationTimeout", 24L * 60L * 60L * 1000L).io("/opt/air-routes/air-routes-small.graphml").with(IO.reader, IO.graphml).read()

View the Aerospike Graph product documentation for more information.

DEV Community