DataStax Graph Fluent API ========================= The fluent API adds graph features to the core driver: * A TinkerPop GraphTraversalSource builder to execute traversals on a DSE cluster * The ability to execution traversal queries explicitly using execute_graph * GraphSON serializers for all DSE Graph types. * DSE Search predicates The Graph fluent API depends on Apache TinkerPop and is not installed by default. Make sure you have the Graph requirements are properly :ref:`installed `. You might be interested in reading the :doc:`DataStax Graph Getting Started documentation ` to understand the basics of creating a graph and its schema. Graph Traversal Queries ~~~~~~~~~~~~~~~~~~~~~~~ The driver provides :meth:`.Session.execute_graph`, which allows users to execute traversal query strings. Here is a simple example:: session.execute_graph("g.addV('genre').property('genreId', 1).property('name', 'Action').next();") Since graph queries can be very complex, working with strings is not very convenient and is hard to maintain. This fluent API allows you to build Gremlin traversals and write your graph queries directly in Python. These native traversal queries can be executed explicitly, with a `Session` object, or implicitly:: g = DseGraph.traversal_source(session=dse_session) g.addV('genre').property('genreId', 1).property('name', 'Action').next() # implicit execution caused by iterating over results for v in g.V().has('genre', 'name', 'Drama').in_('belongsTo').valueMap(): print(v) These :ref:`Python types ` are also supported transparently:: g.addV('person').property('name', 'Mike').property('birthday', datetime(1984, 3, 11)). \ property('house_yard', Polygon(((30, 10), (40, 40), (20, 40), (10, 20), (30, 10))) More readings about Gremlin: * `DataStax Drivers Fluent API `_ * `gremlin-python documentation `_ Configuring a Traversal Execution Profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The fluent api takes advantage of *configuration profiles* to allow different execution configurations for the various query handlers. Graph traversal execution requires a custom execution profile to enable Gremlin-bytecode as query language. Here is how to accomplish this configuration: .. code-block:: python from cassandra.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT from cassandra.datastax.graph.fluent import DseGraph ep = DseGraph.create_execution_profile('graph_name') cluster = Cluster(execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: ep}) session = cluster.connect() g = DseGraph.traversal_source(session) # Build the GraphTraversalSource print g.V().toList() # Traverse the Graph Note that the execution profile created with :meth:`DseGraph.create_execution_profile <.datastax.graph.fluent.DseGraph.create_execution_profile>` cannot be used for any groovy string queries. If you want to change execution property defaults, please see the :doc:`Execution Profile documentation ` for a more generalized discussion of the API. Graph traversal queries use the same execution profile defined for DSE graph. If you need to change the default properties, please refer to the :doc:`DSE Graph query documentation page ` Explicit Graph Traversal Execution with a DSE Session ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Traversal queries can be executed explicitly using `session.execute_graph` or `session.execute_graph_async`. These functions return results as DSE graph types. If you are familiar with DSE queries or need async execution, you might prefer that way. Below is an example of explicit execution. For this example, assume the schema has been generated as above: .. code-block:: python from cassandra.datastax.graph.fluent import DseGraph from pprint import pprint # create a tinkerpop graphson2 ExecutionProfile ep = DseGraph.create_execution_profile('graph_name') cluster = Cluster(execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: ep}) session = cluster.connect() g = DseGraph.traversal_source(session=session) addV_query = DseGraph.query_from_traversal( g.addV('genre').property('genreId', 1).property('name', 'Action') ) v_query = DseGraph.query_from_traversal(g.V()) for result in session.execute_graph(addV_query): pprint(result.value) for result in session.execute_graph(v_query): pprint(result.value) Implicit Graph Traversal Execution with TinkerPop ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the :class:`DseGraph <.datastax.graph.fluent.DseGraph>` class, you can build a GraphTraversalSource that will execute queries on a DSE session without explicitly passing anything to that session. We call this *implicit execution* because the `Session` is not explicitly involved. Everything is managed internally by TinkerPop while traversing the graph and the results are TinkerPop types as well. Synchronous Example ------------------- .. code-block:: python # Build the GraphTraversalSource g = DseGraph.traversal_source(session) # implicitly execute the query by traversing the TraversalSource g.addV('genre').property('genreId', 1).property('name', 'Action').next() # blocks until the query is completed and return the results results = g.V().toList() pprint(results) Asynchronous Exemple -------------------- You can execute a graph traversal query asynchronously by using `.promise()`. It returns a python `Future `_. .. code-block:: python # Build the GraphTraversalSource g = DseGraph.traversal_source(session) # implicitly execute the query by traversing the TraversalSource g.addV('genre').property('genreId', 1).property('name', 'Action').next() # not async # get a future and wait future = g.V().promise() results = list(future.result()) pprint(results) # or set a callback def cb(f): results = list(f.result()) pprint(results) future = g.V().promise() future.add_done_callback(cb) # do other stuff... Specify the Execution Profile explicitly ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you don't want to change the default graph execution profile (`EXEC_PROFILE_GRAPH_DEFAULT`), you can register a new one as usual and use it explicitly. Here is an example: .. code-block:: python from cassandra.cluster import Cluster from cassandra.datastax.graph.fluent import DseGraph cluster = Cluster() ep = DseGraph.create_execution_profile('graph_name') cluster.add_execution_profile('graph_traversal', ep) session = cluster.connect() g = DseGraph.traversal_source() query = DseGraph.query_from_traversal(g.V()) session.execute_graph(query, execution_profile='graph_traversal') You can also create multiple GraphTraversalSources and use them with the same execution profile (for different graphs): .. code-block:: python g_movies = DseGraph.traversal_source(session, graph_name='movies', ep) g_series = DseGraph.traversal_source(session, graph_name='series', ep) print g_movies.V().toList() # Traverse the movies Graph print g_series.V().toList() # Traverse the series Graph Batch Queries ~~~~~~~~~~~~~ DSE Graph supports batch queries using a :class:`TraversalBatch <.datastax.graph.fluent.query.TraversalBatch>` object instantiated with :meth:`DseGraph.batch <.datastax.graph.fluent.DseGraph.batch>`. A :class:`TraversalBatch <.datastax.graph.fluent.query.TraversalBatch>` allows you to execute multiple graph traversals in a single atomic transaction. A traversal batch is executed with :meth:`.Session.execute_graph` or using :meth:`TraversalBatch.execute <.datastax.graph.fluent.query.TraversalBatch.execute>` if bounded to a DSE session. Either way you choose to execute the traversal batch, you need to configure the execution profile accordingly. Here is a example:: from cassandra.cluster import Cluster from cassandra.datastax.graph.fluent import DseGraph ep = DseGraph.create_execution_profile('graph_name') cluster = Cluster(execution_profiles={'graphson2': ep}) session = cluster.connect() g = DseGraph.traversal_source() To execute the batch using :meth:`.Session.execute_graph`, you need to convert the batch to a GraphStatement:: batch = DseGraph.batch() batch.add( g.addV('genre').property('genreId', 1).property('name', 'Action')) batch.add( g.addV('genre').property('genreId', 2).property('name', 'Drama')) # Don't use `.next()` with a batch graph_statement = batch.as_graph_statement() graph_statement.is_idempotent = True # configure any Statement parameters if needed... session.execute_graph(graph_statement, execution_profile='graphson2') To execute the batch using :meth:`TraversalBatch.execute <.datastax.graph.fluent.query.TraversalBatch.execute>`, you need to bound the batch to a DSE session:: batch = DseGraph.batch(session, 'graphson2') # bound the session and execution profile batch.add( g.addV('genre').property('genreId', 1).property('name', 'Action')) batch.add( g.addV('genre').property('genreId', 2).property('name', 'Drama')) # Don't use `.next()` with a batch batch.execute() DSL (Domain Specific Languages) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DSL are very useful to write better domain-specific APIs and avoiding code duplication. Let's say we have a graph of `People` and we produce a lot of statistics based on age. All graph traversal queries of our application would look like:: g.V().hasLabel("people").has("age", P.gt(21))... which is not really verbose and quite annoying to repeat in a code base. Let's create a DSL:: from gremlin_python.process.graph_traversal import GraphTraversal, GraphTraversalSource class MyAppTraversal(GraphTraversal): def younger_than(self, age): return self.has("age", P.lt(age)) def older_than(self, age): return self.has("age", P.gt(age)) class MyAppTraversalSource(GraphTraversalSource): def __init__(self, *args, **kwargs): super(MyAppTraversalSource, self).__init__(*args, **kwargs) self.graph_traversal = MyAppTraversal def people(self): return self.get_graph_traversal().V().hasLabel("people") Now, we can use our DSL that is a lot cleaner:: from cassandra.datastax.graph.fluent import DseGraph # ... g = DseGraph.traversal_source(session=session, traversal_class=MyAppTraversalsource) g.people().younger_than(21)... g.people().older_than(30)... To see a more complete example of DSL, see the `Python killrvideo DSL app `_ Search ~~~~~~ DSE Graph can use search indexes that take advantage of DSE Search functionality for efficient traversal queries. Here are the list of additional search predicates: Text tokenization: * :meth:`token <.datastax.graph.fluent.predicates.Search.token>` * :meth:`token_prefix <.datastax.graph.fluent.predicates.Search.token_prefix>` * :meth:`token_regex <.datastax.graph.fluent.predicates.Search.token_regex>` * :meth:`token_fuzzy <.datastax.graph.fluent.predicates.Search.token_fuzzy>` Text match: * :meth:`prefix <.datastax.graph.fluent.predicates.Search.prefix>` * :meth:`regex <.datastax.graph.fluent.predicates.Search.regex>` * :meth:`fuzzy <.datastax.graph.fluent.predicates.Search.fuzzy>` * :meth:`phrase <.datastax.graph.fluent.predicates.Search.phrase>` Geo: * :meth:`inside <.datastax.graph.fluent.predicates.Geo.inside>` Create search indexes --------------------- For text tokenization: .. code-block:: python s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('text_field').asText().add()") For text match: .. code-block:: python s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('text_field').asString().add()") For geospatial: You can create a geospatial index on Point and LineString fields. .. code-block:: python s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('point_field').add()") Using search indexes -------------------- Token: .. code-block:: python from cassandra.datastax.graph.fluent.predicates import Search # ... g = DseGraph.traversal_source() query = DseGraph.query_from_traversal( g.V().has('my_vertex_label','text_field', Search.token_regex('Hello.+World')).values('text_field')) session.execute_graph(query) Text: .. code-block:: python from cassandra.datastax.graph.fluent.predicates import Search # ... g = DseGraph.traversal_source() query = DseGraph.query_from_traversal( g.V().has('my_vertex_label','text_field', Search.prefix('Hello')).values('text_field')) session.execute_graph(query) Geospatial: .. code-block:: python from cassandra.datastax.graph.fluent.predicates import Geo from cassandra.util import Distance # ... g = DseGraph.traversal_source() query = DseGraph.query_from_traversal( g.V().has('my_vertex_label','point_field', Geo.inside(Distance(46, 71, 100)).values('point_field')) session.execute_graph(query) For more details, please refer to the official `DSE Search Indexes Documentation `_