Wednesday, September 23, 2015

Cassandra Native Business Intelligence

Native Analytics on DataStax Enterprise Eliminates the Need to Move Your Data

Originally posted on

DataStax Enterprise, built on Apache Cassandra, is synonymous for scalability and high availability while delivering the flexibility required for modern Web, mobile and IOT applications. Knowi provides a native Business Intelligence platform that enables data discovery, CQL query generation, analysis, multi-datasource joins, along with easily shareable, embeddable dashboards from data stored in DataStax Enterprise.

Analytics and Business Intelligence from data within DataStax Enterprise today generally means writing MapReduce and ETL processes to extract the relevant data out of DataStax Cassandra and moving them into a relational database, to provision a traditional BI tool to derive insights and reporting from it. This has significant drawbacks, since data integration, data modeling and database administration specialties require an excessive amount of engineering effort and coordination of different tools and teams.

Knowi offers a dramatically simpler path. By enabling queries on data stored in DataStax Enterprise and Cassandra directly with the ability to store and incrementally track the results to instant visualizations, enterprises can go from data to insights and react to data changes quickly without engineering efforts around MapReduce, ETL processes and relational database schema definitions and storage.

Let's take a 10-minute hands-on look at generating visual insights from a live DataStax Enterprise/Cassandra database (as well as some simple predictive analytics on it):

Macintosh HD:Users:gopalaj61:Dropbox:c9 Screenshots:Screen Shot 2015-05-23 at 11.18.32 AM.png

Overview of the Solution:
  • Eliminates complex ETL into relational databases for BI purposes.
  • Native integration with CQL (without shoehorning the data in relational ODBC form).
  • Data Discovery: Connect and discover keyspaces, tables, fields & key detection (primary partition, clustering and secondary index keys).
  • Query Generation: Automatically generate and explore CQL queries and aggregations using a visual interface.
  • Joins: Blend and join data across multiple DataStax Enterprise databases and other NoSQL and SQL datastores.
  • Optional seamless warehousing of query results with incremental tracking.
  • Advanced Analytics: aggregations, calculations, date bucketing, predictions ….
  • Instant Visualizations: Generate shareable, embeddable dashboards.
  • Connect from the Cloud or on-premise with cloud or on-premise deployment modes.

1. Getting Started: Go to and click on ‘Get Started’ button to sign up (it’s free to get started).

2. Connect: Click on Data sources from the left-hand settings menu, then select Datastax as shown in the animated GIF below. Use the default settings point to a live demo DataStax Enterprise database hosted by us.

Note: If you prefer to connect to your own DataStax Enterprise cluster and the database is inaccessible from outside your network, use our agent to facilitate secure connectivity to it.  

3. Queries & Insights: Now, let’s generate some insights from a demo dataset. This database contains a demo dataset of email campaign details; in the following example, we’ll determine the Total Sent, Delivered and Opened on a weekly basis from it, while showcasing the following along the way:
  • Table, Keyspace, Field and Key discovery
  • Auto-generate queries
  • Aggregations
  • Visualizations

Click on Configure Queries.
i. Expand the ‘Query Generator’ section to determine tables within this DataStax Enterprise Cassandra database, along with identifying partition keys, cluster keys and secondary index columns.
ii.  Select cloud9_demo as the database.
iii. CQL queries are automatically generated when you use the Query Generator. Click on Preview to immediately see a preview of all data in this table.
iv. Select Sent, Delivered and Opened from the metrics dropdown. Click on each to display aggregation options. Select Sum on each.
v. On the Dimensions option, select Date, then click on it to select Weekly aggregation.
Now we have the sum of Sent, Delivered and Opened on a weekly basis.

The auto-generated queries so far look like this:
select "date", "sent", "delivered", "opened"
from "cloud9_demo"
limit 10000

Cloud9QL: This is a post processor with a SQL like familiarity that enables easy aggregations and a range of more advanced analytics capabilities. It’s not a replacement for CQL, but rather complements it with a range of powerful Business Intelligence and reporting capabilities that CQL is not designed for. Learn more about Cloud9QL at

The generated Cloud9QL looks like the following, which processes the results returned from CQL to an aggregation of Sent, Delivered and Opened totals on a weekly basis.

select SUM(sent) as Sum of sent,
 SUM(delivered) as Sum of delivered,
 SUM(opened) as Sum of opened,
 WEEK(date) as Week of  date
group by WEEK(date)
vi.  Click on Preview to instantly visualize the results.

vii. Save the Results. The results can be either be seamlessly saved into our schemaless data warehouse for fast access (with auto-update and incremental query/upsert capabilities that are beyond the scope of this post), or executed against the datastore in real-time.  

4. Dashboard: Click on Dashboards. Drag & drop the newly created visualization into the dashboard. The dashboard can be easily shared, embedded and filtered on.
5. Predictions: A range of analytics capabilities are part of the Knowi platform, including seamless predictive analytics that backtests the data across a number of algorithms to automatically determine the best fit.

In the following example, we’ll put together predictions for the Messages sent on a monthly basis for the next 6 months.   
  1. Clone the widget.
  2. Add a Cloud9QL filter (see the GIF) with the following:
select predict(Sum of Sent,Week of  date, 10/01/2015, 1m, 6)
  1. Save. Our new widget will be updated with the predictions for 6 months starting from October.

In a few simple steps, we’ve derived insights and predictions from a DataStax Enterprise cluster with easily customizable, shareable and embeddable dashboards.                                                          

Instantly connect and derive insights from our demo DataStax Enterprise cluster:

Join DataStax Enterprise data with other data sources:

Post process CQL results to aggregate, date bucketing and other: