Tuesday, August 25, 2015

New BI Paradigm

We are in the midst of a fundamental shift in how data is stored, with rapid adoption of NoSQL and other storage technologies for storing unstructured and semi-structured data.

With an estimated 80% of enterprise data in unstructured/semi-structured form, enterprises are increasingly adopting “Polyglot Persistence” architecture, which is a fancy term for using different database technologies to handle unique persistence needs. In practice, this means traditional SQL based relational databases for structured data, in conjunction with newer NoSQL databases and file based sources for unstructured/semi-structured data.

So, lots of great options on how to store the data. But, when it comes to unlocking the value of this data across heterogeneous sources, there are some significant challenges given that traditional Business Intelligence solutions are built for structured data.

The standard approach to BI looks like this:

1. Extract, Transform & Load (ETL): Relevant data across heterogenous sources is typically stored into a data warehouse, but first step in that process is the dreaded ETL process. This usually entails:
a)  A team to write custom scripts to ETL data across various SQL based data sources into the predefined schema/warehouse

b)  Develop another set of ETL processes to pull relevant data from NoSQL databases, and yet another for files and shoehorn that data into a relational structure

2.  Data Warehouse: Generally a SQL based database, this requires schema definitions mapped upfront that determine what gets stored where and a dedicated database team to manage it. For any new metric or changes to existing metrics means changes to the ETL layer as well as the warehousing tier. 

3.  BI Tool: Once the data is in a structured form in the warehouse, a BI tool comes into play. With traditional BI tools, this means:
a)  Desktop licenses for use by report creators
b)  Server licenses with IT involvement for publishing and sharing the reports

Each stage of this process involves specific product suites along with specialized engineering and IT resources. For example, for the warehousing layer, typical choices include a custom SQL database or Amazon Redshift, Pentaho/Alteryx for ETL and data processing, Tableau/Qlik for data visualization, etc.

The problems with this approach are that the path from data to insights is far too cumbersome, brittle and expensive and takes a significant amount of time, resources and skills before any tangible results are in sight. Given that a database schema has to be designed upfront, every new metric or modifications to existing metrics requires BI and engineering touch points, thus increasing the time and effort to obtain insights from the data, impacting the organization’s ability to react to data changes quickly.

Cloud9 Charts takes a different approach. Specifically built for modern unstructured and structured data, we dramatically simplify every step to go from data to insights seamlessly. 
Let’s contrast the above with an example below using a screenshot gif:


The example sets up connectivity to MongoDB and MySQL, auto-generates queries against MongoDB and MySQL and performs an inner join on them (this example determines the sent message volume by customer and combines it with the address for each customer from a MySQL database) and lastly, creates a shareable, embeddable dashboard from the data.

 Key things to note: 

-  We’ve obtained insights from data within MongoDB, then combined it with data from MySQL in just a few clicks. No cumbersome data cleanup, complex ETL processing or data warehousing setup/management.

The query results are seamlessly stored into a flexible, scalable data lake/warehouse, without the need to pre-define schema or write complex ETL jobs. The warehouse provides fast access to the results, store multi-structured data, provides schema-on-read and SQL like capabilities and incrementally track the results. Reporting and dashboards can be driven off this warehouse, shielding your production systems against reporting workloads.

- While the warehousing scheme provides significant benefits, there may be cases where you need to execute queries & visualization directly against the source database, which is also supported in a single click.  

-  Dashboard creation, sharing, etc. are enabled right inside the portal, eliminating the need for multiple product suites with one for data discovery and report creation and yet another for sharing.

-  A range of sophisticated ways to query, aggregate and pipeline the data is built in (for example: run a query against 1000’s of databases and merge the results, query capabilities on logs/JSON/XML, plug-in custom machine learning algorithms, etc.)

By quickly connecting the dots from the raw source to insights in an integrated platform, we not only enable teams to make driven decisions across all their data sources, but also react to data changes quickly.

Monday, August 24, 2015

ElasticSearch Business Intelligence & Reporting

Announcing ElasticSearch Integration

Due to popular customer demand, Cloud9 Charts is happy to announce native Business Intelligence support for ElasticSearch, a leading open source full text search engine with a JSON based schema-free persistence store.  

This integration combines ElasticSearch query and search capabilities to easily produce shareable visual insights quickly. In addition, for the very first time, our join capabilities provide unified insights into ElasticSearch as well other NoSQL and SQL based datasources, eliminating cumbersome engineering and heavy ETL processes into relational databases.  

To get started, visit our ElasticSearch page, which is preconfigured with a demo ElasticSearch cluster settings to get started.

In the following example, we’ll walk through the following:
  • Connect to a demo ElasticSearch cluster
  • Execute queries against it
  • Join the results with some data in MySQL
  • Visualize, track and share the results

 Connect:
The following animated gif shows how to connect to our ElasticSearch cluster. 

(Note: If the cluster is behind your firewall, please use our agent)





Let's also set up connectivity to our demo MySQL datasource using the default settings -  that we’ll use for Joins later.

Query:
Once the datasource is setup, click on the Configure Queries link:
  1. Specify ElasticSearch JSON query. In our example, let's search for matches for all documents that contains Transactional.
  2.   {
         "size":1000,
         "query": {
             "query_string": {
                 "query": "Transactional*"
            }
         }
      }
    
  3. Click on Preview to return all documents for that query. This returns the first 1000 matches. 

  4. Let's add some Cloud9QL to post process the data further. Note that the Cloud9QL is optional, and is not a replacement to ElasticSearch query - it provides a powerful way to complement ElasticSearch query to transform the results. In the following example, let's transform the results further to get the sum of the sent field, grouped by customer. 
  5. select sum(sent) as Sent, customer 
    group by customer
    Click on Preview to immediately see the updated results.

  6. Joins: Now that we have some basic data from ElasticSearch, say that we need to join this data with customer details that exists in a MySQL database. 
    • Click on Join --> Select the new MySQL datasource --> Under Query Generator, select the customer table
    • Add customer=customer in the Join Fields box. This implies that customer fields from the ElasticSearch query will be used to lookup (inner join) against customer field in the MySQL query.
            More on Joins: https://www.cloud9charts.com/docs/multi-datasource-joins.html
  
            Click on Preview.


The results now include customer details from MySQL combined together with the data from ElasticSearch.

5. Click on 'Save' to save it, then add it to a dashboard to easily share and embed it.  





Thursday, August 13, 2015

New Visualization Options



Several new visualization types have been added into our visualization portfolio. These new additions make 19 different formats that allow our customers to bring in a brand new perspective on the data.
The new visualizations include:
Geo- marker pins




Geo Marker Heatmap renders circles at specified locations on the geo chart, with the color and size that you specify. When hovering above a specific marker it will magnify and show data that you want displayed. Drilldowns can setup to drill into the raw data of each of these widgets as well.


Examples: Usage distribution across cities and areas, with a drilldown into each city


Geo pins: Simple Google map pins based on address.

 


Along with the Geo – Markers/heat maps you are able to change the map type displayed into 4 new styles: satellite, terrain, roadmap, or hybrid. You are also able to change the primary region you would like to focus on, enter in specific coordinates for an exact address, and magnification level.


Data Grid Heat Map 


The data grid heat map allows users to view the data in multi-dimensions. You are able to track your numbers while comparing them across multiple sources and a specific baseline value set.

Common Use Cases: Cohort Analysis, Quickly identifying areas of strengths and weaknesses across dimensions. 

Bubble




The Bubble chart is a variation of a scatter plot, but the bubble chart has an added dimension; the size of the bubbles. The X and Y axes are both value axes, and the Z axes is a size value. Bubble charts are useful to use when three data series that each contain a set of values.


Example: Deficit by GDP, by country, with the size of the bubble indicated by GDP.




Gauge/goal:



Enables tracking/progress of goals. Color options can be customized for start/mid/finish.


Example: Sales goals for the quarter.




Scatter Plot:




A Scatter plot display values for two variables for a set of data, since we allow the points to be color-coded you can increase the number of displayed variables to three.


For example a company could plot the amount of transactional emails sent vs the amount of marketing emails sent over a given time period (as shown above).


        

Spline:




A spline graph is similar to a line graph; however it possesses a much higher degree of smoothness where the data points connect. The chart calculates a suitable smooth curve between any pair of points which leads to advantages in simplicity and clarity.


Example: Product purchases by type by time.





Along with the new additions, Cloud9 Charts is already offering 12 existing visualizations.




Stacked Column:                                                                    Area:








Bar:                                                                                            Column:






















Funnel:                                                                             Data Grid:   











Data Grid Summary:                                                       Geo - Regions:














Geo – Heat Map:                                                                  Line:
















Pie:                                                                                              Stacked Area: