Wednesday, April 19, 2017

We've Got Some Exciting News: New Machine Learning Capabilities and a New Name!


We have a couple of exciting updates.

First, our latest release brings all of the power of our Business Intelligence platform and integrates Artificial Intelligence. With this release, you can blend hindsight and foresight to drive actions from your data. Not only that, it also enables you to offer value added machine learning capabilities to your customers, for embedded use cases.

Second, we are rebranding Cloud9 Charts to Knowi. With the latest product updates, it was the right time to pull the trigger on the name update that we’ve been simmering on for some time.


In a short period, we’ve built a leadership position in NoSQL/Polyglot analytics, with customers range from Fortune 500 to startups. Now we are aiming even higher.  With Knowi at the top of your modern data stack, you can blend SQL and NoSQL data and integrate machine learning to deliver smarter data applications.

If you’d like us to go over the Machine Learning capabilities, please contact us. It’s included as part of your service during the beta period for the next four months.

Thank you from all of us for being part of our journey thus far...we look forward to many more years to come.

Onwards & Upwards!

Jay Gopalakrishnan
Founder and CEO, Knowi

Sunday, April 16, 2017

4 Reasons Why Your BI Tool Is Preventing You from Becoming Data-Driven

Becoming data-driven company-wide, where data is integrated into the decision-making process all levels of the organization because employees, partners, and customers can access the right data at the right time, is often the ultimate goal when implementing an analytics solution. 

There are dozens of BI tools, from Tableau to Qlik to more modern tools like Looker, who promise to help achieve this goal.  However, the path is often overwhelmingly difficult because of hidden barriers to implementation and adoption that only reveal themselves as you try to implement.  What many don’t realize is the most common issues are not on the front-end, where end users enjoy high-quality experiences with most traditional BI tool, but rather on the back-end.  Here data engineers and IT developers are knitting together complex data platforms to move and manipulate unstructured data back into structured tables and get data prepped just so these BI tools will work. 

In recent years, we’ve seen a fast and furious evolution of data particularly in the emergence of Big Data and IoT technologies.  Not so long ago, most data used for analytics lived inside your four walls in well-understood systems and was always structured.  Data warehouses provided a single source for analytics and business analysts happily built retrospective visualizations and reports using day old data. 

Then came Big Data with a whole new world of real-time insights into customer sentiment or behavior tracking.  Now, looking at yesterday data was so, well, yesterday.  The new face of business intelligence was real-time dashboards built using data from mostly external sources and housed in multiple data stores.  As the data evolution progress further into IoT, data only gets bigger and faster and is almost always unstructured or multi-structured.

However, virtually every BI tool is stuck in the age of structured data.  Modern data analytics blends structured, unstructured and multi-structured data together to glean insights.  If your organization cannot leverage NoSQL data, then how do you integrate that data into your decision-making process?  These traditional SQL-friendly BI tools tell you they support NoSQL but there is a catch and that catch permeates everything else putting your data-driven dreams at risk. Here’s are four reasons why:

They are not NoSQL-friendly

No data lives in a silo and one of the significant barriers to fully leverage your data with traditional BI tools like Tableau, Qlik or Looker is they only understand structured data.  To use NoSQL data with any SQL-based BI tool, you are:
  1. Writing and maintaining custom extract queries using proprietary query languages provided by the NoSQL database that someone must learn
  2. Install proprietary ODBC drivers for each NoSQL database
  3. Using batch ETL processes to move unstructured NoSQL data into relational tables
  4. Doing analytics on unstructured data using different data discovery tools and then doing ETL
  5. Dumping everything into a Hadoop or a data lake to clean prep and eventually moving it to a traditional data warehouse
  6. Some mash-up of the above
In all cases, schemas must be defined, extract queries must be written and unstructured data must be shoehorned back into relational tables.  Congrats, you’ve just taking your beautiful modern NoSQL data and made it old again! 

All kidding aside, this is the main barrier to becoming data-driven using your traditional BI tool.  If you cannot natively integrate your analytics platform to your modern data stack, then you simply cannot fully leverage your enterprise data.

Ex.  MongoDB BI Connector Moves Data From MongoDB to MySQL

They are slow to adapt and lack intelligent actions

Bottom-line is the way people interact with analytics in data-driven enterprises is not a “one-size-fits-all” answer.  You have a diverse set of users with differing experience expectations and use case complexity levels which all must be managed to achieve company-wide adoption.

For example, not surprisingly, many people hate looking at data.  They don’t wake up every morning excited about looking at dashboards and drilling down to try to figure out why an application, a market, a region, a product is not performing like it did yesterday.  They don’t think trying to find the needle in the haystack is a good use of their time Instead, they want their analytics platform to tell them where the problem is, why it has happened, is it likely to happen again and, in some cases, automatically trigger what to do next.  At the same time, not everyone is looking for a needle some are just looking for the haystack.  For this group, a shareable or embeddable dashboard works great. 

Then you have the citizen business analyst who just wants to ask a question using Slack or even Siri and have the right dashboard or report appear. 

Traditional BI tools were built with the purpose of handling the “haystack” scenario but are falling behind in helping people find the “needle.”  The “needle” challenge requires more advanced analytics capabilities.  In many cases, this means predictive analytics with integrated machine learning.  Additionally, natural language processing (NLP) interfaces are emerging as alternatives to embedded dashboards to help empower non-technical users with simple analytics needs like “show me the sales for today at store 123”.

Many traditional BI tools lack the ability to adapt to these emerging modern analytics requirements and coupled with their lack of native integration to modern analytics data stacks; other solutions are acquired to manage these different use cases and user experience requirements.  As a result, instead of one analytics solution, most organizations have multiple solutions which operate in data silos to solve very specific use cases using a subset of the enterprise’s data.  The proliferation of analytics solutions is hardly a recipe for becoming a data-driven enterprise.

Not as data democratic as you think

By having to move and manipulate your unstructured data back into relational tables, an artificial wall is built between business users and all available enterprise data.  Expensive IT developers must be involved in virtually every project to integrate NoSQL data.  This adds months to projects, makes changes very expensive and increase the overall cost of data. Arguably, the real cost is the loss of understanding of the value of newer NoSQL data sources.  The business ends up so far away from the original data that it is hard for them to know what questions are possible to ask and therefore realize the value of newer data.  As a result, the questions, instead, become centered around the cost of acquiring and storing NoSQL data, and that is where many Big Data initiatives start to fail.

Not Big Data scalable

As mentioned earlier, data is only getting bigger and faster as IoT analytics hit the mainstream.  When ETL or ODBC drivers are used to move and manipulate NoSQL data to relational tables, then data limits are also added to prevent these processes from failing at volume.  Typically, record retrieval limits or aggregated data sets are employed to combat these performance issues.  A separate data discovery process is used to determine what data to retrieve or what aggregations to create.  Data Discovery, in this case, requires a different tool or custom coding and is almost always done in IT with input from the business.  From a process, technology and people perspective that is just not scalable or sustainable when it comes to leveraging Big Data to become data-driven.  There are simply too many restrictions on data and moving parts for the performance to meet business needs.

These traditional BI tools claim to reduce data silos, reduce time to insights and enable data discovery across the enterprise.  However, our customers repeatedly tell us they fail at achieving this when it comes to integrating NoSQL data into their business analytics. 

To be data-driven requires a modern analytics platform that enables data discovery across your modern data stack with support for descriptive, predictive and prescriptive analytics to derive insights and drive actions in a way that support the specific needs of a diverse end-user community and their use cases. 
________

Cloud9 Charts is modern analytics platform that has already enabled dozens of organizations, large and small, to become data-driven.  We natively integrate NoSQL and SQL data sources and enable multi-data source joins for seamless blending of structured, unstructured and multi-structured data without the need to move it.  You can share or embed over 30 different visualizations or use our advanced analytics capabilities to automate alerts and actions.


In the coming days, we have an exciting release announcement that will bring you one big step closer to achieving your goal of becoming a data-driven enterprise using a single analytics platform.  Stay tuned…

Tuesday, April 11, 2017

Twitter Analysis Using MongoDB Atlas and Cloud9 Charts

GUEST POST: Matthew Plummer, Plymouth University U.K.

This project is a product of a final year honors project by student Matthew Plummer from Plymouth University U.K. At the core of this project is sentiment analysis of large data sets. More specifically, based on sentiment analysis of social network platforms focusing mainly on the social network Twitter. There are multiple parts to this project including Twitter API, Java, NoSQL database storage model using MongoDB and data visualization using Cloud9 Charts.


Why carry out sentiment analysis on social networks?
It is widely known that online social networking has grown rapidly in past years and will continue to do so in the years to come. The volume of data produced via social media by its users is growing at a phenomenal rate with a report from (PaCroffre, 2012) predicting that by the year 2020 data volumes on social media platforms will increase by a staggering factor of 44.


These vast volumes of data provide a hugely beneficial insight into the public's opinion on certain topic areas and/or events. In recognizing this source of information and by taking this opportunity businesses could benefit greatly in many aspects by analyzing this data. This is where sentiment analysis comes into play, by sentimentally analyzing Twitter tweets for specific topic areas or products by using keywords, phrases or hashtags valuable information can be extracted and used to make business decisions.


System architecture
As stated previously this project is made up of 4 key components, to reiterate, these are Twitter API, Java, NoSQL database storage model using MongoDB and data visualization using Cloud9 Charts. The architecture can be seen in the architecture diagram which was created for this project. This diagram was iterated upon many time as the project developed and came into its own.




ARCHITECTURE DIAGRAM

Java and Twitter4J
At its core this project is built using Java, Twitter4J (http://twitter4j.org) was used to construct and make API calls to Twitter in order to retrieve tweets. Twitter4J is a Java based library for use with the Twitter API. In using Twitter4J, access to the Twitter API can be easily integrated into any Java application or IDE. Twitter4J provided a set of predefined code classes which are used to request information from the Twitter servers for specific data. The fact that Twitter4J is an open source and free to use library it is easily downloaded and installed. The number of developers working on and using this package has been on the rise since its conception. It has grown as a platform over the years adding new functionality which can be used for multiple tasks, some examples of these functionalities are live Twitter tweet streams and Twitter tweet searches which were both used for this project. A Twitter stream of live tweets could be retrieved or a search for past tweets may also be carried out. In both cases, the user inputs search words which are used to filter tweets to only those needed and in the case of carrying out a search the user selects a past date (within the 6-9 days which the Twitter API stores) for the search to execute for.


Java was used for Twitter4J but also to carry out the sentiment analysis and in order to access, populate and update the database which was created and is stored on MongoDB Atlas (https://www.mongodb.com/cloud/atlas).


Data storage using MongoDB Atlas
This project heavily implemented both of the concepts of cloud storage and cloud computation. The chosen storage method is the use of MongoDB Atlas, the official definition from MongoDB is that:
“MongoDB Atlas is a cloud service for running, monitoring, and maintaining MongoDB deployments, including the provisioning of dedicated servers for the MongoDB instances.”
(MongoDB, 2017)
In order to store the vast volumes of data which were retrieved from Twitter a cloud-based storage system, such as that of MongoDB Atlas, was needed. A number of cloud storage options was considered including, but not limited to, AWS (Amazon Web Services) Redshift, and Oracle. Upon considering the options it was decided that the NoSQL approach of MongoDB was of best for this project. As a result, a cluster was started in MongoDB Atlas which is where all the data was stored.


MongoDB is a NoSQL data storage system, if you aren't sure what that is or how the storage system/ architecture of a typical NoSQL system is constructed, this is briefly explained below in term of the more traditional relational database model  If you already have an understanding of NoSQL data structures you can skip over the below.


NOSQL STRUCTURE DIAGRAM


Database: this is same as a database in relational data models. The database acts as the highest level of storage where all other data and information is stored. In the case of NoSQL, it is modeled by means other than a tabular format, which is what relational models employ.
Collection: in terms of a relational data model this acts as a ‘table’, here all data is inserted into and stored. Collections are used for sorting similar data into groups for ease of accessing, processing and analyzing.
Document/ object: an object, or document which it is most commonly referred to as acts as a ‘record’. Within a document pairs of corresponding keys and values are stored. Documents are used to organize data into beyond the basic level of key-pair matching. Each document will be assigned a unique object ID which can be used to uniquely identify and retrieve the document.
Key: the key is, in a sense, a ‘field name’, the key will give information regarding the stored value. This allows the user, or analytics tool, to relate the value data to a corresponding key in order to understand what the retrieved values represent.
Value: this is the data that is ingested into the data storage system. This acts as the raw data and also processed information. The value, along with the matching key, is the core of the data and the structure. The value is a record of data in relational data models.


In researching both relational and NoSQL database management systems, it was decided that NoSQL would be best suited for this project. Due to the fact that NoSQL databases are best suited for the management of large unrelated data sets, with the ability to scale and maintain the performance of the manipulation and access to the data and the fact that, for this project, there will be no relations between data sets, NoSQL was, therefore, the approach to be taken forward for this piece of work.


Sentiment analysis and Cloud9 Charts
Each tweet received was analyzed for sentiment words. A collection of over 6,000 sentiment words was constructed which was used for the analysis. Java code was used to access the MongoDB collection to retrieve the sentiment words and then this was iterated through, along with the tweet, in order to find matches.The overall sentiment of the tweet was then calculated by the following:


  • By default, the overall sentiment of all tweets is initially 0.
  • If a positive sentiment is found the overall sentiment is increased by 1.
  • If a negative sentiment is found the overall sentiment is decreased by 1.
  • After all sentiment words are have been checked against the tweet, a polarity is assigned to the tweet. If the overall sentiment is:
    • Greater than 0 it is positive
    • Less than 0 it is negative
    • Equal to 0 zero, with no sentiment words present, it is neutral
    • Equal to 0, with sentiment words present, it is balanced.


The tweet date, tweet text, overall sentiment, polarity and sentiment words found are all inserted into a collection.


At this point, Cloud9 Charts takes over. MongoDB Atlas is integrated into Cloud9 Charts easily and it is supported natively. Within Cloud9 Charts additional fields are created if needed, an example for this project being a stripped down date field to use as a parameter to graphical visualizations i.e. from dd-mm-yyyy hh:mm:ss to dd-mm-yyyy. A number of graphs and charts are created by the user and these widgets are added to a dashboard for presentation, sharing etc.


This system has been run over several days using #Trump as an example search word in order to retrieve data from Twitter.

The complete dashboard created for this hashtag can be seen here: https://www.cloud9charts.com/d/3ySFpmhhl2wAfYQyc6xZ4VFXBbShk9SiivPv6xWv6y3Iie


Matthew Plummer
---------------------


Thursday, February 16, 2017

7 Requirements to Deliver Business-Driven Analytics in the Modern Data Age

The journey to modern data has been fast and furious. Within a just a few decades, we have gone from retrospective analytics on operational data located within your four walls to prescriptive analytics using real-time multi-sourced data and everything in between. 

Evolution of Business Analytics
While data, data storage, and data processing technologies have advanced, I would argue that most analytics platforms are still playing to catch up, specifically when it comes to analytics using unstructured data sources.

Many organizations have figured out that moving, flattening and aggregating unstructured data so it can be loaded into relational tables is not a good long-term solution.  It adds unnecessary complexity to already complex architectures.  It forces the business to decide ahead of time what data they want to see.  It impacts the fidelity of the data.

To pile on, business teams are greedy.  They want the same self-service they get with SQL friendly desktop tool but avoid heavy processes like ETL because it adds weeks to their analytics projects. 

So how do you architect a modern analytics platform that supports polyglot persistence strategies and business-driven analytics?  It's a hard problem to solve, but done right can transform the business.

Having been involved in hundreds of such projects, we've come up with a list of 7 requirements to deliver business-driven analytics on modern data architectures.


#1 - Stop Moving Your Data Around
Modern analytics platform architecture
In the age of polyglot persistence, native integration with NoSQL data sources eliminates pre-defined data constraints and performance issues that are a by-product of moving, flattening and aggregating unstructured data so it can be loaded into relational tables structures.

#2 - Data Discovery Across Sources
Ideally, the complexity of the underlying data architecture is hidden from the data engineer and they just configure new data sources and start exploring, including joining multi-sourced and differently structured data together to created blended data sets and the ability to visualize the results immediately.

#3 - Shift to an Agile Analytics Approach
First, let data engineers take over the creation and management of datasets. Second, shift development to an agile iterative approach.

#4 - Enable Self-Service for All Data
In modern architectures, all data is equal and advanced analytics platforms should enable the same level of business self-service on NoSQL, SQL, RDMS or file-based data sources.

#5 -Embed Analytics in Data Applications
Embedding analytics in data applications not only makes it easier for the business to integrate Big Data into their decision-making processes, but it also makes their decisions data driven.

#6 - Drive Action Not Just Insights
Using advanced analytics is how many business leaders believe they will deliver business transformation and growth so even if you are not there yet plan for .

#7 - Ensure Enterprise Readiness
The business is relying on this data to make decisions.  Some processes need to be layered into the platform to ensure it is reliable, scalable, secure and fault-tolerant.  

We generally see high levels of success and higher user adoption rates when these requirements are delivered.  Our customers also report over a 10x reduction in analytics project implementation time and rework.

For more detail on these requirements and how we narrowed it down to these seven, download our free whitepaper Business-Driven Analytics in the Modern Data Age or click button below.