Tuesday, February 5, 2019

How to Tell a Compelling Story with Your Data Using Knowi

Great Storytelling Starts with Using the Right Data Visualization

Knowi Data Visualizations

Choosing the right visualization to tell your story is important, as the wrong visualization can cause confusion and distract your audience.  We walk you through dozens of visualization types and how to best use them to get your point across. 

We all have our goto charts for visualizing data but if you need to present data to others, you have to think from the viewer perspective.  Picking the wrong data visualization can easily confuse the viewer and lead to mistaken data interpretation. Before you create a chart, first understand the reasons for the chart. Start by asking a couple of questions, What are you trying to convey? What conclusions do you want the viewer to see using their data? This will help you narrow down the type of data visualization to use.


There are two main categories for comparing values: 1) you want to compare one or many items to show low and high values; 2) and you want to show trends over a time period. For example, if you have specific time intervals you want to highlight, such as hours, months, quarters, Line, Column, or Area charts are good options. If linear progression is not available in the dataset, Column and Bar charts are commonly used for simple comparisons among items. If allowing for viewers to search data is required, Data Grids (tables) or Pivot Tables are typically used.

Knowi comparison charts


Relationship charts are suited to showing how one variable relates to one or more other variables. You could use relationship charts to show correlation or connections between variables in a data set. For example, Scatter Plot, Bubble, Heatmaps and Dual Axis Column charts are good for showing how something positively effects or negatively effects another variable over time. Cord Diagrams are a really interesting way to show connections between the data.

Knowi relationship charts


Data composition charts show how individual parts make up the whole of something and change over time. These charts are used to show how something is divided up. The only rule to remember is the sum of the parts must total 100%. If you have relatively few data points, Pie, Doughnut and Stacked Column charts are good options. If you have a large number of data points, try using a Treemap to keep the visualization compact but still readable. Sankey is a good option to show flows as well as proportion such as user behavior flow through app pages. Waterfall charts are used to understand the influence of several positive and negative factors on the initial value. They are typically used for financial reporting such as visualizing financial statements.

Knowi composition charts


Visualizing the distribution of data in a given interval, geographic area, etc. is a great solution for quickly spotting outliers, clustering trends, and relationships within a data set. Bubble, Heatmaps, Box Plot visualizations display frequency, how data spread out over an interval, or how data is grouped. Maps also show frequency and data spread but add geographic context so are often used for population-based visualizations. Word Count diagrams are used to show text-based distributions. For example, Word Count can quickly visualize top revenue generating customers.

Knowi data distribution charts


If you are building executive or business performance visualizations, you typically want to include Key Performance Indicators (KPI) measurement. These visualizations are great for displaying a single value/measure within a quantitative context such as comparing to the previous period or to a target value. It allows viewers to see, at a glance, totals such as sales, revenue to target, number of active users, etc.

This is probably the easiest data visualization type to build with the only consideration being the period you want to track. You just need to remember that less is more with KPI visualizations. KPIs can be very impactful on a dashboard but the more of the same type of visualization you add the more you risk diluting the key metrics. Reserve these visualizations for the important metrics you want to immediately draw the viewers
attention towards.

Knowi KPI charts


In IoT use cases, typically the end users are non-technical so selecting the right visualization is critical to making complex IoT data easily consumable. Simplifying the IoT data story can be accomplished using a variety of visualizations including Timelines, Heatmaps, Geo-cluster, and Image overlays. For example, a Timeline diagram can instantly show if all the lights in a building were off overnight. A geo-cluster diagram or image overlay can show where sensors are deployed and their health with drill-down capabilities to see specific device details. Bubble charts and heatmaps are a quick way to highlight anomalies for further introspection.

Knowi IoT Analytics Charts

Hope you found this useful.  If you'd like to get a PDF version, click here.

Saturday, February 2, 2019

How Three 2019 Analytics Trends Change the Status Quo

Originally posted on medium.com

Demands of the Digital Enterprise Force Rapid Innovation Towards Augmented Analytics

Before we dive in, let me overlay this discussion with the main business trend that is behind some of these technology trends. Digital Transformation.
“Digital transformation marks a radical rethinking of how an organization uses technology, people and processes to radically change business performance”, says George Westerman, a principal research scientist with the MIT Sloan Initiative on the Digital Economy.

According to a recent CIO Gartner survey, Digital initiatives and revenue growth are top priorities for enterprises in 2019. Digital initiatives combine disruptive technologies, like Big Data and AI, with existing infrastructure to innovate business processes and deliver new products and services that improve customer experience and therefore drive growth.

In 2019, digital initiative investments cover a wide range of technology with analytics and BI solutions leading the list. According to the survey, investment in analytics and BI tools is expected to increase by 45% to support Digital Transformation efforts. Technology leaders see analytics as a core component to successfully delivering transformative solutions to the business. With that, let’s breakdown three of the top analytics trends for 2019 and how they support Digital Transformation.

2019 Analytics Trends to Watch

According to Forrester, insight-driven organizations grow 30% faster than their less informed peers.

1. Analytics Reaches the Other 78%

In the past, self-service analytics promised to enable business teams with analytics, but reality shows that traditional BI, even with self-service, achieve adoption rates of less than 25%. There are a variety of reasons for this anemic performance, but in simple terms, it is because they require business users to understand too much about how the data is constructed and dedicate precious time to learning a new application.

As a result, most business users rely on others to answer even basic questions which could mean they wait days, if not weeks, for the result. If only 25% of your organization has access to analytics, it is virtually impossible to become a data-driven organization.
To combat this problem, in 2019, more analytics platforms will integrate Natural Language into their platforms. Natural Language Processing (NLP) promises to turn self-service analytics on its head by focusing on delivering solutions for non-technical business users, i.e., the other 78%.

What Is Natural Language Processing?
Natural Language Processing (NLP) is a form of deep learning AI that creates the ability for a computer to understand human language. In the past, we had to learn computer languages like C++, Javascript, etc. for a computer program to do what we wanted it to do. The shoe is finally on the other foot! With NLP, computers learn our language, English, French, Chinese, etc. and interpret key elements of a sentence to understand the intent. Sentiment analysis is a good example of how NLP is used in analytics today to understand the intent behind a statement. More recently, NLP got a voice. Call her Siri or Alexa but behind the scenes is highly complex NLP that is learning every time you talk to her.

Natural Language BI allows business users to ask questions, in human language, and get answers, usually in the form of a dynamically generated chart. They can ask follow-on questions to explore deeper and run what-if scenarios without needing to understand how the data is constructed. More importantly, the interactions with data analytics can be done within applications your business users are already familiar with using, like Slack, using NLP APIs. Done right, Natural Language BI has the promise to close the data gap for business users and create the data agility needed to uncover insights that can transform business.

2. Self-Service BI Rebrands to Augmented Analytics

Adding NLP to BI platforms is just the beginning changes we’ll see in 2019 geared towards non-technical business users and in support of Digital Transformation efforts.

Let’s say you run a SaaS company. Like all SaaS companies, an important metric you pay close attention to is Churn Rate. With natural language queries, you could ask a question like “what was my churn rate for the last three quarters.” It is obvious that you’re trying to understand if actions you’ve taken have caused any changes (good or bad) to your churn rate. However, the question you want to ask is “what should I change to reduce my churn rate?”

The answer to this question is at the junction of AI and NLP. NLP enables the question to be asked, in plain English, and AI answers the question. This is also where AI and BI converge to create something new, Augmented Analytics.
In 2019, analytics platforms will move in earnest to leverage AI/machine learning under the hood to prepare data for analysis, analyze it, and interpret results into predictive insights that can be used to take action.

3. Analytics Platforms get Smarter about Data Discovery

Data is only getting more complex and dispersed. Enterprises are awash with data, but IT continues to struggle with finding ways to collect it from diverse sources and make available in time for the business to get value out of it. The current status quo is to involve multiple teams to move, aggregate, transform and prep the raw data before using it for machine learning or analytics. This means you are applying machine learning on already biased data that was created based on predetermined questions.
Modern BI platforms with augmented data discovery capabilities will show two times the adoption growth rate of all other modern BI platforms by the year 2020, Gartner predicts
2019 marks an inflection point for analytics, as machine learning/AI becomes more mainstream the traditional extracting, transform, and load data into analytics platforms make even less sense. Data and business pushed by new digital initiative are just moving too fast to wait weeks for data pipelines to be built. We’ve entered a time where data agility is critical to success meaning the old way of custom coding data pipelines or building rigid schemas for a centralized data warehouses/data lakes needs a rethink.
In 2019, more analytics solution providers will expand their data management capabilities. New approaches, like smart data discovery, which applies machine learning/AI to raw data to discover relevant data relationships and correlations automatically will become more prevalent. The intent behind using AI here is to augment human component in a way that reduces unintended bias created by manipulating data when prepping it for analytics. By using smart data discovery on raw data, it becomes possible for the corresponding insights to be surfaced without any preconception of the answer or perhaps even the question.


Implementing technology is only part of the battle there are other moving parts involved. Below are some potential inhibitors that need to be addressed alongside technology implementation if business leaders hope to advance their Digital Transformation efforts forwards in 2019.

AI Trust

AI is a black box and just because it has the “AI” stamp of approval doesn’t mean the result can be trusted. Business executives will ask how you came to a specific conclusion. Saying “the AI told me” probably isn’t the going to be enough to convince anyone to take action. Being able to explain how you came a particular recommendation is going to be required before anyone will trust the AI behind it.

Data Ethics

Data ethics is a top of mind topic for many with data privacy and data use laws like GDPR going into effect in 2018. Additionally, the use of personal data to feed the “fake news” engine may result in similar legislation being applied more globally. As a result, the topic of data ethics is no longer restricted to data collection but also applies to how data and resulting insights are used. With more data put into the hands of more people additional training on the do’s and don’ts of data use become essential to avoid unwanted appearances in news cycles.

Data Literacy

We’ve talked a lot about how technology is helping reach the other 78%, but there is still a human factor to the equation that can make or break success. Developing a data-driven culture within your organization means not only providing the tools but developing the processes and providing necessary data literacy training. Data literacy training ensures everyone knows how to perform experiments and create compelling stories using data to put forward a persuasive argument to take action.


This year will be interesting on many fronts but mostly to how analytics solution providers, new and old, start to leverage AI for good, the level of push back within leading organizations, and the corresponding fallout for product roadmaps. I think the technology is moving faster than business right now and the increase in anxiety AI creates for analytics leaders will force a slowdown in its adoption. Some of the predictions about the level of impact of AI in the market sound overly aggressive, at least to me. However, the movement is unmistakable. If you are in a position to influence your organization’s analytics strategy, it will make sense to prepare for the world of augmented analytics. Begin to assess the impacts implementing augmented analytics may have to data architectures and near term technology decisions. At the same time, the long pole for Digital Transformation is more likely to be process and people than technology. History has proven that humans are good at stifling innovation so the sooner you being to evangelize augmented analytics, to inspire imaginations versus instill fear, the more likely your Digital Transformation efforts will succeed enterprise-wide and actually transform how your business operates.

Tuesday, December 4, 2018

Knowi Product Update - Q3 2018

We are super excited to announce that we've added Natural Language BI to the Knowi platform.  With this addition, business users gain the ability to type, in English, a query and watch as we automagically interpret the question and present the results.  It is specifically designed to provide self-service analytics capabilities to non-technical end users.  They can ask questions and get answers without an in-depth understanding of the data or how it is constructed.

There is a lot more to tell you about so keep reading!  If you're more of a visual person, you can watch the replay of our product update webinar here.

Use Your Words - Natural Language Query

Natural Language BI is a powerful way to enable self-service analytics to non-technical users by enabling them to ask questions in plain English and get to insights quickly without needing any in-depth knowledge of the data or how it was constructed.

Natural Language BI is available for all widgets within a dashboard. At the moment, it can be accessed using the Natural Language/Self Serve Analytics icon on a widget or by going into Analyze mode. 

Start typing your question in the search bar.  We use machine learning to automatically suggest the next part of the question.  When you're happy with the question, hit enter, and the results will be displayed.  Easy as pie.


Users can select the chart type they want to use and save the results to their dashboard to create their custom dashboards.

Trigger Alerts Gets a New Modern Look and Feel

If you're not familiar with Trigger Alert or Trigger Notifications, they are a powerful way to trigger an action based on criteria you specify on any dataset.  Trigger Alerts allows you to monitor datasets for conditions and send alerts, invoke an API, or send a notification to a Slack Channel.  This capability allows you to get your data working for you.   Instead of business users having to go to a dashboard to gain insights, the insights are delivered to them so they can take the appropriate action. 

To create a trigger alert, go to Alerts from Setting Menu.


Enterprise Security Enhancements

Knowi now supports the setting up of a connection with an LDAP server which will then allow users to login into Knowi using their LDAP credentials. If you require this functionality, please contact support.

Support for sharing the administration rights on all Dashboards, Widgets, Reports, and Alerts has now been implemented. This will allow a user to grant admin access to assets they created for the purpose of on-going administration including edits, setting Global Filters, and changing widget level Cloud9QL. 


For secure embedding use cases, you can now specify a time to expire for a given share URL so the link cannot be reused beyond the expiry date.

Other Cool Stuff
The display of time series chart legends and filters can be localized for a user by selecting the Locale option under User Settings. Useful for embedded dashboard and widget modes, this will then render any time series chart selected in the locale specified. Current locales supported are English (en), French (fr) and German (de).

Run-Time Date Query Parameters
Runtime parameters can be applied to Direct queries to pass dynamic parameters into a query.  We've added the ability to pass date parameters based on input from the UI or user configuration. 

Knowi Warehouse Retention
With ElasticStore usage, the overwrite strategy provides you powerful control over how data is managed.  We've added the ability within the overwrite strategy to specify how far back you want to retain your data.  

For example, Date-1m will remove all existing rows in ElasticStore which have Date field's value less than 1 month before the MIN value of the incoming data's Date field.  You can use data retention in conjunction with other overwrite strategies.

Cloud9QL Enhancements
Knowi's Cloud9QL library has been enhanced to now include the following date delta functions :
  • MINUTES_DELTA: number of minutes between two dates
  • HOURS_DELTA: number of hours between two dates
  • DAYS_DELTA: number of days between two dates
  • MONTHS_DELTA: number of months between two dates
  • REGEX_REPLACE: regex function that can act on a field

Tuesday, July 24, 2018

Knowi Product Update - Q2 2018

Join us on Wednesday, August 8, 2018, at 10:30AM Pacific for a live demo of the new features and enhancements we've added to the Knowi platform. If you can't make the time, register anyway and we will send you the recording of the session.  In the meantime, please keep reading to see a summary of the product updates for this past quarter.

Anomaly Detection

Time-series anomaly detection is used to identify unusual patterns that do not conform to expected behavior otherwise know as outliers. There are a number of business applications for this type of machine learning. For example, IoT use cases where an unusual traffic pattern that might indicate an issue with a traffic light and business applications like detecting strange network behavior that could indicate a hack attempt.

We provide a number of anomaly forecasting algorithms within the workspace so you can determine the best one for your specific use case. To use, simply select Anomaly Detection as your machine learning option and create a new workspace. Follow the configuration steps and test out different algorithms for accuracy. The precision of the model increases over time as more data is made available.

The anomaly detection visualization itself consists of a configurable blue band range of expected values (acceptable threshold limit) along with the actual metric data points. Any values outside of the blue band range are considered anomalies and will appear in red.

Sparkly New Data Visualizations

Data Grid

You may think data grids are boring but then you haven't tried our new data grid visualization. You can do a ton with this new grid type, including formatting, conditional highlighting, sorting, grouping, and search along with the ability to download formatted grid information in Excel format.

The other pretty cool thing you can do is add charts to cells. The embedded chart options are sparkline, area, bar, spline, spline area, and pie. Not so boring anymore, right?

Knowi Data Grid with Sparkle line

Image Overlay Heatmap

You can now upload an image and overlay x, y coordinate values. Example use cases are tracking how a stadium is filling up as people are scanned at the entry gates or showing the location of IoT sensors deployed on each floor of a building.

Knowi image overlay data visualization

It's Your Dashboard - Customize It!

We enhanced the level of customization of the look and feel of your dashboards by adding several new configurable options.

Getting Clicky With It

Tightly integrating Knowi visualizations within your applications is an important step to give users a seamless experience. To further that integration, we've added an OnClick Event Handler which is available for various visualizations. You can use it to customize what action to take when a data point is clicked on the visualization.

Other Cool Stuff
Connected Charts
Added a drill down feature to allow the current dashboard to be filtered with settings based on the clicked points.

Hidden Filters
Admin or dashboard owners can now hide certain filters from end users.

REST-API Enhancements
Added epoch secs support for REST API datasource url parameters, which allows you to pass in date in epoch time formats, dynamically.

Added a way to handle the return of multiple JSON objects within the same file

Cloud9QL Enhancements
We've added an ARRAY function that creates an array with a specified grouping to return an array for the field.  We've also added  UPPER and LOWER functions to change the case of string fields.

Baseball Analytics - A Knowi Case Study

Knowi Baseball Stats Analytics

Baseball Stats and Analytics

How do Runners on Base Impact a Batter's Ability to get a Hit?

Often times what we perceive as logic contradicts reality. In psychology, this tendency of believing we know more than we actually do is known as overconfidence. Too frequently do people solely rely on their intuition and logic to make decisions rather than basing them on data and analysis. During an observational study, I discovered a contradiction between my own logical reasoning and what the data actually suggested in regards to baseball.

When considering the correlation between runners on base and batting average, you would think when runners are on base, batters will have a lower batting average than when the bases are empty.

Quick Summary of American Baseball

For those who are unfamiliar with baseball, there are three bases a runner can occupy: first, second,
and third. The goal of the offense is for the batter to hit the ball and get to first base before the defense can get them out. Once on first base, the batter becomes a runner and attempts to reach each base before running home to score. The goal of the defense is to get three outs by striking out the batter, catching the ball in the air, tagging a runner between bases, or beating a runner to the base with the ball. Batting average is a stat that measures a batter’s percentage of getting a hit by dividing a player’s number of Hits by At-bats. 

A Hit is defined as when a batter hits the ball into fair territory and then reaches first base without the defense making an error or getting a runner out. If the batter hits the ball but the defense gets a runner out but the batter is safe it is called a Fielder’s Choice and is not counted as a hit. At-bats differ from Plate Appearances, a Plate Appearance is any time a batter is up to bat, whereas an At-bat is only counted if the player gets a hit, an out, or a fielder’s choice.

The Question

Knowi Baseball Analytics Logically, it would make sense that if a runner is on base, the batter would have a lower chance of getting a hit because the defense could get the runner or the batter out. This would maximize the defense’s odds of making an out and preventing the batter from obtaining a hit. As a result, this would lead to lower batting averages when there is a runner on base, and higher batting averages when there is nobody on base.

The Data

I chose a sample size of 270 MLB players, the nine players from each team with the most plate appearances in 2017. Using Baseball Reference, I was able to collect stats on each player and created a database of 216 stats per player in Google Sheets, totaling over 62,000 data points. The sample of 270 players are only 28% of the players who had an At-bat in 2017, but they account for 74% of the season’s At-bats.

All my charts and calculations were exclusively derived from the data I collected, team statistics only use data from the 9 players sampled from that team in the calculation. I then accessed the data in Google Shets via an API using the Knowi REST-API integration in order to build queries and visualize my data. One of the advantages of using the Knowi API method to connect to my data was it allowed me to edit my spreadsheet to fix errors and add new statistics and the changes were immediately reflected in my visualizations.  Not having to upload my data every time I changed it was a huge time saver.

What the Data Shows

After collecting all of my data, I created a dashboard (click to play around with it) and visualizations using Knowi in order to make observations and analyze the data. The first visualization I created compared team batting averages when there are runners on base and when there are no runners on base. 

The results were surprising, only two teams: Philadelphia and Kansas City, had a higher team average when the bases were empty than when there were runners on base. Philadelphia was only 1 batting average point off. This means that only 6% of the teams fit my hypothesis of having a higher batting average when the bases are empty.

Next, I decided to look at a graph of the players’ averages. Only 37% of the sample of players had a higher batting average when the bases were empty compared to when there were runners on base. 

Again, the data did not fit my hypothesis. I also observed that most players had a large deviation between each of their averages. In other words, most players had either a really high average when runners were on base and a really low average when the bases were empty, and other players had the exact opposite result. 

I then created two new charts: one that only contained players who had higher averages when runners were on base which I called Group A, and one that only included players who had higher averages when the bases were empty, called Group B. By comparing the two groups to one another, I observed that players in Group A had greater deviations in their averages than players in Group B.

To model this, I created another chart where I calculated the difference between each player’s highest average and their lowest average. The top eleven players with the highest differential were all in Group A. The average deviation of Group A players was 41 batting average points and the average for Group B players was 30.

My Conclusion

It turns out that the results did not support my hypothesis whatsoever if anything it showed the opposite. Only 6% of the teams and 37% of players fit my hypothesis. Not only did more players have a higher average with runners on base, the players who did had a large deviation between their averages than players who did not. 

Overall, the League average for when runners are on base was .275 and when the bases were empty was .261. A difference of 14 points seems insignificant, but to put it into context, given the sample size of more than 123,000 At-bats, it results in a difference of 1,800 hits. I also tested the statistical significance of the data. Although the name can be misleading, Batting Average is actually a proportion rather than an average, so in order to conduct hypothesis testing, I would need to use the difference between two proportions formula. 

With the formula below, I came to the conclusion that with over 99% confidence, batting averages when runners on base are greater than averages when the bases are empty. Additionally, with 97% confidence, averages, when runners are on base, are higher by 10 points than when runners are not on base. 

The data clearly rejects my original hypothesis that batting averages would be higher when the bases are empty and goes as far as to provide solid evidence towards the exact opposite.

A Real-World Application

Without applying it to real life, data is useless. Using players’ situational batting averages when setting a team’s batting order can help increase the number of hits the team gets in a game, in turn leading to more runs. The leadoff hitter, the first batter in the lineup, is most likely to come up to bat with the bases empty, roughly 65% of the time. The second batter is the second most likely to come up to bat with the bases empty. By batting players with the highest average when the bases are empty in the 1 or 2 spot, those players will have a greater chance of getting a hit. However, only 47% of leadoff hitters and 24% of batters in the 2 spot have higher batting averages when the bases are empty. 

Additionally, the 4 and 3 spot in the lineup have the highest chance of coming to bat with a runner on base, so a team should put their best hitters with runners on base in those spots. A really good example of a team who bats their players in the best spots is the World Champion Houston Astros. Jose Altuve and Carlos Correa bat 3rd and 4th and when runners are on base have batting averages of .350 and .340 respectively. To put those averages into perspective let me remind you that the league overall batting average is .267. The Astros also bat either Alex Bregman or Josh Reddick 2nd, the players with the two highest averages when the bases are empty. Now compare that to a really bad example, the last place Detroit Tigers. The Tigers 4th hitter is Victor Martinez, being the 4th hitter Martinez is the most likely player on the team to come up to bat with runners on base. However, Martinez is the worst on the team with runners on base, and the second best player on the team when the bases are empty. The Tigers best hitter with runners on base is Nicholas Castellanos with an average of .341. Despite his high average with runners on base, Castellanos only has an average of .223 when the bases are empty. Castelllanos’ average drops by a whole 118 batting points when the bases are empty, yet he bats 2nd and is second most likely on the team to come to bat with the bases empty. Being able to utilize data in a way to make data-driven decisions is the most important reason to use data analytics.

Through this observational study and visualizing my data using Knowi, I was able to reshape the way I think about baseball and use real data to disprove my own logical reasoning. All it takes to bridge the gap between what you know and what you think you know is investigation, observation, and analysis. It can be done with baseball and it can be done with anything. How can data analysis transform your perception and more importantly your decision making?

Sports Reference LLC. Baseball-Reference.com - Major League  

Statistics and Information. https://www.baseball-reference.com/. (5/21/18)