Tuesday, June 19, 2018

The Six Steps of Creating a Machine Learning Model in Knowi



Creating a Classification Model for Breast Cancer Screening

The UCI Machine Learning Repository contains many full data sets that can be used to test and train machine learning models. One such example is the Breast Cancer Wisconsin (Diagnostic) Data Set which relates whether breast cancer is benign or malignant to 10 specific aspects of the tumor. Based on this dataset, we can develop a model that will be able to determine the likelihood of breast cancer being benign or malignant.

The process of using machine learning to analyze data is made easy with Knowi Adaptive Intelligence. Given a training dataset, Knowi can apply either classification or regression algorithms to build valuable insights from the data.
Here is a step-by-step guide about how to turn that data into a powerful machine learning model using Knowi:

1. Create the Workspace and Upload Data

To start the machine learning process, go to www.knowi.com. If you are not already a Knowi user, sign up for a free trial to complete this tutorial. Once in, go into the machine learning section that can be found on the left-hand side of the screen. From there, start a new workspace and you will be given a choice of either making a classification or regression model. In the case of the breast cancer example, the workspace will be classification due to the nature of the data where the variable that we are predicting will always fall into either of two categories. Next, upload the Breast Cancer Wisconsin (Diagnostic) Data Set.







2. Choose Response Variable and View Full Dataset

After uploading, and possibly manipulating the file, chose the Attribute to Predict from the drop-down list. In the case of the breast cancer data, the attribute that is being predicted is the class of the tumor. Following the choice of the prediction variable, the initial analysis takes place by using the Analyze Data button. This displays the data on the screen and allows an opportunity to scroll through the data looking for patterns.

3.  Prepare the Data

After analyzing, data preparation begins. Data preparation is an optional, wizard-driven process that involves going through a step-by-step process where the program confirms the training set datatypes, identifies and allows for the removal of outliers, reports missing data with the option to remove or impute values, allows for rescaling of the data, groups into discrete bins and, finally, provides the option to create dummy variables. All decisions can be changed by moving backward and forwards through the steps at any time.

For the Breast Cancer data, a small amount of rescaling and grouping were necessary to increase accuracy.


4. Feature Selection

Whether you came in with prepared data, or just finished the process, the next step is to select which variables to be used in the model. To make this decision it is essential to check back at the data, looking for patterns and correlations.




5. Create and Compare the Models

At this point, you are left with choosing between the available algorithms (i.e. Decision Tree, Logistic Regression, K-Nearest Neighbor, or Naive Bayes). Knowi makes it easy to choose all available and compare them with useful attributes such as accuracy or the absolute deviation. Pressing the little eye next to the model created in the results section will show a preview of the input data along with the predictions of the program. Next, to the eye, there is a plus sign that, when pressed, will display the details of that specific model. It is beneficial to produce many models and tweak settings each time to find the best one for the situation. All past models are saved in the history and can be viewed, compared, and even published.


6. Publish

The last step is publication. This step involves the button next to the plus sign. Upon publishing, a prompt to name the model will be displayed. It is possible to publish as many models as needed from the same data. All models that are created can be viewed and compared directly in the ‘Published Models’ tab within Machine Learning. 






How to Apply a Model to a Query

Now you have officially created a machine learning model that can seamlessly be applied to any query. To integrate it into a dataset simply press ‘Apply Model’ while performing a query and this will add a field where all the machine learning models will be available to be selected and used. Pressing the preview button on the screen will show the data along with the predictions made by the model.















Actions from Insight Made Easy

With those six steps, you have a machine learning model that can be integrated into any workflow and create new visualizations and insights that will drive downstream actions. The applications of the machine learning model are endless and can be tailored to the individual need. Once a model is made and put in place, there are many actions that can be performed to gain meaning and spark reactions. This is done through trigger notifications. A trigger notification is a notification that will act in the case that a certain condition is met. In the scope of the breast cancer machine learning model, an alert can be set to email a doctor the patient’s information in the situation that the model found a tumor to be malignant. This enables more than just insights, it generates action.

Summary

The process of creating a model within Knowi is so easy that anyone can do it, and it starts by simply uploading a dataset. Data can be uploaded from a file, SQL, and NoSQL sources, along with REST-APIs. Following the uploading of a file, Knowi has built-in algorithms available, or the option to create your own, along with a designated page to review multiple factors and evaluate the best algorithm for your situation. Using this method, the Breast Cancer training data was loaded from the UCI Machine Learning Repository into a Knowi workspace, then analyzed with the built-in data prepping tools. The resulting model was ready to be integrated into any workflow and autonomously perform actions based on the results, such as sending an alert to a doctor depending on the outcome of the test. Give Knowi a try and see how easy visualizing and learning from your data can be.


References

Dheeru, D., & Karra Taniskidou, E. (2017). UCI Machine Learning Repository. Retrieved from University of California, Irvine, School of Information and Computer Sciences: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Knowi. (2017). Adaptive Intelligence for Modern Data. Retrieved from Knowi Website: www.knowi.com









Wednesday, June 6, 2018

MongoDB Analytics Tips and Traps

Free Advice When Starting a MongoDB Analytics Project




MongoDB excels at analytics. Every week, we have customers building MongoDB analytics, queries and dashboards to help their business teams uncover new insights.  We have seen outstanding performance from good practices and have seen issues with common bad practices.

We have customers performing many different types of analytics from performance monitoring, usage metrics, financial and business performance.  These are general tips and traps that our customers mention when starting to do analytics on MongoDB. It’s not an extensive list but a start.

Arrays and Query Performance

There are some inherent difficulties involved when working with embedded documents and arrays which can impacts your query performance.  It’s best think simple when it comes to your documents so you can optimally leverage indexing for query performance. Not to say, don’t use nested arrays but avoid, if possible, arrays that are deeply nested or at least be aware that your query performance may be impacted.
Knowi has a handy tool called Data Explorer to allow you to explore your nested arrays so you can see what data is available to query.  This can help you quickly determine if you might need to set up additional indexes to more efficiently access that data.

MongoDB Join Collections or, well, Anything.

Some people think joins are the root of all evil.  We don’t but you can’t take a willy-nilly approach to Joins in Mongo either.  Think about joins early to ensure you don’t kill your query performance, especially if joining across multiple sources.  For example, if you have a logically key that would join Mongo collections, you probably don’t want that buried deep in a nested array and you’ll want to index it.
Knowi allows you to join across collections as well as with other NoSQL, SQL or REST-API sources.  Just tell us the join key(s) and the type of join you want to perform and we take care of the rest.  We optimize join performance to allows joins across large collections/datasets.

Use Mongo Query Generators

MongoQL is powerful but comes with a pretty steep learning curve.  An intuitive query builder will not only simplify the query building process but enable more people to build queries.  Drag and Drop query based query builders allow data engineers to leverage MongoDB data query functionalities without necessarily knowing MongoQL.
Knowi auto-generates MongoQL through an intuitive drag and drop interface.  It’s designed for data engineers who understand the data but may not be fluent in MongoQL. Easily group, filter and sort data without writing any code.

Trap:  ODBC drivers; SQL Wrappers, Mongo BI Connector

Don’t be fooled into thinking you can simply add a SQL layer on top of your MongoDB and your existing analytics tools will work.  You are about to perform unnatural acts on your MongoDB data because SQL-ifying your MongoDB data is as bad as it sounds. You will be defining schemas, moving data and building complex ETL or ELT processes.  Potentially, worst of all, you will have to determine very early on what data is “important”. Important enough to move, transform and duplicate.
There is a significant cost to adding a SQL layer on top of your Mongo data starting with your initial project costs and timeline. Post implementation, changes are extremely expensive because you have a lot of moving parts between your end users and the data they are trying to analyze.  For example, adding a new field requires schema changes and changes to ETL processes which can be weeks of work.

The real cost here is the ability for the business to experiment with analytics on their Mongo data.  Delivering the first visualization usually just spawns more questions so enabling experimentation is important.
Native integration matters when it comes to enabling business self-service and experimentation.  Knowi natively integrates to MongoDB so there is no need to move data out of MongoDB and no transforming data back into relational structures.  By eliminating the ETL step, you effectively accelerate delivery of your MongoDB analytics project by 10x. Changes are as easy as modifying the query to pull the new field and it is immediately available for use in visualizations.

Managing Visualization Performance and Long Running Queries

Long running queries is a common issue when it comes to analytics and MongoDB is not immune to the issue. No matter how much you optimize your queries and Mongo, if you are dealing with large datasets or complex joins, your queries will take time to execute.  Your end-users will not be patient enough to wait if its more than a few seconds which will make them less likely to adopt your analytics platform.

To provide excellent end-user visualization experience, consider adding a layer to persist query results.  This persistence layer could be another MongoDB instance that is used to cache query results for the purpose of powering reporting, for example.
Knowi comes with an optional persistence layer which is essentially a schema-less data warehouse called Elastic Store.  Queries can then be scheduled to run every x min/hours/days and visualizations automatically go against data in Elastic Store and not directly against your Mongo database.

Conclusion

MongoDB is an excellent data source for analytics but most traditional analytics tools like Tableau, Qlik, Looker, etc. do not work directly against MongoDB.  This is the biggest trap when implementing MongoDB analytics projects. You will have to introduce a number of intermediary steps to extract data from MongoDB, transform it and load it into a pre-defined relational schema for any of these tools to work.  While this sounds straightforward, the devil is in the details in the fact you are adding significant time and cost to your MongoDB Analytics projects. On top of that, you are immediately limiting what data business teams can use for analytics and therefore limiting their ability to experiment to gain new insights.
Instead, look at new analytics tools that are purpose-built for performing analytics on NoSQL databases, like MongoDB.  Native integration matters when it comes to accelerating delivery of your analytics project and building a data architecture that is sustainable and scalable over time.  Knowi is one such tool but there are others.

Resoures:  

Monday, May 7, 2018

Knowi Product Update Q1 2018

PRODUCT UPDATE - LIVE DEMO

To see these exciting new capabilities in action, please join Lorraine Williams, Head of Success at Knowi, for a web demo on Wednesday, May 16th at 10:30 AM PT.

Below are the highlights of our product updates for this quarter.  Enjoy!

Knowi Product Update Q1 2018

Ready. Set. Data Prep.

We now offer a full data preparation suite that helps guide you through the steps necessary to ensure that your data is machine learning ready.

After selecting the training dataset and your prediction variable, the system will provide statistical distribution information on the metrics along with a data an easy to use wizard that will guide you through a series of steps to prepare your data for analysis.


These steps include verifying/amending the data types, identifying and dealing with outliers and missing values, rescaling of attributes using either normalization or standardization techniques, the creation of discrete groupings of your data and finally the creation of dummy variables. The ability to then manually select or have the system select model features is then available.






Learn More >

Join Our Community

We have revamped the Knowi documentation site to include a Community portal that will allow you to post and answer questions on how to do something, common getting started questions and even the occasional undocumented "feature". We're excited to enable you guys with a channel to share your ideas, provide expertise, and to give us feedback on features and wishlist items. 

Please join the Community by going to our docs page and creating a post or leaving a comment.

The 'What's New' section contains our product release notes and the 'Knowledge Base' section contains the product documentation.




Lovin' Query Management

Query Revert

The system now has the ability to revert to a prior version of a query from the Query History page. When reverted, the existing query will be overwritten with the selected version. 

Knowi - Show Query History

Query Cancel

The system now supports the ability to cancel a running query. The behavior of this action depends upon datasource type and may not result in the ability to actually kill the query on the database itself. However, the thread that is executing the query will be interrupted, thereby allowing it to disconnect and return. A new run of the same query can then be executed.

Knowi - Query Cancel


Pivot! Pivot! Pivot!

We've added a new Pivot table visualization. This supports easy pivoting of data without the need for the Cloud9QL transpose function of the regular Data Grid. Simply drag the available fields into the following areas to slice and dice the data accordingly: 
  • Columns 
  • Rows 
  • Values 
  • Filters 
Knowi - Pivot Table Visualization

Other Cool Stuff

Widget Sharing

Prior to this release, widgets by themselves could not be shared, unless they were part of a shared dashboard. This has changed and widgets now have the ability to be shared with both users and groups in isolation and in either View or Edit mode.

Login As (For Admin Users)

Admin user can now log in as another user on their team. Once logged in, they are able to perform any action that the accessed account has access to. When logged in as another user, the system will display the accessed account username at the top of the screen.

API Pagination

Modified the API pagination function to be able to also supporting a next page URL parameter. This enhances the current method of using a page token such as a page number of specific flag

Data Explorer

We have added the ability to expand/collapse the sample fields from within the Data Explorer at all nested levels 




Tuesday, April 10, 2018

4 Simple Steps to Create Reports and Drive Actions from REST APIs



Real-time Customer Support Team Performance Reporting Using Zendesk APIs


APIs are everywhere these days. While most SaaS products now offer API access, interacting with APIs, gathering insights and driving actions, typically requires custom implementations to manage and maintain.

This post walks through how to query and drive actions from API’s with Knowi, using a real-world use case with Zendesk APIs to alert our customer success team.

Knowi is a new kind of analytics platform designed for modern data architectures, including:

  • Calling API endpoints
  • Prep and transform the results
  • Multi-source Joins to join results with other API calls and/or NoSQL/SQL databases
  • Visualizing and triggering actions on the data
  • Integrating machine learning for forecasting, anomaly detection, predictive analytics.

Use Case

Our customer success team needed alerts on tickets under certain conditions.  For example, new tickets that have not been responded to for over 3 hours.

At a high level, this boils down to:
  • Retrieve Ticket data from Zendesk
  • Join with another REST call in Zendesk to retrieve Submitter data
  • Transform/Manipulate the data (using Cloud9QL, a SQL like syntax to filter the results)
  • Setup a Trigger Notification into Slack with a list of tickets that require attention

Connecting to Zendesk API

First step is setting up a REST datasource in Knowi, by specifying a base URL (https://<yourdomain>.zendesk.com/api/v2), along with the authentication method. Zendesk supports basic user/password auth, so we’ll use that. Knowi will pass along the authentication as part of the API calls into subsequent requests.


Knowi REST API Query

Calling Endpoints


Retrieving Ticket Data

The API endpoint is
GET /tickets.json

The API returns tickets in a nested JSON array, with associated details for each ticket.

{
 "tickets": [
   {
     "id":      35436,
     "subject": "Help I need somebody!",
     ...
   },
   {
     "id":      20057623,
     "subject": "Not just anybody!",
     ...
   },
   ...
 ]
}
To query this in Knowi:
  • Specify the end-point to query. In this case, it’s /tickets
  • Pass along URL params sort_by=created_at&sort_order=desc&include=comment_count to get the most recent tickets first and include the comment count in the ticket.
  • Use Cloud9QL to manipulate the results. For example, to get the total tickets by date from the results, the syntax would be:

select expand(tickets);
select count(*) as count, date(created_at) as week
group by date(created_at)

This expands the nested array to calculate total tickets by date. This can be immediately visualized and added into a dashboard (for sharing and embedding).

Knowi RESt API data visualization


To get the new tickets that have not been responded to:


1. Retrieve Ticket data from Zendesk

Using the following Cloud9QL to get the relevant fields from the nested tickets array.
select expand(tickets);
select id as ticket_id,
assignee_id,
created_at,
description,
status,
submitter_id,
subject,
type,
recipient,
requester_id,
comment_count
order by ticket_id desc;
Knowi REST API data transformation


2. Join with another REST call in Zendesk to retrieve Submitter info

Knowi supports Joins with other API endpoints (or disparate databases). Since the tickets only contain submitter id, get the submitted name/email from a second endpoint (/users) to join it with the first set of results.


3. Post-Join Data Transformation

The last step of the data processing is to filter the results to what we are looking for. Specifically, we are looking for tickets in open status, and not created by internal users. In addition, we are only looking to receive alerts during business hours, Pacific Time..
Knowi REST API joins

4. Action: Setup a Trigger notification into Slack


With this data, now we are ready to drive an action. Knowi provides Triggers to drive calls into other API’s, emails or Slack. In this case, we’ll send a slack notification to our Customer Success team with the list of tickets that require attention attached.

Triggering actions from REST APIs with Knowi



Knowi Slack Alert Configuration

Summary

We pulled support tickets from Zendesk through the Zendesk Ticket API. We visualized tickets by date. We added Submitter information to create a blended dataset that we could use for alerting. We created an alert to send a list of open tickets that exceeded our response threshold of 3 hours to our customer support Slack channel.

Try it on your own APIs. See our REST API Playground to point to your own.

Resources: