Wednesday, June 6, 2018

MongoDB Analytics Tips and Traps

Free Advice When Starting a MongoDB Analytics Project

MongoDB excels at analytics. Every week, we have customers building MongoDB analytics, queries and dashboards to help their business teams uncover new insights.  We have seen outstanding performance from good practices and have seen issues with common bad practices.

We have customers performing many different types of analytics from performance monitoring, usage metrics, financial and business performance.  These are general tips and traps that our customers mention when starting to do analytics on MongoDB. It’s not an extensive list but a start.

Arrays and Query Performance

There are some inherent difficulties involved when working with embedded documents and arrays which can impacts your query performance.  It’s best think simple when it comes to your documents so you can optimally leverage indexing for query performance. Not to say, don’t use nested arrays but avoid, if possible, arrays that are deeply nested or at least be aware that your query performance may be impacted.
Knowi has a handy tool called Data Explorer to allow you to explore your nested arrays so you can see what data is available to query.  This can help you quickly determine if you might need to set up additional indexes to more efficiently access that data.

MongoDB Join Collections or, well, Anything.

Some people think joins are the root of all evil.  We don’t but you can’t take a willy-nilly approach to Joins in Mongo either.  Think about joins early to ensure you don’t kill your query performance, especially if joining across multiple sources.  For example, if you have a logically key that would join Mongo collections, you probably don’t want that buried deep in a nested array and you’ll want to index it.
Knowi allows you to join across collections as well as with other NoSQL, SQL or REST-API sources.  Just tell us the join key(s) and the type of join you want to perform and we take care of the rest.  We optimize join performance to allows joins across large collections/datasets.

Use Mongo Query Generators

MongoQL is powerful but comes with a pretty steep learning curve.  An intuitive query builder will not only simplify the query building process but enable more people to build queries.  Drag and Drop query based query builders allow data engineers to leverage MongoDB data query functionalities without necessarily knowing MongoQL.
Knowi auto-generates MongoQL through an intuitive drag and drop interface.  It’s designed for data engineers who understand the data but may not be fluent in MongoQL. Easily group, filter and sort data without writing any code.

Trap:  ODBC drivers; SQL Wrappers, Mongo BI Connector

Don’t be fooled into thinking you can simply add a SQL layer on top of your MongoDB and your existing analytics tools will work.  You are about to perform unnatural acts on your MongoDB data because SQL-ifying your MongoDB data is as bad as it sounds. You will be defining schemas, moving data and building complex ETL or ELT processes.  Potentially, worst of all, you will have to determine very early on what data is “important”. Important enough to move, transform and duplicate.
There is a significant cost to adding a SQL layer on top of your Mongo data starting with your initial project costs and timeline. Post implementation, changes are extremely expensive because you have a lot of moving parts between your end users and the data they are trying to analyze.  For example, adding a new field requires schema changes and changes to ETL processes which can be weeks of work.

The real cost here is the ability for the business to experiment with analytics on their Mongo data.  Delivering the first visualization usually just spawns more questions so enabling experimentation is important.
Native integration matters when it comes to enabling business self-service and experimentation.  Knowi natively integrates to MongoDB so there is no need to move data out of MongoDB and no transforming data back into relational structures.  By eliminating the ETL step, you effectively accelerate delivery of your MongoDB analytics project by 10x. Changes are as easy as modifying the query to pull the new field and it is immediately available for use in visualizations.

Managing Visualization Performance and Long Running Queries

Long running queries is a common issue when it comes to analytics and MongoDB is not immune to the issue. No matter how much you optimize your queries and Mongo, if you are dealing with large datasets or complex joins, your queries will take time to execute.  Your end-users will not be patient enough to wait if its more than a few seconds which will make them less likely to adopt your analytics platform.

To provide excellent end-user visualization experience, consider adding a layer to persist query results.  This persistence layer could be another MongoDB instance that is used to cache query results for the purpose of powering reporting, for example.
Knowi comes with an optional persistence layer which is essentially a schema-less data warehouse called Elastic Store.  Queries can then be scheduled to run every x min/hours/days and visualizations automatically go against data in Elastic Store and not directly against your Mongo database.


MongoDB is an excellent data source for analytics but most traditional analytics tools like Tableau, Qlik, Looker, etc. do not work directly against MongoDB.  This is the biggest trap when implementing MongoDB analytics projects. You will have to introduce a number of intermediary steps to extract data from MongoDB, transform it and load it into a pre-defined relational schema for any of these tools to work.  While this sounds straightforward, the devil is in the details in the fact you are adding significant time and cost to your MongoDB Analytics projects. On top of that, you are immediately limiting what data business teams can use for analytics and therefore limiting their ability to experiment to gain new insights.
Instead, look at new analytics tools that are purpose-built for performing analytics on NoSQL databases, like MongoDB.  Native integration matters when it comes to accelerating delivery of your analytics project and building a data architecture that is sustainable and scalable over time.  Knowi is one such tool but there are others.



Post a Comment

Please share your thoughts. If you have a question, please reference our docs section at