Integrating Einstein Predictions within Tableau using API’s
At Tableau Conference-ish, October 2020, plans were announced on the integration of Tableau and Tableau CRM (formerly Salesforce’s Einstein Analytics solution). This will allow for very complex machine learning model predictions to be seamlessly integrated into the Tableau analytics workflow. Until this integration is released there are other approaches to get predictions generated by Einstein models efficiently into Tableau.
Before we look at these approaches, we should consider the benefits of using Einstein predictive models. Einstein’s machine learning approach uses past data to predict what will happen in the future with minimal programming. This allows users to work more efficiently, identify patterns based on their datasets faster, and enables predictive modeling and recommendations capabilities that would otherwise require an advanced coding skillset.
More specifically, Einstein predictions:
- Helps users find insights by analyzing millions of rows of data within minutes to discover patterns that a human would never be able to find.
- Projects AI-driven predictions and recommendations based on the observation that has been derived from the data patterns.
- Empowers everyone from CEOs to customer service agents and sales managers with readily accessible AI-powered insights, predictions, and recommendations.
- Allows administrators and engineers to construct a prediction model and custom artificial intelligence shows the top predictive factors for every prediction.
- Cautions clients if there is a probability of bias in a dataset via a pop-up alert. Potential bias shifts on a case-by-case basis and comes from skewed data sets .Customers can create protected fields in data sets.
- Contains model metrics features that show how your models will perform before, during, and after you roll them out. This allows clients to estimates the quality of the model and understand the accuracy of the predictions. Clients can give direct and indirect feedback to models to constantly refresh and become more accurate.
What Is a Prediction?
A prediction is a forward-looking estimate based on historical data. Today, businesses are using artificial intelligence (AI) and machine learning (ML) to generate predictions and embed them in business workflows. Examples of the types of predictions businesses use include:
- Scores: For example, how likely is an invoice to be paid in the next three months?
- Estimates: For example, what will be the closed amount of an open opportunity?
- Classifications: For example, what should be the priority of a new incoming service case?
Businesses combine predictions with business rules to deliver smarter, more efficient workflows and better outcomes.
Fictious use case
For the rest of this article we are going to use a fictious use case to illustrate how we can combine Tableau analytics with Tableau CRM predictive models. Our fictious use case is based on dummy data for New York City (NYC) public schools; NYC schools are trying to prioritize which of their 1,234 schools to safely open during the COVID-19 pandemic. We already have data for NYC schools that were recently opened and we want to find a way to minimizing the number of sick days for schools that are currently closed. In short, we are trying to predict which schools that are currently closed can be opened whilst minimizing the risk to students and staff. Our data has a lot of information that initially looks like it should be considered when making the decision on which schools to open, but we are struggling on how much weight each factor should be given and are concerned that our analyst implicit bias could skew the outcome as well.
Data that we think is highly relevant includes:
- Schools have motion sensor doors
- Staggered start and end to the school day
- COVID-19 rate in local population
- Average tenure of teaching Staff
- Average tenure of administrative Staff
- % of staff who have taken safety training
- Count of masks on hand
- Student sick days
Using Einstein we are going to create a predictive model (referred to as a “Story” within Einstein terminology) using data from the schools that recently opened. We will use the Story to create a prediction of the number of sick using data for our schools that are currently closed. Ideally, we want to a way to harness the power of Einstein Predictions with the ease of asking the next question in Tableau. As new data arrives, being able to ask questions, generate what-if analysis is standard with Tableau, leveraging the predictive capability of Einstein as part of this work flow allows very complex model predictions to be easily included in the analytical flow.
We are assuming the model has been created, as this is a simple workflow within Einstein, and we will look at how we integrate this model with Tableau.
We want to use our model in Einstein story to gain insights into new data that we receive on a regular basis about schools that are currently closed. We could look at predictions for individual schools, or also look at broad brush “best” options for the whole dataset.
If we could look at each school individually to assess its predictive risk Einstein also tells us the top three leading contributors to the predicted risk. In aggregate we could look across the whole school system to see what are the driving factors we should consider; This might be used to drive changes in policy, as opposed to a school by school assessment. Think of this as a macro decision making, as opposed to micro decision making.
Once we know what our leading contributor are, we may want to perform what-if analysis. This allows us to see what our model predicts as we make changes to the leading factors. This sounds useful, but if generating these what-if insights is painful and time consuming, requiring you to spend more time manipulating data than analyzing results, it could quickly become very frustrating, especially if the data changes frequently. Now imagine you could get all the power of the Einstein predictions, whilst staying in Tableau, generate new what-if analysis, then compare the outcome of your new predictions against previous predictions. This is possible through the Einstein Prediction API.
The workflow can be broken down into two similar areas. Firstly, we have new data for schools that are currently closed that we want to generate predictions for. Secondly, we have generated predictions for our new data and we now want to perform some what-if analysis by changing some data values based on what our new predictions told us; The result is we can compare the what-if prediction against our new “base” prediction.
In practical terms, we have data that we are viewing in Tableau that we want to pass to an existing Einstein Story. We receive the generated predictions back from the model and then save these values, whilst associating them with the original data in Tableau. Still in Tableau, we want to see the predictions, and after analyzing them, generate some what-if analyses by changing some of the values in our base data, starting the processing over again. Finally, we want to be able to compare the outcome of different predictions to help us decide what changes could be effective
The two workflows are similar in nature in that:
- In Tableau, we select the set of data we want to send to Einstein, so that predictions can be generated
- We want to save the generated predictions in a database so we can pull them back in to Tableau
- Once the predictions have been saved, we want to associate them with the original data in Tableau and display the original data and the predictions.
Fortunately, both Tableau and Tableau CRM have APIs that allow us to programmatically create the workflow described above.
Tableau has several APIs, but in this use case we want to use the Extension API (https://help.tableau.com/current/pro/desktop/en-us/dashboard_extensions.htm). This API allows us to interact and communicate with Tableau dashboards.
The extension is embedded directly in a workbook and allows us to run Java Script within our dashboard, optionally build a user interface (think of embedding a HTML page inside a Dashboard), and listen for changes made on a worksheet. An Extension is served up from a web server. We can push up values from our Extension to the web server then have the web server execute some logic, and pass back information to the Extension in Tableau.
In our use case we have new data that we want Einstein to generate predictions on. As a result, we have an extension that listens for selection of marks on a Tableau chart and POSTs the data associated with these marks to the web server.
The Web Server is the component that integrates the Tableau Extension, the Einstein Story, and saves the generated predictions to a database. I used Apache Tomcat, with a Java servlet, not because that was necessarily the best choice, but because it was technology I was familiar with.
The Servlet code listens for GET or POST requests from the Tableau Extension. It sends POST requests to the specified Einstein Story along with the data passed up from the Tableau Extension. It then runs INSERT statements against the specified Database to save the Einstein generated predictions.
Once you create an Einstein Story you can expose it to the Einstein Predictions API. This allows you to send POST requests programmatically to the story and generate new predictions for data that is passed in. At the Web Server, each prediction we want to use has its own properties file that contains information that is used to communicate with the Einstein Story via the API. The Tableau Extension asks the webserver for a list of the defined Stories allowing users in Tableau to select the Story they want to use.
Example of an Einstein Model
We need somewhere to store the generated predictions so we can show them in Tableau and compare predictions against each other if we are doing what-if analysis. The Web Server inserts the returned predictions into a Database. Once the values have been inserted, the Web Server tells the Tableau Extension that the predictions have been saved. The Extension tells the dashboard to refresh its datasource so that the dashboard is now looking at the latest data.
The database used to store the generated prediction values is open to choice. The only perquisites are that it is one of the many supported databases that Tableau connects to and that both the Web Server and Tableau Server hosting the dashboard can connect to it.
In our example using dummy NYC school data, the NYC data is in the grey tables and the purple tables contain the generated Einstein predictions data. The Einstein tables are designed to store the Einstein Story predictions and to be easily joined to external data, so that data different from NYC schools could be plugged in
To generate a new “base” prediction for closed school data, we just need display this data within Tableau and enable the Tableau Extension that will allow us to communicate with Einstein.
- Select the data that you want to generate a prediction for, in this case we are selecting all the available schools. The Extension is listening for selections, and knows which values have been selected
- Using the Tableau Extension, we confirm:
- Which Einstein Story we want to generate predictions for (these are defined at the Web Server that also serves up the Tableau Extension)
- The columns of data we want to pass to the Einstein Story for each school. In this case we are passing all columns of data as the model has been created using all these columns. (Note: If you pass up columns that the model is not expecting, they are ignored)
- We can now give the prediction a unique name and press the
- Generate button
Once we have generated our “base” prediction we can analyze the predictions in multiple ways. In this example, as the predicted number of sick days for each school comes back with the top 3 leading causes that drove the prediction, we could look at the top leading causes across all 1,234 closed schools. We can see what overall factors appear to be increasing or decreasing the predicted number of sick days.
In the Leading Causes Across All Schools viz we report the aggregate of the Leading Causes across all the currently closed schools.
For instance, looking at the top row we have “Motion Sensor Doors”, with a value of No. Moving left to right, we see the predicted number of sick days for all the schools that fall within this category. Next we see that 744 schools do not have “Motion Sensor Doors”. The last column shows the total number of additional sick days that this is predicted to cause. This view allows us to immediate see what factors are driving an increase or decrease in predicted sick days across all our closed schools and the number of schools affected.
Scrolling down this report we can see the reverse: we are now looking at leading causes that reduce the number of predicted sick days
Another intuitive way to view this data is with a Waterfall chart. We can easily see what factors are driving the predicted values across all the schools
Another way of looking at the “base” prediction is to see the predicted number of sick days for the closed schools at a more granular level.
This viz shows the predicted sick days for all 1,234 schools in three different ways:
- Top left, a heat map that shows the range of predicted sick days and the concentration.
- Left middle breaks out the predicted sick days into cohorts
- The map top right shows the schools and predicted sick days indicated by color
This visualization also allows us to generate new what-if predictions. The top of the report has five factors that our leading causes analysis showed were major factors. We can change the parameters’ values and generate new predictions. This allows us to see what the impact of the change might be. We can also compare what-if predictions against our “base” prediction to see the overall effect at a macro or micro (school) level.
For example, schools that have use a staggered start to the school day are predicted to have significantly fewer sick days. What would our sick days prediction be if we made all schools have a staggered start?
Generate a new what-if prediction
Make all schools have a staggered start
- Set all schools to staggered start
- Select the schools to generate the prediction for, in this case we selected at 1,234 schools
- Create a unique name for the prediction
- Press the generate button
We can now compare our base prediction with our new what-if prediction of all schools having a staggered start.
In details: 1 & 2 show the predictions we are comparing. 3 & 4 show the total number of predicted sick days. We see that making all schools have a staggered start reduces the predicted sick days across all the schools by roughly 50,0000 days. 5 shows schools in the base prediction that did not have a staggered start, and in the new prediction this is not present as we made all schools have a staggered start.
In summary we have seen how we can integrate the power of Einstein predictions with the ease of asking the next question of Tableau. Our fictitious use-case just starts to explore how you could potentially make informed decisions using Tableau and Tableau CRM. In subsequent posts I will dive into the code that makes this possible. We will look at the Tableau Extension code and then the code for the Java Servlet.