Try out Predictive Analytics in Tableau using Einstein Discovery
Posted On
Posted By Dan Bradley
Are you curious about incorporating predictive analytics into your Tableau vizzes but not sure where to start? You may want to try the Einstein Discovery and Tableau integration.
Einstein Discovery is the machine learning/predictive analytics tool under Salesforce’s CRM Analytics platform. Now that Tableau is part of the Salesforce family, there are new ways to integrate Einstein Discovery’s predictive functionality into Tableau.
This post is a step-by-step guide to build and deploy a simple predictive model. After building the model, I connect to it via Tableau Prep Builder to score individual records of a dataset.
To follow, I assume you already have an active Tableau license with access to Tableau Prep Builder. If not, you can sign up for a Tableau trial here.
The big disclaimer: All of the data used here is fictitious and intentionally contrived for the purposes of this developer, proof-of-concept demonstration guide. The intent of this post is to introduce the mechanical basics of building, deploying, and connecting to a predictive model in a developer environment using Einstein Discovery and Tableau. The critical steps of evaluating and tuning a model for accuracy for use in a production scenario are not covered here. Trailhead includes several modules introducing these concepts and working with your friendly, local data scientist team is recommended before deploying predictive modeling into any production situation.
Step 1: Sign-up for a Salesforce Trailhead Account
You’ll want to sign up with Salesforce’s Trailhead program, if you haven’t already. Trailhead is a free learning environment covering the universe of Salesforce products (Tableau included). The process is fast and will help you keep track of what you have learned or may be interested in learning. There are a number of lessons, or “trails”, that focus on aspects of Einstein Discovery that are far beyond what is covered in this post.
If you aren’t yet an existing Salesforce user, use the sign-up options in the second row in the image below.
Sign-up for a Trailhead account with Salesforce, if you don’t already have one.
Step 2: Start the Einstein Discovery Basics Trail
Once you have an account, the Einstein Discovery Basics trail provides an in-depth guide introducing the ins and outs of Einstein Discovery. To follow along with this post you aren’t required to complete the trail first, but you’ll certainly want to complete it at some point. At the very least, I’d recommend the first two modules: “Get to Know Einstein Discovery” and “Build Your CRM Analytics Dataset”.
Some of the free modules contained in the Einstein Discovery Basics Trail on Trailhead.
Step 3: Sign-up for an Einstein Discovery Developer Edition Org
An Org in Salesforce parlance is an account. If you started the Einstein Discovery Basics Trail above, you’ll be prompted to do do this step in the “Build Your CRM Analytics Dataset”. Alternatively, you can use this link to jump straight to the sign-up page.
In addition to a Trailhead account, you’ll need to sign-up for a Developer account, which is the trial environment used to train and deploy the predictive model.
Signing up for a Developer Edition Org is essentially signing up for a trial environment, so there are limitations. Even if you already have a Developer Edition Org, you’ll still want to sign-up for this particular Analytics developer org version; it contains the correct features and configurations required to follow along here. Note: as this is a developer edition account, there are limitations as to how it can be used and the amount of data supported – see the Terms of Use.
After signing up, you’ll receive an email that prompts you to activate your org and set a password. Make sure you keep both your username and password handy as it will be needed later.
Step 4: Upload Part 1 of the Sample Dataset
After activating the org, you should be redirected directly into your trial environment. If not, use thislink to sign-in using your username and password. Next, navigate to the CRM Analytics app or “Analytics Studio” as show below. This is where we’ll upload the dummy dataset.
Access the “Analytics Studio” app by clicking on the upper left hand grid icon once you are logged in to your Salesforce Developer account.Once in the “Analytics Studio” app, we need to upload the data to train our predictive model. Click the Create button and select Dataset.We’ll be using a CSV dataset in this example, so select the first option.After you’ve downloaded part 1 of the dataset from the link below, upload it using the interface.
As this post is intended for a higher education audience, I’m using an enrollment management dataset, part 1 of which can be downloaded here. This data that will be used to build and train our predictive model, or “Story” in the jargon of Einstein Discovery. Part 2 of the data set includes the records we will be using the model to score the likelihood of each admit enrolling (link below).
Give the dataset a name, but leave the other options as default.This interface allows you to make modifications to the data type for fields in the uploaded dataset. In this example, leave everything as default and click “Upload File”.It may take a few minutes for your dataset to be uploaded into your environment.
Step 5: Create the Einstein Discovery Story or Predictive Model
With the dataset uploaded, the fun part begins. We need define what we are trying predict and what fields should be used to predict it. For example, here we’re trying to predict an applicant’s likelihood to matriculate based on all of the other information we have about them. Einstein Discovery’s interface walks you through a series of screens to help you with this process. It also provides feedback to help you assess how appropriate and what amount of confidence you should have in the developed model to base predictions on.
Depending on whether you’ve closed out the data upload waiting screen, you’ll either be directed to a screen that looks like the one just below, or you may need to navigate to the newly loaded data set from the main interface, which is shown in the second screenshot. Either way, you’ll be looking for the “Create Story” button:
Depending on how patient you are with the uploading screen, you’ll either be redirected to this interface or the one in the image below. The goal is to access the “Create Story” button, which is seen here in the upper right corner.If you aren’t taken to the prior image, you may need to navigate to the list of all uploaded datasets. On the left hand menu, select “All Items” under the browse category. Click the “Datasets” tab in the middle of the screen to filter the list only to datasets. Finally, find the name of the dataset you just uploaded on click the dropdown “more” arrow on the far right. You’ll be shown a “Create Story” option. Click on this option.You’ve now entered the “Create Story” interface, which is how we configure the predictive model we want to build and train.The first option defines what the goal of the predictive model is – what we want to analyze and ultimately predict. In this case, we want to predict the likelihood of an applicant in the dataset enrolling. Specify “Enrolled” as the field we want to build the prediction on.Next, we specify that we want our prediction to maximize that enrollment occurs, or as it is coded in the dataset the value is “True”. Give the predictive model, or story a name that describes the goal of the prediction.The next few screens provide options on whether we want to just generate diagnostic insights related to our story or predictions as well. We want the latter in this example.Depending on how well you know the dataset, you may want to take manual control over which fields you want to include in the model, or just allow Einstein Discovery to make the decision for you based on its assessment. In this case, we’ll configure the fields manually.The last step before we actually create and train the model is to specify which fields we’d like to include as predictors. We’ll keep everything except for “ID Student” in this case. “ID Student” is just a unique identifier for each student record, so has no bearing over how likely an admit is to enroll.This may take a few minutes, but Einstein Discovery provides a status as it progresses through the model generation.Once complete, you’ll be taken to the diagnostic “Insights” screen, where you can explore some of the relationships between our predictor and “Enrolled is True” variable.
Step 6: Deploy your Model so that Tableau Prep Builder can Connect to it
Now that the story/predictive model has been trained, we need to make it accessible to outside applications, which in this case is Tableau Prep Builder.
In this example, we’re focused on the model component. Click on the “Model” tab to be taken to an overview of the model metrics. Notice that on the left side menu, you have options to copy R code to run in your own copy of R (See A) or delve deeper into evaluation metrics (B). For In this case, we’re also presented with a series of alerts and detected issues in the dataset (C). Einstein Discovery provides a checklist of recommended changes to improve the model or help determine whether the data is appropriate for what you are trying to predict. We won’t delve into them in this guide, but feel free to explore the recommendations on your own. When ready, click “Deploy Model”. The next series of screens walks through the steps to publish the predictive model so that it will be accessible by other applications, like Tableau Prep Builder.We’ll deploy as a new model and name it “Likelihood to Enroll”. Notice that Einstein Discovery will continue to alert you if there are issues detected in the data and model to help prevent poor predictive models from being inadvertently published. In this example, we’ll proceed, but explore on your own if you’d like.The next few screens tell Salesforce how to deploy the model – in this example, we’ll deploy without connecting to a Salesforce Object as we’ll be using the prediction with Tableau.We won’t segment the data here.We have an opportunity to indicate which variables are aspects that are within direct influence or control. For example, we may be able to influence the number of interactions applicants have with admitted students by encouraging admissions officers to reach out proactively to certain students. We also may be able to influence institutional grant amounts by working with the Financial Aid office in certain cases. Indicating these variables are actionable allows Einstein Discovery to surface prescriptive actions that may improve the likelihood of enrolling for certain students.Last, we review all of our configurations and click Deploy.You’ll be taken to this screen, which confirms the model has been published and will keep track of how it is used over time. You can also specify refresh cadences or notifications if model quality degrades over time. At this point, we’re done with our work in Einstein Discovery for this example.
Step 7: Open Tableau Prep Builder and Load the Data to be Scored by the Model
We now move out of the Salesforce interface and into Tableau Prep Builder. You’ll want to download and load into Prep Builder part 2 of our dataset, which is structured identically (same fields) as part 1. In the new dataset, we still have an “Enrolled” field, but we’ll be using this as a way to compare against our model’s prediction.
We’ve switched product gears and now create the tie in to our published model to score data in Tableau Prep Builder. After downloading part 2 of the dataset, load it into a new Prep Builder Flow in Prep Builder Desktop.
Step 8: Create a prediction step and sign-in to your Developer Edition Org
You’ll need the username and password you created for this step. After signing in, you’ll need to tell Tableau Prep which of the models you want it to use, which will likely be only one unless you’ve previously created and deployed a story.
Add a prediction step to your flow.Sign into your Salesforce developer account where the prediction model is published.You’ll be prompted to allow Tableau access to the model. After granting it, you’ll see a confirmation screen that you can close.Back in Tableau Prep Builder, select the Prediction Definition you named.The prediction definition screen should look something like this. You may see multiple predictions if you have published other models.
Step 9: Map fields between the Tableau Prep Builder flow and Einstein Discovery model
Many of the fields may automatically be matched but if any are missing you’ll need to specify them manually. Pay attention to any errors that Prep may through – I’ve found that most often the issue is related to a data type mismatch, e.g., a String field in your story which is a Boolean field in your Prep flow. If there are any data type mismatches, you’ll need to make the change to the data type in a prior step of the flow.
Last, we need to confirm the mapping of fields between our predictive model and the dataset we are working with in Tableau Prep Builder. Review the list to make sure everything matches and then click Apply.If Prep Builder is happy, you’ll see a new field added to your dataset called “Prediction”. This represents the % predicted likelihood that a student record in your Prep Builder dataset will enroll. For example, in the first row of the data table, student 2014028 has an 11.9% predicted likelihood of enrolling based on our model. Similarly, the second row for student 2025750 has a 94.1% predicted likelihood of enrolling based on our model.
Step 10: Evaluate the Predicted Results
If Tableau Prep Builder is happy and processes the results, you’ll see a new field is added to your dataset representing the percent likelihood of an applicant matriculating. You can use this probability to do all sorts of things – aggregate, bucket applicants, etc. When you output the flow, this field will be included.
From here, we can use the new prediction field in other steps of our flow. For example, we may want to interpret whether a student will or will not enroll based on a prediction likelihood threshold and add this as a true/false boolean field to the dataset.After adding a new cleaning step, create a calculated field.In this example, our calculated field returns that we are predicting the student as enrolling if our prediction value is above 37.7. Why 37.7? You may have noticed that this was the threshold value that Einstein Discovery recommended we use back in step 6. Of course, this can be adjusted and could even incorporate a parameter to allow for greater custom flexibility.The result of the new calculated field now transforms our numeric prediction value into a binary true or false value.In this dataset, we also already include the actual value of whether a student enrolled or not. We can use this to compare against our prediction to gain a sense of how well our model performed in this test dataset.After moving the “Enrolled” field (actual enrollment value for a student) next to our new “Predicted Enrolled” value, we can use the interactive nature of Prep Builder to highlight distribution of values. For example, when selecting “Predicted Enrolled” True, we see that we identified 585 or 95% of the actual Enrolled students correctly with our model. Conversely, of those students that we predicted would not enroll, there wee 34 students, or 5%, who did actually enroll, meaning that our prediction was incorrect.
Step 11: Include additional predictive fields from the Einstein Discovery model
A single predictive value is not the only insight you can use to enhance your dataset. On the prediction step, try experimenting with the other options under settings: top predictors and top improvements. Depending on you’ve configured your model and what fields are set as actionable, these additional fields can help provide prescriptive insights — suggestions on what you could do to increase the likelihood of an applicant matriculating
There is more information from the predictive model that can be brought into Tableau Prep flows. One option is to include the “Top Predictor” value for each record. In this example, we’re only bringing in the top predictor for each scored record. Checking this box will include both the predictor description and the numeric value that it is contributing to the total prediction score.Once included in the flow, you can sort the values in the field to see the most common top predictors.As with most Tableau Prep flows, the last step is to output your dataset, whether locally, to a .csv or Tableau hyper extract, or to a shareable location such as Tableau Cloud, or a database table.
Concluding Thoughts and Next Steps
If you’ve followed along, I hope you found this guide a helpful introduction to the basic mechanics of working with Einstein Discovery predictive models and Tableau Prep Builder.
Dan Bradley is a Principal Solution Engineer for Tableau’s Higher Education Field Education Team. Based in Chicago, he works with higher education institutions in the Central and mid-Atlantic regions of the U.S. In addition to technology, Dan has a background in education administration, including an M.S. in Higher Education Administration and Policy.
Dan's mission is to help the people of higher education become data-reflective practitioners who can see, understand, and act on their data.
*Opinions are my own and not the views of my employer*
The views, thoughts, and opinions expressed on Tableaustudyhall.com belong solely to the authors, and not necessarily to Tableau, Salesforce, or an author’s employer.