Implement Small n or Privacy Suppression
When working with student counts, it is common to need a way to suppress or “mask” a value if it falls below a certain threshold. For example, I may be an Institutional Research analyst tasked with creating a viz of student totals by gender for my college’s departments. To protect the privacy of students where the count or “n” may be small, I need to display an “*” rather than the actual figure. There are a few approaches to this, but here’s one example to provide the basic intuition of building this suppression into a dashboard that would be published for non-editor, viewer usage.
Privacy Suppression using a Calculated Field
In my viz below, I have a number of places where the sum of records, i.e., student count, grouped by department and gender results in a value less than 10. In such cases I want to display an “*” instead of the value.
Rather than going into the dataset and modifying records there, I can use a Level of Detail (LOD) expression within an “IF” statement to display custom text or a symbol to indicate that the value has been masked from display.
Here’s what that formula looks like:
A couple of callouts in the calculation:
- I’ve used the “Include” versus “Fixed” keyword to provide more flexibility in its use. “Include” takes into account any dimensions used in the view, rather than just those specified in the calculation saving time and reducing the chance of a mistake.
- I’ve hard coded in a “10” as my suppression cutoff threshold. This figure varies depending on institutional policies. For more flexibility the “10” could be replaced with a parameter field adjustable by the workbook author.
- When the first portion of the IF statement evaluates TRUE, the result is that the calculation returns “NULL” and not a string “*”; this is necessary in order to allow us to treat the field as a measure we can still aggregate, rather than as a string. This will make more sense in a later step.
Next, I’ll replace my “SUM(Number of Records) fields with my new LOD calculated field. As a result, all of the values less than 10 from figure 1 are removed and replaced by the “11 nulls” indicator in the bottom right-hand corner.
The last step is to change how Tableau displays special values, i.e., “NULL” values, in a view. You adjust this in the format menu of the calculated field. We’ll change the default special value to “*” but it could just as easily be labeled “suppressed” or any other term. I also change the “Marks” dropdown from “Show at Indicator” to “Show at Default Value”.
Here’s the final result after making the format menu changes:
A Couple of Caveats to Keep in Mind
As is the case with any calculation, users who have the editing permissions to inspect or change a formula or workbook can override the suppression logic. Thus, this use-case will mask only small n’s for users who are viewing a published dashboard with this field in use.
Also, while the use of “Include” allows for aggregation and disaggregation to continue to calculate, there may be scenarios where a viewer could manually calculate a difference between a rolled-up and drilled-down views. It may be possible to obfuscate the true value in such instances by building more complex logic into the calculated field, e.g., use of the “round” function.
If you have other examples of how you’ve implemented privacy suppression, please share in comments or post a link to a viz.