General

Implement Small n or Privacy Suppression

small n Posted On
Posted By Dan Bradley

When working with student counts, it is common to need a way to suppress or “mask” a value if it falls below a certain threshold. For example, I may be an Institutional Research analyst tasked with creating a viz of student totals by gender for my college’s departments. To protect the privacy of students where the count or “n” may be small, I need to display an “*” rather than the actual figure. There are a few approaches to this, but here’s one example to provide the basic intuition of building this suppression into a dashboard that would be published for non-editor, viewer usage.

Privacy Suppression using a Calculated Field

In my viz below, I have a number of places where the sum of records, i.e., student count, grouped by department and gender results in a value less than 10. In such cases I want to display an “*” instead of the value.

Figure 1. Enrolled student totals by department and gender.

Rather than going into the dataset and modifying records there, I can use a Level of Detail (LOD) expression within an “IF” statement to display custom text or a symbol to indicate that the value has been masked from display.

Here’s what that formula looks like:

Figure 2. Example Level of Detail calculated field implementing privacy suppression on displayed values less than 10

A couple of callouts in the calculation:

  • I’ve used the “Include” versus “Fixed” keyword to provide more flexibility in its use. “Include” takes into account any dimensions used in the view, rather than just those specified in the calculation saving time and reducing the chance of a mistake.
  • I’ve hard coded in a “10” as my suppression cutoff threshold. This figure varies depending on institutional policies. For more flexibility the “10” could be replaced with a parameter field adjustable by the workbook author.
  • When the first portion of the IF statement evaluates TRUE, the result is that the calculation returns “NULL” and not a string “*”; this is necessary in order to allow us to treat the field as a measure we can still aggregate, rather than as a string. This will make more sense in a later step.

Next, I’ll replace my “SUM(Number of Records) fields with my new LOD calculated field. As a result, all of the values less than 10 from figure 1 are removed and replaced by the “11 nulls” indicator in the bottom right-hand corner.

Figure 3. Sum(Number of Records) field on Columns shelf was replaced by SUM(LOD Privacy Suppression) calculated field. Only values of 10 and above are displayed in the view; 11 null values are indicated in the bottom right-hand corner, representing cases where the total was less than 10.

The last step is to change how Tableau displays special values, i.e., “NULL” values, in a view. You adjust this in the format menu of the calculated field. We’ll change the default special value to “*” but it could just as easily be labeled “suppressed” or any other term. I also change the “Marks” dropdown from “Show at Indicator” to “Show at Default Value”.

Figure 4. Format menu for the LOD calculated field. The Special Values section at the bottom has been updated with a “*” and Marks dropdown selection to “Show at Default Value”.

Here’s the final result after making the format menu changes:

Figure 5. Finished result of viz with “*” displayed for values less than 10.

A Couple of Caveats to Keep in Mind

As is the case with any calculation, users who have the editing permissions to inspect or change a formula or workbook can override the suppression logic. Thus, this use-case will mask only small n’s for users who are viewing a published dashboard with this field in use.

Also, while the use of “Include” allows for aggregation and disaggregation to continue to calculate, there may be scenarios where a viewer could manually calculate a difference between a rolled-up and drilled-down views. It may be possible to obfuscate the true value in such instances by building more complex logic into the calculated field, e.g., use of the “round” function.

If you have other examples of how you’ve implemented privacy suppression, please share in comments or post a link to a viz.

Dan Bradley is a Principal Solution Engineer for Tableau’s Higher Education Field Education Team. Based in Chicago, he works with higher education institutions in the Central and mid-Atlantic regions of the U.S. In addition to technology, Dan has a background in education administration, including an M.S. in Higher Education Administration and Policy. Dan's mission is to help the people of higher education become data-reflective practitioners who can see, understand, and act on their data. *Opinions are my own and not the views of my employer*

Related Post

leave a Comment