Create a Tableau Extract Programmatically (Using the Hyper API)

Posted On
Posted By Jeremy Weatherall
(1 of 2, see Update Extract in a Published Data Source for 2 of 2).

This is the first of two write ups about creating and publishing a Tableau Hyper data extract. This one will look at why you might want to create an extract programmatically, and walk through a Java example of loading a csv file in to a Hyper extract. The second writeup will look at how we can then programmatically publish the extract to Tableau server (On premise or Tableau Cloud), using the Servers REST API, and how we can just update the extract part of a data source, and not overwrite all the data source metadata (i.e. Calculated fields, sets, hierarchies etc.)

What is a Tableau Data Extract?

Tableau data extracts are a “snapshot” of data that are compressed, stored on the file system, and loaded into the memory when visualizations are being created. Since v 10.5 Tableau has used Hyper, which is an in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets.

(Hyper is a high-performance in-memory data engine technology that helps customers analyze large or complex data sets faster, by efficiently evaluating analytical queries directly in the transactional database. A core Tableau platform technology, Hyper uses proprietary dynamic code generation and cutting-edge parallelism techniques to achieve fast performance for extract creation and query execution.)

Overview and terminology of .hyper files

Commonly, these files are called extracts, as they are created when you switch your Tableau connection from live to extract and they contain the extracted data. The first thing to note is that a .hyper file coming from Tableau only stores the data; either denormalized as a single table or normalized as a multi table extract. The file does not store the join tree you defined in Tableau or information about the connection you used to get the data. The join tree is stored in your workbook (.twb file), connection information is stored in a datasource (.tds file). Tableau also offers you to zip all these files together into a .twbx file, so if you wonder where the extract of your workbook is in case you have a .twbx file, open that file with a zip utility and you will find the .hyper file in it.

Why programmatically create an extract

There are numerous ways to create new Hyper extracts and to refresh existing ones, manually using Tableau Desktop, manually running a Tableau Prep Flow, scheduling an extract to update on Tableau Server or running a Prep flow using Prep Conductor on Tableau Server. The Hyper API gives you another way of doing this!

The Hyper API contains a set of functions you can use to automate your interactions with Tableau extract (.hyper) files. You can use the API to create new extract files, or to open existing files, and then insert, delete, update, or read data from those files. Using the Hyper API developers and administrators can:

  • Create extract files for data sources not currently supported by Tableau.
  • Automate custom extract, transform and load (ETL) processes (for example, implement rolling window updates or custom incremental updates).
  • Place multiple tables and schemas in a Hyper file

Hardware, software and supported languages

The Hyper libraries are a resource that you download and install, unlike Tableau’s REST API, where you make calls to a Tableau Server.


Supported languages
The Hyper API only supports 64-bit platforms. The Hyper API libraries are available for the following programming languages:

  • Python (3.6 or newer)
  • C++ (C++11 or newer)
  • Java (Java 8 or newer)
  • C#/.NET (.NET Standard 2.0)
  • While it is expected that the Hyper API will work on newer versions of these languages, it may not be fully tested.

Supported platforms

  • Microsoft Windows Server 2016, 2012, 2012 R2, 2008 R2, 2019
  • Amazon Linux 2, Red Hat Enterprise Linux (RHEL) 7.3+, CentOS 7.3+, Oracle Linux 7.3+, Ubuntu 16.04 LTS and 18.04 LTS
  • Microsoft Windows 7 or newer (64-bit)
  • macOS 10.13 or newer

Hardware requirements
The Hyper API has the following minimum hardware requirements.

  • Intel Nehalem or AMD Bulldozer processor or newer
  • 2 GB memory
  • 1.5 GB minimum free disk space

Walk through for downloading the Hyper Libraries and setting up for Java Eclipse

  1. Download the Hyper API Zip for your operating system, we are using Windows for this example.

2. Unzip the Hyper API package file to a convenient location.

Your file name may differ, as new versions are released

I would recommend reducing the length of the prompted folder to something like below, as you may get exceptions about the path name being to long during the unzip process

Extracted folder structure

Eclipse Java Project Setup

Rather than than use the Hyper Eclipse Gradle example referred to in the Hyper documentation, I will walk through creating a new Java Project, add references to the Hyper Library, and from the Hyper “Examples”, add the class, and it’s associated customers.csv data file, so that you can now run this in debug mode. (Note: I am using Eclipse 2020-12 (4.18.0))

  1. Create a new Java Project
Create a new Java Project

Add references to the Hyper libraries you extracted earlier

Select the four jar files in the lib folder

Now add a reference using ‘External Class Folder’

Select the hyper folder under the lib folder you selected the four jar files from.

Your Classpath should now contain 5 references

Now we have set up the project infrastructure, there are two things left todo.

  1. Add class from the Hyper examples
  2. Add the associated customers.csv data file from the Hyper examples

We are going to create a new empty Java class with the same name as the Hyper example (CreateHyperFileFromCSV)

We now have an empty class. Open the Hyper example file, copy it’s content and paste over all the text in the CreateHyperFileFromCSV class.

Empty class

You can find the file in your extracted Hyper files, relative path to the Hyper example file: ..\tableauhyperapi-java-windows-x86_64-release-hyperapi_release_33.0.0.14946.r6b1f09d9\examples\create-hyper-file-from-csv\src\main\java\examples

After pasting, delete the top line

If everything is working correct, you should not see any errors!

Last part of the setup, copy over the demo data file referenced in this code example.

Create a new folder called data

Now locate the Hyper example file called customers.csv, located in the following relative location ..\tableauhyperapi-java-windows-x86_64-release-hyperapi_release_33.0.0.14946.r6b1f09d9\examples\data

Paste this in to the new data Project folder

Your completed Project should look like

You can now run the class, or debug, and see the following console output

EXAMPLE -  Load data from CSV into table in new Hyper file
Issuing the SQL COPY command to load the csv file into the table. Since the first line
of our csv file contains the column names, we use the `header` option to skip it.
The number of rows in table "Customer" is 793

The connection to the Hyper file has been closed
The Hyper process has been shut down

Quick look at the Hyper code

Before we look at the example code, I want to recommend the Tableau Engineering Blog An Introduction to Hyper API, which walks through different examples of creating Hyper files using SQL, and gives a very solid background on how to best approach creating extracts.

Define the table structure that will hold you CSV data

Column names and datatypes are not inferred from the data, you need to explicitly define the table structure

  1. TableDefinition is used to hold the structure of the table to create
  2. TableName, you can pass in just a table name, or schema name and table name. In this example I have added some code that it not in example class. Added schema “q”, and some code in the next section (step 5) to create the schema in the database.
  3. Define the column names and the column datatypes. For performance reasons for large datasets, it is recommended not to load everything as text, then convert at runtime using Tableau Calculated fields, but to correctly set the datatypes for each column, and specify if nullable. See SqlTypes for supported datatypes

  1. The name of the hyper file you want to create, you can give a full path, or as in this case, a relative path. (In step 4, you get to specify if you want to Create, replace, update this file)
  2. Defines a Hyper process. This class starts Hyper and manages its lifetime.
  3. Connection parameters, define the Date and Time settings
  • lc_time, choose which locale to use, A Locale controls which cultural preferences the application should apply. For example, the literal Januar 1. 2002 can be converted to a date with the German locale de but not with the English locale en_US.
  • date_style, (not shown in example, you would add connectionParameters.put(“date_time”, “MDY”);) Controls how date strings are interpreted. Y, M and D stand for Year, Month, and Day respectively. For example, the string “01/02/2000” could be interpreted as “2nd of January 2000” or “1st of February 2000”. The first possibility is chosen with the MDY date style while the second is chosen with the DMY date style. This setting also affects date parsing from CSV files. Default value: MDY, Accepted values: MDY, DMY, YMD, YDM

4. Defines a Connection, used for all interactions with Hyper.

  • Hyper process to use
  • Database to use (hyper file)
  • createMode – Whether the database should be created and what to do in case of an already existing database.

Create the database.
Create and replace the database.
Create the database if it doesn’t exist.
Do not create the database.

  • Connection parameters defined in Step (3)

5. Code not in the original Hyper example. This checks to see if a schema name was included in the TableDefinition, if so, it creates the schema in the database.

6. Creates the Table defined in the TableDefinition

7. The name of the CSV you want to load, you can give a full path, or as in this case, a relative path.

Execute the load of data from the CSV file to the hyper file, into the defined table structure
  1. COPY — copy data between a file and a table, COPY FROM copies data from a source location to a table, appending the data to whatever is in the table already.
COPY table_name [ ( column_name [, ...] ) ]
    FROM { source_location }
    [ WITH ( option [, ...] ) ]

where source_location can be one of:

        [, access_key_id => 'text', secret_access_key => 'text' [, session_token => 'text'] ]
        [, region => 'text']
    ARRAY[ source_location [, ...] ]

and option can be one of:

    FORMAT => format_name
    format_specific_option => value

2. WITH options
WITH(format csv, NULL 'NULL', delimiter ',', header)

format csv : defines file format as CSV

NULL ‘NULL’ : Null values appear as ‘NULL’ (enclosed in single quotes) in the CSV file

delimiter ‘,’ : columns are separated with the ‘,’ character

header : flag saying to ignore first row in CSV file, as it is a header

Other Java code examples that are part of the JAVA Hyper download include looking at Inserting, Updating and Deleting data. One interesting comment is that it is much faster to load a CSV file directly in to Hyper, as opposed to iterating over a CSV file and then Inserting the values in to Hyper.

Jeremy Weatherall is a Solutions Engineer for Tableau’s Field Education Team. Based near Boston, he works with higher education institutions and K12 districts in the North East region of the U.S. Jeremy has over 20 years in technology and a background in Accountancy (ACA ICAEW). *Opinions are my own and not the views of my employer*

Related Post

One thought on “Create a Tableau Extract Programmatically (Using the Hyper API)
  1. Update Extract in a Published Data Source – Tableaustudyhall

    […] of 2, see Create an Extract Programmatically for 1 of […]

leave a Comment