Create a Tableau Extract Programmatically (Using the Hyper API)
(1 of 2, see Update Extract in a Published Data Source for 2 of 2).
This is the first of two write ups about creating and publishing a Tableau Hyper data extract. This one will look at why you might want to create an extract programmatically, and walk through a Java example of loading a csv file in to a Hyper extract. The second writeup will look at how we can then programmatically publish the extract to Tableau server (On premise or Tableau Cloud), using the Servers REST API, and how we can just update the extract part of a data source, and not overwrite all the data source metadata (i.e. Calculated fields, sets, hierarchies etc.)
What is a Tableau Data Extract?
Tableau data extracts are a “snapshot” of data that are compressed, stored on the file system, and loaded into the memory when visualizations are being created. Since v 10.5 Tableau has used Hyper, which is an in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets.
(Hyper is a high-performance in-memory data engine technology that helps customers analyze large or complex data sets faster, by efficiently evaluating analytical queries directly in the transactional database. A core Tableau platform technology, Hyper uses proprietary dynamic code generation and cutting-edge parallelism techniques to achieve fast performance for extract creation and query execution.)
Overview and terminology of .hyper files
Commonly, these files are called extracts, as they are created when you switch your Tableau connection from live to extract and they contain the extracted data. The first thing to note is that a
https://engineering.tableau.com/an-introduction-to-hyper-api-788a2be8fbae.hyper
file coming from Tableau only stores the data; either denormalized as a single table or normalized as a multi table extract. The file does not store the join tree you defined in Tableau or information about the connection you used to get the data. The join tree is stored in your workbook (.twb
file), connection information is stored in a datasource (.tds
file). Tableau also offers you to zip all these files together into a.twbx
file, so if you wonder where the extract of your workbook is in case you have a.twbx
file, open that file with a zip utility and you will find the.hyper
file in it.
Why programmatically create an extract
There are numerous ways to create new Hyper extracts and to refresh existing ones, manually using Tableau Desktop, manually running a Tableau Prep Flow, scheduling an extract to update on Tableau Server or running a Prep flow using Prep Conductor on Tableau Server. The Hyper API gives you another way of doing this!
The Hyper API contains a set of functions you can use to automate your interactions with Tableau extract (.hyper) files. You can use the API to create new extract files, or to open existing files, and then insert, delete, update, or read data from those files. Using the Hyper API developers and administrators can:
- Create extract files for data sources not currently supported by Tableau.
- Automate custom extract, transform and load (ETL) processes (for example, implement rolling window updates or custom incremental updates).
- Place multiple tables and schemas in a Hyper file
Hardware, software and supported languages
The Hyper libraries are a resource that you download and install, unlike Tableau’s REST API, where you make calls to a Tableau Server.
See https://help.tableau.com/current/api/hyper_api/en-us/index.html
Supported languages
The Hyper API only supports 64-bit platforms. The Hyper API libraries are available for the following programming languages:
- Python (3.6 or newer)
- C++ (C++11 or newer)
- Java (Java 8 or newer)
- C#/.NET (.NET Standard 2.0)
- While it is expected that the Hyper API will work on newer versions of these languages, it may not be fully tested.
Supported platforms
- Microsoft Windows Server 2016, 2012, 2012 R2, 2008 R2, 2019
- Amazon Linux 2, Red Hat Enterprise Linux (RHEL) 7.3+, CentOS 7.3+, Oracle Linux 7.3+, Ubuntu 16.04 LTS and 18.04 LTS
- Microsoft Windows 7 or newer (64-bit)
- macOS 10.13 or newer
Hardware requirements
The Hyper API has the following minimum hardware requirements.
- Intel Nehalem or AMD Bulldozer processor or newer
- 2 GB memory
- 1.5 GB minimum free disk space
Walk through for downloading the Hyper Libraries and setting up for Java Eclipse
- Download the Hyper API Zip for your operating system, we are using Windows for this example.

2. Unzip the Hyper API package file to a convenient location.

I would recommend reducing the length of the prompted folder to something like below, as you may get exceptions about the path name being to long during the unzip process

Extracted folder structure

Eclipse Java Project Setup
Rather than than use the Hyper Eclipse Gradle example referred to in the Hyper documentation, I will walk through creating a new Java Project, add references to the Hyper Library, and from the Hyper “Examples”, add the CreateHyperFileFromCSV.java class, and it’s associated customers.csv data file, so that you can now run this in debug mode. (Note: I am using Eclipse 2020-12 (4.18.0))
- Create a new Java Project


Add references to the Hyper libraries you extracted earlier

Select the four jar files in the lib folder

Now add a reference using ‘External Class Folder’

Select the hyper folder under the lib folder you selected the four jar files from.

Your Classpath should now contain 5 references

Now we have set up the project infrastructure, there are two things left todo.
- Add CreateHyperFileFromCSV.java class from the Hyper examples
- Add the associated customers.csv data file from the Hyper examples
We are going to create a new empty Java class with the same name as the Hyper example (CreateHyperFileFromCSV)


We now have an empty class. Open the Hyper example file, copy it’s content and paste over all the text in the CreateHyperFileFromCSV class.

You can find the CreateHyperFileFromCSV.java file in your extracted Hyper files, relative path to the Hyper example file: ..\tableauhyperapi-java-windows-x86_64-release-hyperapi_release_33.0.0.14946.r6b1f09d9\examples\create-hyper-file-from-csv\src\main\java\examples
After pasting, delete the top line

If everything is working correct, you should not see any errors!
Last part of the setup, copy over the demo data file referenced in this code example.
Create a new folder called data


Now locate the Hyper example file called customers.csv, located in the following relative location ..\tableauhyperapi-java-windows-x86_64-release-hyperapi_release_33.0.0.14946.r6b1f09d9\examples\data

Paste this in to the new data Project folder

Your completed Project should look like

You can now run the class, or debug, and see the following console output
EXAMPLE - Load data from CSV into table in new Hyper file
Issuing the SQL COPY command to load the csv file into the table. Since the first line
of our csv file contains the column names, we use the `header` option to skip it.
The number of rows in table "Customer" is 793
The connection to the Hyper file has been closed
The Hyper process has been shut down
Quick look at the Hyper code
Before we look at the example code, I want to recommend the Tableau Engineering Blog An Introduction to Hyper API, which walks through different examples of creating Hyper files using SQL, and gives a very solid background on how to best approach creating extracts.
Define the table structure that will hold you CSV data
Column names and datatypes are not inferred from the data, you need to explicitly define the table structure

- TableDefinition is used to hold the structure of the table to create
- TableName, you can pass in just a table name, or schema name and table name. In this example I have added some code that it not in example class. Added schema “q”, and some code in the next section (step 5) to create the schema in the database.
- Define the column names and the column datatypes. For performance reasons for large datasets, it is recommended not to load everything as text, then convert at runtime using Tableau Calculated fields, but to correctly set the datatypes for each column, and specify if nullable. See SqlTypes for supported datatypes

- The name of the hyper file you want to create, you can give a full path, or as in this case, a relative path. (In step 4, you get to specify if you want to Create, replace, update this file)
- Defines a Hyper process. This class starts Hyper and manages its lifetime.
- Connection parameters, define the Date and Time settings
- lc_time, choose which locale to use, A Locale controls which cultural preferences the application should apply. For example, the literal
Januar 1. 2002
can be converted to a date with the German localede
but not with the English localeen_US
. - date_style, (not shown in example, you would add connectionParameters.put(“date_time”, “MDY”);) Controls how date strings are interpreted. Y, M and D stand for Year, Month, and Day respectively. For example, the string “01/02/2000” could be interpreted as “2nd of January 2000” or “1st of February 2000”. The first possibility is chosen with the MDY date style while the second is chosen with the DMY date style. This setting also affects date parsing from CSV files. Default value: MDY, Accepted values: MDY, DMY, YMD, YDM
4. Defines a Connection, used for all interactions with Hyper.
- Hyper process to use
- Database to use (hyper file)
- createMode – Whether the database should be created and what to do in case of an already existing database.
CREATE
Create the database.
CREATE_AND_REPLACE
Create and replace the database.
CREATE_IF_NOT_EXISTS
Create the database if it doesn’t exist.
NONE
Do not create the database.
- Connection parameters defined in Step (3)
5. Code not in the original Hyper example. This checks to see if a schema name was included in the TableDefinition, if so, it creates the schema in the database.
6. Creates the Table defined in the TableDefinition
7. The name of the CSV you want to load, you can give a full path, or as in this case, a relative path.
Execute the load of data from the CSV file to the hyper file, into the defined table structure

- COPY — copy data between a file and a table,
COPY FROM
copies data from a source location to a table, appending the data to whatever is in the table already.
COPYtable_name
[ (column_name
[, ...] ) ] FROM {source_location
} [ WITH (option
[, ...] ) ] wheresource_location
can be one of: 'file_system_path
' 'amazon_s3_uri
' s3_location('amazon_s3_uri
' [, access_key_id => 'text
', secret_access_key => 'text
' [, session_token => 'text
'] ] [, region => 'text
'] ) ARRAY[
source_location
[, ...]]
andoption
can be one of: FORMAT =>format_name
format_specific_option
=>value
2. WITH options WITH(format csv, NULL 'NULL', delimiter ',', header)
◯ format csv : defines file format as CSV
◯ NULL ‘NULL’ : Null values appear as ‘NULL’ (enclosed in single quotes) in the CSV file
◯ delimiter ‘,’ : columns are separated with the ‘,’ character
◯ header : flag saying to ignore first row in CSV file, as it is a header
Other Java code examples that are part of the JAVA Hyper download include looking at Inserting, Updating and Deleting data. One interesting comment is that it is much faster to load a CSV file directly in to Hyper, as opposed to iterating over a CSV file and then Inserting the values in to Hyper.
Update Extract in a Published Data Source – Tableaustudyhall
[…] of 2, see Create an Extract Programmatically for 1 of […]