General

Update Extract in a Published Data Source

Posted On
Posted By Jeremy Weatherall
(2 of 2, see Create an Extract Programmatically for 1 of 2).

This is the second blog about using Tableaus’ API to programmatically create or update a local hyper file (extract), and then how to upload this in to an existing published data source. This second article covers how to update a published data source with a local hyper file

REST API Flow for updating an extract in a published data source

If you publish an extract to Tableau Server, you will be either creating a new data source, or overwriting an existing data source. If you overwrite, you are losing any metadata that the existing data source contained. Metadata covers anything you did after you were able to pull back data. This could include, calculated fields, default properties for columns, groups, sets, hierarchies etc. So overwriting could be considered a brave move! Once you have defined a data source that uses an embedded Hyper file, it probably makes sense just to update the Hyper file content, and leave the metadata untouched, and this is what the following Tableau Server REST API flow demonstrates how to do.

Note: With this code you update a table in a hyper file contained in a published data source. As you need to specify the schema.<table name> of both the table in the local hyper file and published hyper file, this means that both of the hyper files can contain multiple tables, and only the tables that are referenced will be used.

The REST API flow below can be called in your language of choice, the download from GitHub has been written in Java, which you can amend and use as makes sense to you.

The sequence of REST API calls to Tableau Server (on premise or Tableau Cloud) for updating the Extract in a published data source are listed below, but in summary:

  • Log on the Server
    • REST API calls are stateless, so the login returns a token that is then passed in the Header of subsequent calls, which tells the server that this is a valid call
  • Get the unique ID of the data source you want to update.
    • All calls to server content uses the contents ID, and the not the names we see on the server
  • We are now going to upload the new extract file to the server, it is going to saved as a temporary object, and once we have finished updating the data source it will be deleted.
    • As we will be uploading a large file, we will be sending it in chunks, to keep the chunks associated, we request an upload-session-id from the server, this is then used each time we upload a new chunk so the server knows that these chunks are part of the same upload session
    • Loop through and send up chunks of the extract, max size per chunk is 64 MG
    • In testing I successfully uploaded a 8 GB hyper file, sending 10 MB chunks
  • Once the upload is complete, we tell the Tableau server how we want to amend the extract in the published data source, with the data from the temporary upload. Specify one of the following actions: Insert, Update, Upsert, Replace
  • Because a hyper file can potentially contain multiple schemas and tables (a table needs to be in a schema), we are required to specify the schema.table name for both the uploaded extract, and the extract that exists in the published data source, so the server can locate both the source and destination tables. If you created your hyper file programmatically , you should in theory know the schema.table name.
  • Here are some ways to get the schema.name
    • You could use the hyper API to programmatically get the schema.table for you local hyper file
    • Use the Server REST API to get the schema.table for the published hyper file in the data source
    • If you manually created an extract, this is how the schema.name are automatically defined (Note: schema and table name are CASE sensitive)
      • Tableau Prep: Extract.Extract
      • Hyper API
        • Only specify table name, then public.<table name> (if no schema is specified, schema defaults to public)
        • Specify schema and table name, then schema.<table name>
      • Tableau Desktop:
        • For a single table: Extract.Extract
        • For multiple tables (Logical tables): Extract.TableName +_GUID
          • The tables below were called Orders and Returns respectively

The above image shows how to easily find out the schema.name in a hyper file, look at it using Tableau Desktop, or view it on the Server

API Calls

REST API Call______________

Log in to Tableau Server

POST /api/api-version/auth/signin

Request
  • URL
    • Site Name
  • Request Body
    • User Credentials
Response
  • token-id
  • site-id

Get data source ID

GET /api/api-version/sites/site-id/datasources?filter=name:eq:{0}

  • Header
    • token-id
  • URL
    • site-id
    • Data source name (replace {0} in URL with this)

datasource-id

Request File Upload GUID

POST /api/api-version/sites/site-id/fileUploads

  • Header
    • token-id
  • URL
    • site-id

upload-session-id

Loop through sending chunks of extract file using File-upload-append-to

PUT /api/api-version/sites/site-id/fileUploads/upload-session-id

  • Header
    • token-id
  • URL
    • site-id
    • upload-session-id
  • Request Body
    • The content of the file to be uploaded is included in a MIME multipart message
Response showing current size of uploaded file:
<tsResponse>
  <fileUpload uploadSessionId=upload-session-id
   fileSize=size-of-file-in-megabytes-after-append />
</tsResponse>

Tell server to update extract part of data source

PATCH /api/api-version/sites/site-id/datasources/datasource-id/data?uploadSessionId=upload-session-id

  • Header
    • token-id
    • GUID (you provide)
  • URL
    • site-id
    • datasource-id
    • upload-session-id
  • Request body
    • Action (Insert, Update, Upsert, Replace)
    • Source Schema
    • Source Table Name
    • Server Extract Schema
    • Server Extract Table Name

job-id: “scheduled-job-luid”

The ID of the async job that will carry out the data operations described in the action batch. This returned “job-id” can be used to query the status of the update job(Link opens in a new window) or to cancel a scheduled job(Link opens in a new window).

The nice part here is that the code has been written in Java for this work flow, and it can be downloaded from the following git-hub location. https://github.com/Jeremy-Weatherall/extractUpdate

The config.properties file contains information for how to log on to Tableau Server (URL), log on credentials, either username and password, or Personal Access Token. Location of hyper file to upload, name of data source to update, action to take (update, insert etc), schema and table names, and the chunk size to use when uploading large files in multiple parts.

# Set this to the name or IP address of the Tableau Server installation.
server.host=https://us-east-1.online.tableau.com

#Rest API version, if you are connecting to Tableau Cloud, you can set to the most recent value
#https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_concepts_versions.htm
server.api.version=/api/3.15/

# Not assigning a value to this configuration references the default site, for Tableau Cloud use your Site Name as it appears in the URL
site.default.contentUrl=siteName


# Set the username and password, if not using Personal Access Tokens (PAT) (see below) 
user.admin.name=*****
user.admin.password=******

#with MFA, have to use Personal Access Tokens
#If user.usePAT=true then ignore user.admin credentials, and use user.token PAT values
user.usePAT=true
user.tokenName=*****
user.tokenSecret=*****

#path to hyper file
datasource.sample.path=C:\\temp\\local.hyper

#Name of data source to update, this is the name of your published data source
datasource.tableauserver.urlname=Data Source Name can contain spaces

#names ARE case sensitive!!!
#hyper table values, set for local hyper file, and hyper file in the published data source
#set Schema and table name
datasource.LOCAL.hyper.table.name=table name can have spaces
datasource.LOCAL.hyper.schema.name=public
datasource.server.hyper.table.name=table name can have spaces
datasource.server.hyper.schema.name=public

#what are we going to do with the hyper extract, see insert, update, upsert and replace
#https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref_data_sources.htm#update_data_in_hyper_data_source
datasource.tableauserver.action=replace

#max upload size is 64 meg, set the chunk size you want to upload in bytes. 10,000,000 is 10 MG 
server.chunk.upload.byte.size=5000000

Java Project Download

The extractUpdate project is an Eclipse Java Maven project. Maven projects let you reference external jars from the https://mvnrepository.com/ repository, which are downloaded automatically and included in your build path.

This project is referencing the following external jars using Maven

<!-- https://mvnrepository.com/artifact/com.sun.jersey/jersey-core -->
<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-core</artifactId>
    <version>1.19.4</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.sun.jersey/jersey-server -->
<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-server</artifactId>
    <version>1.19.4</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.sun.jersey/jersey-client -->
<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-client</artifactId>
    <version>1.19.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.sun.jersey.contribs/jersey-multipart -->
<dependency>
    <groupId>com.sun.jersey.contribs</groupId>
    <artifactId>jersey-multipart</artifactId>
    <version>1.19.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/javax.ws.rs/javax.ws.rs-api -->
<dependency>
    <groupId>javax.ws.rs</groupId>
    <artifactId>javax.ws.rs-api</artifactId>
    <version>2.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple -->
<dependency>
    <groupId>com.googlecode.json-simple</groupId>
    <artifactId>json-simple</artifactId>
    <version>1.1</version>
</dependency>

Steps to import the downloaded files in to Eclipse

There are probably multiple ways to successfully import a Maven project in to Eclipse, so feel free to ignore my approach.

Create an empty Java Project called extractUpdate

Select the new project, right click and select Import

Locate the location of the downloaded project files

How your imported project should look

Quick look at the code

ExtractUpdate->>Main()

Loads the properties file, then calls

ExtractUpdate->>publishExtractToServer(…)

Sets the chunk upload size, then calls

TableauRest->>updateDatasource_ExtractChunked(….)

This is where the logic of the REST API calls, detailed above, is executed

  • Logon to Tableau Server
  • Get data source ID
  • Load chunks of extract up to Tableau Server
  • Tell Tableau server to update the extract in the data source

This code is as is. I will do my best to respond to any questions.

Jeremy Weatherall is a Solutions Engineer for Tableau’s Field Education Team. Based near Boston, he works with higher education institutions and K12 districts in the North East region of the U.S. Jeremy has over 20 years in technology and a background in Accountancy (ACA ICAEW). *Opinions are my own and not the views of my employer*

Related Post

leave a Comment