Fivetran SDK uses gRPC to talk to partner code. The partner side of the interface is always the server side. Fivetran implements the client side and initiates the requests.
- gRPC: 1.61.1
- protobuf: 3.25.2
At the moment, partner code should be developed in a language that can generate a statically linked binary executable.
The executable needs to do the following:
- Accept a
--port
argument that takes an integer as a port number to listen to. - Listen on both IPV4 (i.e. 0.0.0.0) and IPV6 (i.e ::0), but if only one is possible, it should listen on IPV4.
- Partners should not add the proto files to their repos. Proto files should be pulled in from this repo at build time and added to
.gitignore
so they are excluded. - Always use proto files from latest release and update you're code if necessary. Older releases proto files can be considered deprecated and will be expired at later date.
- Write logs out to STDOUT in the following JSON format. Accepted levels are INFO, WARNING, and SEVERE.
Message-origin
can besdk_connector
orsdk_destination
.
{
"level":"INFO",
"message": "Your log message goes here"
"message-origin": "sdk_connector"
}
- Try to make log messages as precise as possible, which can make it easier to debug issues.
- Provide context in log messages. Contextual information can make it much easier to debug issues.
- Write a separate error message for each exception.
- Log after an action. When you log after an action, additional context can be provided.
- Include details about "what went wrong" in your error message
- Manage the volume of the log. Ask yourself if a consumer of the log message will find it useful to solve a problem and whether that need justifies the cost of storing and managing that log. Sources of excessive logging include:
- Tracing entries and exits in methods - Don't do this unless it is absolutely necessary.
- Logging inside tight loops - Be careful about what you are logging inside loops, especially if the loop runs for many iterations.
- Including benign errors - When a successful execution flow includes handling errors.
- Repeating errors - For instance, if you log an exception trace before each retry, you might end up logging the exception trace unnecessarily or too many times.
- Consider logging of timing data - Logging the time taken for time-sensitive operations like network calls can make it easier to debug performance issues in production. Consider if logging of timing data can be useful in your connector.
- Partner code should handle any source and destination-related errors.
- Partner code should retry any transient errors internally without deferring them to Fivetran.
- Partner code should use gRPC built-in error mechanism to relay errors instead of throwing exceptions and abruptly closing the connection.
- Partner code should capture and relay a clear message when the account permissions are not sufficient.
- Partners can throw alerts on the Fivetran dashboard to notify customers about potential issues with their connector.
- These issues may include bad source data or connection problems with the source itself. Where applicable, the alerts should also provide guidance to customers on how to resolve the problem.
- Currently, we allow only throwing errors.
- Partner code should use gRPC built-in error mechanism to relay errors and pass the message in the following JSON format to the error description:
{
"message": "Your task message goes here"
"alert_type": "TASK"
}
- Usage example:
responseObserver.onError(Status.UNAVAILABLE.withDescription(jsonMessage).asException());
- We convert the error description to a SEVERE log if it is not in the correct JSON format and the error causes the sync to fail.
- Partner code should retry transient problems internally
- Fivetran will not be able to handle any problems that the partner code runs into
- If an error is raised to Fivetran's side, the sync will be terminated and retried from the last good known spot according to saved cursors from the last successful batch.
The following are hard requirements to be able to deploy partner code to Fivetran production:
- Do not decrypt batch files to disk. Fivetran does not allow unencrypted files at rest. If you need to upload batch files in plaintext, do the decryption in "streaming" mode.
- Do not log sensitive data. Ensure only necessary information is kept in logs, and never log any sensitive data. Such data may include credentials (passwords, tokens, keys, etc.), customer data, payment information, or PII.
- Encrypt HTTP requests. Entities like URLs, URL parameters, and query parameters are always encrypted for logging, and customer approval is needed to decrypt and examine them.
- Keep the form clear and concise, only requesting essential information for successful connector setup.
- Use clear and descriptive labels for each form field. Avoid technical jargon if possible.
- Organize the fields in a logical order that reflects the setup process.
The ConfigurationForm
RPC call retrieves all the setup form fields and tests information. You can provide various parameters for the fields to enhance the user experience, such as descriptions, optional fields, and more.
The ConfigurationForm
RPC call retrieves the tests that need to be executed during connection setup. The Test
call then invokes the test with the customer's credentials as parameters. As a result, it should return a success or failure indication for the test execution.
- Text Field: A standard text input field for user text entry. You can provide a
title
displayed above the field. You can indicate whether the field isrequired
, and you may also include an optionaldescription
displayed below the field to help explain what the user should complete. - Dropdown: A drop-down menu that allows users to choose one option from the list you provided.
- Toggle Field: A toggle switch for binary options (e.g., on/off or yes/no).
- Don't push anything other than source data to the destination. State data is saved to production database and returned in
UpdateRequest
. - Don't forget to handle new schemas/tables/columns per the information and user choices in
UpdateRequest#selection
. - Make sure you checkpoint at least once an hour. In general, the more frequently you do it, the better.
The Schema
RPC call retrieves the user's schemas, tables, and columns. It also includes an optional selection_not_supported
field that indicates whether the user can select or deselect tables and columns within the Fivetran dashboard.
The Update
RPC call should retrieve data from the source. We send a request using the UpdateRequest
message, which includes the user's connection state, credentials, and schema information. The response, streaming through the UpdateResponse
message, can contain data records and other supported operations.
- Do not push anything other than source data to the destination.
- In addition to source columns, Fivetran sends the following additional system columns if and when required:
_fivetran_synced
: This is aUTC_DATETIME
column that represents the start of sync. Every table has this system column._fivetran_deleted
: This column is used to indicate whether a given row is deleted at the source or not. If the source soft-deletes a row or a table, this system column is added to the table._fivetran_id
: Fivetran supports primary-keyless source tables by adding this column as a stand-in pseudo primary key column so that all destination tables have a primary key.
Batch files are compressed using ZSTD.
- Each batch file is encrypted separately using AES-256 in CBC mode and with
PKCS5Padding
. - You can find the encryption key for each batch file in the
WriteBatchRequest#keys
field. - First 16 bytes of each batch file hold the IV vector.
- Each batch file is limited in size to 100MB.
- Number of records in each batch file can vary depending on row size.
- We only support CSV file format.
- Fivetran creates batch files using
com.fasterxml.jackson.dataformat.csv.CsvSchema
, which by default doesn't consider backslash '' an escape character. If you are reading the batch file then make sure that you do not consider backslash '' an escape character. - BINARY data is written to batch files using base64 encoding. You need to decode it to get back the original byte array.
The CreateTable
RPC call should fail if it is asked to create a table that already exists. However, it should not fail if the target schema is missing. The destination should create the missing schema.
The DescribeTable
RPC call should report all columns in the destination table, including Fivetran system columns such as _fivetran_synced
and _fivetran_deleted
. It should also provide other additional information as applicable such as data type, primary_key
, and DecimalParams
.
- The
Truncate
RPC call might be requested for a table that does not exist in the destination. In that case, it should NOT fail, simply ignore the request and returnsuccess = true
. utc_delete_before
has millisecond precision.
- The
AlterTable
RPC call should be used for changing primary key columns, adding columns, and changing data types. - However, this operation should not drop any columns even if the
AlterTable
request has a table with a different set of columns. Dropping columns could lead to unexpected customer data loss and is against Fivetran's general approach to data movement.
The WriteBatchRequest
RPC call provides details about the batch files containing the records to be pushed to the destination. We provide the WriteBatchRequest
parameter that contains all the information required for you to read the batch files. Here are some of the fields included in the request message:
-
replace_files
is for theupsert
operation where the rows should be inserted if they don't exist or updated if they do. Each row will always provide values for all columns. Set the_fivetran_synced
column in the destination with the values coming in from the batch files. -
update_files
is for theupdate
operation where modified columns have actual values whereas unmodified columns have the special valueunmodified_string
inFileParams
. Soft-deleted rows will arrive in here as well. Update the_fivetran_synced
column in the destination with the values coming in from the batch files. -
delete_files
is for thehard delete
operation. Use primary key columns (or_fivetran_id
system column for primary-keyless tables) to performDELETE FROM
. -
keys
is a map that provides a list of secret keys, one for each batch file, that can be used to decrypt them. -
file_params
provides information about the file type and any configurations applied to it, such as encryption or compression.
Also, Fivetran deduplicates operations such that each primary key shows up only once in any of the operations.
Do not assume order of columns in the batch files. Always read the CSV file header to determine the column order.
CsvFileParams
:null_string
value is used to representNULL
value in all batch files.unmodified_string
value is used to indicate columns inupdate_files
where the values did not change.
Examples of each DataType as they would appear in CSV batch files are as follows:
- UNSPECIFIED: This data type never appears in batch files
- BOOLEAN: "true", "false"
- SHORT: -32768 .. 32767
- INT: -2147483648 .. 2147483647
- LONG: -9223372036854776000 .. 9223372036854775999
- DECIMAL: Floating point values with max precision of 38 and max scale of 37
- FLOAT: Single-precision 32-bit IEEE 754 values, e.g. 3.4028237E+38
- DOUBLE: Double-precision 64-bit IEEE 754 values, e.g. -2.2250738585072014E-308
- NAIVE_TIME: Time without a timezone in the ISO-8601 calendar system, e.g. 10:15:30
- NAIVE_DATE: Date without a timezone in the ISO-8601 calendar system, e.g. 2007-12-03
- NAIVE_DATETIME: A date-time without timezone in the ISO-8601 calendar system, e.g. 2007-12-03T10:15:30
- UTC_DATETIME: An instantaneous point on the timeline, always in UTC timezone, e.g. 2007-12-03T10:15:30.123Z
- BINARY: Binary data is represented as protobuf
bytes
(such as ByteString in Java), e.g.[B@7d4991ad
(showing ByteString as bytes) - XML: "
<tag>
This is xml</tag>
" - STRING: "This is text"
- JSON: "{"a": 123}"
The following is a list of test scenarios we recommend you consider:
- Test mapping of all data types between Fivetran types and source/destination types (e.g. Mysql)
- Big data loads
- Big incremental updates
- Narrow event tables
- Wide fact tables
In addition to the suggestions above, consider the following as well:
- Make sure to test with at least one of each of the following source connector types:
- Database (Postgres, MongoDB, etc.)
- Application (Github, Hubspot, Google Sheets, etc.)
- File (S3, Google Drive, etc.)
- Fivetran Platform connector
- Exercise
AlterTable
in various ways:- Adding one or more columns
- Change of primary key columns (adding and removing columns from primary key constraint)
- Changing data type of non-primary key columns
- Changing data type of primary key columns
- Test tables with and without primary-key
Sort of. We will email you the logs for a failed sync through support but the number of log messages is limited and this is a slow process for debugging in general. What you need to do is add your own logging for your own platform of choice so you are not reliant on us for logs. Plus, that way, you can implement alerts, monitoring, etc.
Yes, definitely. This is most likely an initial sync where there is nothing but upsert operations, all followed by a truncate, which is meant to (soft) delete any rows that may have existed prior to the initial sync starting. This is done to make sure all rows that may have existed prior to the initial sync are marked as deleted (since we cannot be sure the initial sync will necessarily overwrite them all). The "before timestamp" is key to the truncate operation so you don't just mark the entire table deleted. It should pick out the rows that existed prior to the sync starting, in other words, where _fivetran_synced
< "truncate before timestamp".