Skip to main content

Table configuration

Model configuration

PropertyDescriptionDefault
materializedA table materialization like table, incremental, table_hive_ha
s3_data_namingAn optional naming policy for the data on S3. See Table data location.schema_table_unique
external_locationIf set, the full S3 path in which the table will be saved. (Does not work with Iceberg table).none
partitioned_byAn array list of columns by which the table will be partitioned. ⚠️ Limited to the creation of 100 partitions.none
bucketed_byAn array list of columns to bucket the data. This is ignored when using Iceberg. Example: [org_id]none
bucket_countThe number of buckets for bucketing your data. This is ignored when using Iceberg. Example: 1.none
table_typeThe type of table in Athena. Supports hive or iceberg values.hive
formatThe data format for the table. Supports ORC, PARQUET, AVRO, JSON, TEXTFILE.PARQUET
write_compressionThe compression type to use for any storage format that allows compression to be specified. To see which options are available, see CREATE TABLE AS. Example: SNAPPY.none
field_delimiterCustom field delimiter. Used when the format is set to TEXTFILE. See CREATE TABLE AS. Example: ','none
table_propertiesAdditional table properties to add to the table. Valid for Iceberg only. Example: {'optimize_rewrite_delete_file_threshold': '2'}none
native_dropRelation drop operations will be performed with SQL, not direct Glue API calls. No S3 calls will be made to manage data in S3. Data in S3 will only be cleared up for Iceberg tables see AWS docs. Useful in contexts where S3 access is restricted. Note that Iceberg DROP TABLE operations may timeout if they take longer than 60 seconds.false
seed_by_insertDefault behaviour uploads seed data to S3. This flag will create seeds using an SQL insert statement. large seed files cannot use seed_by_insert, as the SQL insert statement would exceed the Athena limit of 262144 bytes. Useful in contexts where S3 access is restricted.false

Table data location

The S3 location in which table data is saved is determined by:

  1. If external_location is defined, that value is used.
  2. If s3_data_dir is defined, the path is determined by this value and s3_data_naming.
  3. If no s3_data_dir is defined, the data is stored under s3_staging_dir/tables/.

The options for s3_data_naming are:

  • table_table: {s3_data_dir}/{table}/
  • table_unique: {s3_data_dir}/{table}/{uuid4()}/
  • schema_table: {s3_data_dir}/{schema}/{table}/
  • schema_table_unique: {s3_data_dir}/{schema}/{table}/{uuid4()}/

It's possible to set the s3_data_naming globally in the profile.yml, set it for a group of models in the dbt_project.yml or overwrite the value for a specific model in the config block.

Workgroup with default output location

When using an Athena workgroup with a default output location configured, s3_data_naming and any configured buckets are ignored and the location configured in the workgroup is used.

Incremental table models

dbt-athena supports incremental models. These strategies are supported:

  • insert_overwrite (default): The insert overwrite strategy deletes the overlapping partitions from the destination table, and then inserts the new records from the source. This strategy depends on the partitioned_by keyword! If no partitions are defined, dbt will fall back to the append strategy.
  • append: Insert new records without updating, deleting or overwriting any existing data. There might be duplicate data (e.g. great for log or historical data).
  • merge: Conditionally updates, deletes, or inserts rows into an Iceberg table. Used in combination with unique_key. ⚠️ Only available when using Iceberg.

On schema change

on_schema_change is an option to reflect changes of schema in incremental models. The following options are supported:

  • ignore (default)
  • fail
  • append_new_columns
  • sync_all_columns

In detail, please refer to dbt docs.

0