Table configuration
Model configuration
Property | Description | Default |
---|---|---|
materialized | A table materialization like table , incremental , table_hive_ha | |
s3_data_naming | An optional naming policy for the data on S3. See Table data location. | schema_table_unique |
external_location | If set, the full S3 path in which the table will be saved. (Does not work with Iceberg table). | none |
partitioned_by | An array list of columns by which the table will be partitioned. ⚠️ Limited to the creation of 100 partitions. | none |
bucketed_by | An array list of columns to bucket the data. This is ignored when using Iceberg. Example: [org_id] | none |
bucket_count | The number of buckets for bucketing your data. This is ignored when using Iceberg. Example: 1 . | none |
table_type | The type of table in Athena. Supports hive or iceberg values. | hive |
format | The data format for the table. Supports ORC , PARQUET , AVRO , JSON , TEXTFILE . | PARQUET |
write_compression | The compression type to use for any storage format that allows compression to be specified. To see which options are available, see CREATE TABLE AS. Example: SNAPPY . | none |
field_delimiter | Custom field delimiter. Used when the format is set to TEXTFILE . See CREATE TABLE AS. Example: ',' | none |
table_properties | Additional table properties to add to the table. Valid for Iceberg only. Example: {'optimize_rewrite_delete_file_threshold': '2'} | none |
native_drop | Relation drop operations will be performed with SQL, not direct Glue API calls. No S3 calls will be made to manage data in S3. Data in S3 will only be cleared up for Iceberg tables see AWS docs. Useful in contexts where S3 access is restricted. Note that Iceberg DROP TABLE operations may timeout if they take longer than 60 seconds. | false |
seed_by_insert | Default behaviour uploads seed data to S3. This flag will create seeds using an SQL insert statement. large seed files cannot use seed_by_insert , as the SQL insert statement would exceed the Athena limit of 262144 bytes. Useful in contexts where S3 access is restricted. | false |
Table data location
The S3 location in which table data is saved is determined by:
- If
external_location
is defined, that value is used. - If
s3_data_dir
is defined, the path is determined by this value ands3_data_naming
. - If no
s3_data_dir
is defined, the data is stored unders3_staging_dir/tables/
.
The options for s3_data_naming
are:
table_table
:{s3_data_dir}/{table}/
table_unique
:{s3_data_dir}/{table}/{uuid4()}/
schema_table
:{s3_data_dir}/{schema}/{table}/
schema_table_unique
:{s3_data_dir}/{schema}/{table}/{uuid4()}/
It's possible to set the s3_data_naming
globally in the profile.yml
, set it for a group of models in the dbt_project.yml
or overwrite the value for a specific model in the config block.
Workgroup with default output location
When using an Athena workgroup with a default output location configured, s3_data_naming
and any configured buckets are ignored and the location configured in the workgroup is used.
Incremental table models
dbt-athena supports incremental models. These strategies are supported:
insert_overwrite
(default): The insert overwrite strategy deletes the overlapping partitions from the destination table, and then inserts the new records from the source. This strategy depends on thepartitioned_by
keyword! If no partitions are defined, dbt will fall back to theappend
strategy.append
: Insert new records without updating, deleting or overwriting any existing data. There might be duplicate data (e.g. great for log or historical data).merge
: Conditionally updates, deletes, or inserts rows into an Iceberg table. Used in combination withunique_key
. ⚠️ Only available when using Iceberg.
On schema change
on_schema_change
is an option to reflect changes of schema in incremental models.
The following options are supported:
ignore
(default)fail
append_new_columns
sync_all_columns
In detail, please refer to dbt docs.
0