Skip to content

Data Model

Arus uses a single PostgreSQL instance with multiple schemas to separate concerns.


Schema Overview

sql
arus_config      -- Configuration (auth, sources, pipelines, settings)
arus_state       -- Watermark/state tracking
arus_run_logs    -- Run history and logs
staging          -- Raw landing zone (JSONB tables)
analytics        -- Normalized typed tables

Schema: arus_config

users

User accounts for Arus Console authentication.

ColumnTypeConstraintsDescription
idUUIDPK, default gen_random_uuid()User ID
emailVARCHAR(255)UNIQUE, NOT NULL, INDEXLogin email
password_hashVARCHAR(255)NOT NULLbcrypt hash
nameVARCHAR(255)NOT NULLDisplay name
roleVARCHAR(20)NOT NULL, default vieweradmin, editor, viewer
is_activeBOOLEANdefault trueAccount enabled/disabled
last_loginTIMESTAMPTZnullableLast login timestamp
created_atTIMESTAMPTZdefault NOW()Creation timestamp
updated_atTIMESTAMPTZdefault NOW()Last update timestamp

sources

Registered source database connections.

ColumnTypeConstraintsDescription
idUUIDPKSource ID
nameVARCHAR(255)NOT NULLDisplay name
typeVARCHAR(50)NOT NULLmysql, postgresql, mongodb, mariadb
hostVARCHAR(255)NOT NULLDatabase host
portINTEGERNOT NULLDatabase port
databaseVARCHAR(255)NOT NULLDatabase name
usernameVARCHAR(255)NOT NULLDatabase username
password_encTEXTNOT NULLFernet-encrypted password
sslBOOLEANdefault falseSSL enabled
uriTEXTnullableMongoDB connection string
auth_sourceVARCHAR(100)nullableMongoDB auth database
sync_methodVARCHAR(20)default autoauto, incremental, full_refresh
table_includeVARCHAR[]default []Include glob patterns
table_excludeVARCHAR[]default []Exclude glob patterns
schema_includeVARCHAR[]default []PostgreSQL schemas to scan
statusVARCHAR(20)default registeredregistered, connected, error
last_testedTIMESTAMPTZnullableLast connection test
created_atTIMESTAMPTZdefault NOW()
updated_atTIMESTAMPTZdefault NOW()

destinations

Registered destination (warehouse) connections.

ColumnTypeConstraintsDescription
idUUIDPKDestination ID
nameVARCHAR(255)NOT NULLDisplay name
typeVARCHAR(50)NOT NULLpostgresql, mysql, clickhouse
hostVARCHAR(255)nullableDestination host
portINTEGERnullableDestination port
databaseVARCHAR(255)nullableDatabase name
usernameVARCHAR(255)nullableUsername
password_encTEXTnullableEncrypted password
sslBOOLEANdefault false
raw_schemaVARCHAR(255)default stagingRaw landing schema
target_schemaVARCHAR(255)default analyticsNormalized target schema
is_defaultBOOLEANdefault falseDefault destination
statusVARCHAR(20)default registered
created_atTIMESTAMPTZdefault NOW()
updated_atTIMESTAMPTZdefault NOW()

pipelines

Pipeline configuration.

ColumnTypeConstraintsDescription
idUUIDPKPipeline ID
nameVARCHAR(255)NOT NULLPipeline name
source_idUUIDFK → sources.id ON DELETE CASCADESource reference
destination_idUUIDFK → destinations.id ON DELETE RESTRICTDestination reference
statusVARCHAR(20)default activeactive, paused, inactive
scheduleVARCHAR(100)nullableCron expression
max_retriesINTEGERdefault 3Per-pipeline retry override
timeout_secondsINTEGERdefault 300Pipeline timeout
depends_onUUIDFK → pipelines.id ON DELETE SET NULLPipeline dependency
target_schemaVARCHAR(255)default publicDefault target schema
load_modeVARCHAR(20)default directdirect or raw
created_atTIMESTAMPTZdefault NOW()
updated_atTIMESTAMPTZdefault NOW()

pipeline_tables

Per-table configuration within a pipeline.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDFK → pipelines.id ON DELETE CASCADE
source_tableVARCHAR(255)NOT NULLTable name in source
source_schemaVARCHAR(255)default publicSource schema
target_schemaVARCHAR(255)nullablePer-table override
sync_modeVARCHAR(20)default incrementalincremental or full_refresh
load_modeVARCHAR(20)default directdirect or raw
watermark_columnVARCHAR(255)nullableOverride auto-detected column
transform_configJSONnullableArray of transform step objects
enabledBOOLEANdefault trueEnable/disable table

transform_scripts

Python transform scripts for pipelines.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDFK → pipelines.id ON DELETE CASCADE
nameVARCHAR(255)NOT NULLScript name
descriptionTEXTnullable
contentTEXTNOT NULLPython source code
created_atTIMESTAMPTZdefault NOW()
updated_atTIMESTAMPTZdefault NOW()

notification_targets

Notification channels (Telegram, Discord, Slack).

ColumnTypeConstraintsDescription
idUUIDPK
nameVARCHAR(255)NOT NULL
typeVARCHAR(50)NOT NULLtelegram, discord, slack
configJSONBNOT NULLType-specific config
activeBOOLEANdefault true
created_atTIMESTAMPTZdefault NOW()

pipeline_notifications

Links between pipelines and notification targets.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDNOT NULL
target_idUUIDNOT NULL
eventsJSONBdefault []Event types array
created_atTIMESTAMPTZdefault NOW()

runtime_settings

Key-value runtime settings (managed via UI).

ColumnTypeConstraintsDescription
idUUIDPK
keyVARCHAR(100)UNIQUE, NOT NULLSetting key
valueTEXTnullableSetting value
updated_atTIMESTAMPTZdefault NOW()

data_quality_log

Data quality check results.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDNOT NULL, INDEX
run_idUUIDNOT NULL, INDEX
table_nameVARCHAR(255)NOT NULL
check_typeVARCHAR(50)NOT NULLrow_count, null_check
statusVARCHAR(20)NOT NULLpassed, failed, warning
rows_extractedINTEGERnullable
rows_loadedINTEGERnullable
discrepancy_pctFLOATnullableRow count discrepancy
null_columnsTEXTnullableComma-separated null columns
required_columnsTEXTnullable
messageTEXTnullable
passedBOOLEANdefault true
checked_atTIMESTAMPTZdefault NOW()

Schema: arus_state

watermarks

Incremental sync state tracking.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDNOT NULL
source_tableVARCHAR(255)NOT NULL
watermark_colVARCHAR(255)nullableColumn used for watermark
watermark_valueTEXTnullableCurrent watermark value
row_countINTEGERdefault 0Last synced row count
last_run_idUUIDnullableLast run reference
last_synced_atTIMESTAMPTZnullable
created_atTIMESTAMPTZdefault NOW()
updated_atTIMESTAMPTZdefault NOW()

Unique constraint: (pipeline_id, source_table)


Schema: arus_run_logs

runs

Individual pipeline run records.

ColumnTypeConstraintsDescription
idUUIDPK
pipeline_idUUIDNOT NULL
statusVARCHAR(20)default runningrunning, success, failed, cancelled, skipped
started_atTIMESTAMPTZdefault NOW()
finished_atTIMESTAMPTZnullable
duration_msINTEGERnullable
rows_syncedINTEGERdefault 0
trigger_typeVARCHAR(20)default scheduledscheduled, manual, backfill, full_refresh
error_messageTEXTnullable
created_atTIMESTAMPTZdefault NOW()

run_table_stats

Per-table statistics within a run.

ColumnTypeConstraintsDescription
idUUIDPK
run_idUUIDFK → runs.id ON DELETE CASCADE
table_nameVARCHAR(255)NOT NULL
rows_extractedINTEGERdefault 0
rows_loaded_rawINTEGERdefault 0Rows into staging
rows_loaded_analyticsINTEGERdefault 0Rows into analytics
rows_failedINTEGERdefault 0
watermark_beforeTEXTnullable
watermark_afterTEXTnullable
duration_msINTEGERdefault 0
error_messageTEXTnullable

run_logs

Timestamped log entries for each run.

ColumnTypeConstraintsDescription
idBIGINTPK, autoincrementLog entry ID
run_idUUIDFK → runs.id ON DELETE CASCADE
timestampTIMESTAMPTZdefault NOW()
levelVARCHAR(10)default INFOINFO, WARNING, ERROR, DEBUG
messageTEXTNOT NULLLog message

Schema: staging

<source>_<table>_raw

Auto-created raw landing tables (one per synced source table).

ColumnTypeConstraintsDescription
_arus_idBIGSERIALPK
_arus_run_idUUIDNOT NULLRun reference
_arus_extractedTIMESTAMPTZdefault NOW()Extraction timestamp
_dataJSONBNOT NULLFull source row as JSON

_dead_letters

Failed rows that couldn't be loaded after retries.

ColumnTypeConstraintsDescription
idUUIDPK
source_nameVARCHAR(255)NOT NULL
table_nameVARCHAR(255)NOT NULL
run_idUUIDNOT NULL, INDEX
row_dataJSONBNOT NULLFailed row data
error_textTEXTnullableError message
failed_atTIMESTAMPTZdefault NOW()

Schema: analytics

<table>

Auto-created normalized tables mirroring source table structure.

Each table has:

  • All source columns mapped to PostgreSQL types
  • _arus_run_id UUID NOT NULL — reference to the run that synced this row
  • _arus_synced_at TIMESTAMPTZ DEFAULT NOW() — when the row was loaded

Entity Relationship

users ─────────── has many ──→ runs (via pipeline triggers)

sources ──── has many ──→ pipelines ──── has many ──→ pipeline_tables
     │                          │
     │                          ├── has many ──→ transform_scripts
     │                          │
     │                          └── has many ──→ runs ──── has many ──→ run_logs
     │                                                    └── has many ──→ run_table_stats

     │                          ┌── has many ──→ watermarks (arus_state)
     │                          │
     └── has many ──→ pipeline_notifications ──── has one ──→ notification_targets