FAQ

Frequently asked questions about Arus.

General

What is Arus?

Arus is a lightweight, self-hosted CDC & ETL framework for teams running on VPS-class infrastructure (no Kubernetes needed). It ingests data from MySQL, MariaDB, PostgreSQL, and MongoDB, applies transformations, and lands them into a PostgreSQL, MySQL, or ClickHouse data warehouse.

How is Arus different from Airbyte?

	Airbyte	Arus
Infrastructure	Requires Kubernetes or Docker Compose with async worker	Single `docker compose up -d`
Resources	Minimum 4GB+ with separate worker	2-core / 4GB RAM total
CDC method	Debezium/Kafka-based (log-based)	Watermark-based (batch SELECT)
Complexity	10+ microservices	3 containers (DB, API, Console)
Setup time	30-60 minutes	2-3 minutes

How is Arus different from Debezium?

Debezium requires Kafka as the transport layer — you need Zookeeper + Kafka + Kafka Connect + Debezium connectors. That's 4+ infrastructure components before you sync a single row. Arus does everything in-process with just PostgreSQL + FastAPI. No Kafka, no Zookeeper, no Connect cluster.

Do I need Kubernetes?

No. Arus is designed specifically for VPS-class infrastructure. It runs on a single Docker host with 2 CPU cores and 4GB RAM.

Is Arus production-ready?

Phases 1 and 2 are complete — core ETL pipeline, connectors, auth, DAG UI, retry/backoff, dead letter queue, quality checks, schema drift detection, and notifications. Phase 3 (CLI tools, backfill UI, multi-env) is in progress.

What license does Arus use?

Arus is currently a private project. Check the repository for licensing details.

Technical

What CDC methods does Arus support?

Arus uses watermark-based batch CDC — it polls source tables for new/updated rows using a timestamp column (e.g., updated_at). This approach:

✅ Works with read-replicas (no write access needed)
✅ Minimal database overhead (standard SELECT queries)
✅ No binlog/Write-Ahead Log (WAL) configuration needed
✅ Works across all supported databases

Does Arus support real-time (streaming) CDC?

No. Arus is a batch CDC tool with configurable polling intervals (default: every 5 minutes). If you need sub-second real-time sync, consider Debezium + Kafka.

Can Arus handle large tables?

Yes. Arus uses batched extraction (default: 10,000 rows per batch) with watermark tracking. Large tables are synced incrementally — only new/changed rows are transferred after the initial backfill.

What happens if a pipeline run fails?

Arus automatically retries failed operations with exponential backoff (default: 3 attempts, 2s → 4s → 8s → 16s). After all retries are exhausted:

The run is marked as failed
Failed rows are saved to the Dead Letter Queue (DLQ)
A notification is sent if configured (Telegram, Discord, or Slack)
The next scheduled run picks up from the last successful watermark

Does Arus handle schema changes in source tables?

Yes — schema drift detection is built in. During each pipeline run, Arus compares source columns against warehouse columns:

New columns are detected and logged
If auto_alter_schema is enabled, new columns are added to the warehouse table automatically
Notifications can be sent for schema drift events

Can I run multiple pipelines concurrently?

Yes. The default configuration supports 5 concurrent pipelines on 2-core / 4GB RAM. Each pipeline runs in its own thread with a 5-minute timeout.

What databases can be destinations?

Currently supported destinations:

PostgreSQL — full support (JSONB raw storage, UPSERT, typed analytics)
MySQL — full support (JSON raw storage, batch insert)
ClickHouse — with MergeTree engine and auto-TTL

Setup & Configuration

What are the minimum system requirements?

Minimum: 2 CPU cores, 4GB RAM, 20GB disk
Recommended: 4 CPU cores, 8GB RAM, 50GB SSD
Software: Docker 24+ & Docker Compose v2+

Do I need to expose Arus to the internet?

Not necessarily. Arus works perfectly on a local network or behind a VPN. For cloud deployments, we recommend using a reverse proxy with HTTPS (nginx, Caddy, or Cloudflare Tunnel).

Can I use a source database that's behind a firewall?

Yes. As long as the Arus server can reach the source database host and port, it works. For additional security:

Use SSH tunnels or WireGuard for connectivity
Create a read-only database user for source connectors
Restrict source access to the Arus server's IP address

How do I secure Arus in production?

See the Security Guide for a comprehensive checklist. Key points:

Change default credentials immediately
Set strong ARUS_JWT_SECRET and ARUS_ENCRYPTION_KEY
Use HTTPS via reverse proxy
Bind Docker ports to localhost only
Use read-only accounts for source databases
Configure automated backups

How do I update Arus?

bash

# Pull new images
docker compose pull

# Restart with new images
docker compose up -d

# Verify health
curl http://localhost:8081/api/health

Database migrations run automatically on startup.

Monitoring & Troubleshooting

How do I check if Arus is running?

bash

docker compose ps
curl http://localhost:8081/api/health

Expected response:

json

{"status":"ok","data":{"version":"0.1.0","database":"connected","scheduler":"running"}}

How do I view pipeline logs?

Via Console: Pipeline Detail → Run History → Logs
Via API: GET /api/runs/{id}/logs
Via Docker: docker compose logs arus-api

What should I do if a pipeline keeps failing?

Check Run Logs in the Console for error details
Verify source database credentials and connectivity
Check the Dead Letter Queue for failed rows
Ensure the source table has a proper updated_at index
Review source database load and connection limits

How do I reset a stuck pipeline?

bash

# Cancel the stuck run via API
curl -X POST http://localhost:8081/api/runs/{run_id}/cancel \
  -H "Authorization: Bearer ***"

If the scheduler itself is stuck:

bash

docker compose restart arus-api

Migration

Can I migrate from Airbyte to Arus?

Yes. Since both tools use a common pattern (source connector → pipeline → destination), migration steps are:

Set up Arus with Docker Compose
Configure the same source databases in Arus
Configure the same destination
For each Airbyte connection, create a matching pipeline in Arus
Run a Full Refresh to backfill historical data
Disable Airbyte connections once Arus pipelines are running

Can I migrate from custom ETL scripts to Arus?

Yes. Custom Python/Shell scripts can be replaced by:

Configure the source as an Arus Source connector
Set transforms in the pipeline for any data processing logic
Configure the existing database as an Arus Destination

Will my existing warehouse tables be affected?

Arus creates tables in dedicated schemas (analytics.*, staging.*), so existing tables are not affected. You can run Arus alongside your current setup during migration.

Limits & Scalability

What's the maximum throughput?

Throughput depends on hardware and source/destination performance. Typical benchmarks:

2-core / 4GB RAM: ~50K-100K rows/minute (single pipeline)
4-core / 8GB RAM: ~200K-500K rows/minute (5 concurrent pipelines)
8-core / 16GB RAM: ~1M+ rows/minute

Is there a limit on the number of tables?

No hard limit. Each table is tracked by a watermark row in the database. Resource usage scales with the number of active tables and sync frequency.

Can I use Arus with a data warehouse like Snowflake or BigQuery?

Not yet. Currently supported destinations are PostgreSQL, MySQL, and ClickHouse. Support for cloud warehouses is being evaluated for Phase 3.

Does Arus support multi-tenancy?

Not directly. Each Arus instance manages its own set of connectors and pipelines. For multi-tenant setups, you can run separate instances or use schema-level isolation in the warehouse database.

FAQ ​

General ​

What is Arus? ​

How is Arus different from Airbyte? ​

How is Arus different from Debezium? ​

Do I need Kubernetes? ​

Is Arus production-ready? ​

What license does Arus use? ​

Technical ​

What CDC methods does Arus support? ​

Does Arus support real-time (streaming) CDC? ​

Can Arus handle large tables? ​

What happens if a pipeline run fails? ​

Does Arus handle schema changes in source tables? ​

Can I run multiple pipelines concurrently? ​

What databases can be destinations? ​

Setup & Configuration ​

What are the minimum system requirements? ​

Do I need to expose Arus to the internet? ​

Can I use a source database that's behind a firewall? ​

How do I secure Arus in production? ​

How do I update Arus? ​

Monitoring & Troubleshooting ​

How do I check if Arus is running? ​

How do I view pipeline logs? ​

What should I do if a pipeline keeps failing? ​

How do I reset a stuck pipeline? ​

Migration ​

Can I migrate from Airbyte to Arus? ​

Can I migrate from custom ETL scripts to Arus? ​

Will my existing warehouse tables be affected? ​

Limits & Scalability ​

What's the maximum throughput? ​

Is there a limit on the number of tables? ​

Can I use Arus with a data warehouse like Snowflake or BigQuery? ​

Does Arus support multi-tenancy? ​

FAQ

General

What is Arus?

How is Arus different from Airbyte?

How is Arus different from Debezium?

Do I need Kubernetes?

Is Arus production-ready?

What license does Arus use?

Technical

What CDC methods does Arus support?

Does Arus support real-time (streaming) CDC?

Can Arus handle large tables?

What happens if a pipeline run fails?

Does Arus handle schema changes in source tables?

Can I run multiple pipelines concurrently?

What databases can be destinations?

Setup & Configuration

What are the minimum system requirements?

Do I need to expose Arus to the internet?

Can I use a source database that's behind a firewall?

How do I secure Arus in production?

How do I update Arus?

Monitoring & Troubleshooting

How do I check if Arus is running?

How do I view pipeline logs?

What should I do if a pipeline keeps failing?

How do I reset a stuck pipeline?

Migration

Can I migrate from Airbyte to Arus?

Can I migrate from custom ETL scripts to Arus?

Will my existing warehouse tables be affected?

Limits & Scalability

What's the maximum throughput?

Is there a limit on the number of tables?

Can I use Arus with a data warehouse like Snowflake or BigQuery?

Does Arus support multi-tenancy?