Building Connector Libraries That Scale in the Real World

January 27, 2026
Cloud Modernization
Kavin Sharma
A practical story of how “just add integrations” turns into a real connector platform problem and how standardizing Airflow DAG behavior, state, and failure handling made Meltano and DLT scale reliably in production.

When I started working on connectors in a recent project, the job sounded simple:

“Add integrations so users can sync data.”

The first few connectors were exactly that. We used Meltano (Singer taps) to get moving fast, and it worked great in the beginning. You pick a tap, configure credentials, run the pipeline, and you’ve got data.

Then we added more connectors. And then more users started running them on schedules.

What started as simple data connectors quickly became a challenge in

building scalable connector libraries that could survive real-world production workloads.

If your integrations involve older platforms, this ties closely to integrating legacy systems with modern digital services, the same scaling pressure shows up there too.

That’s when the real work began.

Blog image

Phase 1: Meltano gave us speed - and then reality of Scaling Data Connectors

If you’re in the middle of modernizing ingestion end-to-end, our migration story in Rethinking ETL: Why We Migrated Our Legacy Pipelines to Apache Spark adds the broader pipeline context.

Meltano and Singer taps are excellent for bootstrapping ETL connectors but scaling them exposed orchestration and state management gaps.

Meltano was a strong starting point for building a connector library quickly. It helped us integrate multiple sources without writing everything from scratch, and it gave us a common pattern for extraction.

But scaling exposed a few repeating issues - not “Meltano is bad” issues, just the kind of problems that show up when your system moves from “a few manual runs” to “many automated syncs”.

The kinds of problems we started seeing

  • A tap runs successfully but returns 0 rows, because the API needs an extra permission or hidden context.
  • A connector works locally but fails in staging due to environment mismatches.
  • Incremental loads break because an upstream API changes a type or field name.
  • Downstream storage fails on data types like timezones that look valid but aren’t accepted.

At this point, it wasn’t a connector problem.
It was a connector system problem.

The first scaling fix wasn’t a new tool - it was standardizing DAG behaviour

Here’s what I learned fast: connectors don’t scale when orchestration is inconsistent. In practice, connector orchestration with Airflow DAGs mattered more than the connector implementation itself.

A sync job can’t just “run a tap.” It must behave predictably so the platform can:

  • show sync status in UI
  • report failures clearly
  • retry safely
  • trigger alerts
  • avoid partial states that confuse everyone later

So, we standardized the Airflow execution lifecycle into a simple pattern:

start → run extractor → finalize → final status

“This is the same reason ETL control logic matters retries, idempotency, and consistent failure handling are what keep production pipelines stable.”

For Meltano, that meant:

  • start_dag marks the sync started
  • run_meltano runs the tap
  • finalize_dag updates states/metadata
  • final_status reports success/failure and powers alerting

This sounds like a small thing, but it solved a huge scaling issue:

Even when connectors fail in unique ways, the system handles failure in a consistent way.

That’s what “scale” actually is.

Phase 2: When we hit the limits of “just use a tap”

As we went deeper, there were cases where “use an existing Meltano tap” became less ideal:

  • Some taps were hard to extend safely
  • Some workflows needed more control over state and composition
  • Some connectors required custom handling without forking a repo forever

That’s where DLT came into the picture.

Not as a replacement for Meltano - but as a second execution path for connectors where we needed more flexibility.

The hard part of DLT wasn’t writing DLT code

Writing a DLT pipeline is not that hard.

The hard part is this:

If DLT runs differently than Meltano, you now have two systems, two failure models, and two operational playbooks.

That’s how platforms become messy.

So the real goal became:

Make DLT run like Meltano inside the platform

That meant DLT jobs had to:

  • run on existing infra (Airflow / containers)
  • follow the same DAG lifecycle (start → run → finalize → status)
  • report status back the same way so UI and alerting continue to work
  • handle state in a predictable way
  • load data into the same destinations (DuckDB / DuckLake)

We literally reshaped the DLT DAG to match the Meltano DAG structure.

So instead of “a custom DLT task”, it became:
start_dag → run_dlt → finalize_dag → final_status

And this mattered more than any implementation detail, because it kept the platform coherent.

The details that made it scale

Once we had Meltano and DLT both running under the same orchestration contract, the next scaling problems were mostly predictable - and solvable by standardization.

1) Credentials and secrets had to become boring

Connectors fail constantly when secrets aren’t managed cleanly across environments. This is one of the most common modernization pitfalls we see (and it compounds technical debt fast), which we cover in Top 10 Costly Mistakes to Avoid During Cloud Modernization.

So, part of making this scale was ensuring credentials could be:

  • provided consistently
  • injected via environment variables (especially for containerized DLT jobs)
  • backfilled for existing integrations when the credential model evolved

This is unglamorous work, but without it, “scaling connectors” is a lie.

2) State needed to be explicit, not “whatever the tool does”

Treating connector state management as a platform concern was critical for reliable incremental data loads.

Meltano and DLT both manage state differently.

To scale, we treated state as a platform concern:

  • state is persisted
  • state is traceable
  • state updates don’t depend on tribal knowledge

3) Schema drift had to be expected

Schema drift handling became non-negotiable for long-running data connectors tied to external APIs.

External APIs change. The system must assume:

  • new fields appear
  • types change
  • some fields disappear or go empty

Scaling meant adding guardrails so schema issues show up early and don’t silently corrupt the pipeline.

4) Debugging patterns needed to apply across connectors

Once Meltano and DLT ran under the same DAG lifecycle, debugging became far more systematic:

  • check start step ran
  • check extractor logs
  • check finalize callbacks
  • check final_status output
  • check destination tables/state

That’s the difference between “hero debugging” and “platform debugging”.

Blog image

What we ended up with

We didn’t end up choosing Meltano or DLT.

We ended up with a connector platform where:

  • Meltano is the default fast path (huge ecosystem, proven taps)
  • DLT is the flexible path for cases where we need custom composition or better control
  • both run on the same orchestration model
  • both follow the same lifecycle and reporting
  • adding a new integration doesn’t require reinventing how we operate

The scaling win wasn’t throughput.

It was confidence:

Adding connectors stopped feeling risky.

Final takeaway

Meltano helped us move fast.
Airflow DAG standards helped us stay sane.
DLT helped us extend the platform without turning it into a forked-connector graveyard.

Scaling connector libraries isn’t about writing more connectors.

It’s about making sure every connector - no matter how it’s built - behaves predictably when it runs at 2 a.m. on a schedule.

That’s what makes it real.

Related Blogs