You might be wondering: I’m learning software architecture, why do I need a whole track on Big Data?

Fair question. Here’s the answer.

Data Is the System

In most modern systems, data is the most critical and hardest-to-change part of the architecture. You can rewrite services. You can swap out frameworks. You cannot easily migrate terabytes of data or redesign a pipeline that thousands of processes depend on.

Architects who don’t understand data — how it moves, where it lives, how it’s processed — make decisions that seem reasonable at small scale and become catastrophic bottlenecks at large scale.

The Scale Problem

When data volume grows beyond what a single database can comfortably handle, you enter Big Data territory. This isn’t just about storage — it’s about:

Processing speed: Can you compute results fast enough to be useful?

Pipeline reliability: What happens when a stage of your pipeline fails?

Latency requirements: Does the business need results in real-time, or is batch processing acceptable?

Cost: Processing at scale is expensive. Architectural decisions here have direct financial consequences.

These are architectural decisions, not just data engineering decisions.

Two Worlds: Batch vs. Real-Time

One of the most fundamental Big Data architectural choices is how you process data:

Batch processing — collect data over a period, process it all at once. Simpler, cheaper, but results are delayed. Useful for reports, aggregations, model training.

Stream processing — process data as it arrives, event by event. More complex, but enables real-time decisions. Useful for fraud detection, live dashboards, personalization.

The Lambda and Kappa architectures (covered in this track) are the two dominant patterns for deciding how to combine or choose between these approaches. Every architect working on a data-heavy system needs to understand them.

What You’ll Take Away from This Track

After completing this track, you’ll be able to:

Understand the landscape of Big Data tools and why they exist

Explain the difference between batch and stream processing, and when to use each

Articulate the trade-offs of Lambda vs. Kappa architectures

Understand what data orchestration is and why it matters (Airflow)

Navigate the cloud data services ecosystem without getting lost

These aren’t specialist skills. They’re table stakes for any architect working in a modern, data-driven organization.

💡 Before you start: Make sure you’ve gone through the Databases track first. Big Data builds on those foundations — if you’re shaky on what a database is and how it works, some of this will feel abstract.

Why Big Data Matters for Architects

Data Is the System

The Scale Problem

Two Worlds: Batch vs. Real-Time

What You’ll Take Away from This Track