You might be wondering: I’m learning software architecture, why do I need a whole track on Big Data?
Fair question. Here’s the answer.
Data Is the System
In most modern systems, data is the most critical and hardest-to-change part of the architecture. You can rewrite services. You can swap out frameworks. You cannot easily migrate terabytes of data or redesign a pipeline that thousands of processes depend on.
Architects who don’t understand data — how it moves, where it lives, how it’s processed — make decisions that seem reasonable at small scale and become catastrophic bottlenecks at large scale.
The Scale Problem
When data volume grows beyond what a single database can comfortably handle, you enter Big Data territory. This isn’t just about storage — it’s about:
- Processing speed: Can you compute results fast enough to be useful?
- Pipeline reliability: What happens when a stage of your pipeline fails?
- Latency requirements: Does the business need results in real-time, or is batch processing acceptable?
- Cost: Processing at scale is expensive. Architectural decisions here have direct financial consequences.
These are architectural decisions, not just data engineering decisions.
Two Worlds: Batch vs. Real-Time
One of the most fundamental Big Data architectural choices is how you process data:
Batch processing — collect data over a period, process it all at once. Simpler, cheaper, but results are delayed. Useful for reports, aggregations, model training.
Stream processing — process data as it arrives, event by event. More complex, but enables real-time decisions. Useful for fraud detection, live dashboards, personalization.
The Lambda and Kappa architectures (covered in this track) are the two dominant patterns for deciding how to combine or choose between these approaches. Every architect working on a data-heavy system needs to understand them.
What You’ll Take Away from This Track
After completing this track, you’ll be able to:
- Understand the landscape of Big Data tools and why they exist
- Explain the difference between batch and stream processing, and when to use each
- Articulate the trade-offs of Lambda vs. Kappa architectures
- Understand what data orchestration is and why it matters (Airflow)
- Navigate the cloud data services ecosystem without getting lost
These aren’t specialist skills. They’re table stakes for any architect working in a modern, data-driven organization.
💡 Before you start: Make sure you’ve gone through the Databases track first. Big Data builds on those foundations — if you’re shaky on what a database is and how it works, some of this will feel abstract.