Hadoop Modes of Operation¶

Hadoop can work in three different modes, each suited for different environments and use cases. Let's look at them briefly:

1. Standalone Mode¶

This is the simplest mode of Hadoop. In Standalone Mode, Hadoop runs on a single machine and doesn't use HDFS (Hadoop Distributed File System). It's mostly used for testing and learning because there's no real distribution of data or processing.

No HDFS: Files are stored locally.
No distributed processing: Everything happens on one machine.
Use Case: Good for testing code or doing small experiments.

2. Pseudo-distributed Mode¶

In Pseudo-distributed Mode, Hadoop runs on a single machine, but it behaves as if it’s a distributed system. Each Hadoop service like NameNode, DataNode, and ResourceManager runs as a separate process on the same machine.

HDFS is used: Data is distributed across simulated nodes (which are really just separate processes on the same machine).
Single machine but acts distributed.
Use Case: For developers who want to test on a single machine but still mimic how Hadoop behaves in a real environment.

3. Fully-Distributed Mode¶

In Fully-Distributed Mode, Hadoop runs across multiple machines. Each service (like NameNode, DataNode, etc.) runs on different machines, and data is truly distributed.

Real distribution: Data is spread across many machines.
Use Case: This is used in production environments where Hadoop processes large datasets across a cluster of machines.

Key Takeaway¶

Standalone: All on one machine, no HDFS.
Pseudo-distributed: Simulates distribution on one machine.
Fully-distributed: Runs across multiple machines, real-world usage.

Every Hadoop architecture soul is fsimage¶