Hadoop
Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of commodity hardware. It is designed to scale from single servers to thousands of machines, handling petabytes of data. The core components are the Hadoop Distributed File System (HDFS) for storage, and the MapReduce processing model, together with Yet Another Resource Negotiator (YARN) for resource management. While originally built around MapReduce, Hadoop now supports a variety of data processing engines that run on the same ecosystem.
History: Hadoop originated from research at Yahoo and the Google papers on distributed storage and processing.
Architecture: HDFS stores data on a cluster of data nodes, managed by a centralized NameNode. Data is
Usage and scope: Hadoop is well-suited for batch processing of very large data sets using inexpensive hardware.