InputFormat
InputFormat is an abstraction used in data processing frameworks to describe how input data is split into manageable chunks and how those chunks are read. In the Hadoop MapReduce ecosystem, InputFormat is a core interface that determines two complementary responsibilities: how the input data is partitioned into InputSplit objects and how a RecordReader is created to produce key/value pairs from each split. By delegating these concerns to the InputFormat, the framework can efficiently schedule work and support diverse data sources, from text files to binary formats and specialized data stores.
The splits represent portions of input data that can be processed independently. The framework queries the
Common implementations include TextInputFormat (reads lines of text), KeyValueTextInputFormat (parses lines into a key and value