Lexing
Lexing, short for lexical analysis, is the process of converting a stream of characters into a stream of tokens, the basic units used by a compiler or interpreter to understand a programming language or data format. It is usually the first phase in language processing and operates before parsing. A lexer reads source text, matches character sequences against a set of token patterns, and emits tokens that carry a type and, when relevant, a value such as an identifier name or numeric literal.
Token kinds typically include keywords, identifiers, literals (numbers, strings), operators, punctuation, and sometimes comments or whitespace.
Two key principles guide lexing: the longest-match rule, which selects the token that matches the most characters
The lexer outputs a sequence of tokens to the parser, enabling syntactic analysis without direct access to