Home

lexer

A lexer, or lexical analyzer, is a software component that performs lexical analysis by reading a stream of characters from a program's source code and grouping them into tokens. Each token represents a basic syntactic unit, such as a keyword, an identifier, a numeric or string literal, or an operator, along with optional metadata like its type, value, and position.

The lexer scans the input left to right, applying a set of patterns (typically expressed as regular

Common token categories include keywords (if, while), identifiers (variable names), literals (numbers, strings), operators (+, -, *), punctuation (commas,

The token stream produced by the lexer is consumed by the parser or interpreter, which uses the

expressions
or
deterministic
finite
automata)
to
recognize
the
longest
possible
match
at
each
position
(the
maximal
munch
rule).
When
a
pattern
matches,
the
corresponding
token
is
emitted
and
the
scanner
advances
past
the
matched
input.
Whitespace
and
comments
are
usually
ignored
or
turned
into
non-emitting
tokens,
though
some
languages
preserve
them
for
tooling
or
formatting.
semicolons),
and
sometimes
more
complex
tokens
like
multi-character
operators
(<=,
!=).
The
lexer
may
also
perform
simple
normalization
or
conversion,
such
as
converting
numeric
literals
to
internal
representations
or
mapping
escape
sequences
in
strings.
tokens
to
build
a
syntactic
structure.
Lexical
analysis
is
fundamental
to
compilation
and
interpretation,
and
its
design
affects
error
reporting,
performance,
and
language
features.
Lexers
can
be
implemented
by
hand
or
generated
from
specification
tools
such
as
Lex,
Flex,
or
ANTLR,
often
trading
readability
for
speed
or
flexibility.