Home

Parsing

Parsing is the process of analyzing a sequence of symbols, often text, to determine its grammatical structure with respect to a formal grammar. In computer science, parsing converts input such as source code or data into a structured representation that can be further processed by a program, such as a parse tree or an abstract syntax tree (AST). A typical parser operates after lexical analysis, which tokenizes the input into a stream of symbols. The parser uses a grammar to decide whether the token sequence is valid and, when valid, to build a hierarchical tree that reveals the relationships among tokens.

There are two broad classes of parsing strategies: top-down parsers, such as recursive-descent parsers, and bottom-up

In natural language processing, parsing refers to determining the syntactic structure of sentences, producing constituency or

Key issues in parsing include ambiguity, error handling, and performance. Ambiguous grammars may permit multiple parses

parsers,
such
as
LR
or
shift-reduce
parsers.
Many
parsers
are
hand-written
for
simple
languages,
while
larger
ones
are
generated
from
formal
grammars
by
tools
such
as
YACC,
Bison,
or
ANTLR.
Parsers
are
widely
used
for
programming
languages,
compilers
and
interpreters,
but
also
for
data
formats
like
JSON,
XML,
and
YAML,
where
a
well-defined
grammar
provides
reliable
parsing.
dependency
representations.
These
parsers
help
downstream
tasks
such
as
information
extraction
and
machine
translation,
though
natural
language
parsing
faces
greater
ambiguity
and
variability
than
programming-language
parsing.
for
the
same
input,
requiring
disambiguation
strategies.
Error
recovery
mechanisms
attempt
to
continue
parsing
after
a
fault
and
provide
helpful
messages.
The
result
of
parsing
is
usually
a
parse
tree
showing
the
syntactic
structure,
or
an
AST
that
abstracts
away
certain
details
for
further
semantic
analysis.