Home

TypeFrequenz

TypeFrequenz is a metric used to quantify the rate at which a particular type or category appears within a data sequence. It is used across disciplines such as linguistics, data mining, and information theory to summarize the distribution of discrete types in time or space.

Formal definition: Consider a data stream of events labeled with discrete types. The TypeFrequenz of type t

Computation and methods: It can be estimated with fixed windows, sliding windows, or online estimators such

Examples and caveats: In a text corpus, the TypeFrequenz of a common function word might be high,

See also: Frequency, probability distribution, Zipf's law, event rate.

over
a
window
W
with
N
events
is
f_W(t)
=
n_W(t)
/
N,
where
n_W(t)
is
the
count
of
events
of
type
t
in
W.
In
continuous
time,
the
asymptotic
rate
f(t)
=
lim_{T→∞}
n_T(t)/T
gives
the
long-run
arrival
rate
of
type
t.
TypeFrequenz
thus
normalizes
counts
by
window
length
rather
than
using
absolute
counts,
enabling
comparison
across
samples
of
different
sizes.
as
exponential
smoothing.
It
is
used
to
compare
type
distributions,
detect
shifts,
model
language
usage,
analyze
network
traffic,
sensor
streams,
or
user
behavior.
e.g.,
0.07
of
tokens
in
a
large
English
corpus.
However,
TypeFrequenz
depends
on
the
chosen
time
scale,
labeling
granularity,
and
sampling
method;
different
definitions
of
a
"type"
yield
different
results.