Home

Scrapy

Scrapy is an open-source, collaborative framework for extracting data from websites. Written in Python, it provides tools to define how to crawl sites, parse pages, and export structured data. It is widely used for web scraping, data mining, and automated content harvesting.

The framework is built around the Scrapy Engine and components such as Spiders, Selectors, Item Pipelines, and

Scrapy offers built-in features such as robust CSS/XPath extraction, structured data export (JSON, CSV, XML), respect

Typical use involves writing a Spider class, running it with the Scrapy command-line tool, and configuring pipelines

Scrapy is maintained as an open-source project with BSD license. It is developed and supported by Zyte

Middlewares.
Spiders
define
the
site
crawling
logic;
Selectors
(XPath/CSS)
extract
data;
Item
Pipelines
process
and
store
data;
Middlewares
and
Extensions
customize
requests,
responses,
and
behavior.
The
engine
integrates
with
Twisted
for
asynchronous
networking.
for
robots.txt,
auto-throttling
to
adjust
request
rates,
request
retries,
and
caching.
It
provides
a
rich
set
of
components
for
handling
requests,
following
links,
and
managing
scheduling
and
concurrency.
to
persist
data
to
databases
or
files.
The
project
integrates
with
pipelines
for
cleaning,
validation,
and
storage,
supporting
debugging
via
the
Scrapy
Shell.
(formerly
Scrapinghub)
and
a
community
of
contributors,
and
has
a
large
ecosystem
of
plugins
and
extensions.