BeautifulSoup
BeautifulSoup is a Python library for parsing HTML and XML documents. It creates a parse tree from page source that can be navigated and searched. It is designed for quick extraction of data from malformed markup and for operations with a simple interface.
Developed by Leonard Richardson, BeautifulSoup is commonly used via the bs4 package. The library is open-source
Core features include a navigable parse tree, methods such as find, find_all, find_parent, and select for CSS-style
Usage example: from bs4 import BeautifulSoup; soup = BeautifulSoup(html, 'html.parser'); for link in soup.find_all('a', href=True): print(link['href'])
BeautifulSoup emphasizes simplicity and readability. For very large HTML documents, choosing a faster parser such as