bs4 (Beautiful Soup) uses a pluggable XML or HTML parser to parse a (possibly invalid) document into a tree representation. bs4 (Beautiful Soup) provides methods and Pythonic idioms that make it easy to navigate, search, and modify the parse tree.
bs4 (Beautiful Soup) works with Python 2.7 and up. It works better if lxml and/or html5lib is installed.
urllib is a package that collects several modules for working with URLs:
-
urllib.requestfor opening and reading URLs -
urllib.errorcontaining the exceptions raised by urllib.request -
urllib.parsefor parsing URLs -
urllib.robotparserfor parsingrobots.txtfiles


