Parsel

Parsel is a BSD-licensed Python library to extract data from HTML, JSON, and XML documents.

It supports:

CSS and XPath expressions for HTML and XML documents
JMESPath expressions for JSON documents
Regular expressions

Find the Parsel online documentation at https://parsel.readthedocs.org.

Example (open online demo):

>>> from parsel import Selector
>>> text = """
        <html>
            <body>
                <h1>Hello, Parsel!</h1>
                <ul>
                    <li><a href="http://example.com">Link 1</a></li>
                    <li><a href="http://scrapy.org">Link 2</a></li>
                </ul>
                <script type="application/json">{"a": ["b", "c"]}</script>
            </body>
        </html>"""
>>> selector = Selector(text=text)
>>> selector.css('h1::text').get()
'Hello, Parsel!'
>>> selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
>>> for li in selector.css('ul > li'):
...     print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org
>>> selector.css('script::text').jmespath("a").get()
'b'
>>> selector.css('script::text').jmespath("a").getall()
['b', 'c']

Name	Name	Last commit message	Last commit date
Latest commit wRAR Merge pull request #320 from Laerte/master Mar 31, 2025 4966533 · Mar 31, 2025 History 813 Commits
.github/workflows	.github/workflows	Add non-Linux CI jobs, bump tool versions (#316 )	Mar 24, 2025
docs	docs	Migrate to ruff.	Jan 30, 2025
parsel	parsel	Add support for bytearray to the body parameter of the Selector	Mar 31, 2025
tests	tests	fix: add return type	Mar 31, 2025
.git-blame-ignore-revs	.git-blame-ignore-revs	ignoring commit for pre-commit changes	Feb 22, 2024
.gitignore	.gitignore	Selector.css: do not murate state	Oct 28, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	Add non-Linux CI jobs, bump tool versions (#316 )	Mar 24, 2025
.readthedocs.yml	.readthedocs.yml	Add non-Linux CI jobs, bump tool versions (#316 )	Mar 24, 2025
LICENSE	LICENSE	add BSD LICENSE from Scrapy (fixes #46 )	Jul 25, 2016
MANIFEST.in	MANIFEST.in	Fix twinecheck warnings.	Oct 11, 2024
NEWS	NEWS	Bump version: 1.9.1 → 1.10.0	Dec 16, 2024
README.rst	README.rst	Fix CI badge	Mar 24, 2025
pyproject.toml	pyproject.toml	Move tool configs into pyproject.toml.	Jan 31, 2025
release.rst	release.rst	hotfix: fix strict syntax to work in PyPI; update procedure	May 17, 2017
setup.py	setup.py	Migrate to ruff.	Jan 30, 2025
tox.ini	tox.ini	Add non-Linux CI jobs, bump tool versions (#316 )	Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsel

About

Releases 17

Packages

Used by 38.6k

Contributors 49

Languages

License

scrapy/parsel

Folders and files

Latest commit

History

Repository files navigation

Parsel

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 17

Packages 0

Used by 38.6k

Contributors 49

Languages

Packages