123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119 |
- Metadata-Version: 2.1
- Name: beautifulsoup4
- Version: 4.11.1
- Summary: Screen-scraping library
- Home-page: https://www.crummy.com/software/BeautifulSoup/bs4/
- Author: Leonard Richardson
- Author-email: leonardr@segfault.org
- License: MIT
- Download-URL: https://www.crummy.com/software/BeautifulSoup/bs4/download/
- Platform: UNKNOWN
- Classifier: Development Status :: 5 - Production/Stable
- Classifier: Intended Audience :: Developers
- Classifier: License :: OSI Approved :: MIT License
- Classifier: Programming Language :: Python
- Classifier: Programming Language :: Python :: 3
- Classifier: Topic :: Text Processing :: Markup :: HTML
- Classifier: Topic :: Text Processing :: Markup :: XML
- Classifier: Topic :: Text Processing :: Markup :: SGML
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
- Requires-Python: >=3.6.0
- Description-Content-Type: text/markdown
- Provides-Extra: lxml
- Provides-Extra: html5lib
- Requires-Dist: soupsieve (>1.2)
- Provides-Extra: html5lib
- Requires-Dist: html5lib; extra == 'html5lib'
- Provides-Extra: lxml
- Requires-Dist: lxml; extra == 'lxml'
- Beautiful Soup is a library that makes it easy to scrape information
- from web pages. It sits atop an HTML or XML parser, providing Pythonic
- idioms for iterating, searching, and modifying the parse tree.
- # Quick start
- ```
- >>> from bs4 import BeautifulSoup
- >>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
- >>> print(soup.prettify())
- <html>
- <body>
- <p>
- Some
- <b>
- bad
- <i>
- HTML
- </i>
- </b>
- </p>
- </body>
- </html>
- >>> soup.find(text="bad")
- 'bad'
- >>> soup.i
- <i>HTML</i>
- #
- >>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
- #
- >>> print(soup.prettify())
- <?xml version="1.0" encoding="utf-8"?>
- <tag1>
- Some
- <tag2/>
- bad
- <tag3>
- XML
- </tag3>
- </tag1>
- ```
- To go beyond the basics, [comprehensive documentation is available](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).
- # Links
- * [Homepage](https://www.crummy.com/software/BeautifulSoup/bs4/)
- * [Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- * [Discussion group](https://groups.google.com/group/beautifulsoup/)
- * [Development](https://code.launchpad.net/beautifulsoup/)
- * [Bug tracker](https://bugs.launchpad.net/beautifulsoup/)
- * [Complete changelog](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/CHANGELOG)
- # Note on Python 2 sunsetting
- Beautiful Soup's support for Python 2 was discontinued on December 31,
- 2020: one year after the sunset date for Python 2 itself. From this
- point onward, new Beautiful Soup development will exclusively target
- Python 3. The final release of Beautiful Soup 4 to support Python 2
- was 4.9.3.
- # Supporting the project
- If you use Beautiful Soup as part of your professional work, please consider a
- [Tidelift subscription](https://tidelift.com/subscription/pkg/pypi-beautifulsoup4?utm_source=pypi-beautifulsoup4&utm_medium=referral&utm_campaign=readme).
- This will support many of the free software projects your organization
- depends on, not just Beautiful Soup.
- If you use Beautiful Soup for personal projects, the best way to say
- thank you is to read
- [Tool Safety](https://www.crummy.com/software/BeautifulSoup/zine/), a zine I
- wrote about what Beautiful Soup has taught me about software
- development.
- # Building the documentation
- The bs4/doc/ directory contains full documentation in Sphinx
- format. Run `make html` in that directory to create HTML
- documentation.
- # Running the unit tests
- Beautiful Soup supports unit test discovery using Pytest:
- ```
- $ pytest
- ```
|