"뷰티플 수프"의 두 판 사이의 차이

2021년 12월 18일 (토) 20:18 기준 최신판

1 개요[ | ]

Beautiful Soup
뷰티플 수프, 뷰터펄 숲 [bjúːtəfəl suːp]

HTML과 XML 문서를 파싱하는 파이썬 패키지
잘못 쓴 태그, 안닫힌 태그 등 소위 '태그 수프'를 잘 처리할 수 있다.

pip install BeautifulSoup4

2 예시 1[ | ]

from bs4 import BeautifulSoup
print(BeautifulSoup("<html><head></head><body>Sacr&eacute; bleu!</body></html>", "html.parser"))

→ HTML 엔티티가 유니코드 문자로 변환되었다.

3 예시 2[ | ]

웹 상의 HTML 페이지를 읽어와서 파싱한다.
requests와 함께 사용한 예시

import requests
from bs4 import BeautifulSoup

r = requests.get('https://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(r.text, 'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

4 같이 보기[ | ]

5 참고[ | ]

@@ 1번째 줄: / 1번째 줄: @@
 ==개요==
 ;Beautiful Soup
-;뷰티풀 수프
+;뷰티플 수프, 뷰터펄 숲 [bjúːtəfəl suːp]
-* HTML과 XML 파일의 데이터를 가져오기 위한 파이썬 라이브러리
+* HTML과 XML 문서를 파싱하는 파이썬 패키지
+* 잘못 쓴 태그, 안닫힌 태그 등 소위 '[[태그 수프]]'를 잘 처리할 수 있다.
-https://www.crummy.com/software/BeautifulSoup/bs4/doc/_images/6.1.jpg
+[[파일:bs4-doc-image-6.1.jpg]]
-<source lang='python'>
+<syntaxhighlight lang='bash'>
+pip install BeautifulSoup4
+</syntaxhighlight>
+==예시 1==
+<syntaxhighlight lang='python' run>
+from bs4 import BeautifulSoup
+print(BeautifulSoup("<html><head></head><body>Sacr&eacute; bleu!</body></html>", "html.parser"))
+</syntaxhighlight>
+:→ [[HTML 엔티티]]가 유니코드 문자로 변환되었다.
+==예시 2==
+* 웹 상의 HTML 페이지를 읽어와서 파싱한다.
+* [[파이썬 requests|requests]]와 함께 사용한 예시
+<syntaxhighlight lang='python' run>
+import requests
 from bs4 import BeautifulSoup
-from urllib.request import urlopen
-with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
+r = requests.get('https://en.wikipedia.org/wiki/Main_Page')
-    soup = BeautifulSoup(response, 'html.parser')
+soup = BeautifulSoup(r.text, 'html.parser')
-    for anchor in soup.find_all('a'):
+for anchor in soup.find_all('a'):
-        print(anchor.get('href', '/'))
+    print(anchor.get('href', '/'))
-</source>
+</syntaxhighlight>
 ==같이 보기==
+{{z컬럼3|
+* [[lxml]]
 * [[Selenium]]
-* [[lxml]]
+* [[태그 수프]]
+* [[뷰티플 수프 문서]]
+* [[파이썬 requests]]
+* [[파이썬 네이버 뉴스 스크래핑 시작하기]]
+}}
 ==참고==
 * {{영어위키백과|Beautiful Soup (HTML parser)}}
-* https://www.crummy.com/software/BeautifulSoup/
+* https://www.crummy.com/software/BeautifulSoup/bs4/doc/
-[[분류: Python 라이브러리]]
+[[분류: BeautifulSoup]]
 [[분류: HTML 파서]]
 [[분류: XML 파서]]
 [[분류: MIT 라이선스]]