-- BeautifulSoup document
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
-- Beautifulsoup4 설치 (pip 이용)
C:\Users\Administrator>pip install Beautifulsoup4
Collecting Beautifulsoup4
Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl
Collecting soupsieve>=1.2 (from Beautifulsoup4)
Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, Beautifulsoup4
Successfully installed Beautifulsoup4-4.8.0 soupsieve-1.9.3
-- Beautifulsoup4 설치 (Download하여 설치)
1. https://www.crummy.com/software/BeautifulSoup/bs4/download/
2. 압축풀고 cmd에서
python setup.py install
3. python shell 다시 열고 import해본다.
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
_url = input("Enter url : ")
try :
_response = urllib.request.urlopen(_url)
_html = _response.read()
_soup = BeautifulSoup(_html, "html.parser")
# Retrieve all of the anchor tags
_tags = _soup("img") # equal _soup.find_all("img")
for _tag in _tags :
print(_tag.get("src", None))
except Exception as e :
print("Exception :", e)
#quit()
>>> exec(open("beautifulSoup.py", encoding="utf-8").read())
Enter url : http://naver.com
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd14463367.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd172615885.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd173138949.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd154219877.png
https://s.pstatic.net/static/newsstand/up/2018/0807/nsd1484475.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd162528724.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd172855569.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd145214517.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd14449981.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd145718601.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd145951763.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd151840663.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd1449112.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd144732945.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd172712628.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd17150763.png
https://s.pstatic.net/static/newsstand/up/2018/0201/nsd19842442.png
https://s.pstatic.net/static/newsstand/up/2017/0424/nsd144110729.png
https://s.pstatic.net/static/www/mobile/edit/2018/0226/mobile_10281388616.png
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9010204439122945.jpeg
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9030267937101597.jpeg
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9030134539426539.jpeg
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9024703022726039.jpeg
https://s.pstatic.net/imgnews/image/380/2019/08/29/sptPostArticleImage-23858.jpg
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9024084681267805.jpeg
https://s.pstatic.net/static/www/mobile/edit/2019/0829/cropImg_166x108_9016497794767293.png
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/m/guide/dummy_1X1.jpg
https://s.pstatic.net/static/www/mobile/edit/2019/0725/mobile_150854635863.png
https://s.pstatic.net/static/www/mobile/edit/2019/0806/mobile_10221528497.png
>>>
urllib BeautifulSoup4 scraping
|
2019.08.30 01:00:08
|
2019.09.15 00:57:16
|
592
|
Aiden
Total of Attached file
0.00 Bytes of 0 files
2019.09.14
2019.09.03
2019.09.03
2019.08.30
2019.08.30
2019.08.30
2019.08.29
2019.08.29
2019.08.28
2019.08.23