电竞比分网-中国电竞赛事及体育赛事平台

分享

BeautifulSoup4的安裝及使用

 Levy_X 2017-10-05
    一、BeautifulSoup4的安裝
    方法一:cmd->easy_install BeautifulSoup
    方法二:從http://www./software/BeautifulSoup/bs4/download/
下載->cmd->進(jìn)入下載的文件目錄->python setuyp.py install

二、 BeautifulSoup4的使用 
  1、導(dǎo)入
     from bs4 import BeautifulSoup
     注意:要是BeautifulSoup的版本為3.x,則導(dǎo)入方式為:from BeautifulSoup import BeautifulSoup
  2、example
     html文件:
     html_doc = """

  The Dormouse's story

   Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

"""

  代碼:
  from bs4 import BeautifulSoup
  soup = BeautifulSoup(html_doc)
 
  接下來(lái)可以開(kāi)始使用各種功能

   soup.X (X為任意標(biāo)簽,返回整個(gè)標(biāo)簽,包括標(biāo)簽的屬性,內(nèi)容等)

  如:soup.title

    #

    soup.p

    #

  The Dormouse's story

   soup.a  (注:僅僅返回第一個(gè)結(jié)果)

    # Elsie

    soup.find_all('a') (find_all 可以返回所有)

    # [Elsie,

    # Lacie,

    # Tillie]

    find還可以按屬性查找
    soup.find(id="link3")
    # Tillie

    要取某個(gè)標(biāo)簽的某個(gè)屬性,可用函數(shù)有 find_all,get
    for link in soup.find_all('a'):
      print(link.get('href'))
    # http:///elsie
    # http:///lacie
    # http:///tillie

    要取html文件中的所有文本,可使用get_text()
    print(soup.get_text())
    # The Dormouse's story
    # The Dormouse's story
    # Once upon a time there were three little sisters; and their names were
    # Elsie,
    # Lacie and
    # Tillie;
    # and they lived at the bottom of a well.
    # ...

    如果是打開(kāi)html文件,語(yǔ)句可用:
    soup = BeautifulSoup(open("index.html"))
    BeautifulSoup中的Object
    tag (對(duì)應(yīng)html中的標(biāo)簽)
    tag.attrs (以字典形式返回tag的所有屬性)
   可以直接對(duì)tag的屬性進(jìn)行增、刪、改,跟操作字典一樣

    tag['class'] = 'verybold'

    tag['id'] = 1

    tag

    # <blockquote class="verybold" id="1">Extremely bold</blockquote>


    del tag['class']

    del tag['id']

    tag

    # <blockquote>Extremely bold</blockquote>

    tag['class']

    # KeyError: 'class'

    print(tag.get('class'))

    # None


    X.contents (X為標(biāo)簽,可返回標(biāo)簽的內(nèi)容)

    eg.

    head_tag = soup.head

    head_tag

    # <head><title>The Dormouse's story</title></head>

    head_tag.contents

    [<title>The Dormouse's story</title>]

    title_tag = head_tag.contents[0]

    title_tag

    # <title>The Dormouse's story</title>

    title_tag.contents

    # [u'The Dormouse's story']


    解決解析網(wǎng)頁(yè)出現(xiàn)亂碼問(wèn)題:
    import urllib2
    2     from BeautifulSoup import BeautifulSoup
    3    
    4     page = urllib2.urlopen('http://www.');
    5     soup = BeautifulSoup(page,fromEncoding="gb18030")
    6    
    7     print soup.originalEncoding
    8     print soup.prettify()
 

    

    本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,所有內(nèi)容均由用戶(hù)發(fā)布,不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊一鍵舉報(bào)。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶(hù) 評(píng)論公約

    類(lèi)似文章 更多