How to find the comment tag lt;!--...--gt; with BeautifulSoup?(如何找到评论标签lt;!--...--gt;美丽汤?)
问题描述
我尝试了 soup.find('!--') 但它似乎不起作用.提前致谢.
I tried soup.find('!--') but it doesn't seem to work. Thanks in advance.
感谢您提供有关如何查找所有评论的提示.我有一个后续问题.我如何专门搜索评论?
Thanks for the tip on how to find all comments. I have a follow up question. How do I specifically search out for a comment?
例如,我有以下评论标签:
For example, I have the following comment tag:
<!-- <span class="titlefont"><i>星期三 110518</i>(05:00PM)<br/></span>-->
我真的只是想要这些东西 <i>Wednesday 110518</i>
.110518"是我倾向于用作搜索目标的日期 YYMMDD.但是,我不知道如何在特定的评论标签中找到一些东西.
I really just want this stuff <i>Wednesday 110518</i>
. The "110518" is the date YYMMDD which I'm leaning on using as my search target. However, I don't know how to find something within a specific comment tag.
推荐答案
Pyparsing 允许您使用内置的 htmlComment
表达式搜索 HTML 注释,并附加解析时回调以验证和提取各种评论中的数据字段:
Pyparsing allows you to search for HTML comments using a builtin htmlComment
expression, and attach parse-time callbacks to validate and extract the various data fields within the comment:
from pyparsing import makeHTMLTags, oneOf, withAttribute, Word, nums, Group, htmlComment
import calendar
# have pyparsing define tag start/end expressions for the
# tags we want to look for inside the comments
span,spanEnd = makeHTMLTags("span")
i,iEnd = makeHTMLTags("i")
# only want spans with class=titlefont
span.addParseAction(withAttribute(**{'class':'titlefont'}))
# define what specifically we are looking for in this comment
weekdayname = oneOf(list(calendar.day_name))
integer = Word(nums)
dateExpr = Group(weekdayname("day") + integer("daynum"))
commentBody = '<!--' + span + i + dateExpr("date") + iEnd
# define a parse action to attach to the standard htmlComment expression,
# to extract only what we want (or raise a ParseException in case
# this is not one of the comments we're looking for)
def grabCommentContents(tokens):
return commentBody.parseString(tokens[0])
htmlComment.addParseAction(grabCommentContents)
# let's try it
htmlsource = """
want to match this one
<!-- <span class="titlefont"> <i>Wednesday 110518</i>(05:00PM)<br /></span> -->
don't want the next one, wrong span class
<!-- <span class="bodyfont"> <i>Wednesday 110519</i>(05:00PM)<br /></span> -->
not even a span tag!
<!-- some other text with a date in italics <i>Wednesday 110520</i>(05:00PM)<br /></span> -->
another matching comment, on a different day
<!-- <span class="titlefont"> <i>Thursday 110521</i>(05:00PM)<br /></span> -->
"""
for comment in htmlComment.searchString(htmlsource):
parsedDate = comment.date
# date info can be accessed like elements in a list
print parsedDate[0], parsedDate[1]
# because we named the expressions within the dateExpr Group
# we can also get at them by name (this is much more robust, and
# easier to maintain/update later)
print parsedDate.day
print parsedDate.daynum
print
打印:
Wednesday 110518
Wednesday
110518
Thursday 110521
Thursday
110521
这篇关于如何找到评论标签<!--...-->美丽汤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何找到评论标签<!--...-->美丽汤?


- 失败的 Canvas 360 jquery 插件 2022-01-01
- 如何使用 JSON 格式的 jQuery AJAX 从 .cfm 页面输出查 2022-01-01
- Quasar 2+Apollo:错误:找不到ID为默认的Apollo客户端。如果您在组件设置之外,请使用ProvideApolloClient() 2022-01-01
- Css:将嵌套元素定位在父元素边界之外一点 2022-09-07
- 使用RSelum从网站(报纸档案)中抓取多个网页 2022-09-06
- Fetch API 如何获取响应体? 2022-01-01
- 400或500级别的HTTP响应 2022-01-01
- CSS媒体查询(最大高度)不起作用,但为什么? 2022-01-01
- addEventListener 在 IE 11 中不起作用 2022-01-01
- Flexslider 箭头未正确显示 2022-01-01