Skip to content

针对获取真正URL的正则的改进 #2

Open
@shuguang-dong

Description

@shuguang-dong

楼主你好。首先很感谢你分享自己的心得,给了我很大帮助
发现你最后提取真正URL的时候的代码如下:
url_text = re.findall("\'(\S+?)\';", second_url, re.S) best_url = ''.join(url_text)
截取的效果如下:
&from=innerhttp://mp.weixin.qq.com/s?src=11&timestamp=1578541951&ver=2085&signature=1c3e2o2NzgWJffH0bchXLv21TsvnPpio-R65LSusRchiIxZ3kMOnANDzGYIoTJRhPzNluorh-Dmgd*B6pbHxHSOjqjSKdwjHI4cH4Tiio-SBtTrDpU9BK7cGAiS1qo1b&new=1
其实会有一些杂音,个人建议如下:
url_text = re.findall(r"\+= '(.*?)';", second_url, re.S) best_url = ''.join(url_text)
结果如下:
http://mp.weixin.qq.com/s?src=11&timestamp=1578541951&ver=2085&signature=1c3e2o2NzgWJffH0bchXLv21TsvnPpio-R65LSusRchiIxZ3kMOnANDzGYIoTJRhPzNluorh-Dmgd*B6pbHxHSOjqjSKdwjHI4cH4Tiio-SBtTrDpU9BK7cGAiS1qo1b&new=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions