Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

针对获取真正URL的正则的改进 #2

Open
shuguang-dong opened this issue Jan 9, 2020 · 0 comments
Open

针对获取真正URL的正则的改进 #2

shuguang-dong opened this issue Jan 9, 2020 · 0 comments

Comments

@shuguang-dong
Copy link

楼主你好。首先很感谢你分享自己的心得,给了我很大帮助
发现你最后提取真正URL的时候的代码如下:
url_text = re.findall("\'(\S+?)\';", second_url, re.S) best_url = ''.join(url_text)
截取的效果如下:
&from=innerhttp://mp.weixin.qq.com/s?src=11&timestamp=1578541951&ver=2085&signature=1c3e2o2NzgWJffH0bchXLv21TsvnPpio-R65LSusRchiIxZ3kMOnANDzGYIoTJRhPzNluorh-Dmgd*B6pbHxHSOjqjSKdwjHI4cH4Tiio-SBtTrDpU9BK7cGAiS1qo1b&new=1
其实会有一些杂音,个人建议如下:
url_text = re.findall(r"\+= '(.*?)';", second_url, re.S) best_url = ''.join(url_text)
结果如下:
http://mp.weixin.qq.com/s?src=11&timestamp=1578541951&ver=2085&signature=1c3e2o2NzgWJffH0bchXLv21TsvnPpio-R65LSusRchiIxZ3kMOnANDzGYIoTJRhPzNluorh-Dmgd*B6pbHxHSOjqjSKdwjHI4cH4Tiio-SBtTrDpU9BK7cGAiS1qo1b&new=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant