Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ImproveHTML code #24

Open
gustavorps opened this issue Sep 22, 2017 · 0 comments
Open

Move ImproveHTML code #24

gustavorps opened this issue Sep 22, 2017 · 0 comments
Assignees

Comments

@gustavorps
Copy link
Contributor

It has been implemented the context in items_refs to use in ImproveHTML processor class.

How use in items_refs:

"contexts": {
"improve_html": [
"ze.spiders.g1.G1Spider.improve_html"
]
}

How implement the function:

@staticmethod
def improve_html(html, spider_name=None):
exceptions = []; exceptions_append = exceptions.append
try:
selector = '[data-block-type="backstage-photo"]'
for el in html.select(selector):
fg = html.new_tag('figure')
img_src = el.select_one('img.content-media__image').get('data-src')
fg.append(html.new_tag('img', src=img_src))
fc = html.new_tag('figcaption')
fc.string = el.select_one('.content-media__description__caption').get_text()
fg.append(fc)
el.replace_with(fg)
except Exception as e:
exceptions_append(e)
try:
selector = '[data-block-type="backstage-video"]'
for el in html.select(selector):
video_id = el.select('.content-video__placeholder')[0]['data-video-id']
fg = html.new_tag('figure')
fg.append(html.new_tag('img', src='https://s02.video.glbimg.com/x720/%s.jpg' % video_id))
fc = html.new_tag('figcaption')
fc.string = el.select('[itemprop="description"]')[0].get_text() #antes tava itemprop='caption'
fg.append(fc)
a = html.new_tag('a', href='https://globoplay.globo.com/v/%s/' % video_id)
a.append(fg)
el.replace_with(a)
except Exception as e:
exceptions_append(e)
try:
for el in html.select('a'):
el.replace_with(el.get_text())
except Exception as e:
exceptions_append(e)
return html, exceptions

ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Sep 25, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Sep 25, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Sep 25, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 2, 2017
agenciabrasil, g1, goval, govce mundoeducacao, oglobo, r7, sejabixo, senado, terra
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 3, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 6, 2017
There were tro spiders for agenciabrasil.ebc.com.br:
ebc and agenciabrasil, removed agenciabrasil and left ebc.
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 6, 2017
There were tro spiders for agenciabrasil.ebc.com.br:
ebc and agenciabrasil, removed agenciabrasil and left ebc.
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 9, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 9, 2017
There were tro spiders for agenciabrasil.ebc.com.br:
ebc and agenciabrasil, removed agenciabrasil and left ebc.
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 9, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 18, 2017
gustavorps added a commit that referenced this issue Oct 19, 2017
improved spiders and processors #24
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 23, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Oct 30, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Nov 1, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Nov 6, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Nov 8, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Nov 8, 2017
ligiaiv added a commit to ligiaiv/ze-the-scraper that referenced this issue Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants