Skip to content

Ironhack-Data-Madrid-Abril-2021/lab_web_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Ironhack logo

Lab | Web Scraping

Introduction

As you have learned in the lesson, Web "scraping" (also called "web harvesting", "web data extraction" or even "web data mining"), can be defined as "the construction of an agent to download, parse, and organize data from the web in an automated manner". Or, in other words: instead of a human end-user clicking away in their web browser and copy-pasting interesting parts into, say, a spreadsheet, web scraping offloads this task to a computer program which can execute it much faster, and more correctly, than a human can.

Data scientists have often found web scraping to be a powerful tool to have in their arsenal, as many data science projects starts with the first step of obtaining an appropiate data set, so why not utilize the information the web provides?

In this lab, you will practice a series of exercises to test your web scraping skills. You will work on your own but remember the teaching staff is at your service whenever you encounter problems.

Getting Started

Open the main.ipynb file in the your-code directory. There are a bunch of questions to be solved. Each exercise is independent from the previous one. If you get stuck in one exercise you can skip to the next one. Read each instruction carefully and provide your answer beneath it.

Deliverables

  • main.ipynb with your responses to each of the exercises.

Submission

Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.

Resources

Web Scraping Tutorial Dataquest

Web Scraping Tutorial Kdnuggets

HTML Scraping

The Anatomy of a Search Engine

Additional Challenges for the Nerds

If you are way ahead of your classmates and willing to accept some tough challenges about Web scraping you will find five bonus questions in the main.ipynb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published