Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38

monk1337 · 2024-03-10T19:22:01Z

Right now the unstructured data folder contains limited data, we need scrappers to scrape the data from different Lucknow websites so that if we want to add more data in the future or update the database of the Lucknow we can simply run those scrappers agents.

thePratyakshSoni1 · 2024-03-11T05:53:39Z

I can do it but i will need list of websites from which to fetch the data. Like if there's a blogging site then whenever we will run our scrapper so new blogs will be added to unstructured data.

AayushSharma-1 · 2024-03-11T16:13:28Z

How about we build this scraper in parts, like someone takes the tourism part, someone takes the hospitals part, and later on, we can combine them to make a fully automated raw data scraper?

thePratyakshSoni1 · 2024-03-11T16:58:45Z

How about we build this scraper in parts, like someone takes the tourism part, someone takes the hospitals part, and later on, we can combine them to make a fully automated raw data scraper?

That would be nice, but we will still need list of sites ( that regularly update data on specific topic ) to target them for latest data.

Or we can have another folder called scrapped in Unstrcured_data folder and we can scrap any data related to lucknow by our program, ( can be in different files that are named based on date or something else ) in it.

monk1337 · 2024-03-11T23:33:13Z

@pratyakshSoni1 @AayushSharma-1 That's a great idea to take care of one topic and build the scrapper step by step.
@AayushSharma-1 you can go through the old PRs of this repo, those who are contributing the unstructured data, are also mentioning the source of websites/links in the PR description, we can use those websites to scrape.

AayushSharma-1 · 2024-03-12T02:20:41Z

Yes, Sure!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38

Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38

monk1337 commented Mar 10, 2024

thePratyakshSoni1 commented Mar 11, 2024

AayushSharma-1 commented Mar 11, 2024

thePratyakshSoni1 commented Mar 11, 2024 •

edited

Loading

monk1337 commented Mar 11, 2024 •

edited

Loading

AayushSharma-1 commented Mar 12, 2024

Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38

Build scrapper to continuously update the unstructured data folder with latest Lucknow data #38

Comments

monk1337 commented Mar 10, 2024

thePratyakshSoni1 commented Mar 11, 2024

AayushSharma-1 commented Mar 11, 2024

thePratyakshSoni1 commented Mar 11, 2024 • edited Loading

monk1337 commented Mar 11, 2024 • edited Loading

AayushSharma-1 commented Mar 12, 2024

thePratyakshSoni1 commented Mar 11, 2024 •

edited

Loading

monk1337 commented Mar 11, 2024 •

edited

Loading