This project was implemented to be able to save crunchbase data without having access to their APIs. All that you need is a Crunchbase Free Trial
.
It gathers data about companies like their website, their twitter and their founder's twitter. It can be modified to gather other types of data easily.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
pip install -r requirements.txt
The project is composed of 2 scripts clipboard_fetcher.py
and crunchbase_scraper.py
In order to get a list of comapanies
, saved in list_of_company_names_raw
run python clipboard_fetcher.py
. Then login into Crunchbase, go to an advanced search and cmd+a, cmd+c
. The program will automatically detect the copied content and will write the name of the company to the list csv.
In order to scrape data using the company names run python crunchbase_scraper.py
. It will write the data in 3 files:
found.csv
- the companies that were found. FormatCompany Name, Company Website, Company Twitter, CEO Twitter, CTO Twitter
not_found.csv
- the companies that were not found based on the company name. FormatCompany Name
error.csv
- the companies that returned an error while scraping. FormatCompany Name
- PyQt5 - For loading website javascript
- BeautifulSoup - Scraping library
- @stoicaandrei - Idea & Initial work
See also the list of contributors who participated in this project.
- Hat tip to anyone whose code was used
- Inspiration
- References