-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The app is currently capable of extracting articles from:
- freeCodeCamp
- Substack
- GitHub
I may expand support to include additional sites.
git clone [email protected]:victoriacheng15/articles-extractor.git
cd articles-extractor
Check out the guide from Python quickstart by Google
Once you get the Google Sheet API, you must get the credentials from Google, rename the JSON file to credentials.json
, and move the file to the root directory. If you are to use a different name than credentials.json
, you would need to update the file name for the get_creds
function in utils/sheet.py
Create a new worksheet and name the sheet providers
Example of the table:
name | element | url |
---|---|---|
freecodecamp | article | https://www.freecodecamp.org/news/ |
substack | pencraft pc-display-flex pc-flexDirection-column pc-gap-4 | the archive blog link |
substack | pencraft pc-display-flex pc-flexDirection-column pc-gap-4 | the archive blog link |
substack | pencraft pc-display-flex pc-flexDirection-column pc-gap-4 | the archive blog link |
Create a file named providers.py
under the data
folder. If you save the file with a different name, you need to update the path in main.py
at line 5.
Single URL:
provider_name = {
"class": "the element that contains article Info",
"urls": # url,
}
For example:
freecodecamp = {"class": "post-card", "url": "https://www.freecodecamp.org/news/"}
Multiply URLs:
provider_name = {
"class": "the element that contains article Info",
"URLs": [
"url1",
"url2",
......
],
}
You can easily run the extractor using the provided Makefile. Execute the following command:
make start
You can put the app on the Raspberry Pi or your main machine.
- Create the bash script
touch articles_extractor.sh
- Run
chmod +x articles_extractor.sh
- Go to the folder, and type
pwd
to get the folder path - Copy the code below to
articles_extractor.sh
#!/bin/bash
cd your_path/articles-extractor
docker compose up
# I save the log in Documents folder
docker logs extractor > log_articles.txt
docker compose down
The logging is optional, if you do not want the log file:
#!/bin/bash
cd your_path/articles-extractor
docker compose up
docker compose down
Use crontab guru to find the time you want it to run.
Let's say you would like to run the app at 9 am every day, the cron looks like 0 9 * * *
- Type
crontab -e
- If it promotes which editor to use, pick
nano
or your choice - Scroll down all the way to the bottom
- Type
0 9 * * * your_path/articles_extractor.sh
-
ctrl + o
->enter
->ctrl + x