-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The app is currently capable of extracting articles from:
- freeCodeCamp
- Substack
- The GitHub Blog
I may expand support to include additional sites.
git clone [email protected]:victoriacheng15/articles-extractor.git
cd articles-extractor
-
Google Sheets API:
- Follow Google’s API quickstart guide to download
credentials.json
. - Place
credentials.json
in the project’s root directory.
- Follow Google’s API quickstart guide to download
-
Configure
.env
:cp .env.example .env # Copy template
Edit
.env
with:SHEET_ID="your_google_sheet_id_here" # Found in your Sheet’s URL
-
Create a worksheet:
- Name the sheet
providers
(exact spelling, lowercase).
- Name the sheet
-
Build your provider list:
Create a table with these columns:name
element
(CSS selector/class)url
freecodecamp article https://www.freecodecamp.org/news/ github article https://github.blog/category/engineering/ substack pencraft pc-display-flex pc-flexDirection-column pc-gap-4 https://[your-substack].substack.com/archive substack pencraft pc-display-flex pc-flexDirection-column pc-gap-4 https://[your-substack].substack.com/archive substack pencraft pc-display-flex pc-flexDirection-column pc-gap-4 https://[your-substack].substack.com/archive Notes:
- Replace
[your-substack]
with your actual Substack domain.
- Replace
Choose one of these methods to run the app:
Note: If you are not using GitHub Actions, disable the workflow by navigating to: Actions → Daily Extraction Schedule → Click the horizontal three dots (⋮) → Disable workflow.
# Create and activate virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies and run
pip install -r requirements.txt
python3 main.py
# Alternative using Makefile
make init
make run
-
Run with Docker:
make up # Builds and starts the container
-
Schedule with cron:
crontab -e
Add line for your schedule (for example: daily at 9 AM):
0 9 * * * cd ~/path_to_project/articles-extractor && make up
Use crontab.guru to customize timing.
Replace
~/path_to_project
with your project’s absolute path (usepwd
in the terminal to confirm).
-
Add secrets in repo settings (
Settings > Secrets and variables > Actions
):-
CREDENTIALS
: Paste entire content of yourcredentials.json
-
SHEET_ID
: Your Google Sheet ID
-
-
Example workflow (already configured if using repo's
.github/workflows/
):on: schedule: - cron: '0 9 * * *' # Daily at 9 AM UTC workflow_dispatch: # Manual trigger
The action will automatically:
- Install dependencies
- Run
main.py
- Clean up resources
Security Note: Never commit credentials.json
or .env
files!