Skip to content

Commit

Permalink
chore: init commit
Browse files Browse the repository at this point in the history
  • Loading branch information
SWHL committed Jan 15, 2025
1 parent 6b111fd commit 31fb132
Show file tree
Hide file tree
Showing 12 changed files with 606 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# These are supported funding model platforms

github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: https://raw.githubusercontent.com/RapidAI/.github/6db6b6b9273f3151094a462a61fbc8e88564562c/assets/Sponsor.png
22 changes: 22 additions & 0 deletions .github/workflows/SyncToGitee.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: SyncToGitee
on:
push:
branches:
- main
jobs:
repo-sync:
runs-on: ubuntu-latest
steps:
- name: Checkout source codes
uses: actions/checkout@v3

- name: Mirror the Github organization repos to Gitee.
uses: Yikun/[email protected]
with:
src: 'github/SWHL'
dst: 'gitee/SWHL'
dst_key: ${{ secrets.GITEE_PRIVATE_KEY }}
dst_token: ${{ secrets.GITEE_TOKEN }}
force_update: true
static_list: "BaiduImageSpider"
debug: true
53 changes: 53 additions & 0 deletions .github/workflows/publish_whl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Push baidu_image_spider to pypi

on:
push:
tags:
- v*

jobs:
UnitTesting:
runs-on: ubuntu-latest
steps:
- name: Pull latest code
uses: actions/checkout@v3

- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
architecture: 'x64'

- name: Display Python version
run: python -c "import sys; print(sys.version)"

- name: Unit testings
run: |
pytest tests/test_main.py
GenerateWHL_PushPyPi:
needs: UnitTesting
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
architecture: 'x64'

- name: Run setup
run: |
pip install -r requirements.txt
python -m pip install --upgrade pip
pip install wheel get_pypi_latest_version
python setup.py bdist_wheel ${{ github.ref_name }}
- name: Publish distribution 📦 to PyPI
uses: pypa/[email protected]
with:
password: ${{ secrets.BAIDUIMAGESPIDER }}
packages_dir: dist/
164 changes: 164 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
outputs/
*.json

# Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
.pytest_cache

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
# *.manifest
# *.spec
*.res

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

#idea
.vs
.vscode
.idea
/images
/models

#models
*.onnx

*.ttf
*.ttc

long1.jpg

*.bin
*.mapping
*.xml

*.pdiparams
*.pdiparams.info
*.pdmodel

.DS_Store
*.pth
/rapid_table_torch/models/*.pth
/rapid_table_torch/models/*.json
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://gitee.com/SWHL/autoflake
rev: v2.1.1
hooks:
- id: autoflake
args:
[
"--recursive",
"--in-place",
"--remove-all-unused-imports",
"--remove-unused-variable",
"--ignore-init-module-imports",
]
files: \.py$
- repo: https://gitee.com/SWHL/black
rev: 23.1.0
hooks:
- id: black
files: \.py$
47 changes: 47 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
## Baidu Image Spider

一个超级轻量的百度图片爬虫, modified from <https://github.com/kong36088/BaiduImageSpider>

### 安装

```bash
pip install baidu_image_spider
```

### Python使用

```python
from baidu_image_spider.main import Crawler

crawler = Crawler(0.05, save_dir="outputs") # 抓取延迟为 0.05

# 抓取关键词为 “美女”,总数为2页,开始页码为1,每页 30 张, 即总共2*30=60张
crawler(word="美女", total_page=2, start_page=1, per_page=30)
```

### 终端使用

```bash
baidu_image_spider -w 美女 -tp 1 -sp 1 -pp 2
```

查看参数文档:

```bash
$ baidu_image_spider -h
usage: baidu_image_spider [-h] -w WORD -tp TOTAL_PAGE -sp START_PAGE [-pp [PER_PAGE]] [-sd SAVE_DIR] [-d DELAY]

options:
-h, --help show this help message and exit
-w WORD, --word WORD 抓取关键词
-tp TOTAL_PAGE, --total_page TOTAL_PAGE
需要抓取的总页数
-sp START_PAGE, --start_page START_PAGE
起始页数
-pp [PER_PAGE], --per_page [PER_PAGE]
每页大小
-sd SAVE_DIR, --save_dir SAVE_DIR
图片保存目录
-d DELAY, --delay DELAY
抓取延时(间隔)
```
3 changes: 3 additions & 0 deletions baidu_image_spider/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# -*- encoding: utf-8 -*-
# @Author: SWHL
# @Contact: [email protected]
Loading

0 comments on commit 31fb132

Please sign in to comment.