▒█▀▀▀█ █▀▄▀█ █░░█ ▀▀█▀▀ █▀▀ █▀▀ █▀▀█ █▀▀█ █▀▀█ █▀▀ ░▀▀▀▄▄ █░▀░█ █░░█ ░░█░░ ▀▀█ █░░ █▄▄▀ █▄▄█ █░░█ █▀▀ ▒█▄▄▄█ ▀░░░▀ ░▀▀▀ ░░▀░░ ▀▀▀ ▀▀▀ ▀░▀▀ ▀░░▀ █▀▀▀ ▀▀▀
A Python-based tool to scrape and download adult content from various websites straight to your preferred data store, alongside .nfo
files that preserve the title, tags, actors, studios, and other metadata for a richer immediate watching experience in Plex, Jellyfin, or Stash.
- Python 3.10+ 🐍
- yt-dlp for video downloads
- Either wget or curl for alternative downloads
- ffmpeg for M3U8 stream downloads and metadata validation
- Recommended: Conda or Mamba for environment management 🐼🐍
- Only for some sites: Selenium + Chromedriver for JS-heavy sites, and webdriver-manager for foolproof ChromeDriver management.
All Python dependencies are in requirements.txt
.
-
Clone the Repo 📂
git clone https://github.com/io-flux/smutscrape.git cd smutscrape
-
Install Dependencies 🚀
# With Conda (Recommended): conda create -n smutscrape python=3.10.13 conda activate smutscrape pip install -r requirements.txt # With pip: pip3 install -r requirements.txt
Install additional tools:
# On Ubuntu/Debian sudo apt-get install yt-dlp wget curl ffmpeg chromium # On macOS with Homebrew brew install yt-dlp wget curl ffmpeg google-chrome
For Selenium (not required for all sites):
# webdriver-manager is the best solution for most people: pip install webdriver-manager # ... but a manual chromedriver installation may be necessary for some setups: brew install chromedriver
-
Configure
config.yaml
⚙️cp example-config.yaml config.yaml nano config.yaml
Set up
download_destinations
,ignored
terms,selenium
paths, and optionalvpn
integration for secure, anonymous scraping. -
Make Executable ⚡️
chmod +x scrape.py # Optional: add a symlink for easy use from anywhere sudo ln -s $(realpath ./scrape.py) /usr/local/bin/scrape
Run python scrape.py
(or scrape
if symlinked) to download adult content and save metadata in .nfo
files. With no arguments, you’ll get a detailed, aesthetic readout of all supported site modes on your system, dynamically generated from ./sites/
configurations (see left image below). Alternatively, running scrape {code}
(e.g., scrape ml
) provides detailed info about that site—curated notes, tips, caveats, available metadata, special requirements, and usage examples (see right image below).
scrape {code} {mode} {query}
Refer to this table of supported sites with available modes and metadata, or see the current configuration with latest updates by simply running scrape
without arguments.
code | site | modes | metadata |
---|---|---|---|
9v |
9Vids † | search · tag | tags |
fphd |
FamilyPornHD † | tag · model · search · studio | actors · description · studios · tags |
fptv |
FamilyPorn † | model · tag · search · studio | actors · description · studios · tags |
fs |
Family Sex † | tag · search · model | actors · description · image · studios · tags |
if |
IncestFlix | tag ‡ | actors · image · studios · tags |
ig |
IncestGuru | tag ‡ | actors · image · studios · tags |
lf |
LoneFun | search | description · tags |
ml |
Motherless † | search · tag · user · group · group_code | tags |
ph |
PornHub † | model · category · tag · studio · search · pornstar | actors · code · date · image · studios · tags |
sb |
SpankBang | model · search · tag | actors · description · tags |
tna |
TNAflix | search | actors · date · description · studios · tags |
xh |
xHamster | model · studio · search · tag | actors · studios · tags |
xn |
XNXX † | search · model · tag · studio | actors · date · description · image · studios · tags |
xv |
XVideos | search · studio · model · tag · playlist | actors · studios · tags |
† Selenium with chromedriver required.
‡ Combine terms with "&".
scrape [args] [optional arguments]
argument | summary |
---|---|
-p {p}.{video} |
start scraping on a given page and video (e.g., -p 12.9 to start at video 9 on page 12. |
-o, --overwrite |
download all videos, ignoring .state and overwriting existing media when filenames collide. ⚠ |
-n, --re_nfo |
refresh metadata and write new .nfo files, irrespective of whether --overwrite is set. ⚠ |
-a, --applystate |
retroactively add URL to .state without re-downloading if local file matches (-o has priority). |
-t, --table {path} |
generate markdown table of active site configurations with modes, metadata, and examples. |
-d, --debug |
enable detailed debug logging. |
-h, --help |
show the help submenu. |
⚠ Caution: Using --overwrite
or --re_nfo
risks overwriting different videos or .nfo
files with identical names—a growing concern as your collection expands and generic titles (e.g., "Hot Scene") collide. Mitigate this by adding name_suffix: "{unique site identifier}"
in a site’s YAML config (e.g., name_suffix: " - Motherless.com"
for Motherless, where duplicate titles are rampant).
-
All videos on Massy Sweet’s 'pornstar' page on PornHub that aren't saved locally, refreshing metadata for already saved videos we encounter again:
scrape ph pornstar "Massy Sweet" -n
-
All videos produced by MissaX from FamilyPornHD, overwriting existing copies:
scrape fphd studio "MissaX" -o
-
Chloe Temple's videos involving brother-sister (BS) relations not yet saved locally, starting on page 4 of results with 6th video, recording URL for faster scraping when upon matching with local file:
scrape if tag "Chloe Temple & BS" -a -p 4.6
-
Down and dirty in debug logs for scraping that "real" incest stuff on Lonefun:
scrape lf tag "real incest" -d
-
One particular vintage mother/daughter/son video on Motherless:
scrape https://motherless.com/2ABC9F3
-
All videos from Halle Von's pornstar page on XNXX:
scrape https://www.xnxx.com/pornstar/halle-von
Define destinations in config.yaml
. The first is primary, any others are fallbacks.
download_destinations:
- type: smb
server: "192.168.69.69"
share: "media"
path: "xxx"
username: "ioflux"
password: "th3P3rv3rtsGu1d3"
permissions:
uid: 1000
gid: 3003
mode: "750"
temporary_storage: "/Users/ioflux/.private/incomplete"
- type: local
path: "/Users/ioflux/.private/xxx"
Smutscrape was built with SMB in mind, and it's the recommended mode when it fits.
Add any content you want Smutscrape to avoid altogether to the ignored
terms list in your config.yaml
:
ignored:
- "JOI"
- "Age Play"
- "Psycho Thrillers"
- "Virtual Sex"
All metadata fields are checked against the ignored
list, so you can include specific genres, sex acts, performers, studios, etc. that you do not want content of.
For Javascript-heavy sites (marked on the table with †), selenium with chromedriver is required. By default, the script uses webdriver-manager
for seamless setup. Some setups require a manual installation, including macOS typically. This worked for me:
- Install Chrome Binary:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chrome-mac-arm64.zip
unzip chrome-mac-arm64.zip
chmod +x "chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
sudo mv "chrome-mac-arm64/Google Chrome for Testing.app" /Applications/
- Install Chromedriver:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chromedriver-mac-arm64.zip
unzip chromedriver-mac-arm64.zip
chmod +x chromedriver-mac-arm64/chromedriver
sudo mv chromedriver-mac-arm64/chromedriver /usr/local/bin/chromedriver
- Update
config.yaml
:
selenium:
mode: "local"
chromedriver_path: "/usr/local/bin/chromedriver"
chrome_binary: "/Applications/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
Smutscrape can be set to automatically rotate VPN exit-nodes, using most existing VPN apps that have CLI tools. In config.yaml
, enable and configure:
vpn:
enabled: true
vpn_bin: "/usr/bin/protonvpn"
start_cmd: "{vpn_bin} connect -f"
new_node_cmd: "{vpn_bin} connect -r"
new_node_time: 1200 # Refresh IP every 20 minutes
Smutscrape welcomes contributions! Its current 2200-line monolithic design isn’t collaboration-friendly, so refactoring into a modular, Pythonic app is a priority. Meanwhile, adding site configurations—YAML files with URL schemes and CSS selectors—is a simple, valuable contribution.
Inspired by Stash CommunityScrapers, Smutscrape’s YAML configs adapt its structure. We use CSS selectors instead of XPath (though conversion is straightforward), and metadata fields port easily. The challenge is video downloading—some sites use iframes or countermeasures—but the yt-dlp fallback often simplifies this. Adapting a CommunityScrapers site for Smutscrape is a great way to contribute. Pick a site, tweak the config, and submit a pull request!
Scrape responsibly! You’re on your own. 🧠💭