Skip to content

io-flux/smutscrape

Folders and files

NameName
Last commit message
Last commit date
Mar 23, 2025
Mar 28, 2025
Mar 23, 2025
Mar 25, 2025
Mar 27, 2025
Mar 21, 2025
Mar 21, 2025
Mar 28, 2025
Mar 27, 2025
Mar 21, 2025

Repository files navigation

   ▒█▀▀▀█ █▀▄▀█ █░░█ ▀▀█▀▀ █▀▀ █▀▀ █▀▀█ █▀▀█ █▀▀█ █▀▀ 
   ░▀▀▀▄▄ █░▀░█ █░░█ ░░█░░ ▀▀█ █░░ █▄▄▀ █▄▄█ █░░█ █▀▀ 
   ▒█▄▄▄█ ▀░░░▀ ░▀▀▀ ░░▀░░ ▀▀▀ ▀▀▀ ▀░▀▀ ▀░░▀ █▀▀▀ ▀▀▀ 

Securing smut to salty pervs over CLI 🍆💦

A Python-based tool to scrape and download adult content from various websites straight to your preferred data store, alongside .nfo files that preserve the title, tags, actors, studios, and other metadata for a richer immediate watching experience in Plex, Jellyfin, or Stash.


Requirements 🧰

  • Python 3.10+ 🐍
  • yt-dlp for video downloads
  • Either wget or curl for alternative downloads
  • ffmpeg for M3U8 stream downloads and metadata validation
  • Recommended: Conda or Mamba for environment management 🐼🐍
  • Only for some sites: Selenium + Chromedriver for JS-heavy sites, and webdriver-manager for foolproof ChromeDriver management.

All Python dependencies are in requirements.txt.


Installation 🛠️

  1. Clone the Repo 📂

    git clone https://github.com/io-flux/smutscrape.git
    cd smutscrape
  2. Install Dependencies 🚀

    # With Conda (Recommended):
    conda create -n smutscrape python=3.10.13
    conda activate smutscrape
    pip install -r requirements.txt
    
    # With pip:
    pip3 install -r requirements.txt

    Install additional tools:

    # On Ubuntu/Debian
    sudo apt-get install yt-dlp wget curl ffmpeg chromium
    # On macOS with Homebrew
    brew install yt-dlp wget curl ffmpeg google-chrome

    For Selenium (not required for all sites):

    # webdriver-manager is the best solution for most people:
    pip install webdriver-manager
    # ... but a manual chromedriver installation may be necessary for some setups:
    brew install chromedriver
  3. Configure config.yaml ⚙️

    cp example-config.yaml config.yaml
    nano config.yaml

    Set up download_destinations, ignored terms, selenium paths, and optional vpn integration for secure, anonymous scraping.

  4. Make Executable ⚡️

    chmod +x scrape.py
    # Optional: add a symlink for easy use from anywhere
    sudo ln -s $(realpath ./scrape.py) /usr/local/bin/scrape

Usage 🚀

Run python scrape.py (or scrape if symlinked) to download adult content and save metadata in .nfo files. With no arguments, you’ll get a detailed, aesthetic readout of all supported site modes on your system, dynamically generated from ./sites/ configurations (see left image below). Alternatively, running scrape {code} (e.g., scrape ml) provides detailed info about that site—curated notes, tips, caveats, available metadata, special requirements, and usage examples (see right image below).

No Arguments Screenshot Site Identifier Screenshot

To start scraping, build commands following this basic syntax:

      scrape {code} {mode} {query}

Supported sites and modes:

Refer to this table of supported sites with available modes and metadata, or see the current configuration with latest updates by simply running scrape without arguments.

code site modes metadata
9v 9Vids search · tag tags
fphd FamilyPornHD tag · model · search · studio actors · description · studios · tags
fptv FamilyPorn model · tag · search · studio actors · description · studios · tags
fs Family Sex tag · search · model actors · description · image · studios · tags
if IncestFlix tag ‡ actors · image · studios · tags
ig IncestGuru tag ‡ actors · image · studios · tags
lf LoneFun search description · tags
ml Motherless search · tag · user · group · group_code tags
ph PornHub model · category · tag · studio · search · pornstar actors · code · date · image · studios · tags
sb SpankBang model · search · tag actors · description · tags
tna TNAflix search actors · date · description · studios · tags
xh xHamster model · studio · search · tag actors · studios · tags
xn XNXX search · model · tag · studio actors · date · description · image · studios · tags
xv XVideos search · studio · model · tag · playlist actors · studios · tags

Selenium with chromedriver required.
Combine terms with "&".


Command-Line Arguments [ > ]

scrape [args] [optional arguments]
argument summary
-p {p}.{video} start scraping on a given page and video (e.g., -p 12.9 to start at video 9 on page 12.
-o, --overwrite download all videos, ignoring .state and overwriting existing media when filenames collide. ⚠
-n, --re_nfo refresh metadata and write new .nfo files, irrespective of whether --overwrite is set. ⚠
-a, --applystate retroactively add URL to .state without re-downloading if local file matches (-o has priority).
-t, --table {path} generate markdown table of active site configurations with modes, metadata, and examples.
-d, --debug enable detailed debug logging.
-h, --help show the help submenu.

⚠ Caution: Using --overwrite or --re_nfo risks overwriting different videos or .nfo files with identical names—a growing concern as your collection expands and generic titles (e.g., "Hot Scene") collide. Mitigate this by adding name_suffix: "{unique site identifier}" in a site’s YAML config (e.g., name_suffix: " - Motherless.com" for Motherless, where duplicate titles are rampant).


Usage Examples 🙋

  1. All videos on Massy Sweet’s 'pornstar' page on PornHub that aren't saved locally, refreshing metadata for already saved videos we encounter again:

    scrape ph pornstar "Massy Sweet" -n
  2. All videos produced by MissaX from FamilyPornHD, overwriting existing copies:

    scrape fphd studio "MissaX" -o
  3. Chloe Temple's videos involving brother-sister (BS) relations not yet saved locally, starting on page 4 of results with 6th video, recording URL for faster scraping when upon matching with local file:

    scrape if tag "Chloe Temple & BS" -a -p 4.6
  4. Down and dirty in debug logs for scraping that "real" incest stuff on Lonefun:

    scrape lf tag "real incest" -d
  5. One particular vintage mother/daughter/son video on Motherless:

    scrape https://motherless.com/2ABC9F3
  6. All videos from Halle Von's pornstar page on XNXX:

    scrape https://www.xnxx.com/pornstar/halle-von

Advanced Configuration ⚙️

Download Destinations 📁

Define destinations in config.yaml. The first is primary, any others are fallbacks.

download_destinations:
  - type: smb
    server: "192.168.69.69"
    share: "media"
    path: "xxx"
    username: "ioflux"
    password: "th3P3rv3rtsGu1d3"
    permissions:
      uid: 1000
      gid: 3003
      mode: "750"
    temporary_storage: "/Users/ioflux/.private/incomplete"
  - type: local
    path: "/Users/ioflux/.private/xxx"

Smutscrape was built with SMB in mind, and it's the recommended mode when it fits.

Filtering Content 🚫

Add any content you want Smutscrape to avoid altogether to the ignored terms list in your config.yaml:

ignored:
  - "JOI"
  - "Age Play"
  - "Psycho Thrillers"
  - "Virtual Sex"

All metadata fields are checked against the ignored list, so you can include specific genres, sex acts, performers, studios, etc. that you do not want content of.

Selenium & Chromedriver 🕵️‍♂️

For Javascript-heavy sites (marked on the table with †), selenium with chromedriver is required. By default, the script uses webdriver-manager for seamless setup. Some setups require a manual installation, including macOS typically. This worked for me:

  1. Install Chrome Binary:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chrome-mac-arm64.zip
unzip chrome-mac-arm64.zip
chmod +x "chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
sudo mv "chrome-mac-arm64/Google Chrome for Testing.app" /Applications/
  1. Install Chromedriver:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chromedriver-mac-arm64.zip
unzip chromedriver-mac-arm64.zip
chmod +x chromedriver-mac-arm64/chromedriver
sudo mv chromedriver-mac-arm64/chromedriver /usr/local/bin/chromedriver
  1. Update config.yaml:
selenium:
  mode: "local"
  chromedriver_path: "/usr/local/bin/chromedriver"
  chrome_binary: "/Applications/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"

VPN Support 🔒

Smutscrape can be set to automatically rotate VPN exit-nodes, using most existing VPN apps that have CLI tools. In config.yaml, enable and configure:

vpn:
  enabled: true
  vpn_bin: "/usr/bin/protonvpn"
  start_cmd: "{vpn_bin} connect -f"
  new_node_cmd: "{vpn_bin} connect -r"
  new_node_time: 1200  # Refresh IP every 20 minutes

Contributing 🤝

Smutscrape welcomes contributions! Its current 2200-line monolithic design isn’t collaboration-friendly, so refactoring into a modular, Pythonic app is a priority. Meanwhile, adding site configurations—YAML files with URL schemes and CSS selectors—is a simple, valuable contribution.

Inspired by Stash CommunityScrapers, Smutscrape’s YAML configs adapt its structure. We use CSS selectors instead of XPath (though conversion is straightforward), and metadata fields port easily. The challenge is video downloading—some sites use iframes or countermeasures—but the yt-dlp fallback often simplifies this. Adapting a CommunityScrapers site for Smutscrape is a great way to contribute. Pick a site, tweak the config, and submit a pull request!


Scrape responsibly! You’re on your own. 🧠💭