Skip to content
Blondel MONDESIR edited this page Mar 23, 2023 · 39 revisions

Create a Zim File From a YouTube Playlist

The ZIM file format, being an open file format that stores wiki content for offline use, is a good candidate for non-wiki content as well. Thanks to the efforts of the openzim developers, it is possible to store YouTube videos in a ZIM file using youtube2zim. This document, far from being an official guide, is a reference that details the lessons learned from multiple tests performed with youtube2zim and the workarounds used to perform some specific actions during the process of generating and serving a ZIM file.

  1. Before You Begin
  2. Requirements
    1. Hardware
    2. Software
      1. Prerequisites
      2. Dependencies
  3. Installation
  4. Usage
    1. Basic Usage
    2. Troubleshooting
    3. Serving the ZIM File
  5. Customization and Branding
    1. Rename the Videos
    2. Add a Custom Profile Image and Banner
  6. Extra
    1. Libzim Installation
    2. Kiwix-serve Installation
    3. Video Format/Resolution Choices
    4. Video Details via mediainfo
    5. Custom Profile Image or Banner Design
    6. Make a Subset

Before You Begin

This document explains the Internet-in-a-Box (IIAB) fork of Kiwix's youtube2zim. Its purpose is to test, develop and dramatically accelerate the customization of ZIM files tailored to actual schools' needs.

🔨 Requirements

  1. Hardware

  • 4GB RAM (or more depending on your needs, e.g. using concurrency or compressing/encoding videos on the fly)
  • 100GB disk space (or enough to hold the generated ZIM file)
  1. Software

  • Linux (Ubuntu 22.04 LTS or later). We originally recommended Ubuntu 20.04 (Python 3.8). As of December 2022, we recommend Ubuntu 22.04 or 22.10 (each use Python 3.10). Virtual machines are a good way to test the software on different versions of Ubuntu (we recommend Multipass or VirtualBox).

    1. Prerequisites

      • A YouTube API key is required to use the YouTube API. You can get one following the instructions found at https://developers.google.com/youtube/v3/getting-started.

        The API key is subject to a daily quota of 10,000 units. If you are interested, read how these are used up here. To prevent your quota from being instantly used up when adjusting parameters or just branding, calculate ahead of time how many videos you will need to download and how many units each video will cost you. You can use the YouTube API Explorer to get an idea of the number of units required to download a video.

      • A YouTube playlist ID is also required. Youtube2zim requires it as argument to generate a ZIM file. Channel owners can categorize their videos by grouping them in public playlists. Each of these playlists has a link to access it through the YouTube Data API (v3). However, it happens that videos are present in several different playlists or that these videos are simply not in any public playlist. In these cases, you have to create a playlist to group these videos and use the link. To do this, a YouTube channel is required and the playlist must be public.

    2. Dependencies

      Youtube2zim requires the following dependencies:

      • ffmpeg, to convert the videos to the required format.
      • curl and unzip to install Javascript dependencies
      • python3, python3-pip and python3-venv to install Python dependencies.
      • git, to clone the repository.

      Install them using the following command:

      sudo apt install ffmpeg curl unzip python3 python3-pip python3-venv git
      • libzim-dev to build the ZIM file:

        sudo apt install libzim-dev

        Alternatively, you can install a recent libzim from source (instead of libzim-dev) as described upstream or just follow our libzim installation guide.

      • OPTIONAL kiwix-serve. This is not required to generate the ZIM file, but it is required to serve it. See kiwix-serve installation tips.

Once the dependencies are installed, you can proceed to the youtube2zim installation.

🔧 Installation

  1. Start by cloning https://github.com/iiab/youtube to /opt/iiab/youtube

    mkdir -p /opt/iiab/
    cd /opt/iiab/
    git clone https://github.com/iiab/youtube
    

    Since some changes highlighted in this document are not yet merged in main branch of https://github.com/iiab/youtube, please refer for now to main branch of https://github.com/deldesir/youtube for this step.

  2. Install the dependencies in a virtual environment:

    cd youtube/
    python3 -m venv venv
    source venv/bin/activate
    pip3 install -r requirements.txt
    
  3. Install the JavaScript dependencies:

    ./get_js_dep.sh
    

Usage

For the following examples, we assume that you are in the /opt/iiab/youtube directory and that you have activated the virtual environment.

Basic Usage

To generate a ZIM file, you need to provide the following arguments:

  • --api_key: YouTube API key
  • --id: YouTube playlist ID
  • --type: Type of YouTube source. Can be playlist or channel
  • --name: Name of the ZIM file

For example, to generate a ZIM file with the name test from a playlist, you can use the following command:

python3 ./youtube2zim --api-key <39-CHARACTERS> --id <34-CHARACTERS> --type playlist --name test

If you want to generate one ZIM file per playlist, use youtube2zim-playlists instead, which is a wrapper around youtube2zim:

python3 ./youtube2zim/playlists --indiv-playlists --api-key <39-CHARACTERS> --type user --id <34-CHARACTERS> --playlists-name="<34-CHARACTERS>_en_playlist-{playlist_id}"

RESULT: A ZIM file is generated in the /output directory.

Troubleshooting

If you encounter any problem, you can use the --debug option to get more information about the process. For example:

python3 ./youtube2zim --api-key <39-CHARACTERS> --id <34-CHARACTERS> --type playlist --name test --debug

You can also make the process generate a log file in /output directory by modifying line 34 of constants.py like this :

logger = getLogger(NAME, level=logging.DEBUG, file="/output/run.log")

Serving the ZIM File

To serve the ZIM file, you can use kiwix-serve (See kiwix-serve installation if you don't have it installed yet):

kiwix-serve --port 8080 /output/test_<month-year>.zim

Customization and Branding

Youtube2zim may be functional, but we quickly run into some limitations if we want to extend its scope. During our tests, we found that we needed to rename the videos. We also wanted to be able to add a custom logo and banner to the ZIM file. This section describes how to do this.

Rename the Videos

When generating a ZIM file from a playlist, we want video titles to be crystal clear to field community learners. For example, if the playlist contains videos from different channels, source video titles often contain a flood of overwhelming and extraneous details like the name of the channel.

So you can rename any subset of the videos using the --custom-titles option. This option requires you specify any two (2) text files: <youtube-video-ids.txt> contains the YouTube IDs of each video needing its title changed, and <custom-video-titles.txt> contains the customized (new) video titles. These two files must have the same number of lines, and refer to the videos in the same order.

Concretely: if the first line of the first file contains a URL like https://www.youtube.com/watch?v=12345678910, the first line of the second file must contain the customized (new) title for that video. And don't forget to add a newline character at the end of each line.

Example: bash python3 ./youtube2zim --api-key <39-CHARACTERS> --id <34-CHARACTERS> --type playlist --name test --custom-titles <youtube-video-ids.txt> <custom-video-titles.txt>

Any video not in <youtube-video-ids.txt> will not have its title customized. Example: if you have a playlist with 100 videos and --custom-titles specifies 10 IDs, the titles of the other 90 videos will not be changed. Also, an error will be raised if the number of lines in the two files is not the same, or if the two files are not provided.

Add a Custom Profile Image and Banner

To add a custom profile image and banner, you need to have ready two (2) images URLs. We recommend using the following sizes:

  • Profile image: 100×100 pixels
  • Banner: 1060×175 pixels

We then use respectively the --profile and --banner options to add the images to the ZIM file. Each of these options take one (1) argument: the URL of the image. The following command shows how to use these options:

python3 ./youtube2zim --api-key <39-CHARACTERS> --id <34-CHARACTERS> --type playlist --name test --profile <url-of-image> --banner <url-of-image>

Extra

Libzim Installation

When you need to install libzim, here are 3 different ways:

  1. Run apt install libzim-dev which installs libzim 7.2.0 from 2022-01-20 (on Ubuntu 22.04) or 7.2.2 from 2022-05-18 (on Ubuntu 22.10).
  2. Or, install Kiwix's latest official release from https://download.openzim.org/release/libzim/
  3. Or, install Kiwix's latest "nightly" build (experimental, risky) build from https://download.openzim.org/nightly/ if you're helping test Kiwix's very latest.

Example instructions to help with 2. or 3. above:

a) Run: wget https://download.openzim.org/release/libzim/libzim_linux-x86_64-8.1.0.tar.gz

b) Run: tar xvf libzim_linux-x86_64-8.1.0.tar.gz

c) Copy file libzim.so.8.1.0 and its 2 symbolic links libzim.so, libzim.so.8 from libzim_linux-x86_64-8.1.0/lib/x86_64-linux-gnu to directory: /usr/local/lib/x86_64-linux-gnu/

d) Run: ldconfig

  • OPTIONAL: Run ldconfig -p | grep libzim to verify it's installed.
  • OPTIONAL: Run ldd libzim.so to look at its dependencies.

Kiwix-serve Installation

Kiwix-serve is one among 3 self-contained binaries provided in the kiwix-tools. Here are 3 different ways to get it.

  1. Run apt install kiwix-tools which installs it along kiwix-manage and kiwix-search.
  2. Or, install Kiwix's latest official release from https://download.kiwix.org/release/kiwix-tools
  3. Or, install Kiwix's latest "nightly" build (experimental, risky) build from https://download.kiwix.org/nightly/ if you're helping test Kiwix's very latest.

Example instructions to help with 2. or 3. above:

a) Run: wget https://download.kiwix.org/release/kiwix-tools/kiwix-tools_linux-x86_64-3.4.0.tar.gz

b) Run: tar xvf kiwix-tools_linux-x86_64-3.4.0.tar.gz

Video Format/Resolution Choices

You may want your ZIM file to contain videos with a specific resolution and file format. You can check which formats are available for the videos in your playlist with this command: yt-dlp -F https://www.youtube.com/watch?v=abcd1234 yt-dlp will return a list of all the available formats for the video (replace abcd1234 with the video url). The list will include the format code, resolution, file format, and other information about each format.

Once you have the list of available formats, you can choose the one that matches the resolution you want and use the corresponding format code to download the video with the -f option. For example, if you want to download the video with a height of 480 pixels, you can use a format code like bestvideo[height<=480]+bestaudio/best[height<=480].

To default youtube2zim to a specific resolution when making ZIM file, add the wanted height in lines 516 and 517 of your scraper.py file

ZIM files generally use the VP8 codec in 2023, but in future might favor VP9 or AV1: openzim/python-scraperlib#79

Video Details via mediainfo

To display technical info (e.g. video encoding details) for a particular video file, consider running mediainfo at the command-line:

  1. To install mediainfo, run: sudo apt install mediainfo
  2. Run it: mediainfo [video filename]

Replace "[video filename]" with the path/filename of any downloaded video. MediaInfo will show the video's resolution, frame rate, audio codec, bitrate, etc.

Custom Profile Image or Banner Design

  • How do I avoid destroying the high-resolution focus of my (original) input profile or banner?

    • Always use an input images that are exactly the recommended sizes (100×100 px for profile and 1060×175 px for banner) to avoid blurry results.

      If your input image(s) are smaller than the recommended sizes, you're losing an opportunity to provide higher-resolution imagery. If they are larger, they will be thumbnailed (scaled down) which might be very ugly — crop your image(s) to the correct aspect ratio, or else...

  • If I don't use the correct aspect ratio (e.g. 1060×175), which part of my original/input banner image is actually used?

    • If your input banner is too tall, only its top will be used.
    • If your input banner is too wide, only its left side will be used.
  • How to create a 1060x175 banner from a large photo? Using Inkscape, a quality vector graphics software, is one option:

    • Open and create a blank document with this software.
    • In the file tab, click on Document Properties (Shift+Control+D). In the side window that opens, enter the dimensions 1060 (width) and 175 (height) in px to resize the page.
    • Zoom in enough on the page to draw a rectangle in a color of your choice that is the same size as the page.
    • Now that the page is resized, import your image, again using the file tab (Ctrl+I) and drop it onto the page/rectangle.
    • Move, reduce or enlarge the image on the page/rectangle as you wish. Press Alt+Click to see if the part of the image you want to capture is inside the page. Thanks to the rectangle you drew on the page, you will be able to see a border around part of the image that will serve as a banner.
    • If you are happy with the highlighted area, go to the file tab and click on "Export PNG Image" (Shift+Control+E).
    • To make sure you are exporting the right part, click on the page option in the sidebar that opens.
    • Choose the target location for the export, enter a name and voila, you have a 1060x175 px banner to your liking.

CODE

Make a Subset

Options:

  • --subset: The sort order of the videos in the subset. Valid options are recent (most recent videos), popularity (most popular videos based on view count), or viewed-year (most viewed videos based on views per year). Default is recent.
  • --max-videos: The maximum number of videos to include in the subset. Cannot be used with --subset-size.
  • --subset-size: The maximum size of the subset in megabytes. Cannot be used with --max-videos.