Skip to content

Python scripts for downloading dialog metadata and messages from Telegram using Telethon

Notifications You must be signed in to change notification settings

Hukyl/telegram-data-collection

 
 

Repository files navigation

Telegram data collector v0.02

Combination of Python scripts that allow to download data from Telegram, consisting from all dialog metadata (names, members, etc.) and messages from those chats.

Structure

This repo consists of two parts:

  1. Main repo Python code

    telegram_data_downloader directory consists of the main Python code that drive the scripts.

    A few words about underlying modules:

    1. dict_types contains basic type definitions for the data structures that are used throughout other modules.

    2. loader contains files to manage reading and writing metadata and message data. Currently this data is exported to local filesystem.

    3. processor contains files to manage the processing and downloading of the data.

    4. settings is responsible for initializing the settings and configuration for the downloader. It also contains some important constants, about which you can read more in the file itself.

  2. Scripts

    These scripts are the main entrypoint and perform dialog metadata and message downloading.

    There are two scripts:

    1. 0_download_dialogs_list.py

      This script downloads the metadata of all dialogs for the account. Run with -h to see the available options.

    2. 1_download_dialogs_data.py

      This script downloads all messages from the dialogs. Run with -h to see the available options.

    We strongly encourage you to read the help of the scripts and visit settings file to understand the available options.

Requirements

  • Python 3.11^
  • pip

Installation

  1. Clone the repository

    git clone <repo-url>
    cd telegram-data-collection
  2. Install package manager used by the project - Poetry

    python3.11 -m pip install poetry
  3. Install dependencies

    poetry install --without=dev

    In case you want to install the virtual environment in current directory and not in the default Poetry location (you can more about it here), you can run:

    POETRY_VIRTUALENVS_IN_PROJECT=true poetry install --without=dev
  4. Copy .env.sample to .env and fill in the required values

    cp .env.sample .env

    For basic usage, you only need to fill in the API_ID and API_HASH values. These can be obtained from my.telegram.org.

    NOTE: for detailed information on the message downloading progress, set "LOG_LEVEL" variable to "DEBUG". This allows the logs to include messages on per-chat downloading progress.

Usage

  1. Activate the virtual environment

    poetry shell
  2. Run the scripts

    python 0_download_dialogs_list.py --dialogs-limit -1
    python 1_download_dialogs_data.py --dialog-ids -1 --dialog-msg-limit -1

    Note: in case you want to provide dialog ids and you need to enter a negative value for chat id, start your value with " <your values>" (enter value in quotes and add a whitespace at the start). E.g. --dialog-ids " -1234567890".

Contributing

If you want to contribute to the project, please read the CONTRIBUTING.md file.

In case you were a part of the project and weren't listed as contributor, please let us know.

About

Python scripts for downloading dialog metadata and messages from Telegram using Telethon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%