Combination of Python scripts that allow to download data from Telegram, consisting from all dialog metadata (names, members, etc.) and messages from those chats.
This repo consists of two parts:
-
telegram_data_downloader
directory consists of the main Python code that drive the scripts.A few words about underlying modules:
-
dict_types
contains basic type definitions for the data structures that are used throughout other modules. -
loader
contains files to manage reading and writing metadata and message data. Currently this data is exported to local filesystem. -
processor
contains files to manage the processing and downloading of the data. -
settings
is responsible for initializing the settings and configuration for the downloader. It also contains some important constants, about which you can read more in the file itself.
-
-
These scripts are the main entrypoint and perform dialog metadata and message downloading.
There are two scripts:
-
This script downloads the metadata of all dialogs for the account. Run with
-h
to see the available options. -
This script downloads all messages from the dialogs. Run with
-h
to see the available options.
We strongly encourage you to read the help of the scripts and visit settings file to understand the available options.
-
- Python 3.11^
- pip
-
Clone the repository
git clone <repo-url> cd telegram-data-collection
-
Install package manager used by the project - Poetry
python3.11 -m pip install poetry
-
Install dependencies
poetry install --without=dev
In case you want to install the virtual environment in current directory and not in the default Poetry location (you can more about it here), you can run:
POETRY_VIRTUALENVS_IN_PROJECT=true poetry install --without=dev
-
Copy
.env.sample
to.env
and fill in the required valuescp .env.sample .env
For basic usage, you only need to fill in the
API_ID
andAPI_HASH
values. These can be obtained from my.telegram.org.NOTE: for detailed information on the message downloading progress, set "LOG_LEVEL" variable to "DEBUG". This allows the logs to include messages on per-chat downloading progress.
-
Activate the virtual environment
poetry shell
-
Run the scripts
python 0_download_dialogs_list.py --dialogs-limit -1
python 1_download_dialogs_data.py --dialog-ids -1 --dialog-msg-limit -1
Note: in case you want to provide dialog ids and you need to enter a negative value for chat id, start your value with
" <your values>"
(enter value in quotes and add a whitespace at the start). E.g.--dialog-ids " -1234567890"
.
If you want to contribute to the project, please read the CONTRIBUTING.md file.
In case you were a part of the project and weren't listed as contributor, please let us know.