Telegram data collector v0.02

Combination of Python scripts that allow to download data from Telegram, consisting from all dialog metadata (names, members, etc.) and messages from those chats.

Structure

This repo consists of two parts:

Main repo Python code

telegram_data_downloader directory consists of the main Python code that drive the scripts.

A few words about underlying modules:
1. dict_types contains basic type definitions for the data structures that are used throughout other modules.
2. loader contains files to manage reading and writing metadata and message data. Currently this data is exported to local filesystem.
3. processor contains files to manage the processing and downloading of the data.
4. settings is responsible for initializing the settings and configuration for the downloader. It also contains some important constants, about which you can read more in the file itself.
Scripts

These scripts are the main entrypoint and perform dialog metadata and message downloading.

There are two scripts:
1. 0_download_dialogs_list.py
  
  This script downloads the metadata of all dialogs for the account. Run with -h to see the available options.
2. 1_download_dialogs_data.py
  
  This script downloads all messages from the dialogs. Run with -h to see the available options.
We strongly encourage you to read the help of the scripts and visit settings file to understand the available options.

Requirements

Python 3.11^
pip

Installation

Clone the repository

git clone <repo-url>
cd telegram-data-collection

Install package manager used by the project - Poetry
```
python3.11 -m pip install poetry
```
Install dependencies
```
poetry install --without=dev
```
In case you want to install the virtual environment in current directory and not in the default Poetry location (you can more about it here), you can run:
```
POETRY_VIRTUALENVS_IN_PROJECT=true poetry install --without=dev
```
Copy .env.sample to .env and fill in the required values
```
cp .env.sample .env
```
For basic usage, you only need to fill in the API_ID and API_HASH values. These can be obtained from my.telegram.org.

NOTE: for detailed information on the message downloading progress, set "LOG_LEVEL" variable to "DEBUG". This allows the logs to include messages on per-chat downloading progress.

Usage

Activate the virtual environment
```
poetry shell
```
Run the scripts
```
python 0_download_dialogs_list.py --dialogs-limit -1
```
```
python 1_download_dialogs_data.py --dialog-ids -1 --dialog-msg-limit -1
```
Note: in case you want to provide dialog ids and you need to enter a negative value for chat id, start your value with " <your values>" (enter value in quotes and add a whitespace at the start). E.g. --dialog-ids " -1234567890".

Makefile Commands

This project uses a Makefile to simplify common development tasks. Below is a description of the available commands and their purposes:

Setup

make setup

Installs Poetry if not already installed.
Configures a local virtual environment within the project directory.
Installs all required dependencies, including development dependencies, as defined in pyproject.toml.
Skips installing the project itself (--no-root).

Test

make test

Runs all test cases using pytest.
Ensures the PYTHONPATH is set to the current directory for correct module imports.

Coverage

make coverage

Runs tests with pytest while generating a code coverage report.
Includes a detailed report of any lines or branches missing coverage.

Ruff

make ruff

Runs Ruff to lint the codebase and check for style violations.

Pylint

make pylint

Runs Pylint on the telegram_data_downloader module.

Notes

To run any command, simply type make <command> in the terminal.
Ensure Python 3.11 and make are installed on your system before running these commands.

GitHub Actions

To ensure the codebase is always in a good state, this project uses GitHub Actions to run tests, linting, and code coverage checks on every push to the repository. The status of these checks can be seen in the "Actions" tab of the repository. Apart from that, keep in mind that the successfull run of the GitHub Actions is required for the PR to be merged.

Contributing

If you want to contribute to the project, please read the CONTRIBUTING.md file.

In case you were a part of the project and weren't listed as contributor, please let us know.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github/workflows		.github/workflows
.vscode		.vscode
telegram_data_downloader		telegram_data_downloader
.env.sample		.env.sample
.gitignore		.gitignore
.pylintrc		.pylintrc
0_download_dialogs_list.py		0_download_dialogs_list.py
1_download_dialogs_data.py		1_download_dialogs_data.py
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telegram data collector v0.02

Structure

Main repo Python code

Scripts

Requirements

Installation

Usage

Makefile Commands

Setup

Test

Coverage

Ruff

Pylint

Notes

GitHub Actions

Contributing

About

Releases

Packages

Contributors 11

Languages

SanGreel/telegram-data-collection

Folders and files

Latest commit

History

Repository files navigation

Telegram data collector v0.02

Structure

Main repo Python code

Scripts

Requirements

Installation

Usage

Makefile Commands

Setup

Test

Coverage

Ruff

Pylint

Notes

GitHub Actions

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages