Skip to content

Commit

Permalink
refactor: separate telegram bot from server (#143)
Browse files Browse the repository at this point in the history
  • Loading branch information
gsarrco authored Oct 30, 2023
2 parents 0aa4150 + 64198c7 commit 24ad3ed
Show file tree
Hide file tree
Showing 30 changed files with 238 additions and 165 deletions.
47 changes: 30 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,36 @@
MuoVErsi is a Telegram bot for advanced users of Venice, Italy's public transit. You can check it out
here: [@MuoVErsiBot](https://t.me/MuoVErsiBot).
MuoVErsi is a web service that parses and serves timetables of buses, trams, trains and waterbusses. As of now, it
supports Venice, Italy's public transit system (by using public GTFS files) and Trenitalia trains within 100km from
Venice (parsed from the Trenitalia api). However, since it can build on any GTFS file, it will be easily extended to
other cities in the future.

It allows you to get departure times for the next buses, trams and vaporetti (waterbusses) from a given stop or
location, or starting from a specific line. You can then use filters to get the right results and see all the
stops/times of that specific route.
Separated from the core code and optional to set up, a Telegram bot uses the web service to provide a more user-friendly
interface. You can check it out here: [@MuoVErsiBot](https://t.me/MuoVErsiBot). Also, a mobile app is in the works.

## Infrastructure
## Features

The bot is written in Python 3 and uses
the [python-telegram-bot](https://github.com/python-telegram-bot/python-telegram-bot) library, both for interacting with
Telegram bot API and for the http server.
The program downloads the data from Venice transit agency Actv's GTFS files and stores it in SQLite databases, thanks to
the
[gtfs](https://www.npmjs.com/package/gtfs) CLI. New data is checked every time the
server service restarts, or every night at 4:00 AM with a cronjob.
MuoVErsi allows you to get departure times from a given stop or location, or starting from a specific line. You can then
use filters to get the right results and see all the stops/times of that specific route.

When new data arrives, stops are not simply stored in the database, but they are clustered by name and location. This
way it is easier to search for bus stations with more than one bus stop. For example, "Piazzale Roma" has 15
different bus stops from the GTFS file, but they are all clustered together.
When new data is parsed and saved to the database, stops are not simply stored as-is, but they are clustered
by name and location. This way it is easier to search for bus stations with more than one bus stop. For example,
"Piazzale Roma" has 15 different bus stops from the GTFS file, but they are all clustered together.

The code is not written specifically for Venice, so it can be easily adapted to other cities that use GTFS files.
## Installation

### Requirements

- Python 3
- PostgreSQL for the database
- [Typesense](https://typesense.org/) for the stop search engine
- [Telegram bot token](https://core.telegram.org/bots/features#botfather) if you also want to run the bot

### Steps

1. Download the repo and install the dependencies with `pip install -r requirements.txt`.
2. Fill out the config file `config.example.yaml` and rename it to `config.yaml`. If you don't want to run the Telegram,
bot, set `TG_BOT_ENABLED` to `False` and skip the all the variables starting with `TG_`. You won't need the `tgbot`
folder.
3. Run PostgreSQL migrations with `alembic upgrade head`.
4. Run the server by executing `run.py`. For saving data from the GTFS files and, more importantly, for the parsing and
saving of Trenitalia trains, make sure you schedule the execution of `save_data.py` once a day. As of now, also
a daily restart of `run.py` is required to set the service calendar to the current day.
4 changes: 2 additions & 2 deletions alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from sqlalchemy import engine_from_config
from sqlalchemy import pool

from MuoVErsi.base.models import Base
from MuoVErsi.handlers import engine_url
from server.base.models import Base
from server.sources import engine_url

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"""
from typesense.exceptions import ObjectNotFound

from MuoVErsi.typesense import connect_to_typesense
from server.typesense import connect_to_typesense

# revision identifiers, used by Alembic.
revision = '2e8b9b6298f0'
Expand Down
2 changes: 1 addition & 1 deletion alembic/versions/6c9ef3a680e3_create_stops_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from alembic import op
from sqlalchemy.orm import sessionmaker

from MuoVErsi.base import Station
from server.base import Station

# revision identifiers, used by Alembic.
revision = '6c9ef3a680e3'
Expand Down
9 changes: 5 additions & 4 deletions config.example.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
TOKEN:
WEBHOOK_URL:
SECRET_TOKEN:
TG_BOT_ENABLED: # True or False (if True, the bot will be used)
TG_TOKEN: # required if TG_BOT_ENABLED is True
TG_WEBHOOK_URL: # required if TG_BOT_ENABLED is True
TG_SECRET_TOKEN: # required if TG_BOT_ENABLED is True
DEV: # True or False
PGUSER:
PGPASSWORD:
PGPORT:
PGHOST:
PGDATABASE:
ADMIN_TG_ID: # Telegram user ID of the admin
TG_ADMIN_ID: # Telegram user ID of the admin, required if TG_BOT_ENABLED is True
SSL_KEYFILE: # Path to the SSL key file
SSL_CERTFILE: # Path to the SSL certificate file
TYPESENSE_API_KEY:
Expand Down
19 changes: 19 additions & 0 deletions config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import logging
import os

import yaml

logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO
)
logger = logging.getLogger(__name__)

current_dir = os.path.abspath(os.path.dirname(__file__))

config_path = os.path.join(current_dir, 'config.yaml')
with open(config_path, 'r') as config_file:
try:
config = yaml.safe_load(config_file)
logger.info(config)
except yaml.YAMLError as err:
logger.error(err)
54 changes: 52 additions & 2 deletions run.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,56 @@
import asyncio
import logging

from MuoVErsi.handlers import main
import uvicorn
from starlette.applications import Starlette

from config import config
from server.routes import routes as server_routes

logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO
)
logger = logging.getLogger(__name__)


async def run() -> None:
routes = server_routes

tgbot_application = None
if config['TG_BOT_ENABLED']:
from tgbot.handlers import set_up_application
tgbot_application = await set_up_application()
from tgbot.routes import get_routes as get_tgbot_routes
routes += get_tgbot_routes(tgbot_application)

starlette_app = Starlette(routes=routes)

if config.get('DEV', False):
webserver = uvicorn.Server(
config=uvicorn.Config(
app=starlette_app,
port=8000,
host="127.0.0.1",
)
)
else:
webserver = uvicorn.Server(
config=uvicorn.Config(
app=starlette_app,
port=443,
host="0.0.0.0",
ssl_keyfile=config['SSL_KEYFILE'],
ssl_certfile=config['SSL_CERTFILE']
)
)

if tgbot_application:
async with tgbot_application:
await tgbot_application.start()
await webserver.serve()
await tgbot_application.stop()
else:
await webserver.serve()

if __name__ == "__main__":
asyncio.run(main())
asyncio.run(run())
8 changes: 4 additions & 4 deletions save_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

from sqlalchemy.orm import sessionmaker

from MuoVErsi.handlers import engine
from MuoVErsi.trenitalia import Trenitalia
from MuoVErsi.GTFS import GTFS
from MuoVErsi.typesense import connect_to_typesense
from server.GTFS import GTFS
from server.sources import engine
from server.trenitalia import Trenitalia
from server.typesense import connect_to_typesense

logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 3 additions & 4 deletions MuoVErsi/GTFS/source.py → server/GTFS/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,13 @@

import requests
from bs4 import BeautifulSoup
from sqlalchemy import select, func
from tqdm import tqdm

from MuoVErsi.base import Source, Station, Stop, TripStopTime
from server.base import Source, Station, Stop, TripStopTime
from .clustering import get_clusters_of_stops, get_loc_from_stop_and_cluster
from .models import CStop

from sqlalchemy import select, func

logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO
)
Expand Down Expand Up @@ -310,7 +309,7 @@ def get_sqlite_stop_times(self, day: date, start_time: time, end_time: time, lim

def search_lines(self, name):
today = date.today()
from MuoVErsi.base import Trip
from server.base import Trip
trips = self.session.execute(
select(func.max(Trip.number), Trip.dest_text)\
.filter(Trip.orig_dep_date == today)\
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion MuoVErsi/base/source.py → server/base/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ def get_stop_times_between_stops(self, dep_station: Station, arr_station: Statio
raw_stop_time.destination, raw_stop_time.trip_id, raw_stop_time.route_name,
arr_time=a_arr_time, orig_dep_date=raw_stop_time.orig_dep_date)

from MuoVErsi.trenitalia import TrenitaliaRoute
from server.trenitalia import TrenitaliaRoute
route = TrenitaliaRoute(d_stop_time, a_stop_time)
directions.append(Direction([route]))

Expand Down
File renamed without changes.
File renamed without changes.
31 changes: 31 additions & 0 deletions server/routes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from sqlalchemy import text
from starlette.requests import Request
from starlette.responses import Response
from starlette.routing import Route

from server.sources import sources


async def home(request: Request) -> Response:
text_response = '<html>'

try:
sources['treni'].session.execute(text('SELECT 1'))
except Exception:
return Response(status_code=500)
else:
text_response += '<p>Postgres connection OK</p>'

text_response += '<ul>'
for source in sources.values():
if hasattr(source, 'gtfs_version'):
text_response += f'<li>{source.name}: GTFS v.{source.gtfs_version}</li>'
else:
text_response += f'<li>{source.name}</li>'
text_response += '</ul></html>'
return Response(text_response)


routes = [
Route("/", home)
]
23 changes: 23 additions & 0 deletions server/sources.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from config import config
from server.GTFS import GTFS
from server.trenitalia import Trenitalia
from server.typesense import connect_to_typesense

engine_url = f"postgresql://{config['PGUSER']}:{config['PGPASSWORD']}@{config['PGHOST']}:{config['PGPORT']}/" \
f"{config['PGDATABASE']}"
engine = create_engine(engine_url)

session = sessionmaker(bind=engine)()
typesense = connect_to_typesense()

sources = {
'aut': GTFS('automobilistico', '🚌', session, typesense, dev=config.get('DEV', False)),
'nav': GTFS('navigazione', '⛴️', session, typesense, dev=config.get('DEV', False)),
'treni': Trenitalia(session, typesense)
}

for source in sources.values():
source.sync_stations_typesense(source.get_source_stations())
File renamed without changes.
4 changes: 2 additions & 2 deletions MuoVErsi/trenitalia/source.py → server/trenitalia/source.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import json
import math
import os
from pytz import timezone

import requests
from pytz import timezone
from tqdm import tqdm

from MuoVErsi.base import *
from server.base import *

logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO
Expand Down
File renamed without changes.
3 changes: 2 additions & 1 deletion tests/test_db.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from datetime import date, datetime, time

import pytest

from MuoVErsi.GTFS import GTFS, get_clusters_of_stops, CCluster, CStop
from server.GTFS import GTFS, get_clusters_of_stops, CCluster, CStop


@pytest.fixture
Expand Down
2 changes: 1 addition & 1 deletion tests/test_gtfs_clustering.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pytest

from MuoVErsi.GTFS.clustering import get_root_from_stop_name, get_loc_from_stop_and_cluster
from server.GTFS.clustering import get_root_from_stop_name, get_loc_from_stop_and_cluster


@pytest.fixture
Expand Down
Empty file added tgbot/__init__.py
Empty file.
Loading

0 comments on commit 24ad3ed

Please sign in to comment.