-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added facilities pull skeleton * Added remaining synchronization logic and scheduler * Added README * Added readme and sigeca client * Added API * Removed codespace * Synchronization of requirements added
- Loading branch information
1 parent
10c7068
commit a0ecfad
Showing
33 changed files
with
2,349 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Use the official Python image from the Docker Hub | ||
FROM python:3.10-slim | ||
|
||
# Set environment variables to avoid interactive prompts during package installation | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
# Update and install dependencies | ||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommends \ | ||
apt-transport-https \ | ||
ca-certificates \ | ||
curl \ | ||
wget \ | ||
gnupg \ | ||
libc-dev \ | ||
gcc \ | ||
software-properties-common \ | ||
libpq-dev \ | ||
default-jdk \ | ||
default-jre \ | ||
file && \ | ||
apt-get clean && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
|
||
# Install Python dependencies | ||
COPY requirements.txt . | ||
|
||
RUN pip install -r requirements.txt | ||
|
||
# Set environment variables | ||
ENV PYTHONPATH=/app | ||
|
||
# Copy application code | ||
COPY . /app | ||
|
||
# Spark installation | ||
ARG spark_version="3.0.1" | ||
ARG hadoop_version="3.2" | ||
ARG spark_checksum="E8B47C5B658E0FBC1E57EEA06262649D8418AE2B2765E44DA53AAF50094877D17297CC5F0B9B35DF2CEEF830F19AA31D7E56EAD950BBE7F8830D6874F88CFC3C" | ||
ARG openjdk_version="11" | ||
|
||
ENV APACHE_SPARK_VERSION="${spark_version}" \ | ||
HADOOP_VERSION="${hadoop_version}" | ||
|
||
WORKDIR /tmp | ||
# Using the preferred mirror to download Spark | ||
# hadolint ignore=SC2046 | ||
RUN wget -q https://dlcdn.apache.org/spark/spark-3.4.3/spark-3.4.3-bin-hadoop3.tgz | ||
RUN tar xzf "spark-3.4.3-bin-hadoop3.tgz" -C /usr/local --owner root --group root --no-same-owner && \ | ||
rm "spark-3.4.3-bin-hadoop3.tgz" | ||
|
||
WORKDIR /usr/local | ||
|
||
# Configure Spark | ||
ENV SPARK_HOME=/usr/local/spark | ||
ENV SPARK_OPTS="--driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info" \ | ||
PATH=$PATH:$SPARK_HOME/bin | ||
|
||
RUN ln -s "spark-3.4.3-bin-hadoop3" spark | ||
|
||
WORKDIR /app | ||
|
||
# Expose any necessary ports (e.g., for Spark UI) | ||
EXPOSE 4040 | ||
RUN unset SPARK_HOME | ||
# Define default command | ||
CMD ["python", "main.py", "--run-mode", "continuous"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# Sigeca Data Export Microservice | ||
|
||
This microservice synchronizes data between a local database and an external API using Apache Spark. It supports both continuous synchronization and one-time integration. | ||
|
||
## Table of Contents | ||
|
||
- [Prerequisites](#prerequisites) | ||
- [Setup](#setup) | ||
- [Configuration](#configuration) | ||
- [Building the Docker Image](#building-the-docker-image) | ||
- [Running the Application](#running-the-application) | ||
- [Troubleshooting](#troubleshooting) | ||
- [Logs](#logs) | ||
- [Acknowledgements](#acknowledgements) | ||
|
||
|
||
## Prerequisites | ||
### Docker | ||
- Docker and Docker Compose installed on your system | ||
- An external database (e.g., PostgreSQL) accessible from your Docker network | ||
|
||
### Loacl run | ||
|
||
- Python 3.10 | ||
- Java Runtime Environment (JRE) installed | ||
- Apache Hadoop and Apache Spark | ||
- An external database (e.g., PostgreSQL) accessible from your Docker network | ||
|
||
## Setup | ||
|
||
1. Clone the repository: | ||
|
||
```bash | ||
git clone https://github.com/OpenLMIS-Angola/sigeca-synchronization.git | ||
cd sigeca-synchronization/sigeca_data_import_microservice | ||
``` | ||
2. Create and run virtual environment | ||
|
||
```bash | ||
python3 -m venv venv | ||
source venv/bin/activate | ||
``` | ||
3. Install requirements | ||
|
||
```bash | ||
python install -r requirements.txt | ||
``` | ||
|
||
## Configuration | ||
|
||
Create the `config.json` file with your specific settings. It can be created based on the provided `config_example.json`: | ||
|
||
```json5 | ||
{ | ||
"open_lmis_api" : { // Used for sending the new entries to the LMIS database | ||
"api_url": "https://openlmisapi.example.org/api/", // URL of the API endpoint | ||
"username": "lmis_user", // Authorized user | ||
"password": "password", // Authorized user password | ||
"login_token": "dSFdoi1fb4l6bn16YxhgbxhlbdU=" // Basic token value taken from the client request of the server | ||
}, | ||
"sigeca_api" : { | ||
"api_url": "http://exmapleapisiggeca.org/api", // Endpoint used for fetchingsource of truth for facilities | ||
"headers": { // Headers used for the synchronization | ||
"ContentType": "application/json" | ||
}, | ||
"credentials": { // Credentials used for authorization of user | ||
"username": "username", | ||
"password": "password" | ||
}, | ||
"skip_verification": false // Skip SSL cerficate validation, USE FOR TEST ONLY | ||
}, | ||
"database": { // DB Concetion used for the validating existing facilities in ORM | ||
"username": "db_user", | ||
"password": "db_passwd", | ||
"host": "localhost", | ||
"port": 5432, | ||
"database": "open_lmis" | ||
}, | ||
"jdbc_reader": { // PySpark connection details for data validation | ||
"jdbc_url": "jdbc:postgresql://dbserver.example.org:5432/open_lmis", // Points to db on open_lmis_api | ||
"jdbc_user": "db_user", // DB User | ||
"jdbc_password": "db_passwd", // DB Password | ||
"jdbc_driver": "org.postgresql.Driver", // Default driver | ||
"log_level": "WARN", // Log level for spark operations | ||
"ssh_host": "sshOptionalHost", // SSH Server used when tunneling is required to connect to db | ||
"ssh_port": 22, // Port for ssh connection | ||
"ssh_user": "ubuntu", // User | ||
"ssh_private_key_path": "./private_key", // Relative path to the rsa private key | ||
"remote_bind_address": "dbserver.example.org", // Address used to connect to server from ssh server | ||
"remote_bind_port": 5432, // Port used on the remote | ||
"local_bind_port": 5559 // Port binded to localhost | ||
}, | ||
"sync": { | ||
"interval_minutes": 5 // Job interval in minutes | ||
} | ||
} | ||
``` | ||
## Running the Application | ||
### Continuous Synchronization | ||
To run the application continuously using Docker Compose: | ||
```bash | ||
docker-compose run app python main.py --run-mode continuous | ||
``` | ||
### One-time Integration | ||
To perform a one-time integration: | ||
1. Run the application with the `one-time` argument: | ||
```bash | ||
docker-compose run app python main.py --run-mode one-time | ||
``` | ||
This will run one time task which will synchronize all available data with external system. | ||
## Troubleshooting | ||
### Common Issues | ||
1. **Network Issues**: | ||
- Ensure the Docker network is properly set if application is run from separate docker-compose than OpenLMIS. | ||
2. **Configuration Errors**: | ||
- Double-check your `config.json` file for accuracy, especially the database connection details. | ||
3. **Dependency Issues**: | ||
- If you encounter issues with Java or Hadoop dependencies, ensure they are correctly installed and the URLs are correct. | ||
### Logs | ||
Logs are stored in the `logs` volume. You can access them for debugging and monitoring purposes: | ||
```bash | ||
docker-compose logs app | ||
``` | ||
## Acknowledgements | ||
- [Apache Spark](https://spark.apache.org/) | ||
- [Apache Hadoop](http://hadoop.apache.org/) | ||
- [Docker](https://www.docker.com/) | ||
- [OpenLMIS-Angola](https://github.com/OpenLMIS-Angola) | ||
For any questions or issues, please open an issue on the [GitHub repository](https://github.com/OpenLMIS-Angola/sigeca-synchronization/issues). |
Empty file.
Empty file.
29 changes: 29 additions & 0 deletions
29
sigeca_data_import_microservice/app/application/scheduler.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
from .synchronization.facilities import FacilitySynchronizationService | ||
import logging | ||
from apscheduler.schedulers.background import BlockingScheduler | ||
|
||
|
||
class FacilitySyncScheduler: | ||
def __init__( | ||
self, | ||
sync_service: FacilitySynchronizationService, | ||
interval: int | ||
): | ||
self.sync_service = sync_service | ||
self.sync_interval_minutes = interval | ||
self.scheduler = BlockingScheduler() | ||
|
||
def start(self): | ||
self.scheduler.add_job( | ||
self.run_sync, "interval", minutes=self.sync_interval_minutes | ||
) | ||
self.scheduler.start() | ||
|
||
def stop(self): | ||
self.scheduler.shutdown() | ||
|
||
def run_sync(self): | ||
try: | ||
self.sync_service.synchronize_facilities() | ||
except Exception as e: | ||
logging.exception(f"Synchronization job failed. Error: {e}") |
Empty file.
Oops, something went wrong.