-
Notifications
You must be signed in to change notification settings - Fork 1
Services
Actually, epnoi is composed by the following modules :
- api: deploy a web interface to allow users to do operations on the system
- hoarder: download documents to be added to the system
- harvester: extract text and meta-information from them
- learner: identify relevant terms and relations as well as create ontologies from the text
- modeler: create internal models to represent and categorize them
- comparator: measure the similarity between them according to the model created
Each of them has a different purpose and works on parallel with the rest.
The system has been designed following the basic principles of the Microservices Architecture pattern. The main purpose here is not only to have multiple public services, but also to allow building a scalable system in a scalable way.
From the scalable way point of view, services are developed and deployed independently of one another. Each of them has its own life-cycle and could be running in an isolated environment.
From the scalable system point of view, the architecture is designed applying the Scale Cube model. It enables the x-axis scaling, i.e. running multiple copies of an service, the y-axis scaling, i.e. decomposing the system into service, and the z-axis scaling, i.e. running multiple instances of a deployment with different corpus each of them.
We have followed these approaches to implement our services:
-
internal library: this could be considered not strictly a service. Here, the service is a java library added to the same classpath than the clients, i.e. running in the same Java Virtual Machine (JVM) instance. This is the first step to create a service. It emerges when many different clients require the same functionality.
- pros: speed. Running in the same JVM instance, it allows passing objects as references, avoiding any type of serialization/deserialization.
- cons: dependency. Memory consumption, CPU consumption, classpath etc are shared between service and clients. A service may has a particular behavior profile, but running as library, it will depend on the behavior profile of the group of components running in the same JVM instance. Also, some dependency conflicts may occur because of shared classpath. Common libraries must to have the same version for all the components.
-
internal resource: this is the next step. Now, the service runs in a different JVM instance with an isolated classpath. In this way, it can adjust its particular consumption profile and even can be deployed in a different machine. However, it is considered an internal service because of the format of the exchanged messages. It uses a shared internal format between service and clients ( e.g. java objects, ProtoBuf messages) instead of commonly accepted structured data (e.g. JSON, XML).
- pros: independence (over internal library), speed (over external resource). It is a balanced solution between a constrained internal library and a low-efficient external resource. It has an isolated JVM instance (and classpath) and the time-consuming to serialize/deserialize the exchanged messages is lower than a external resource because it can be done directly to/from bytes, instead of first to/from textual representation and after that, to/from byte representation.
- cons: dependency (over external resource), speed (over internal library). Now, the exchanged messages can not be passed by reference, they have to be serialized/deserialized for each new communication. It implies a slower communication than as internal library and a higher dependency than as external resource because service and clients must to share the message definition in their classpaths.
-
external resource: This type of services are the most independent, because they use an opened format for exchange messages. Thus, any client either from the own platform or from external sites will be able to communicate with it without the need of share any class definition in their classpaths.
- pros: independence. This the maximum representation of independence for a service. No more information that the meaning of the data is shared between service and clients.
- cons: speed. As previously mentioned, it requires a serialization/deserialization process in two stages (i.e. text and bytes) to communicate. It introduces a higher delay than a internal resource.
Our criteria, to decide the way a service is deployed, will be based on its potential use and its life-flow. The next figure show this as a decision tree:
work supported by the European Community's Seventh Framework Programme (FP7-ICT-2013-8.1) under grant agreement no: 611383. For further information please see http://DrInventor.eu