Data driven architecture with WMPayload service #12171

vkuznet · 2024-11-18T19:31:21Z

Impact of the new feature
Reduce operational overhead of WM services.

Is your feature request related to a problem? Please describe.
Current WM architecture is based on set of distributed data-services talking to different databases and overlapping data. The new system can eliminate many components of WMAgent and replace them with central high-availability service to hold and serve WM data from a single location. This can lead to reduction of operation cost, maintenances various components, and overall improvements for WM services.

Describe the solution you'd like
I propose to adopt data (event) driven architecture with central WMPayload service. The full proposal is available in this google document. It consists of the following:

introduce WMPayload service with the following characteristics:
- High-availability service (run it on k8s)
- Horizontal scalable on demand of the clients
- High-throughput for data IO operations
- Low-latency data IO with O(ms) range
- Support for JSON and NDJSON data-formats
if necessary this architecture can be complemented with event driven approach of Pub/Sub service for distributing messages among various components.
the database backend can be either document oriented database, e.g. MongoDB, or even relational database like ORACLE which supports storing and querying data in JSON data-format.
- within such database backend we may introduce GraphQL for more flexibility or rely on existing QueryLanguage to support WMPayloads queries

The benefits of the new architecture can be summarized as following:

No change of underlying programming language within WM (Micro)Services, e.g. we can still use the same python code
Eliminate multiple databases, CouchDB, MongoDB, and converge on a single database backend
- If MongoDB is chosen we can use its flexible QL
Horizontally scalable (high availability and throughput)
Uniform storage and APIs for managing unstructured data (JSONs)
Data streaming via NDJSON
Eliminate need to use JSON records as payload across services, instead relies on UUID
- Memory reduction for ALL MS service
- If we’ll switch to ((ND-JSON** the memory footprint should not exceed much the size of a single processed JSON record
- To speed up service an asynchronous pattern should be applied as records can be processed in parallel
Common QL for backend database, e.g. use MongoDB with Mongo QL (JSON)
Data accessibility via APIs rather direct database CouchDB views
- Allows to re-design WMStats easily, i.e. separate data presentation from database
Separation of service from a database backend
- We can provide RESTful service and choose any document-oriented database with it (CouchDB, MongoDB, ElasticSearch, etc.)
- To achieve additional speed up a cache layer can be added between service and database, e.g. Reddis

Describe alternatives you've considered
Many iterations of existing architectures.

Additional context
There is a very simple but fully function prototype WMPayload service which satisfies to desired functionality and requirements. The initial prototype shows the following performance using JSON data-format:

operation	document	req/sec	bytes/operation	memory allocations
write single doc	auto-gen	0.5ms	12KB	197
write single doc	ReqMgr2	0.8ms	60KB	666
read single doc	auto-gen	0.2ms	12KB	124
read single doc	ReqMgr2	0.5ms	38KB	201
read all docs	ReqMgr2	75ms	102MB	238

Tests were performed under macOS (Apple M2 8 core) and used either auto-generated JSON docs or documents taken from ReqMgr2 service. In total there were 1500 documents in MongoDB indexes by uuid.

vkuznet · 2024-12-11T18:35:53Z

Update from CERN DBA (Kate) about JSON documents support in ORACLE database:

JSON is reasonably supported in Oracle 19. No need for special setup, everything works out of the box. I cannot provide you anything more interesting the doc you've mentioned:

There is no dedicated SQL data type for JSON data, so you can index it in the usual ways. In addition, you can index it in a general way, with a JSON search index, for ad hoc structural queries and full-text queries.

You can index JSON data as you would any data of the type that you use to store it. In particular, you can use a B-tree index or a bitmap index for SQL/JSON function json_value, and you can use a bitmap index for SQL/JSON conditions is json, is not json, and json_exists.

What we don't have for the moment is JSON Relational Duality in 23ai that allows data to remain in normalized relational tables but be accessed as JSON documents. We will upgrade most probably during the long shutdown but the timeline is unclear due to Oracle's release planning.

vkuznet added New Feature WMAgent R&D Microservices v2 Architecture labels Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data driven architecture with WMPayload service #12171

Data driven architecture with WMPayload service #12171

vkuznet commented Nov 18, 2024 •

edited

Loading

vkuznet commented Dec 11, 2024

Data driven architecture with WMPayload service #12171

Data driven architecture with WMPayload service #12171

Comments

vkuznet commented Nov 18, 2024 • edited Loading

vkuznet commented Dec 11, 2024

vkuznet commented Nov 18, 2024 •

edited

Loading