Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data driven architecture with WMPayload service #12171

Open
vkuznet opened this issue Nov 18, 2024 · 1 comment
Open

Data driven architecture with WMPayload service #12171

vkuznet opened this issue Nov 18, 2024 · 1 comment

Comments

@vkuznet
Copy link
Contributor

vkuznet commented Nov 18, 2024

Impact of the new feature
Reduce operational overhead of WM services.

Is your feature request related to a problem? Please describe.
Current WM architecture is based on set of distributed data-services talking to different databases and overlapping data. The new system can eliminate many components of WMAgent and replace them with central high-availability service to hold and serve WM data from a single location. This can lead to reduction of operation cost, maintenances various components, and overall improvements for WM services.

Describe the solution you'd like
I propose to adopt data (event) driven architecture with central WMPayload service. The full proposal is available in this google document. It consists of the following:

  • introduce WMPayload service with the following characteristics:
    • High-availability service (run it on k8s)
    • Horizontal scalable on demand of the clients
    • High-throughput for data IO operations
    • Low-latency data IO with O(ms) range
    • Support for JSON and NDJSON data-formats
  • if necessary this architecture can be complemented with event driven approach of Pub/Sub service for distributing messages among various components.
  • the database backend can be either document oriented database, e.g. MongoDB, or even relational database like ORACLE which supports storing and querying data in JSON data-format.
    • within such database backend we may introduce GraphQL for more flexibility or rely on existing QueryLanguage to support WMPayloads queries

The benefits of the new architecture can be summarized as following:

  • No change of underlying programming language within WM (Micro)Services, e.g. we can still use the same python code
  • Eliminate multiple databases, CouchDB, MongoDB, and converge on a single database backend
    • If MongoDB is chosen we can use its flexible QL
  • Horizontally scalable (high availability and throughput)
  • Uniform storage and APIs for managing unstructured data (JSONs)
  • Data streaming via NDJSON
  • Eliminate need to use JSON records as payload across services, instead relies on UUID
    • Memory reduction for ALL MS service
    • If we’ll switch to ((ND-JSON** the memory footprint should not exceed much the size of a single processed JSON record
    • To speed up service an asynchronous pattern should be applied as records can be processed in parallel
  • Common QL for backend database, e.g. use MongoDB with Mongo QL (JSON)
  • Data accessibility via APIs rather direct database CouchDB views
    • Allows to re-design WMStats easily, i.e. separate data presentation from database
  • Separation of service from a database backend
    • We can provide RESTful service and choose any document-oriented database with it (CouchDB, MongoDB, ElasticSearch, etc.)
    • To achieve additional speed up a cache layer can be added between service and database, e.g. Reddis

Describe alternatives you've considered
Many iterations of existing architectures.

Additional context
There is a very simple but fully function prototype WMPayload service which satisfies to desired functionality and requirements. The initial prototype shows the following performance using JSON data-format:

operation document req/sec bytes/operation memory allocations
write single doc auto-gen 0.5ms 12KB 197
write single doc ReqMgr2 0.8ms 60KB 666
read single doc auto-gen 0.2ms 12KB 124
read single doc ReqMgr2 0.5ms 38KB 201
read all docs ReqMgr2 75ms 102MB 238

Tests were performed under macOS (Apple M2 8 core) and used either auto-generated JSON docs or documents taken from ReqMgr2 service. In total there were 1500 documents in MongoDB indexes by uuid.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 11, 2024

Update from CERN DBA (Kate) about JSON documents support in ORACLE database:

JSON is reasonably supported in Oracle 19. No need for special setup, everything works out of the box. I cannot provide you anything more interesting the doc you've mentioned:

There is no dedicated SQL data type for JSON data, so you can index it in the usual ways. In addition, you can index it in a general way, with a JSON search index, for ad hoc structural queries and full-text queries.

You can index JSON data as you would any data of the type that you use to store it. In particular, you can use a B-tree index or a bitmap index for SQL/JSON function json_value, and you can use a bitmap index for SQL/JSON conditions is json, is not json, and json_exists.

What we don't have for the moment is JSON Relational Duality in 23ai that allows data to remain in normalized relational tables but be accessed as JSON documents. We will upgrade most probably during the long shutdown but the timeline is unclear due to Oracle's release planning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant