Skip to content

WMArchive queries

Valentin Kuznetsov edited this page Mar 25, 2016 · 5 revisions

WMArchive queries

The queries in WMArchive should be placed via POST request in a form of Query Language (QL). The QL should specify condition keys and document fields to be retrieved. We end-up using JSON format as QL data-format similar to MongoDB QL. For example:

{"spec": {"lfn":"bla*123.root", "task": "bla", "run":[1,2,3]}, "fields":["logs"]}

here we define that our conditions spec is a dictionary where we define actual conditions such as lfn pattern, task name and run list. The output fields would be logs key in FWJR documents stored in WMArchive.

The WMArchive will support three attributes which user may provide:

  • spec defines condition dictionary
    • each spec must have a timerange key-value pair which points to specific timerange user want to look-up the data, e.g. "timerange":[20160101, 20160102], the values in timerange list should be in YYYYMMDD format
  • fields defines list of attributes to be retrieved from FWJR and returned back to user
  • aggregate define list of attributes we need to aggregate

Query Language

The QL should be flexible enough to accommodate user queries, see next section. For that we can use JSON document to define our queries, see example in previous section. Since we're going to use nested structure in our FWJR documents we need to refer them via dot notations, e.g. output.inputDataset, meta_data.agent_ver, etc. Usage of JSON for queries can be easily translated into MongoDB QL. Here is simplest QL rule definitions (mostly aligned with MongoDB QL):

  • input query should be in form of JSON
  • the query JSON should contain only key-value pairs, no nested structures
  • the supported data types for values would be int, float, string and list of those
  • user can send multiple set of conditions which translates into AND set
  • user can send alternative conditions via {"$or": [JSON1, JSON2]} structure.
  • user can specify $gt, $lt operators for greater than, less than use cases

The WMArchive code will translate user query either into Mongo DB one or the one suitable for querying documents on HDFS. Here are few examples of QL syntax

# use multiple keys
{"task":"abc", "run":1}

# use patterns
{"lfn":"/a/v/b*/123.root"}

# use or conditions
{"$or": [{"task":"abc", "lfn":"/a/v/b*/123.root"}, {"run":[1,2,3]}]}

# use array values, i.e. find docs with all given run numbers
{"run":[1,2,3]}

# usage of $gt, $lt operators
{"run":{"$gt":1})

Example of user queries

Based on current schema structure we can foresee the following queries:

  • Find documents based on provided LFN or its pattern, see here
  • Find documents for given agent meta-data
  • Find documents for chain attributes such as input.runs, output. acquisitionEra or output.applicationVersion, output.inputDataset or output.outputDataset
  • Find documents for given task name

Any combination of aforementioned conditions is also possible

QL keys

We may need to simplify user life and give them restricted set of keys which will be used by QL. For example lfn will refer to LFNArray, agent.version can be translated into meta_data.agent_ver, etc. I think the translation layer will come later when we'll settle down on schema and QL implementation.

Examples

Here we provide examples how input query JSON file may looks like and how WMArchive will provide the output:

input QL JSON (stored in query.json):

{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["wmaid"]}

client call and output

wma_client.py --spec=query.json
{"result": [{"status": "ok",
             "input": {"fields": ["wmaid"],
             "spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}},
             "storage": "mongodb",
             "results": [{"wmaid": "6b0bace20fc732563316198d1ed2b94e"}]}]}

intput QL JSON (stored in query.json):

{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["meta_data", "logs"]}

client call and output

wma_client.py --spec=query.json
{"result": [{"status": "ok", 
             "input": {"fields": ["meta_data", "logs"], 
                       "spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}}, 
             "storage": "mongodb",
             "results": [{"meta_data": {"agent_ver": "1.0.13.pre8", "fwjr_id": "100-0", "host": "vocms008.cern.ch", "ts": 1454617125}}]}]}