-
Notifications
You must be signed in to change notification settings - Fork 7
SelectFAQ
oceandlr edited this page Mar 25, 2019
·
21 revisions
Examples of _select_ with _order_by_ are in DataSet and numpy ndarray. If the order_by keyword is omitted, the 1st attribute in the column list is used. If this attribute is not indexed, the code will generate an exception:
In [6]: src.show(limit=4) meminfo_E5-2698 timestamp component_id Active MemTotal --------------- --------------- --------------- --------------- (1518803953, 3055) 12 82672 131899340 (1518803954, 2905) 12 82672 131899340 (1518803955, 2761) 12 82672 131899340 --------------- --------------- --------------- --------------- 4 record(s) # When no order_by is used, the first indexed column, in this case, timestamp is used: In [9]: src.select(['timestamp','component_id','Active','MemTotal'], from_ = ['meminfo_E5-2698']) In [10]: src.show(limit=4) meminfo_E5-2698 timestamp component_id Active MemTotal --------------- --------------- --------------- --------------- (1518803953, 1846) 35 211076 131899340 (1518803953, 1961) 17 218128 131899340 (1518803953, 1979) 162 748460 131899340 --------------- --------------- --------------- --------------- 4 record(s) # If we create a database with no indexes (see query below): > sos_cmd -C /home/gentile/Source/numsos/csvimport/baddb -l schema : name : meminfo_E5-2698_NOINDEX schema_sz : 4616 obj_sz : 384 id : 129 -attribute : timestamp type : TIMESTAMP idx : 0 indexed : 0 offset : 8 -attribute : component_id type : UINT64 idx : 1 indexed : 0 offset : 16 -attribute : job_id type : UINT64 idx : 2 indexed : 0 offset : 24 -attribute : app_id type : UINT64 idx : 3 indexed : 0 offset : 32 ... # Then the query with no order_by fails: In [9]: badsrc.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698_NOINDEX']) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ----> 1 badsrc.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698_NOINDEX']) XXX/DataSource.pyc in select(self, columns, where, order_by, from_, unique) 618 """ 619 self.query_ = Sos.Query(self.cont) --> 620 self.query_.select(columns, where=where, from_ = from_, order_by = order_by, unique = unique) 621 622 col_no = 0 Sos.pyx in python.Sos.Query.select() Sos.pyx in python.Sos.Query._add_colspec() ValueError: The schema meminfo_E5-2698_NOINDEX does not have the primary key attribute timestamp
You may need to define your function for use in the ColSpec for formatting:
from sosdb import Sos from numsos.DataSource import SosDataSource import datetime as dt # def format_timestamp(ts): f = float(ts[0]) + (float(ts[1]) / 1.e6) d = dt.datetime.fromtimestamp(f) return str(d) # c = Sos.Container('/DATA15/orion/ldms_data') ds = SosDataSource() ds.config(cont=c) ds.select([ Sos.ColSpec('timestamp', cvt_fn=format_timestamp, col_width=30),'component_id','Active'], from_ = ['meminfo'], order_by = 'comp_time') ds.show(limit=10)
This produces the following output:
meminfo timestamp component_id Active ------------------------------ --------------- --------------- 2019-01-08 19:47:28.527092 0 1164000 2019-01-08 19:47:37.834536 0 1164016 2019-01-08 20:16:32.659371 0 1164896 2019-01-08 20:16:47.904506 0 1164960 2019-01-08 20:16:53.001013 0 1164960 2019-01-08 20:16:54.001076 0 1164964 2019-01-08 20:16:55.001391 0 1164964 2019-01-08 20:16:56.001458 0 1164964 2019-01-08 20:16:57.001763 0 1164964 ------------------------------ --------------- --------------- 10 record(s)
Set the 'unique' keyword parameter to True to return only the 1st result if the index contains duplicates.
src.select(['job_id'], from_ = ['meminfo_E5-2698'], order_by = 'job_id', unique = True) dst = src.get_results() dst.show() job_id ---------------- 0.0 5078835.0 ----------------
- SOS QuickStart - includes creating SOS from CSV
- Building
- Viewing Class Documentation
- numSOS overview - python queries to numSOS data objects.