Skip to content

SelectFAQ

oceandlr edited this page Mar 25, 2019 · 21 revisions

Table of Contents

Select and OrderBy

Examples of _select_ with _order_by_ are in DataSet and numpy ndarray. If the order_by keyword is omitted, the 1st attribute in the column list is used. If this attribute is not indexed, the code will generate an exception:

    In [6]: src.show(limit=4)
    meminfo_E5-2698                                                 
    timestamp       component_id    Active          MemTotal        
    --------------- --------------- --------------- --------------- 
    (1518803953, 3055)              12           82672       131899340 
    (1518803954, 2905)              12           82672       131899340 
    (1518803955, 2761)              12           82672       131899340 
    --------------- --------------- --------------- --------------- 
    4 record(s)
    
    # When no order_by is used, the first indexed column, in this case, timestamp is used:
    In [9]: src.select(['timestamp','component_id','Active','MemTotal'], from_ = ['meminfo_E5-2698'])
    
    In [10]: src.show(limit=4)
    meminfo_E5-2698                                                 
    timestamp       component_id    Active          MemTotal        
    --------------- --------------- --------------- --------------- 
    (1518803953, 1846)              35          211076       131899340 
    (1518803953, 1961)              17          218128       131899340 
    (1518803953, 1979)             162          748460       131899340 
    --------------- --------------- --------------- --------------- 
    4 record(s)
    
    
    # If we create a database with no indexes (see query below):
    
    > sos_cmd -C /home/gentile/Source/numsos/csvimport/baddb -l
    schema :
    name      : meminfo_E5-2698_NOINDEX
    schema_sz : 4616
    obj_sz    : 384
    id        : 129
    -attribute : timestamp
        type          : TIMESTAMP
        idx           : 0
        indexed       : 0
        offset        : 8
    -attribute : component_id
        type          : UINT64
        idx           : 1
        indexed       : 0
        offset        : 16
    -attribute : job_id
        type          : UINT64
        idx           : 2
        indexed       : 0
        offset        : 24
    -attribute : app_id
        type          : UINT64
        idx           : 3
        indexed       : 0
        offset        : 32
    ...
    
    # Then the query with no order_by fails:
    
    In [9]: badsrc.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698_NOINDEX'])
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    
    ----> 1 badsrc.select(['timestamp','component_id','Active'],from_ = ['meminfo_E5-2698_NOINDEX'])
        
    XXX/DataSource.pyc in select(self, columns, where, order_by, from_, unique)
    618         """
    619         self.query_ = Sos.Query(self.cont)
    --> 620         self.query_.select(columns, where=where, from_ = from_, order_by = order_by, unique = unique)
    621 
    622         col_no = 0
    
    Sos.pyx in python.Sos.Query.select()
    
    Sos.pyx in python.Sos.Query._add_colspec()
    
    ValueError: The schema meminfo_E5-2698_NOINDEX does not have the primary key attribute timestamp

Select With ColSpec

You may need to define your function for use in the ColSpec for formatting:

 from sosdb import Sos
 from numsos.DataSource import SosDataSource
 import datetime as dt
 # 
 def format_timestamp(ts):
    f = float(ts[0]) + (float(ts[1]) / 1.e6)
    d = dt.datetime.fromtimestamp(f)
    return str(d)
 # 
 c = Sos.Container('/DATA15/orion/ldms_data')
 ds = SosDataSource()
 ds.config(cont=c)
 ds.select([ Sos.ColSpec('timestamp', cvt_fn=format_timestamp, col_width=30),'component_id','Active'], from_ =  ['meminfo'], order_by = 'comp_time')
 ds.show(limit=10)

This produces the following output:

 meminfo                                                        
 timestamp                      component_id    Active          
 ------------------------------ --------------- --------------- 
    2019-01-08 19:47:28.527092               0         1164000 
    2019-01-08 19:47:37.834536               0         1164016 
    2019-01-08 20:16:32.659371               0         1164896 
    2019-01-08 20:16:47.904506               0         1164960 
    2019-01-08 20:16:53.001013               0         1164960 
    2019-01-08 20:16:54.001076               0         1164964 
    2019-01-08 20:16:55.001391               0         1164964 
    2019-01-08 20:16:56.001458               0         1164964 
    2019-01-08 20:16:57.001763               0         1164964 
 ------------------------------ --------------- --------------- 
 10 record(s)

Select with Unique

Set the 'unique' keyword parameter to True to return only the 1st result if the index contains duplicates.

 src.select(['job_id'], from_ = ['meminfo_E5-2698'], order_by = 'job_id', unique = True)
 dst = src.get_results()
 dst.show()
          job_id 
 ---------------- 
             0.0 
       5078835.0 
 ---------------- 

Main

Basic

Data Computations

Reference Docs

Other

Clone this wiki locally