Files

PyAnnolib

By default, have PyAnnolib ignore unknown XML elements.

Nov 10, 2015

c425498 · Nov 10, 2015

Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs	Import PyAnnlib	Jul 17, 2013
pyannolib	pyannolib	By default, have PyAnnolib ignore unknown XML elements.	Nov 10, 2015
tyrannocmd	tyrannocmd	Move sequencing.py from pyannolib/ to tyrannolib/	Dec 2, 2014
tyrannolib	tyrannolib	Move sequencing.py from pyannolib/ to tyrannolib/	Dec 2, 2014
utfiles	utfiles	By default, have PyAnnolib ignore unknown XML elements.	Nov 10, 2015
utlib	utlib	Support <commitTimes> in emake 8 annotation files.	Aug 19, 2015
.gitignore	.gitignore	Import PyAnnlib	Jul 17, 2013
DEVELOPMENT	DEVELOPMENT	Replace parseJobs() with iterJobs()	Nov 25, 2014
LICENSE	LICENSE	Import PyAnnlib	Jul 17, 2013
MANIFEST.in	MANIFEST.in	Fix assumption that there is only one output element per command.	Jul 20, 2013
README	README	By default, have PyAnnolib ignore unknown XML elements.	Nov 10, 2015
compare-metrics	compare-metrics	compare-metrics should iterate over the file lists correctly	Sep 3, 2014
extract-jobs	extract-jobs	Replace parseJobs() with iterJobs()	Nov 25, 2014
sample-jobs	sample-jobs	Add parseJobs() based on iterJobs().	Nov 25, 2014
sample-metrics	sample-metrics	Remove use of sax parser; use ElementTree for anno header too.	Nov 19, 2014
setup.py	setup.py	Import PyAnnlib	Jul 17, 2013
show-conflicts	show-conflicts	Small fixes to conflicts report	Nov 25, 2014
show-deps	show-deps	Use ElementTree iterparse() to parse the body of the annotation file.	Aug 29, 2014
show-jobpath	show-jobpath	Replace parseJobs() with iterJobs()	Nov 25, 2014
show-parse-effects	show-parse-effects	Fix typo	Nov 25, 2014
tyranno	tyranno	Uprev tryanno to 1.0.2	Dec 3, 2014
unittest	unittest	Support <commitTimes> in emake 8 annotation files.	Aug 19, 2015

README

# Copyright (c) 2013-2014 by Cisco Systems, Inc.

Introduction
============
PyAnnolib is a Python library for reading Electric Cloud's emake
(Electric Accelerator) annotation file, an XML log of what happened
during a build.

It was created by Gilbert Ramirez <[email protected]>, and
Cisco has given permission to release this library to the open
source community under a BSD-style license (see the LICENSE file).

The source code for PyAnnolib can be found at the Electric Cloud
github community for Electric Accelerator:

https://github.com/electriccommunity/electricaccelerator

Caveats
=======
This is a work in progress. It works for the original author's
use-cases, which is building with GNU Make on Linux. Certain
fields that are used on Windows-based builds have not yet
been implemented, but can definitely be added.


Usage
=====
Because the annotation file can be huge, and your Python program
may not want or be able to store all records in memory at once,
PyAnnolib is designed to pass each Job record to your
program, one at a time. Your program may throw away the record,
or it may store it in memory, as you see fit.

If you do wish to have all records in memory at once, there is a
convenience function to do that for you.

There are sample programs in the top level directory of this
distribution so you can see how to use PyAnnolib

Importing
---------
from pyannolib import annolib

AnnotatedBuild(filename, fh=None)
----------------------------------
This class models one entire build. You can pass it a filename,
or an open filehandle:

    build = AnnotatedBuild(my_filename[, ignore_unknown=True])
    or
    build = AnnotatedBuild(filename=None, fh=my_fh[, ignore_unknown=True])



If you pass in a filename (as opposed to an open file handle),
AnnotatedBuild will automatically look for additional filenames with
"_1", "_2", etc., suffixes. When the annotation log file is approaching
2GB in size, emake will automatically close the first annotation file,
and create a new file, named with the same name, but with a suffix of "_1".
These files could be concatenated together to create a new XML file, like
so:
    $ cat file.xml file.xml_1 > new_file.xml

But Electric Insight automatically knows to look for the "_1" file. So,
PyAnnolib does the same.

If PyAnnolib encounters any XML elements it wasn't programmed for,
by default it will ignore such errors. Before the ignore_unknown flag
was added, the behavior was opposite: if it encountered unknown elements,
it would error out. You can still have it error out in such situations
if you set ignore_unknown to False.

The initialize will read the header of the annotation file.

The AnnotatedBuild has data associated with it (metrics), and contains many
Job records, which can be retreived, one at a time, via iterJobs()

In previous versions of PyAnnolib, the Metrics could only be accessed
after reading all the jobs first, because the Metrics are
stored at the end of the annotation file. Now, when an AnnotatedBuild
object is first instantiated, PyAnnolib will seek to the end of the file
and parse the metrics immediately. So, if you want to just read
the metrics, there is no need to use iterJobs() to read all the jobs first.

Methods:

getBuildID():   return the build ID

getCM():        return the cluster manager

getStart():     return the start time in string form

getStartDateTime():
    Returns a datetime.datetime object for the start time of
    the build. The timezone information is not modeled in this
    datetime object, even though it is present in the time string
    in the annotation file.

getLocalAgents():
    Returns "true" or None... were local agents used in the build?

getProperties():
    Returns the hash of properties for the build.

getProperty(name):
    Returns the value of the property, or None if it doesn't exist.

getVars():
    Returns the hash of environment variables used in this build.

getVar(name):
    Returns the value of the environment variable as recorded
    for this build, or None if it doesn't exist

getMetrics():
    Returns the hash of metrics that were recorded for the build.

getMetric(name):
    Returns the value of the named metric, or None if it doesn't exists

getMessages():
    Returns the list of out-of-band messages that were emitted
    during the build, if any. These are available during job
    parsing, or afterwards. The Message records are interspersed
    with the Job records.

iterJobs()

    This is an iterator that returns one Job at a time.

    If the annotation file has
    many thousands of Jobs and you do not with to store them all
    in memory at once, this function lets you handle each Job
    object one at aa time.


parseJobs(cb, user_data=None)

    This was the original API for retrieving one job at a time.
    iterJobs() is nicer, but parseJobs() is still provided.

    For each Job record in the XML file, runs the callback function
    (cb), passing it that Job object and user_data. The user_data
    is any additional data that your code wishes for the callback
    function to see; it is a handy way of passing in other objects
    that the callback might need.
    
    The callback function will look like:
    
    def my_callback(job, user_data):
        # process the job object here.
    
    If you want the loop across all jobs to finish becase you
    are done with the annotation file, the callback function should
    return annolib.StopParseJobs. Otherwise, the return value
    of the callback function is ignored. Once the parsing has stopped,
    it cannot be resumed for the same AnnotatedBuild object.


getAllJobs()

    Returns a list of all Job objects in the annotation file.
    This loads all Job objects in memory at once, so if your
    annotation file is extremely huge, you might have to worry
    about memory size here.

getMakeProcess(make_id):
    Returns the MakeProcess object associated with the make ID
    string. The make ID string starts with "M", and is followed
    by a hexadecimal integer.

getMakeJob(job_id):
    Returns the Job record for the job ID string. The only
    job records that are stored are the jobs that initiated
    Make processes. Each MakeProcess object has a parent job ID
    string that can be obtained with getParentJobID(). It is only
    those job IDs that are stored in this hash. By using the
    information about Make processes from getMakeProcess(), and
    the jobs that begat those Make process, using getMakeJob(),
    you can reconstruct the job path (the make process
    hierarchy).

getJobPath(job):
    Return a list that correponds to the Job Path in Electric Insight.
    Its members are the alternating sequences of MakeProcess
    objects and Job objects that make up the sub-make hierarchy
    that leads to "job". The first item in the list will be the root
    MakeProcess (M00000000), and the last item will be the job
    itself that was passed to getJobPath().

getMakePath(job):
    Similar to getJobPath(), but the list contains only the
    MakeProcess objects, not the Job objects.

close()
    Closes the filehandle. If you had AnnotatedBuild open the
    filehandle for you, you may want it to close it as well.

Job
---
getID()
    Returns a string with the job id. The string starts with "J",
    followed by a hexadecimal number.

getStatus()
    Returns a string with one of the status types. The possible values are:
        normal, rerun, conflict, reverted, skipped

getThread()

getType()
    Returns a string representing the job type. The possible values are:
        
        (from Electric Cloud's "Electric Accelerator: Overview
        and Getting Started")

        parse - parses makefiles
        existence - checks for existence of top-level targets w/o rules
        remake - determines if a makefile remake is needed
        rule - runs the commands in the action part of a rule
        continuation - like rule, but for commands after a sub-make
        follow - follows every submake; handles output and exit status
            for the make process
        end - for each make, does work after the final target is built

        (no info on these):
        external, subbuild, alpha, omega, update, statcache


getTimings()
    Returns a list of Timing objects. A Timing object models
    the start and end times of running the job on a node in the
    cluster. For most jobs there is only one Timing object, but
    if there is some issue where emake needs to run the job again,
    it will. It actually runs it up to three times on different
    agents. See:
    https://electriccloud.zendesk.com/hc/en-us/articles/202830223-KBEA-Error-Code-EC1073

getMakeProcess()
    Returns the MakeProcess which has the info about the sub-make
    which started this Job.

getWaitingJobs()
    Returns a list of job IDs. Those are the jobs that are waiting
    on this job to finish.

getOutputs()
    Returns the Output objects that this Job produced, if this job
    was a Make job and the Make itself printed output (parsing
    warnings, etc.). Otherwise, to see the command outputs, use
    getCommands() to get each Command object, which have their
    own getOutputs() method.

getOperations()
    Returns a list of the Operation objects if this build recorded
    "file" information in the annotation file. Operation objects
    show the operations on files under any of the emake roots.

getCommands()
    Returns the Command objects that this Job ran if this Job was
    not a Make job. Each Command object will have its own output,
    retrievable by getOutput() on the Command object. This is different
    from the Job objects getOutput() method, which is only for Make jobs.

getCommitTimes()
getDependencies()
getConflict()
getName()
getNeededBy()
getFile()
getLine()
getPartOf() - Used in FOLLOW-type jobs
    Retrieve these values.

getVars() - returns a hash
getVar(varname) - returns a value
    Returns environment variables for the job if the annotation detail
    included 'environment'

getTextReport()
    Returns a string with all the information nicely formatted.
    It's all the information that is available in the annotation record
    for this job, but your eyes won't glaze over trying to read through
    the XML markup.

getRetval()
    Returns an intbeger value for the return code. 0 on success,
    or non-0 on failure.

MakeProcess
-----------
getLevel()
    Returns a string specifying the make-level. The root make
    is "0", the child sub-make is "1", the grand-child sub-make is "2", etc.

getCmd()
    Returns the command-line for this sub-make process

getCWD()
    Returns the CWD of the sub-make process.

getOWD()
getMode()
    Retrieve these values.

getID() - This is an artifical ID, as it is not stored in the XML file.
    The same is used in Electric Insight

getParentJobID() - This is computed by the sequence of jobs in the XML
    file. This is the job that started this "make process"


Operation
---------
getType()
getFile()
getFileType()
getFound()
    Retrieve these values.

Timing
------
getInvoked()
getCompleted()
getNode()
    Retrieve these values.

CommitTimes
-----------
getStart()
getWait()
getCommit()
getWrite()

Command
-------
getLine()
getArgv()
getOutputs()

Dependency
----------
getWriteJob()
getFile()
getType()

Conflict
--------
getType()
getWriteJob()
getFile(0
getRerunBy()

Output
------
getText()
getSrc()


Message
-------
getText()
getThread()
getTime()
getSeverity()
getCode()


PyAnnolibError - this is the exception that the library will raise
if something is amiss.

anno_open() - this routine opens an annotation file and returns
a filehandle. However, if it finds a "_1", etc., continuation file,
it creates a ConcatendatedFile object (see pyannolib/concatfile.py),
which acts like a filehandle, but automatically and seamlessly
makes multiple files look like one larger file.



Developing
==========
Read the DEVELOPMENT text file.


Sample Scripts
==============

compare-metrics - Show specified metrics between 2 anno files,
    or between the same-named anno files in 2 directories.

extract-jobs - print individual job records to text files.

sample-jobs - This sample script shows a few details about the Build,
    and then shows some data for each "rule" Job that has Command
    objects associated with it. That is, these are the jobs that would
    normally appear in a Make log.

sample-metrics - shows how to read the metrics and report them.

show-conflicts - shows the conflicting jobs in a build

show-deps - shows the file dependency graph

show-jobpath - shows the MakeProcess/Job hierarchy, like the "JobPath"
    tab in ElectricInsight

show-parse-effects - shows the Make parse jobs that had side effects

Reports
=======
'tyranno' is a command-line tool for packaging together various
reports. See "tyranno --help" for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

PyAnnolib

PyAnnolib

README

Files

PyAnnolib

Directory actions

More options

Directory actions

More options

Latest commit

History

PyAnnolib

Folders and files

parent directory

README