Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocabstrings #240

Merged
merged 12 commits into from
Apr 16, 2015
Merged

Vocabstrings #240

merged 12 commits into from
Apr 16, 2015

Conversation

bworrell
Copy link
Contributor

This PR aligns the behavior of VocabString fields with python-stix VocabString fields. The current python-cybox implementation allows for schema-invalid values to be passed in and does not support registration or setting of custom VocabString implementations.

New behavior:

  • VocabString fields can be set to any type of VocabString
  • If a VocabString field is set to a str, an attempt is made to convert it into the default VocabString type.
  • Input validation against schema-allowed values.
  • Dictionary values are assumed to be plain ol' VocabString instances unless there is an xsi:type key.
  • Users can register new VocabString implementations via vocabs.add_vocab() function.

Added:

  • vocabs.VocabField subclass of TypedField, which allows fields to be set to instances of any type of VocabString, as well as casting to the default type if passed a string.
  • vocabs.add_vocab() registration method.
  • A bunch of generated-from-schema VocabString impls in vocabs module.
  • New unit tests.

Modified:

  • Some of the examples in the documentation didn't test properly and had to be adjusted.
  • Some unit tests set invalid values (e.g., Resolves To instead of Resolved_To) and were fixed.

TODO before the merge or in separate issues after the merge (by @gtback):

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.11% when pulling 44c9373 on vocabstring into 6e3811a on master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.06% when pulling 90640ed on vocabstring into 6e3811a on master.

@ikiril01
Copy link
Member

This is great! It solves some existing issues around VocabString and bring some much-need validation around the default vocabulary values to the table.

I did some quick testing, and one issue I came across is setting custom vocabulary values (e.g., from MAEC) on VocabField fields without a defined VocabString implementation. This is currently the primary way such values are set in python-maec, so the following no longer works, as it raises a ValueError:

from maec.bundle import MalwareAction
ma = MalwareAction()
ma.name = "create file"
ma.name.xsi_type = "maecVocabs:FileActionNameVocab-1.1"

I could also see this being problematic if users just want to set a custom value on such a field without any associated vocabulary. Thus, my thought is that instead of raising a ValueError in such cases, we should instead just throw a warning, something to the effect of "Warning: non-standard vocabulary value set for xxx field".

@bworrell
Copy link
Contributor Author

@ikiril01 and I talked about this offline, but i'll record it here.

I sorta, kinda forgot to mention that python-maec probably needed to be changed as well :)

We'll want to create all the VocabString impls like the ones I made for python-stix|cybox for python-maec vocabs as well. We'll also want to use VocabField wherever you defined TypedFields for CV fields.

As for the example you provided, yeah that'll break. The new VocabString class attempts to promote a string to the default VocabString type if one is set.

So this line:

ma.name = "create file"

Is basically doing this internally:

ma.name = cybox.vocabs.ActionName('create file')

Which raises a ValueError because "create file" is not a value ActionName term.

To override this, you explicitly pass in a VocabString object:

my name = VocabString('create  file')
my_name.xsi_type = "maecVocabs:FileActionNameVocab-1.1"
ma.name = my_name

OR...

class FileActionName(cybox.common.VocabString):
    _XSI_TYPE =  "maecVocabs:FileActionNameVocab-1.1"

ma.name = FileActionName('create file')

Disclaimer: I haven't actually tested this code but it's the way it should work (and does work in python--stix fwiw).

@bworrell
Copy link
Contributor Author

@ikiril01, I've been kinda mulling over the error you posted. I'm wondering if it makes sense to implement the VocabField.__set__() as I did in the force-vocabstring branch.

Workflow:

  • If input value is None, set the field to None.
  • Else If input value is a VocabString or VocabString impl object, set the field to the input object.
  • Else set attempt to promote the input to an instance of the VocabField default type (this may be VocabString or a subclass like ActionName depending on the field).
    • If the type promotion raises a ValueError and the VocabField type is a subclass of VocabString, catch the error, issue a warning, and attempt to promote the input to a VocabString object.

Example (very similar to yours...now with 100% fewer Exceptions!):

>>> from cybox.core.action import Action
>>> a = Action()
>>> a.name = 'foobar'
cybox/common/vocabs.py:56: UserWarning: 'foobar' is not a valid term for ActionName. Expected one of ('Accept Socket Connection', 'Add Connection to Network Share', 'Add Network Share', 'Add System Call Hook', 'Add User', 'Add Windows Hook', 'Add Scheduled Task', 'Allocate Virtual Memory in Process', 'Bind Address to Socket', 'Change Service Configuration', 'Check for Remote Debugger', 'Close Port', 'Close Registry Key', 'Close Socket', 'Configure Service', 'Connect to IP', 'Connect to Named Pipe', 'Connect to Network Share', 'Connect to Socket', 'Connect to URL', 'Control Driver', 'Control Service', 'Copy File', 'Create Dialog Box', 'Create Directory', 'Create Event', 'Create File', 'Create File Alternate Data Stream', 'Create File Mapping', 'Create File Symbolic Link', 'Create Hidden File', 'Create Mailslot', 'Create Module', 'Create Mutex', 'Create Named Pipe', 'Create Process', 'Create Process as User', 'Create Registry Key', 'Create Registry Key Value', 'Create Remote Thread in Process', 'Create Service', 'Create Socket', 'Create Symbolic Link', 'Create Thread', 'Create Window', 'Delete Directory', 'Delete File', 'Delete Named Pipe', 'Delete Network Share', 'Delete Registry Key', 'Delete Registry Key Value', 'Delete Service', 'Delete User', 'Disconnect from Named Pipe', 'Disconnect from Network Share', 'Disconnect from Socket', 'Download File', 'Enumerate DLLs', 'Enumerate Network Shares', 'Enumerate Protocols', 'Enumerate Registry Key Subkeys', 'Enumerate Registry Key Values', 'Enumerate Threads in Process', 'Enumerate Processes', 'Enumerate Services', 'Enumerate System Handles', 'Enumerate Threads', 'Enumerate Users', 'Enumerate Windows', 'Find File', 'Find Window', 'Flush Process Instruction Cache', 'Free Library', 'Free Process Virtual Memory', 'Get Disk Free Space', 'Get Disk Type', 'Get Elapsed System Up Time', 'Get File Attributes', 'Get Function Address', 'Get System Global Flags', 'Get Host By Address', 'Get Host By Name', 'Get Host Name', 'Get Library File Name', 'Get Library Handle', 'Get NetBIOS Name', 'Get Process Current Directory', 'Get Process Environment Variable', 'Get Process Startup Information', 'Get Processes Snapshot', 'Get Registry Key Attributes', 'Get Service Status', 'Get System Global Flags', 'Get System Local Time', 'Get System Host Name', 'Get System NetBIOS Name', 'Get System Network Parameters', 'Get System Time', 'Get Thread Context', 'Get Thread Username', 'Get User Attributes', 'Get Username', 'Get Windows Directory', 'Get Windows System Directory', 'Get Windows Temporary Files Directory', 'Hide Window', 'Impersonate Process', 'Impersonate Thread', 'Inject Memory Page', 'Kill Process', 'Kill Thread', 'Kill Window', 'Listen on Port', 'Listen on Socket', 'Load and Call Driver', 'Load Driver', 'Load Library', 'Load Module', 'Lock File', 'Logon as User', 'Map File', 'Map Library', 'Map View of File', 'Modify File', 'Modify Named Pipe', 'Modify Process', 'Modify Service', 'Modify Registry Key', 'Modify Registry Key Value', 'Monitor Registry Key', 'Move File', 'Open File', 'Open File Mapping', 'Open Mutex', 'Open Port', 'Open Process', 'Open Registry Key', 'Open Service', 'Open Service Control Manager', 'Protect Virtual Memory', 'Query Disk Attributes', 'Query DNS', 'Query Process Virtual Memory', 'Queue APC in Thread', 'Read File', 'Read From Named Pipe', 'Read From Process Memory', 'Read Registry Key Value', 'Receive Data on Socket', 'Receive Email Message', 'Release Mutex', 'Rename File', 'Revert Thread to Self', 'Send Control Code to File', 'Send Control Code to Pipe', 'Send Control Code to Service', 'Send Data on Socket', 'Send Data to Address on Socket', 'Send DNS Query', 'Send Email Message', 'Send ICMP Request', 'Send Reverse DNS Query', 'Set File Attributes', 'Set NetBIOS Name', 'Set Process Current Directory', 'Set Process Environment Variable', 'Set System Global Flags', 'Set System Host Name', 'Set System Time', 'Set Thread Context', 'Show Window', 'Shutdown System', 'Sleep Process', 'Sleep System', 'Start Service', 'Unload Driver', 'Unlock File', 'Unmap File', 'Unload Module', 'Upload File', 'Write to File', 'Write to Process Virtual Memory'). Creating VocabString instance instead.
  warnings.warn(warning)
>>> a.name.xsi_type = "maecVocabs:FileActionNameVocab-1.1"
>>> print a.to_xml(include_namespaces=False)
<cybox:ActionType>
    <cybox:Name xsi:type="maecVocabs:FileActionNameVocab-1.1">foobar</cybox:Name>
</cybox:ActionType>

While this gets around the breaking code issue, it also kinda guides users towards creating CybOX that lacks schematic conformance. I dunno, what do you think? @gtback, any thoughts?

@bworrell
Copy link
Contributor Author

I created a force-vocabstring branch for python-stix as well that mirrors this behavior, though I'm not sure I like it.

@ikiril01, one thing I overlooked as I began to write code was that the error you mentioned will not occur if we use VocabField on MAEC VocabString fields and generate all those VocabString subclasses for the MAEC default CVs.

I played around with python-maec and python-cybox (in the vocabstring branch) to test this out and added this to maec.bundle.malware_action:

class FileActionName(vocabs.VocabString):
    _XSI_TYPE = "maecVocabs:FileActionNameVocab-1.1"
    _namespace = "http://maec.mitre.org/default_vocabularies-1"
    _VOCAB_VERSION = "1.1"
    _ALLOWED_VALUES = (
        'create file',
        'other stuff...'
    )


class MalwareAction(Action):
    _binding = bundle_binding
    _binding_class = bundle_binding.MalwareActionType
    _namespace = _namespace

    implementation = cybox.TypedField("Implementation", ActionImplementation)
    name = vocabs.VocabField("Name", FileActionName)  # Override the Action.name field

    def __init__(self):
        super(MalwareAction, self).__init__()
        self.id_ =  maec.utils.idgen.create_id(prefix="action")

Then I was able to run your example:

>>> from maec.bundle import MalwareAction
>>> ma = MalwareAction()
>>> ma.name = 'create file'
>>> ma.name.xsi_type = "maecVocabs:FileActionNameVocab-1.1"  # redundant now
>>> print type(ma.name)
<class 'maec.bundle.malware_action.FileActionName'>
>>> print ma.name
create file
>>> print ma.name.xsi_type
maecVocabs:FileActionNameVocab-1.1

Setting a non-default CV term using a string:

>>> ma.name = "non-default value"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bworrell/projects/python-cybox/cybox/common/vocabs.py", line 43, in __set__
    value = type_(value)
  File "/Users/bworrell/projects/python-cybox/cybox/common/vocabs.py", line 64, in __init__
    self.value = value
  File "/Users/bworrell/projects/python-cybox/cybox/common/vocabs.py", line 83, in value
    raise ValueError(error)
ValueError: Value must be one of ('create file', 'other stuff...'). Received 'non-default value'

Setting a non-default CV term using a VocabString object

>>> from cybox.common import VocabString
>>> ma.name = VocabString('non-default value')
>>> print type(ma.name)
<class 'cybox.common.vocabs.VocabString'>
>>> print ma.name
non-default value

So maybe this "warn and fall back to VocabString" approach isn't necessary? I definitely like the approach of raising a ValueError when invalid (string) terms are passed in, since users can pass in custom (or non-default) VocabString objects as well.

@ikiril01
Copy link
Member

@bworrell thanks for the additional mulling/testing! I'm in agreement with your most recent comment - I think the "warn and fall back" approach isn't necessary if we have the ability to pass in VocabString objects when specifying custom terms. Raising the ValueError as the default behavior is useful for ensuring that a correct vocabulary item is specified, and subsequently better ensures interoperability of our content. We'll just want to be sure to document this behavior somewhere (probably in RTD?).

@bworrell
Copy link
Contributor Author

@ikiril01, we have some docs up on stix.readthedocs.org dealing with controlled vocabularies that may cover everything we want: http://stix.readthedocs.org/en/stable/examples/index.html#controlled-vocabularies-vocabstring

@gtback
Copy link
Contributor

gtback commented Apr 15, 2015

I've been thinking about this some, but I'm still not sure what I think. 🐑 I'll try to look more tomorrow.


def __set__(self, instance, value):
"""Overrides cybox.TypedField.__set__()."""
type_ = self.__vocab_impl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this variable, since it's different than self.type_. Maybe just vocab?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't see any issue with that.

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

I was initially a bit hesitant about this (mostly because I hate the amount of code that it adds), but I'm coming around. I have a few questions about the layout of modules and classes, and for some reason lines like this:

from cybox.common import vocabs, HexBinary, String, VocabString

bother me (importing classes from a module along with a submodule on the same line).

It would be great to keep some of the documentation from this thread on RTD somewhere. A lot is focused on developers of python-cybox, not developers using it, but an explanation of when to use an "established" vocab class vs. the base VocabString class vs. defining and registering a custom vocab would be helpful.

@bworrell
Copy link
Contributor Author

@gtback, have you seen the python-stix RTD stuff for CVs? http://stix.readthedocs.org/en/stable/examples/index.html#controlled-vocabularies-vocabstring

I had planned on adding similar documentation to python-cybox, so maybe I should write up something similar and include it in the PR. What do you think?

@bworrell
Copy link
Contributor Author

Also, I didn't like the mixing of module and class imports either! I just did it because I liked:

class Foo(cybox.Entity):
   foo = cybox.TypedField(...)
   bar = vocabs.VocabField(...)  # I like the namespace alongside "cybox".

Better than

class Foo(cybox.Entity):
   foo = cybox.TypedField(...)
   bar = VocabField(...)  #  no namespace. looks weird to me :)

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

I made so many comments that now I'm losing track of them and your responses. Yikes!

  1. Yes, I've seen that documentation before, but it's been a while (and definitely didn't see you mention it in this PR. Oops!). It would be great to have something like this for CybOX, but doesn't necessarily need to be part of this PR. I would also have it as it's own top-level page rather than under "Examples", but that's just me.

  2. I'll handle alphabetizing the terms (vim FTW!). I'll push to this branch.

  3. Regarding imports and module organization, let's just leave them the way they are now (to match what's in python-stix). Ultimately I'll want to get rid of the prefixes, so just have the following, where Entity, TypedField, and VocabField will all come from mixbox, but it's not worth the change at this point.

    class Foo(Entity):
        foo = TypedField(...)
        bar = VocabField(...) 
  4. I want to look at the from_dict thing a bit more, and also tweaking the constructor of VocabField.

I'll add a checklist to the top of this issue so I don't forget things.

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

I also have this crazy idea of calculating _ALLOWED_VALUES off of every class attribute containing "TERM", but I'll work on that separately!

@bworrell
Copy link
Contributor Author

@gtback, haha. I wonder if it'd be easier to go the other way (dynamically generate TERM_FOO attributes from the _ALLOWED_VALUES). That's basically what my script does which produced all those subclasses :)

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.06% when pulling d37f355 on vocabstring into 6e3811a on master.

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

It's more that I don't want string literals to be duplicated. It would be harder to come up with a valid python identifier (not terrible, as I'm sure your script does it) if there are punctuation characters in the the value. It's also easier to auto-generate documentation that says "here are the term constants you should use".

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

I pushed a commit (b6b69d0) that tries to change the constructor of the VocabField class. If we like it, I'll merge it into this branch.

EDIT: I was getting lots of weird ImportErrors when doing that, so I had to drop the elegant vocabs.VocabField in a couple places.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.06% when pulling cfe4e9d on vocabstring into 6e3811a on master.

@gtback
Copy link
Contributor

gtback commented Apr 16, 2015

I'm going to get rid of the vocabstring-constructor branch. The rest of the stuff is either done or has had separate issues filed. Merging!

gtback added a commit that referenced this pull request Apr 16, 2015
@gtback gtback merged commit 08159dd into master Apr 16, 2015
@gtback gtback deleted the vocabstring branch April 16, 2015 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants