-
Notifications
You must be signed in to change notification settings - Fork 42
Parsing YARA files
You can either parse YARA file from your filesystem or memory buffer. Just use yaramod::parseFile
function and provide it with either string representation of file path or input stream with the contents of YARA file. This function returns you either valid pointer to parsed YARA file or nullptr
in case of failure.
Parsing from filesystem:
std::string filePath = "/home/.../file.yar";
auto yaraFile = yaramod::parseFile(filePath);
Parsing from memory buffer:
std::istringstream input("rule xxx { ... }");
auto yaraFile = yaramod::parseFile(input);
Failure during the parsing produces error on standard error output (std::cerr
) but you can override this behavior by providing the second parameter to yaramod::parseFile
with either file path or output stream and all error messages are going to be printed there.
Output errors to file:
std::string errorLog = "/home/.../error.log";
auto yaraFile = yaramod::parseFile(input, errorLog);
Output errors to memory buffer:
std::ostringstream errorLog;
auto yaraFile = yaramod::parseFile(input, errorLog);
YARA language supports inclusion of other files on the filesystem. Path provided in include
directive is always relative to the YARA file on the disc. Since yaramod
can also parse files from memory, relative paths are only allowed when parsing from the actual file.
Whenever yaramod
runs into include
, it takes the content of included file and starts parsing it as if it was in place of an include
. Therefore, included content is indistinguishable from all other content in the file.
Same as with the original YARA, whenever you want to use functions from available modules, you need to import
it. This way, we retain the compatibility with the original YARA compiler. In order to check what modules are imported, you can use YaraFile::getModules()
method, which returns you a std::vector
of pointer to yaramod::Module
.
All imported modules:
for (const auto& module : yaraFile->getImports())
std::cout << module->getName() << '\n';
Rules of the YARA file can be obtained with the method YaraFile::getRules()
. Rules are always ordered as in the input file (including rules from the included files). Each rule is represented with yaramod::Rule
object.
All rules in the file:
for (const auto& rule : yaraFile->getRules())
std::cout << rule->getName() << '\n';
Meta information are represented using yaramod::Meta
. Each meta contain name of the meta value and value in form of yaramod::Literal
, which is either integer, string or boolean. To obtain the printable representation of literal, use Literal::getPureText()
method. There is also a method Literal::getText()
which returns the textual representation for YARA file. That means no change for integer value, however boolean value is dumped using std::boolalpha
as true
or false
, and string value is enclosed in double-quotes and all escape sequences in it.
All meta information together with its type:
for (const auto& meta : rule->getMetas()) {
if (meta->getValue()->isString())
std::cout << "String meta: ";
else if (meta->getValue()->isInt())
std::cout << "Int meta: ";
else if (meta->getValue()->isBool())
std::cout << "Bool meta: ";
std::cout << meta->getName() << " = " << meta->getValue()->getPureText() << '\n';
}
Obtaining specific meta:
if (auto meta = rule->getMetaWithName("my_meta_value")) {
// ...
}
Strings are always in the order they occur in the file. Abstract class yaramod::String
is used as base for each string, which is then one of specialized classes - yaramod::PlainString
, yaramod::HexString
or yaramod::Regexp
.
Dump all strings and their type:
for (const auto& string : rule->getStrings()) {
if (string->isPlain())
std::cout << "Plain string: ";
else if (string->isHex())
std::cout << "Hex string: ";
else if (string->isRegexp())
std::cout << "Regexp: ";
std::cout << string->getIdentifier() << " = " << string->getText() << '\n';
}
Strings, as literals, also have method getPureText()
which returns pure content of the string without any modifiers.
Modifiers for strings are ascii
, wide
, fullword
and nocase
. These can only be associated with plain strings, but are available for all types for possible further extensions of YARA language. Even though hex string and regular expressions are parsed into their own smaller ASTs, currently there is no way to traverse them. The only provided interface is to get the textual representation of them.
Conditions consists of expressions, which form another smaller AST inside YARA file. This AST can be traversed using visitor design-pattern.
The list of all available expression types:
-
StringExpression
- reference to string instrings
section ($a01
,$a02
,$str
) -
StringWildcardExpression
- reference to multiple strings using wildcard ($a*
,$*
) -
StringAtExpression
- refers to$str at <offset>
-
StringInRangeExpression
- refers to$str in (<offset1> .. <offset2>)
-
StringCountExpression
- refernce to number of matched string of certain string identifier (#a01
,#str
) -
StringOffsetExpression
- reference to first match offset (or Nth match offset) of string identifier (@a01
,@a01[N]
) -
StringLengthExpression
- reference to length of first match (or Nth match) of string identifier (!a01
,!a01[N]1
)
All of these provide method getOperand()
to return operand of an expression.
-
NotExpression
- refers to logicalnot
operator (!(@str > 10)
) -
UnaryMinusExpression
- refers to unary-
operator (-20
) -
BitwiseNotExpression
- refers to bitwise not (~uint8(0x0)
)
All of these provide methods getLeftOperand()
and getRightOperand()
to return both operands of an expression.
-
AndExpression
- refers to logicaland
($str1 and $str2
) -
OrExpression
- refers to logicalor
($str1 or $str2
) -
LtExpression
- refers to<
operator ($str1 < $str2
) -
GtExpression
- refers to>
operator ($str1 > $str2
) -
LeExpression
- refers to<=
operator (@str1 <= $str2
) -
GeExpression
- refers to>=
operator (@str1 >= @str2
) -
EqExpression
- refers to==
operator (!str1 == !str2
) -
NeqExpression
- refers to!=
operator (!str1 != !str2
) -
ContainsExpression
- refers tocontains
operator (pe.sections[0] contains "text"
) -
MatchesExpression
- refers tomatches
operator (pe.sections[0] matches /(text|data)/
) -
PlusExpression
- refers to+
operator (@str1 + 0x100
) -
MinusExpression
- refers to-
operator (@str1 - 0x100
) -
MultiplyExpression
- refers to*
operator (@str1 * 0x100
) -
DivideExpression
- refers to/
operator (@str1 / 0x100
) -
ModuloExpression
- refers to%
operator (@str1 % 0x100
) -
BitwiseXorExpression
- refers to^
operator (uint8(0x10) ^ uint8(0x20)
) -
BitwiseAndExpression
- refers to&
operator (pe.characteristics & pe.DLL
) -
BitwiseOrExpression
- refers to|
operator (pe.characteristics | pe.DLL
) -
ShiftLeftExpression
- refers to<<
operator (uint8(0x10) << 2
) -
ShiftRightExpression
- refers to>>
operator (uint8(0x10) >> 2
)
All of these provide method getVariable()
to return variable used for iterating over the set of values (can also be any
or all
), getIteratedSet()
to return an iterated set (can also be them
) and getBody()
to return the body of a for expression. For OfExpression
, getBody()
always returns nullptr
.
-
ForIntExpression
- refers tofor
which operates on set of integers (for all i in (1 .. 5) : ( ... )
) -
ForStringExpression
- refers tofor
which operates on set of string identifiers (for all of ($str1, $str2) : ( ... )
) -
OfExpression
- refers toof
(all of ($str1, $str2)
)
All of these provide method getSymbol()
to return symbol of an associated identifier.
-
IdExpression
- refers to identifier (rule1
,pe
) -
StructAccessExpression
- refers to.
operator for accessing structure memebers (pe.number_of_sections
) -
ArrayAccessExpression
- refers to[]
operator for accessing items in arrays (pe.sections[0]
) -
FunctionCallExpression
- refers to function call (pe.exports("ExitProcess")
)
-
BoolLiteralExpression
- refers totrue
orfalse
-
StringLiteralExpression
- refers to any sequence of characters enclosed in double-quotes ("text"
) -
IntLiteralExpression
- refers to any integer value be it decimal, hexadecimal or with multipliers (KB
,MB
) (42
,-42
,0x100
,100MB
) -
DoubleLiteralExpression
- refers to any floating point value (72.0
,-72.0
)
-
FilesizeExpression
- refers to keywordfilesize
-
EntrypointExpression
- refers to keywordentrypoint
-
AllExpression
- refers to keywordall
-
AnyExpression
- refers to keywordany
-
ThemExpression
- refers to keywordthem
-
SetExpression
- refers to set of either integers or string identifiers ((1,2,3,4,5)
,($str*,$1,$2)
) -
RangeExpression
- refers to range of integers ((0x100 .. 0x200)
) -
ParenthesesExpression
- refers to expression enclosed in parentheses (((5 + 6) * 30)
) -
IntFunctionExpression
- refers to special built-in functions(u)int(8|16|32)
(uint16(<offset>)
) -
RegexpExpression
- refers to regular expression (/<regexp>/<mods>
)
Here is a small example how to dump all function calls in condition:
class FunctionCallDumper : public yaramod::ObservingVisitor {
public:
void visit(FunctionCallExpression* expr) override {
std::cout << "Function call: " << expr->getFunction()->getText() << '\n';
// Visit arguments because they can contain nested function calls
for (auto& param : expr->getArguments())
param->accept(this);
}
};