These are notes used to develop the slides in ./operational-view.key and its pdf version. They may be handy for running the examples, and are a simpler read than the slides.
A diagram view of the use of codeql is in ./notes/codeql-build.drawio. To edit / view / print these, use the open-source version of drawio. It can be used in the browser or downloaded. For simpler viewing, a pdf is provided.
Think Compiler (C) with library:
# Prepare System
./admin -c
# Convert data if needed
cat users.txt
# Edit your code
edit add-user.c
# Compile & run your code
clang -Wall add-user.c -lsqlite3 -o add-user
for user in `cat input.txt` ; do echo "$user" | ./add-user 2>> users.log ; done
# Examine results
./admin -s
Think Compiler (CodeQL) with library:
# Prepare System
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
# Convert data if needed
SRCDIR=.
DB=add-user.db
cd $SRCDIR && \
codeql database create --language=cpp \
-s . -j 8 -v \
$DB \
--command='clang -Wall add-user.c -lsqlite3 -o add-user'
# Edit your code
edit SqlInjection.ql
# Compile & run your code
RESULTS=cpp-sqli.sarif
codeql database analyze \
-v --ram=14000 -j12 --rerun \
--search-path ~/local/vmsync/ql \
--format=sarif-latest \
--output=$RESULTS \
-- \
$DB \
$SRCDIR/SqlInjection.ql
# Examine results
# Plain text, look for
# "results" : [ {
# and
# "codeFlows" : [ {
edit $RESULTS
# Or
jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif | less
# Or use vs code's sarif viewer
# Or use the GHAS integration via actions
- IDEs: use vs code for full functionality, any lsp-using editor for completion/jump to source
- choose a repository layout that best fits your custom queries’ development model
- check for library X support in the ql/ library. Better yet, check for particular function names.
All of following ideas follow from one simple observation: The CodeQL CLI is a compiler and you should treat it as such
- ghas setup and integration are almost completely independent of query
customization; take advantage of this.
If you can build your code on your desktop/laptop/own server, you don’t have to wait for GHAS integration to produce codeql databases.
In fact, you should start on your desktop/laptop/server to find issues around the build: memory / thread requirements, ensuring the build system runs correctly when invoked from codeql, etc.
- use desktop-based code scanning earlier in workflow
- cli setup / analysis should be done as prototype for your github admins to work off
- customize scanning tools to actually get results:
- bug bounty programs
- known entry / exit points for services
- Just like your CI/CD pipeline encapsulates your compiler cli tools,
github and GHAS encapsulate the codeql cli tools.
So you can always think about what makes sense for the cli, try it there, and then update your GHAS workflow.
Q: Is the C standard library supported?
A: Much of it, typically from a conceptual level.
To find the supported APIs, search the =ql/= library source tree.
For example, for a top-down search start with cpp.qll
and notice the import
import semmle.code.cpp.commons.Printf
. Follow this to find the
=cpp.commons= module and see what it models:
Alloc.qll Dependency.qll NullTermination.qll StringAnalysis.qll
Assertions.qll Environment.qll PolymorphicClass.qll StructLikeClass.qll
Buffer.qll Exclusions.qll Printf.qll Synchronization.qll
CommonType.qll File.qll Scanf.qll VoidContext.qll
DateTime.qll NULL.qll Strcat.qll unix/
Q: Is library X supported?
A: If it is, you’ll find it in the =ql/= library source tree. A whole-tree
search, grep
-style, is easiest.
For example, to check support for sqlite:
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src
0:$ grep -l -R sqlite *
Security/CWE/CWE-313/CleartextSqliteDatabase.ql
Security/CWE/CWE-313/CleartextSqliteDatabase.c
semmle/code/cpp/security/Security.qll
So we have a query (.ql
) and a library (.qll
); look at both to get
some ideas:
Security/CWE/CWE-313/CleartextSqliteDatabase.ql
has some info in the header
/**
* @name Cleartext storage of sensitive information in an SQLite database
* @description Storing sensitive information in a non-encrypted
* database can expose it to an attacker.
*/
and a promising class:
class SqliteFunctionCall extends FunctionCall {
SqliteFunctionCall() { this.getTarget().getName().matches("sqlite%") }
Expr getASource() { result = this.getAnArgument() }
}
semmle/code/cpp/security/Security.qll
has some very promising entries
/**
* Extend this class to customize the security queries for
* a particular code base. Provide no constructor in the
* subclass, and override any methods that need customizing.
*/
class SecurityOptions extends string {
;;
predicate sqlArgument(string function, int arg) {
;;
// SQLite3 C API
function = "sqlite3_exec" and arg = 1
}
;;
/**
* The argument of the given function is filled in from user input.
*/
predicate userInputArgument(FunctionCall functionCall, int arg) {
;;
fname = "scanf" and arg >= 1
;;
}
;;
}
This is a library, so some sample uses would be nice. Another search via
grep -nH -R SecurityOptions *
docs/codeql/ql-training/cpp/global-data-flow-cpp.rst:59:The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data:
and an extension point:
cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll:16:class CustomSecurityOptions extends SecurityOptions
/**
* This class overrides `SecurityOptions` and can be used to add project
* specific customization.
*/
class CustomSecurityOptions extends SecurityOptions {...}
Q: How should we go about modeling our libraries with CodeQL?
A: Follow the way you use a C library, say sqlite3
. Your code includes only
sqlite3.h
; you use, but don’t care about, libsqlite3.a
.
Thus for CodeQL: don’t try to model the library internals, only model the parts of the API you actually use.
For other languages, you need also only model the exposed API.
Q: Should we use the most recent version of codeql at all times?
A: Follow the way you use your compiler. Do you use the most recent version of compiler at all times, or do you use a rolling release cycle?
To get your current version’s info:
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src
0:$ codeql --version
CodeQL command-line toolchain release 2.5.0.
Copyright (C) 2019-2021 GitHub, Inc.
Unpacked in: /Users/hohn/local/vmsync/codeql250
Analysis results depend critically on separately distributed query and
extractor modules. To list modules that are visible to the toolchain,
use 'codeql resolve qlpacks' and 'codeql resolve languages'.
You should match the CodeQL cli version to the CodeQL library version;
the library releases have codeql-cli/<VERSION>
tags to allow matching with
the binaries.
When using git for the library, you should check out the appropriate version via, e.g.,
cd $HOME/local/vmsync/ql && git checkout codeql-cli/v2.5.9