Table of Contents
1 An Operational View of CodeQL
These are notes used to develop the slides in ./operational-view.key and its pdf version. They may be handy for running the examples, and are a simpler read than the slides.
2 The codeql / C compiler comparison
2.0.1 sql injection problem, compiler view
Think Compiler (C) with library:
# Prepare System ./admin -c # Convert data if needed cat users.txt # Edit your code edit add-user.c # Compile & run your code clang -Wall add-user.c -lsqlite3 -o add-user for user in `cat input.txt` ; do echo "$user" | ./add-user 2>> users.log ; done # Examine results ./admin -s
2.0.2 sql injection problem, codeql view
Think Compiler (CodeQL) with library:
# Prepare System export PATH=$HOME/local/vmsync/codeql250:"$PATH" # Convert data if needed SRCDIR=. DB=add-user.db cd $SRCDIR && \ codeql database create --language=cpp \ -s . -j 8 -v \ $DB \ --command='clang -Wall add-user.c -lsqlite3 -o add-user' # Edit your code edit SqlInjection.ql # Compile & run your code RESULTS=cpp-sqli.sarif codeql database analyze \ -v --ram=14000 -j12 --rerun \ --search-path ~/local/vmsync/ql \ --format=sarif-latest \ --output=$RESULTS \ -- \ $DB \ $SRCDIR/SqlInjection.ql # Examine results # Plain text, look for # "results" : [ { # and # "codeFlows" : [ { edit $RESULTS # Or jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif | less # Or use vs code's sarif viewer # Or use the GHAS integration via actions
2.1 Connecting to the compiler core
- IDEs: use vs code for full functionality, any lsp-using editor for completion/jump to source
- choose a repository layout that best fits your custom queries' development model (REFERENCE TO GITHUB REPO)
- check for library X support in the ql/ library. Better yet, check for particular function names.
2.2 Best practice CodeQL
2.3 Key Ideas
All of following ideas follow from one simple observation: The CodeQL CLI is a compiler and you should treat it as such
ghas setup and integration are almost completely independent of query customization; take advantage of this.
If you can build your code on your desktop/laptop/own server, you don't have to wait for GHAS integration to produce codeql databases.
In fact, you should start on your desktop/laptop/server to find issues around the build: memory / thread requirements, ensuring the build system runs correctly when invoked from codeql, etc.
- use desktop-based code scanning earlier in workflow
- cli setup / analysis should be done as prototype for your github admins to work off
- customize scanning tools to actually get results:
- bug bounty programs
- known entry / exit points for services
Just like your CI/CD pipeline encapsulates your compiler cli tools,
github and GHAS encapsulate the codeql cli tools.
So you can always think about what makes sense for the cli, try it there, and then update your GHAS workflow.
2.4 Some Q&A via compiler analogy
2.4.1
Q: Is the C standard library supported?
A: Much of it, typically from a conceptual level.
To find the supported APIs, search the ql/
library source tree.
For example, for a top-down search start with cpp.qll
and notice the import
import semmle.code.cpp.commons.Printf
. Follow this to find the
cpp.commons
module and see what it models:
Alloc.qll Dependency.qll NullTermination.qll StringAnalysis.qll Assertions.qll Environment.qll PolymorphicClass.qll StructLikeClass.qll Buffer.qll Exclusions.qll Printf.qll Synchronization.qll CommonType.qll File.qll Scanf.qll VoidContext.qll DateTime.qll NULL.qll Strcat.qll unix/
2.4.2
Q: Is library X supported?
A: If it is, you'll find it in the ql/
library source tree. A whole-tree
search, grep
-style, is easiest.
For example, to check support for sqlite:
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src 0:$ grep -l -R sqlite * Security/CWE/CWE-313/CleartextSqliteDatabase.ql Security/CWE/CWE-313/CleartextSqliteDatabase.c semmle/code/cpp/security/Security.qll
So we have a query (.ql
) and a library (.qll
); look at both to get
some ideas:
Security/CWE/CWE-313/CleartextSqliteDatabase.ql
has some info in the header
/** * @name Cleartext storage of sensitive information in an SQLite database * @description Storing sensitive information in a non-encrypted * database can expose it to an attacker. */
and a promising class:
class SqliteFunctionCall extends FunctionCall { SqliteFunctionCall() { this.getTarget().getName().matches("sqlite%") } Expr getASource() { result = this.getAnArgument() } }
semmle/code/cpp/security/Security.qll
has some very promising entries
/** * Extend this class to customize the security queries for * a particular code base. Provide no constructor in the * subclass, and override any methods that need customizing. */ class SecurityOptions extends string { ;; predicate sqlArgument(string function, int arg) { ;; // SQLite3 C API function = "sqlite3_exec" and arg = 1 } ;; /** * The argument of the given function is filled in from user input. */ predicate userInputArgument(FunctionCall functionCall, int arg) { ;; fname = "scanf" and arg >= 1 ;; } ;; }
This is a library, so some sample uses would be nice. Another search via
grep -nH -R SecurityOptions *
docs/codeql/ql-training/cpp/global-data-flow-cpp.rst:59:The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data:
and an extension point:
cpp/ql/src/semmle/code/cpp/security/SecurityOptions.qll:16:class CustomSecurityOptions extends SecurityOptions
/** * This class overrides `SecurityOptions` and can be used to add project * specific customization. */ class CustomSecurityOptions extends SecurityOptions {...}
2.4.3
Q: How should we go about modeling our libraries with CodeQL?
A: Follow the way you use a C library, say sqlite3
. Your code includes only
sqlite3.h
; you use, but don't care about, libsqlite3.a
.
Thus for CodeQL: don't try to model the library internals, only model the parts of the API you actually use.
For other languages, you need also only model the exposed API.
2.4.4
Q: Should we use the most recent version of codeql at all times?
A: Follow the way you use your compiler. Do you use the most recent version of compiler at all times, or do you use a rolling release cycle?
To get your current version's info:
hohn@gh-hohn ~/local/vmsync/ql/cpp/ql/src 0:$ codeql --version CodeQL command-line toolchain release 2.5.0. Copyright (C) 2019-2021 GitHub, Inc. Unpacked in: /Users/hohn/local/vmsync/codeql250 Analysis results depend critically on separately distributed query and extractor modules. To list modules that are visible to the toolchain, use 'codeql resolve qlpacks' and 'codeql resolve languages'.
You should match the CodeQL cli version to the CodeQL library version;
the library releases have codeql-cli/<VERSION>
tags to allow matching with
the binaries.
When using git for the library, you should check out the appropriate version via, e.g.,
cd $HOME/local/vmsync/ql && git checkout codeql-cli/v2.5.9