Table of Contents
(from https://xkcd.com/327/)
1. SQL injection example
1.1. Setup and sample run
# Use a simple headline prompt PS1=' \033[32m---- SQL injection demo ----\[\033[33m\033[0m\] $?:$ ' # Build ./build.sh # Prepare db ./admin -r ./admin -c ./admin -s # Add regular user interactively ./add-user 2>> users.log First User # Regular user via "external" process echo "User Outside" | ./add-user 2>> users.log # Check ./admin -s # Add Johnny Droptable ./add-user 2>> users.log Johnny'); DROP TABLE users; -- # And the problem: ./admin -s # Check the log tail users.log
1.2. Identify the problem
./add-user
is reading from STDIN
, and writing to a database; looking at the code in
./add-user.c leads to
count = read(STDIN_FILENO, buf, BUFSIZE - 1);
for the read and
rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);
for the write.
This problem is thus a dataflow problem; in codeql terminology we have
- a source at the
read(STDIN_FILENO, buf, BUFSIZE - 1);
- a sink at the
sqlite3_exec(db, query, NULL, 0, &zErrMsg);
We write codeql to identify these two, and then connect them via
- a dataflow configuration – for this problem, the more general taintflow configuration.
1.3. Build codeql database
To get started, build the codeql database (adjust paths to your setup):
# Build the db with source commit id. # export PATH=$HOME/local/vmsync/codeql250:"$PATH" SRCDIR=$(pwd) DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD) echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" cd $SRCDIR && codeql database create --language=cpp -s . -j 8 -v $DB --command='./build.sh'
Then add this database directory to your VS Code DATABASES
tab.
1.4. Build codeql database in steps
For larger projects, using a single command to build everything is costly when any part of the build fails.
To build a database in steps, use the following sequence, adjusting paths to your setup:
# Build the db with source commit id. export PATH=$HOME/local/vmsync/codeql250:"$PATH" SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD) # Check paths echo $DB echo $SRCDIR # Prepare db directory test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" # Run the build cd $SRCDIR codeql database init --language=cpp -s . -v $DB # Repeat trace-command as needed to cover all targets codeql database trace-command -v $DB -- make codeql database finalize -j4 $DB
Then add this database directory to your VS Code DATABASES
tab.
1.5. Develop the query bottom-up
Identify the source part of the
read(STDIN_FILENO, buf, BUFSIZE - 1);
expression, the
buf
argument. Start from afrom..where..select
, then convert to a predicate.Identify the sink part of the
sqlite3_exec(db, query, NULL, 0, &zErrMsg);
expression, the
query
argument. Again start fromfrom..where..select
, then convert to a predicate.Fill in the taintflow configuration boilerplate
class CppSqli extends TaintTracking::Configuration { CppSqli() { this = "CppSqli" } override predicate isSource(DataFlow::Node node) { none() } override predicate isSink(DataFlow::Node node) { none() } }
Note that an inout-argument in C/C++ (the
buf
pointer is passed toread
and points to updated data after the return) is accessed as a codeql source viasource.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr()
instead of the usual
source.asExpr()
The final query (without isAdditionalTaintStep
) is
/** * @name SQLI Vulnerability * @description Using untrusted strings in a sql query allows sql injection attacks. * @kind path-problem * @id cpp/SQLIVulnerable * @problem.severity warning */ import cpp import semmle.code.cpp.dataflow.TaintTracking import DataFlow::PathGraph class SqliFlowConfig extends TaintTracking::Configuration { SqliFlowConfig() { this = "SqliFlow" } override predicate isSource(DataFlow::Node source) { // count = read(STDIN_FILENO, buf, BUFSIZE); exists(FunctionCall read | read.getTarget().getName() = "read" and read.getArgument(1) = source.(DataFlow::PostUpdateNode).getPreUpdateNode().asExpr() ) } override predicate isSink(DataFlow::Node sink) { // rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg); exists(FunctionCall exec | exec.getTarget().getName() = "sqlite3_exec" and exec.getArgument(1) = sink.asExpr() ) } } from SqliFlowConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink where conf.hasFlowPath(source, sink) select sink, source, sink, "Possible SQL injection"
1.6. Optional: sarif file review of the results
Query results are available in several output formats using the cli. The following produces the sarif format, a json-based result description.
# The setup information from before export PATH=$HOME/local/vmsync/codeql250:"$PATH" SRCDIR=$HOME/local/codeql-training-material.cpp-sqli/cpp/codeql-dataflow-sql-injection DB=$SRCDIR/cpp-sqli-$(cd $SRCDIR && git rev-parse --short HEAD) # Check paths echo $DB echo $SRCDIR # To see the help codeql database analyze -h # Run a query codeql database analyze \ -v \ --ram=14000 \ -j12 \ --rerun \ --search-path ~/local/vmsync/ql \ --format=sarif-latest \ --output cpp-sqli.sarif \ -- \ $DB \ $SRCDIR/SqlInjection.ql # Examine the file in an editor edit cpp-sqli.sarif
An example of using the sarif data is in the the jq script ./sarif-summary.jq. When run against the sarif input via
jq --raw-output --join-output -f sarif-summary.jq < cpp-sqli.sarif > cpp-sqli.txt
it produces output in a form close to that of compiler error messages:
query-id: message line Path ... Path ...