Table of Contents
1. SQL injection example
The primary purpose of this material is to serve as an introduction to CodeQL for Python. The section Reading Order describes the basic CodeQL queries to go through.
The Python code is intentionally trivial and mirrors the structure of production code. Further, the steps needed for CodeQL setup in production CI/CD pipelines have identical structure to the one shown here.
Thus, it is expedient to illustrate intermediate and advanced topics here as well. The section Additional Topics does this by illustrating descriptions from the CodeQL documentation using the Python code in this repository. Thus, the Additional Topics serve as additional examples to parts of the documentation. These additional topics should be read in the order of appearance.
For system administration and devops, the section CodeQL setup provides a rudimentary guide.
1.1. CodeQL setup
It's best to have a full development setup for CodeQL on your laptop/desktop. This requires you to
- download VS Code
- install the CodeQL extension. Instructions on how to do that found here: https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/
install a CodeQL binary (containing CodeQL CLI) for whichever platform you are on and unpack that
The binary for 2.13.5 is found here: https://github.com/github/codeql-cli-binaries/releases/tag/v2.13.5
See script below.
(recommended for browsing) Install the codeql standard library matching the binary version. This is not needed to write or run queries anymore, but the library has many examples and searching it is much easier after extracting this archive: https://github.com/github/codeql/releases/tag/codeql-cli%2Fv2.13.5
See script below.
clone this repository.
mkdir ~/local && cd ~/local && \ git clone https://github.com/hohn/codeql-dataflow-sql-injection-python.git
open the workspace directory in VS Code. This should just be
cd ~/local/codeql-dataflow-sql-injection-python code python-sqli.code-workspace
add the downloaded CodeQL CLI to the VS Code's search path. Find the CodeQL extension settings, then paste the full path to the CodeQL CLI into the
Code QL > Cli: Executable Path
field.
install the pack dependencies for the CLI. In a shell, use
cd ~/local/codeql-dataflow-sql-injection-python codeql pack install
install the pack dependencies VS Code. Do this via
command palette
and then select all listed by
CodeQL: Install Pack Dependencies
It will generate a
codeql-pack.lock.yml
file.use the following to build a CodeQL database.
#* Build the db with source commit id. codeql --version : CodeQL command-line toolchain release 2.13.5. cd ~/local/codeql-dataflow-sql-injection-python DB=./python-sqli-db-$(git rev-parse --short HEAD) echo $DB test -d "$DB" && rm -fR "$DB" mkdir -p "$DB" codeql database create --language=python -s . -j 8 -v $DB
- add the database to the editor. To do this there is a widget on the left side of editor that looks like QL and after selecting that, there is a databases panel. There are options to select from archive or folder. Select the "from folder" option and add the "database" folders you created above.
open the query
trivial.ql
and run it viaright click > run query on selected database
There are several ways to install the CodeQL binaries and libraries. Here is a shell script to do it one way
# grab -- retrieve and extract codeql cli and library # Usage: grab version platform prefix grab() { version=$1; shift platform=$1; shift prefix=$1; shift mkdir -p $prefix/codeql-$version && cd $prefix/codeql-$version || return # Get cli wget "https://github.com/github/codeql-cli-binaries/releases/download/$version/codeql-$platform.zip" # Get lib wget "https://github.com/github/codeql/archive/refs/tags/codeql-cli/$version.zip" # Fix attributes if [ `uname` = Darwin ] ; then xattr -c *.zip fi # Extract unzip -q codeql-$platform.zip unzip -q $version.zip # Rename library directory for VS Code mv codeql-codeql-cli-$version/ ql # remove archives? # rm codeql-$platform.zip # rm $version.zip } # Try: grab v2.13.5 osx64 $HOME/local/xefm grab v2.13.5 linux64 $HOME/local/xefm ls $HOME/local/xefm/codeql-v2.13.5/ : codeql/ codeql-osx64.zip ql/ v2.13.5.zip
1.2. Sample Application Setup and Run
Execute the following in a bourne-style shell, one block at a time to see results. This requires a working Python installation and a POSIX shell.
# Prepare db ./admin -r ./admin -c ./admin -s # Add regular user ./add-user.py 2>> log First User # Check ./admin -s # Add Johnny Droptable ./add-user.py 2>> log Johnny'); DROP TABLE users; -- # See the problem: ./admin -s # Check the log tail log
1.3. Reading Order
The queries introduce CodeQL concepts and should be read bottom-up in this order:
source.ql
: introduces Value, ControlFlowNode and DataFlow::Node.sink.ql
: introduces AstNode.TaintFlowTemplate.ql
: introduce the taint flow template.TaintFlow.ql
: taint flow with endpoints only, using a class. This is the old way, but it still works and is a good introduction to using classes – not writing them.TaintFlowPath.ql
: taint flow with full path. Again, the old way.TaintFlowWithModule.ql
: taint flow with endpoints only, using modules. The way forward.TaintFlowPathQueryWithModule.ql
: taint flow with full path, using modules.
1.4. Note on the Python code
The Python call
conn.execute(query)
to sqlite3 only allows one statement and produces an exception:
sqlite3.Warning: You can only execute one statement at a time.
This makes it safer than the raw
sqlite3_exec()
or Python's
conn.executescript
For this tutorial, we use the multi-statement executescript()
call.
2. Additional Topics
This repository and its source code are used to illustrate some additional topics from the CodeQL Python documentation.
2.1. Running analysis from the command line
Use the following to run analysis manually.
cd ~/local/codeql-dataflow-sql-injection-python DB=./python-sqli-db-$(git rev-parse --short HEAD) codeql database analyze --format=sarif-latest --rerun \ --output StdLibExtPath.sarif \ -j8 \ -- $DB StdLibExtPath.ql
2.2. Dataflow in Python
https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-python/
Using and extending the CodeQL standard library:
StdLibPlain.ql Illustrates using the CodeQL standard library's
RemoteFlowSource
StdLibExt.ql Illustrates extension of the CodeQL standard library via
class SqlAccess extends FileSystemAccess::Range ...
and
class TerminalInput extends RemoteFlowSource::Range ...
Various data flow / taint flow examples from the documentation, modified as needed:
- using-local-data-flow.ql
- using-local-sources.ql
- using-local-taint-tracking.ql
2.3. API graphs
https://codeql.github.com/docs/codeql-language-guides/using-api-graphs-in-python/
API graphs are a uniform interface for referring to functions, classes, and methods defined in external libraries.
ApiGraphs.ql
: various sample queries
2.4. Type Tracking
Documentation for JavaScript, also applicable here: https://codeql.github.com/docs/codeql-language-guides/using-type-tracking-for-api-modeling/#using-type-tracking-for-api-modeling
The files
- sqlite-info.py
- TypeTracking.ql
use type tracking. From the docs: You can track data through an API by creating a model using the CodeQL type-tracking library. The type-tracking library makes it possible to track values through properties and function calls.
The file
TypeTrackingWithData.ql
goes further. From the docs: The type-tracking library makes it possible to track values through properties and function calls. Here, we also track some associated data. See https://codeql.github.com/docs/codeql-language-guides/using-type-tracking-for-api-modeling/#tracking-associated-data
2.5. Flow State
The query TaintFlowPathQueryWithSanitizer.ql
illustrates using a flow-state
representing whether user input has been sanitized.
It introduces ADTs via the newtype
declaration of TInputSanitizationState
.