序言

春风不相识，何事入罗帏。

CodeQL Library for Java

最重要的五类Library：

程序元素：例如class , method
描述AST节点：例如statement , expression
描述元数据metadata：例如注解annotation 注释
计算指标的类：例如 cyclomatic complexity and coupling 例如圈复杂度和耦合程度
Classes for navigating the program’s call graph

程序元素Program element

为了表示class和method，QL提供了如下类：

Element
├── Package
├── CompilationUnit
├── Type
├── Method
├── Constructor
└── Variable

Callable类，它是Method和Constructor类的公共父类。

Type

拥有很多子类：

PrimitiveType ：描述Java语言中的8个基础类型 boolean byte char double float int long short 外带null+void
RefType：引用类型（也就是非基础类型），拥有如下子类：
- Class ：类Class
- Interface：接口
- EnumType：枚举
- Array：数组
- TopLevelType：表示在编译单元顶层声明的类型（声明类型）
- NestedType：在一个type内部声明的type
- TopLevelClass：表示在编译单元顶层声明的类
- NestedClass：在另一个type中声明的类
  - LocalClass：在一个method或者constructor中定义的类
  - AnonymousClass：匿名类

查询程序中的所有int类型的变量：

import java
from Variable v, PrimitiveType pt
where pt=v.getType() and
	pt.hasName("int")
select v

查询所有的声明类型和编译类型不同的类型：

import java
from TopLevelType tl
where tl.getName() != tl.getCompilationUnit().getName()
select tl

还有很多其他单例类：

TypeObject, TypeCloneable, TypeRuntime, TypeSerializable, TypeString, TypeSystem and TypeClass.

查询程序中的所有直接extends了Object类的嵌套类；

import java

from NestedClass nc
where nc.getASupertype() instanceof TypeObject
select nc

泛型Genetics

CodeQL设计了若干个Type子类负责处理Java中的泛型

1
2
3

GenericType
├── GenericInterface // 泛型接口
└── GenericClass // 泛型类

package java.util.;

public interface Map<K, V> {
    int size();

    // ...
}

TypeVariable 可以表示Map泛型接口中的K,V参数

如果现在有一个泛型类实现了Map泛型接口，Map<String, File> 的类型是ParameterizedType

寻找那些原形是Map的泛型类：

import java

from GenericInterface map, ParameterizedType pt
where map.hasQualifiedName("java.util", "Map") and
    pt.getSourceDeclaration() = map
select pt

但是通常泛型参数的类型会限制泛型类：

1
2
3

class StringToNumMap<N extends Number> implements Map<String, N> {
    // ...
}

限制了N的类型一定是Number类自身或者Number的子类，可以说Number类是N的upper bound。

getATypeBound：返回参数的type bound

TypeBound：type bound

举例，查询泛型类中的参数bound是Number的参数变量

import java

from TypeVariable tv, TypeBound tb
where tb = tv.getATypeBound() and
    tb.getType().hasQualifiedName("java.lang", "Number")
select tv

当处理未知泛型类型的遗留代码，每个泛型type都是一个raw version 没有任何类形参数

1
2
3

RawType
├── RawClass
└── RawInterface

同样有谓词getSourceDeclaration方法来获得对应的泛型类型。

例如我们寻找Map的泛型类型

import java
from Variable v, RawType rt
where rt = v.getType() and 
		rt.getSourceDeclaration().hasQualifiedName("java.util", "Map")
select v

1 2	Map m1 = new HashMap(); Map<String, String> m2 = new HashMap<String, String>();

但是只能找到m1，不能找到m2。

RawType不带任何参数，比如说Map

通配符？：wildcard type

1	Map<? extends Number, ? super Float> m;

WildcardTypeAccess：表示的是这两个通配符? extends Number和? super Float

Number是上界，Float是下届

getUpperBound：获取上界

getLowerBound：获取下界

当处理正常method的时候，

GenericMethod ，ParameterizedMethod， RawMethod这三个类都能适用于常规方法。

变量

Filed 代表Java类属性
LocalVariableDecl 代表局部变量
Parameter 代表方法（method, constructor）参数

AST ：Abstract syntax tree

AST中节点的成分，主要两类：

Stmt：语句
Expr：表达式

这两个类中也提供了一些成员谓词：

Expr.getAChildExpr 返回一个当前表达式的子表达式
Stmt.getAChild 返回直接嵌套在给定语句中的语句或者表达式
Expr.getParent and Stmt.getParent 返回一个AST节点的父节点

返回return stmt中的表达式：

import java
from Expr e
where e.getParent() instanceof ReturnStmt
select e

返回If stmt中的表达式：

import java

from Stmt s
where s.getParent() instanceof IfStmt
select s

这样会将if语句的then和else都找到。

返回所有方法体中的语句：

import java
from Stmt s
where s.getParent() instanceof Method
select s

Method-Stmt-Expr

CodeQL提供了两个类：ExprParent 和 StmtExpr

来表示Expr和Stmt的父节点

元数据 Metadata

元数据：描述数据的数据

Java语言有7种元数据，最出名的就是注解了（Annotation）。

CodeQL提供Annotatable类，作为所有可以被添加注解的程序元素的父类。例如：package、引用类型、field、method、constructor、local variable declaration。

Annotatable类的谓词getAnAnnotation可以返回程序元素被添加的注解信息。

当然，Annotation类的类型就是AnnotationType：

import java

from Constructor c, Annotation ann, AnnotationType anntp
where ann = c.getAnAnnotation() and
    anntp = ann.getType() and
    anntp.hasQualifiedName("java.lang", "Deprecated")
select ann

返回private字段的JavaDoc注释内容：

import java

from Field f, Javadoc jdoc
where f.isPrivate() and
    jdoc = f.getDoc().getJavadoc()
select jdoc

JavaDoc将注释内容解析为JavadocElement节点树，可以使用成员谓词getAChild 和getParent 进行查询。

寻找@author标注在private字段的标签。

import java

from Field f, Javadoc jdoc, AuthorTag at
where f.isPrivate() and
    jdoc = f.getDoc().getJavadoc() and
    at.getParent+() = jdoc
select at

调用图 Call Graph

类Call可以代表method call、new expr、cons中的this or super。

A Callable 代表一个 method or constructor。

For example, the following query finds all calls to methods called println:

import java

from Call c, Method m
where m = c.getCallee() and
    m.hasName("println")
select c

相反的，Callable.getAReference返回一个Call。

比如找到那些从来没有被调用的cons和methods。

import java
from Callable c
where not exists(c.getAReference())
select c

Analyzing data flow in Java

local data flow / global data flow / taint tracking

Local data flow

方法内的数据流分析。

CodeQL为局部数据流分析提供的模块是DataFlow。

DataFlow模块定义了Node类，表示数据可以流经的类。

Node最常用的两个子类是ExprNode和ParameterNode。

You can map between data flow nodes and expressions/parameters using the member predicates asExpr and asParameter:

可以在数据流节点和表达式/参数之间进行映射：

class Node {
  /** Gets the expression corresponding to this node, if any. */
  Expr asExpr() { ... }

  /** Gets the parameter corresponding to this node, if any. */
  Parameter asParameter() { ... }

  ...
}

/**
 * Gets the node corresponding to expression `e`.
 */
ExprNode exprNode(Expr e) { ... }

/**
 * Gets the node corresponding to the value of parameter `p` at function entry.
 */
ParameterNode parameterNode(Parameter p) { ... }

如果数据流直接在两个node之间传递，谓词localFlowStep派上用途。

可以用闭包来使用：localFlowStep* 或者迭代使用谓词localFlow。

/**
 * Holds if data can flow in one local step from `node1` to `node2`.
 */
private predicate localFlowStep(NodeEx node1, NodeEx node2, Configuration config) {
  exists(Node n1, Node n2 |
    node1.asNode() = n1 and
    node2.asNode() = n2 and
    simpleLocalFlowStepExt(n1, n2) and
    not outBarrier(node1, config) and
    not inBarrier(node2, config) and
    not fullBarrier(node1, config) and
    not fullBarrier(node2, config)
  )
  or
  exists(Node n |
    config.allowImplicitRead(n, _) and
    node1.asNode() = n and
    node2.isImplicitReadNode(n, false)
  )
}

source -> sink 模板：

1	DataFlow::localFlow(DataFlow::parameterNode(source), DataFlow::exprNode(sink))

Local taint tracking

模块：TaintTracking

谓词：localTaintStep

1	localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)

迭代使用谓词：localTaint 或者直接上闭包localTaintStep*

模板：

1	TaintTracking::localTaint(DataFlow::parameterNode(source), DataFlow::exprNode(sink))

举例

寻找流向new FileReader(..)的fileName，也就是第0个参数

import java

from Constructor fileReader, Call call
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader
select call.getArgument(0)

上面只能找到参数中的表达式，找不到数据流的流向关系。

附加上本地数据流，具体了一些：

import java
import semmle.code.java.dataflow.DataFlow

from Constructor fileReader, Call call, Expr src
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader and
  DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0)))
select src

寻找公共参数：

import java
import semmle.code.java.dataflow.DataFlow

from Constructor fileReader, Call call, Parameter p
where
  fileReader.getDeclaringType().hasQualifiedName("java.io", "FileReader") and
  call.getCallee() = fileReader and
  DataFlow::localFlow(DataFlow::parameterNode(p), DataFlow::exprNode(call.getArgument(0)))
select p

This query finds calls to formatting functions where the format string is not hard-coded：

import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.StringFormat

from StringFormatMethod format, MethodAccess call, Expr formatString
where
  call.getMethod() = format and
  call.getArgument(format.getFormatStringIndex()) = formatString and
  not exists(DataFlow::Node source, DataFlow::Node sink |
    DataFlow::localFlow(source, sink) and
    source.asExpr() instanceof StringLiteral and
    sink.asExpr() = formatString
  )
select call, "Argument to String format method isn't hard-coded."

Global data flow

more powerful than local data flow, less precise than local data flow, need more time and memory to perform.

Using global data flow

对DataFlow::Configuration进行继承，例如：

import semmle.code.java.dataflow.DataFlow

class MyDataFlowConfiguration extends DataFlow::Configuration {
  MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    ...
  }

  override predicate isSink(DataFlow::Node sink) {
    ...
  }
}

override内部定义的谓词；

isSource 从哪流来 from
isSink 流向哪里to
isBarrier 可选限制数据流
isAdditionalFlowStep 可选，添加额外的流程步骤

可以使用hasFlow谓词去寻找数据流：

1
2
3

from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()

Using global taint tracking

对TaintTracking::Configuration进行继承：

import semmle.code.java.dataflow.TaintTracking

class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
  MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    ...
  }

  override predicate isSink(DataFlow::Node sink) {
    ...
  }
}

override内部定义的谓词；

isSource 污点数据从哪流来 from
isSink 污点数据流向哪里to
isBarrier 可选限制数据流
isAdditionalFlowStep 可选，添加额外的流程步骤

也可以使用hasFlow(DataFlow::Node source, DataFlow::Node sink)谓词去寻找数据流：

Flow sources

数据库流包含一些预定义的流源。semmle.code.java.dataflow.FlowSources包中定义的类RemoteFlowSource表示的是那些可以由远程用户控制的数据源，这对查找安全问题有很大帮助。

举例

寻找用户输入源：

import java
import semmle.code.java.dataflow.FlowSources

class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
  MyTaintTrackingConfiguration() {
    this = "..."
  }

  override predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }

  ...
}

四道练习题

Exercises 1

Write a query that finds all hard-coded strings used to create a java.net.URL, using local data flow.

import semmle.code.java.dataflow.DataFlow

from Constructor url, Call call, StringLiteral src
where 
	url.getDeclaringType().hasQualifiedName("java.net", "URL") and
  call.getCallee() = url and
  DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(call.getArgument(0)))
select src

Exercises 2

Write a query that finds all hard-coded strings used to create a java.net.URL, using global data flow.

import semmle.code.java.dataflow.DataFlow

class Configuration extends DataFlow::Configuration {
  Configuration() {
    this = "LiteralToURL Configuration"
  }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr() instanceof StringLiteral
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(Call call |
      sink.asExpr() = call.getArgument(0) and
      call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
    )
  }
}

from DataFlow::Node src, DataFlow::Node sink, Configuration config
where config.hasFlow(src, sink)
select src, "This string constructs a URL $@.", sink, "here"

Exercises 3

Write a class that represents flow sources from java.lang.System.getenv(..).

import java
class GetenvSource extends MethodAccess{
	GetenvSource() {
    exists(Method m | m = this.getMethod() |
      m.hasName("getenv") and
      m.getDeclaringType() instanceof TypeSystem
    )
  }
}

Exercises 4

Using the answers from 2 and 3, write a query which finds all global data flows from getenv to java.net.URL.

import semmle.code.java.dataflow.DataFlow

class GetenvSource extends DataFlow::ExprNode {
  GetenvSource() {
    exists(Method m | m = this.asExpr().(MethodAccess).getMethod() |
      m.hasName("getenv") and
      m.getDeclaringType() instanceof TypeSystem
    )
  }
}

class GetenvToURLConfiguration extends DataFlow::Configuration {
  GetenvToURLConfiguration() {
    this = "GetenvToURLConfiguration"
  }

  override predicate isSource(DataFlow::Node source) {
    source instanceof GetenvSource
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(Call call |
      sink.asExpr() = call.getArgument(0) and
      call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
    )
  }
}

from DataFlow::Node src, DataFlow::Node sink, GetenvToURLConfiguration config
where config.hasFlow(src, sink)
select src, "This environment variable constructs a URL $@.", sink, "here"

Types in Java

PrimitiveType 基础类型

RefType 引用类型类、接口、数组、注解、枚举

CodeQL提供的RefType 类内部有成员谓词： getASupertype and getASubtype 用来去定位当前元素类型的父类和子类。

例如下面这个结构：

class A {}

interface I {}

class B extends A implements I {}

寻找B类的全部父类：

import java

from Class B
where B.hasName("B")
select B.getASuperType+()

返回的结果是A、I、和java.lang.Object()

除了这些，还提供：

谓词getAMember去获得类内声明的成分，例如field、cons、methods

谓词inherits(Method m) 可以去判断m方法是否是该type声明或者继承的。

举例：数组的向下转型

数组的向下转型通常是危险的，容易引发运行时异常。

危险：

1 2	Object[] o = new Object[] { "Hello", "world" }; String[] s = (String[])o;

检测思想，如果source是target的传递父类，那么就是不安全的。

初级版本：

import java
from CastExpr ce, Array source , Array target
where 
	source = ce.getExpr().getType() and
	target = ce.getType() and
	target.getElementType().(RefType).getASupertype+() = source.getElementType()
select ce,"Potentially problematic array downcast."

但是，对于这个例子就会误报：

1
2
3

List l = new ArrayList();
// add some elements of type A to l
A[] as = (A[])l.toArray(new A[0]);

升级一下，所有是Collection.toArray()的方法或者override版本都不算。

升级版本：

/** class representing java.util.Collection.toArray(T[]) */
class CollectionToArray extends Method {
    CollectionToArray() {
        this.getDeclaringType().hasQualifiedName("java.util", "Collection") and
        this.hasName("toArray") and
        this.getNumberOfParameters() = 1
    }
}

/** class representing calls to java.util.Collection.toArray(T[]) */
class CollectionToArrayCall extends MethodAccess {
    CollectionToArrayCall() {
        exists(CollectionToArray m |
            this.getMethod().getSourceDeclaration().overridesOrInstantiates*(m)
        )
    }

    /** the call's actual return type, as determined from its argument */
    Array getActualReturnType() {
        result = this.getArgument(0).getType()
    }
}

最终版本：

刨除掉那些对类型为A[]的对象调用toArray，然后将其再转型到A[]。

import java

// Insert the class definitions from above

from CastExpr ce, Array source, Array target
where source = ce.getExpr().getType() and
    target = ce.getType() and
    target.getElementType().(RefType).getASupertype+() = source.getElementType() and
    not ce.getExpr().(CollectionToArrayCall).getActualReturnType() = target
select ce, "Potentially problematic array downcast."

举例：查找不匹配的包含检查

Map<Object, Object> zkProp;

// ...

if (zkProp.entrySet().contains("dynamicConfigFile")){
    // ...
}

这种当然是没有意义的。

一般来说，我们想找到对Collection.contains的调用（或者在Collection的任何参数化实例中的任何重写方法），使得集合元素的类型E和Contains参数的类型A不相关，也就是说，它们没有共同的子类型。

第一步，描述java.util.Collection：

class JavaUtilCollection extends GenericInterface {
    JavaUtilCollection() {
        this.hasQualifiedName("java.util", "Collection")
    }
}

第二步，描述java.util.Collection.contains方法：

class JavaUtilCollectionContains extends Method {
    JavaUtilCollectionContains() {
        this.getDeclaringType() instanceof JavaUtilCollection and
        this.hasStringSignature("contains(Object)")
    }
}

第三步，找到调用了Collection.contains的方法，包括任何override它的方法，并考虑所有参数化实例。

class JavaUtilCollectionContainsCall extends MethodAccess {
    JavaUtilCollectionContainsCall() {
        exists(JavaUtilCollectionContains jucc |
            this.getMethod().getSourceDeclaration().overrides*(jucc)
        )
    }
}

对每个contains方法的调用，我们关心的是两点：

参数的类型
被调用集合的元素类型

我们需要在第三步这个类中增加两个谓词；

1
2
3

Type getArgumentType() {
    result = this.getArgument(0).getType()
}

Type getCollectionElementType() {
    exists(RefType D, ParameterizedInterface S |
        D = this.getMethod().getDeclaringType() and
        D.hasSupertype*(S) and S.getSourceDeclaration() instanceof JavaUtilCollection and
        result = S.getTypeArgument(0)
    )
}

写一个谓词来检查两个给定的引用类型是否有一个共同的子类型。

1
2
3

predicate haveCommonDescendant(RefType tp1, RefType tp2) {
    exists(RefType commondesc | commondesc.hasSupertype*(tp1) and commondesc.hasSupertype*(tp2))
}

开始查询：

import java

// Insert the class definitions from above

from JavaUtilCollectionContainsCall juccc, Type collEltType, Type argType
where collEltType = juccc.getCollectionElementType() and argType = juccc.getArgumentType() and
    not haveCommonDescendant(collEltType, argType)
select juccc, "Element type " + collEltType + " is incompatible with argument type " + argType

改进

对于许多程序来说，由于类型变量和通配符的原因，这个查询产生了大量的false positive结果。

就像一开始的这个例子，避免误报我们需要帮助我们去让collEltType和argType都不是TypeVariable的实例。
避免自动装箱，int和Integer不是一个，我们需要保证的是collEltType不是argType的装箱类型。
null是特例，他的type在CodeQL里面是<nulltype>，需要考虑

最终版本：

import java

// Insert the class definitions from above

from JavaUtilCollectionContainsCall juccc, Type collEltType, Type argType
where collEltType = juccc.getCollectionElementType() and argType = juccc.getArgumentType() and
    not haveCommonDescendant(collEltType, argType) and
    not collEltType instanceof TypeVariable and not argType instanceof TypeVariable and
    not collEltType = argType.(PrimitiveType).getBoxedType() and
    not argType.hasName("<nulltype>")
select juccc, "Element type " + collEltType + " is incompatible with argument type " + argType

Navigating the call graph

调用图类

CodeQL提供了两个抽象类：Callable和Call

Callable只是Method和Constructor的共同父类，

Call是MethodAccess、ClassInstanceExpression、ThisConstructorInvocationStmt、SuperConstructorInvocationStmt的共同父类。

简单的说，Callable是可以被调用的东西，Call是可以调用Callable的东西。

Call提供了两个谓词：

getCallee 返回的是静态层面被调用的方法。对于非静态方法，运行时实际会是子类override的方法。
getCaller 返回的是语法上发起调用的方法

class Super {
    int x;

    // callable
    public Super() {
        this(23);       // call
    }

    // callable
    public Super(int x) {
        this.x = x;
    }

    // callable
    public int getX() {
        return x;
    }
}

class Sub extends Super {
    // callable
    public Sub(int x) {
        super(x+19);    // call
    }

    // callable
    public int getX() {
        return x-19;
    }
}

class Client {
    // callable
    public static void main(String[] args) {
        Super s = new Sub(42);  // call
        s.getX();               // call 
    }
}

Call提供了两种成员谓词：

getCallee返回的是静态解析这次调用的结果（Callable）。
getCaller返回的是调用者。

就像Client.main的第二行getCallee() 发挥的结果是 Super.getX()，但是实际运行起来调用的是Sub.getX()

Callable提供了大量的成员谓词，最重要的两个是：

cl.calls(Callable target) 如果当前调用的callee是target的话，返回true
polyCalls(Callable target) 如果实际运行可能的callee是target的话，返回true。比如说callee是target或者被target override了的

在上面这个例子中，Client.main方法调用的是Sub(int)和Super.getX()方法；；另外它polyCalls了方法Sub.getX()。

举例：寻找未被调用的方法

import java

from Callable callee
where not exists(Callable caller|caller.polyCalls(callee))
select callee

一个标准Java应用并不会对JDK中的每个库函数都调用可以用fromSource谓词来做过滤。

我们可以使用谓词fromSource来检查一个编译单元是否是一个源文件，并细化我们的查询。

import java
from Callable callee
where not exists(Callable caller|caller.polyCalls(callee)) and callee.getCompilationUnit().fromSource()
select callee, "Not called."

我们还发现，<clinit>方法未被调用，其实它们本质上是类对象的初始化函数，并不会在任何地方被显示调用。

同样的，finalize函数其实也是同理，用于销毁内存中的类对象。

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize")
select callee, "Not called."

还有一种特例就是

在单例模式中，一个类提供了private修饰的空构造方法，防止它被其他类实例化（单例模式）。

这些结果不应该出现在我们的查询结果中，因为他们的设计意图就是这样。

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize") and
    not callee.isPublic() and
    not callee.(Constructor).getNumberOfParameters() = 0
select callee, "Not called."

实际上很多方法其实都是可用通过反射的方式去调用的，一般来说很难检测这种方法。

但是CodeQL支持识别JUnit和其他框架的测试类，这些测试类被test runner调用。

import java

from Callable callee
where not exists(Callable caller | caller.polyCalls(callee)) and
    callee.getCompilationUnit().fromSource() and
    not callee.hasName("<clinit>") and not callee.hasName("finalize") and
    not callee.isPublic() and
    not callee.(Constructor).getNumberOfParameters() = 0 and
    not callee.getDeclaringType() instanceof TestClass
select callee, "Not called."

Annotations in Java

CodeQL中关于Annotation的类：

Annotable类表示所有可以被注解附加的程序元素，比如说package、引用类型、field、method、local_var。
AnnotationType类表示注解这个类型，例如java.langOverride。Java中的注解都是接口。
AnnotationElement类表示某个注解内部的元素，也就是一个注解类型的成员。
Annotation类表示一个注解，例如@Override。注解的value可以通过getValue谓词获得。

例如，SuppressWarnings是Java官方注解，要求针对某类错误不报错。

package java.lang;

public @interface SuppressWarnings {
    String[] value;
}

SuppressWarnings被表示为AnnotationType，value是他唯一的AnnotationElement

用法举例：

class A {
    @SuppressWarnings("rawtypes")
    public A(java.util.List rawlist) {
    }
}

这里，@SuppressWarnings("rawtypes")表示的是一个Annotation，"rawtypes"就是该注解的value，这个值可以被谓词getValue获取。

我们可以寻找加载在cons上面的@SuppressWarning ,返回Annotation本身和value：

import java
from Constructor c, Annotation ann, AnnotationType anntp
where ann = c.getAnAnnotation() and
    anntp = ann.getType() and
    anntp.hasQualifiedName("java.lang", "SuppressWarnings")
select ann, ann.getValue("value")

下面这个例子仅仅查询具有单个注解元素，并且名字是value：

import java

from AnnotationType anntp
where forex(AnnotationElement elt |
    elt = anntp.getAnAnnotationElement() |
    elt.getName() = "value"
)
select anntp

举例：查询缺失的`@Override`注解

class Super {
    public void m() {}
}

class Sub1 extends Super {
    @Override public void m() {}
}

class Sub2 extends Super {
    public void m() {}
}

我们需要找到那些Sub2.m，应该加上@Override注解的方法。

找到所有的@Override注解：

import java

from Annotation ann
where ann.getType().hasQualifiedName("java.lang","Override")
select ann

回找到很多类似Sub1.m()的方法。

接下俩可以将@Override注解进行一个封装：

class OverrideAnnotation extends Annotation {
    OverrideAnnotation() {
        this.getType().hasQualifiedName("java.lang", "Override")
    }
}

接下来我们选择override谓词去寻找是否一个方法override了另一个方法。

和getAnAnnotation谓词(适用在任何Annotatable类上)来获得一些注释。

import java
from Method overriding, Method overridden
where overriding.overrides(overridden) and
	not overriding.getAnAnnotation() instanceof OverrideAnnotation and
	overriding.fromSource()
select overriding, "Method overrides another method, but does not have an @Override annotation."

举例：查询调用`@deprecated`方法

封装@Deprecated：

class DeprecatedAnnotation extends Annotation {
    DeprecatedAnnotation() {
        this.getType().hasQualifiedName("java.lang", "Deprecated")
    }
}

class DeprecatedMethod extends Method {
		DeprecatedMethod() {
        this.getAnAnnotation() instanceof DeprecatedAnnotation
    }
}

开始查询：

import java

from Call call
where call.getCallee() instanceof DeprecatedMethod and
	not call.getCaller() instanceof DeprecatedMethod
select call, "This call invokes a deprecated method."

提升

class A {
    @Deprecated void m() {}

    @Deprecated void n() {
        m();
    }

    @SuppressWarnings("deprecated")
    void r() {
        m();
    }
}

对于这个例子，r方法实际上是忽视了调用的Deprecated方法。

改进：

class SuppressDeprecationWarningAnnotation extends Annotation {
    SuppressDeprecationWarningAnnotation() {
        this.getType().hasQualifiedName("java.lang", "SuppressWarnings") and
        this.getAValue().(Literal).getLiteral().regexpMatch(".*deprecation.*")
    }
}

为此，我们首先引入一个类来表示所有@SuppressWarnings注释，其中deprecated字符串出现在value列表中。

value的内容是字符串值，cast到Literal之后，getLiteral到value的字符串值。

import java

// Insert the class definitions from above

from Call call
where call.getCallee() instanceof DeprecatedMethod
    and not call.getCaller() instanceof DeprecatedMethod
    and not call.getCaller().getAnAnnotation() instanceof SuppressDeprecationWarningAnnotation
select call, "This call invokes a deprecated method."

AST

CodeQL for AST

抽象语法树上的节点都是语句和表达式。

AST
├── Statement (Stmt)
└── Expression (Expr)
    ├── Literal 字面量 
    │   ├── BooleanLiteral
    │   ├── IntegerLiteral
    │   ├── LongLiteral
    │   ├── FloatingPointLiteral
    │   ├── DoubleLiteral
    │   ├── CharacterLiteral
    │   ├── StringLiteral
    │   └── NullLiteral
    ├── Unary expression 一元表达式
    ├── Binary expression 二元表达式
    ├── Assignment expression 赋值表达式
    ├── Access 
    └── Miscellaneous

Java官方例子

https://codeql.github.com/codeql-query-help/java/

https://github.com/github/codeql/tree/main/java/ql/examples/snippets

GitHub Security Lab

https://securitylab.github.com/research/

Java 路径查询

模板：

/**
 * ...
 * @kind path-problem
 * ...
 */

import <language>
// For some languages (Java/C++/Python) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow
import DataFlow::PathGraph
...

from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "<message>"

path query metadata

类似@kind path-probleam

generate path explanation

生成路径解释

需要定义一个名为edges的谓词，用于约束图中的边的关系。

需要定义一个名为node的谓词，用于约束图中的节点的特性。

也可以导入现有图模块，它们内部就有谓词edges定义：

1	import DataFlow::PathGraph

delcare sources and sinks

需要在from语句中，定义source和sink的类型和定义。

实际上，如果import了DataFlow::PathGraph模块，实际上可以直接使用Configuration类，可以：

1	from DataFlow::Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink

DataFlow::Configuration是一个bastract类，实际上它内部定义了两个关键谓词：

isSource() defines where data may flow from.
isSink() defines where data may flow to.

define flow conditions

模板：

1
2
3

where config.hasFlowPath(source, sink)

select element, source, sink, string

排查故障

确定变量的具体类型

如果你不熟悉查询中使用的库，你可以使用CodeQL来确定一个实体有哪些类型。有一个称为getAQlClass()的谓词，它返回调用这个谓词的实体的最具体的QL类型。

比如说，如果你正在使用一个Java数据库，你可能会在一个叫做c的Callable之中的的每个Expr上使用getAQlClass()。

import java

from Expr e, Callable c
where
    c.getDeclaringType().hasQualifiedName("my.namespace.name", "MyClass")
    and c.getName() = "c"
    and e.getEnclosingCallable() = c
select e, e.getAQlClass()

这个查询的结果是该函数中每个Expr的最具体类型的列表。对于由多个类型表示的表达式，你会看到多个结果，所以它可能会返回一个非常大的结果表。

把getAQlClass()作为一个调试工具，但不要把它包含在你的查询的最终版本中，因为它拖慢了性能。

Debugging data-flow queries using partial flow

常规数据流路径查询模板：

class MyConfig extends TaintTracking::Configuration {
  MyConfig() { this = "MyConfig" }

  override predicate isSource(DataFlow::Node node) { node instanceof MySource }

  override predicate isSink(DataFlow::Node node) { node instanceof MySink }
}

from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"

简化版：

1
2
3

from MyConfig config, DataFlow::Node source, DataFlow::Node sink
where config.hasPath(source, sink)
select sink, "Sink is reached from $@.", source.getNode(), "here"

但现实情况经常会遇到数据流断裂的情况，这是可以使用局部数据流进行调试

步骤一：检查Source和Sink

谓词fieldFlowBranchLimit

数据流配置包含一个叫做fieldFlowBranchLimit的参数。如果这个值设置得太高，你可能会遇到性能下降，但如果它太低，你可能会错过结果。在调试数据流时，尝试将fieldFlowBranchLimit设置为一个高值，看看你的查询是否会产生更多的结果。例如，尝试在你的配置中加入以下内容。

1	override int fieldFlowBranchLimit() { result = 5000 }

如果仍然没有结果，而且性能还可以使用，那么最好在做进一步调试时将此设置为高值。

步骤二：局部数据流

Configuration.hasPartialFlow谓词、

/**
 * Holds if there is a partial data flow path from `source` to `node`. The
 * approximate distance between `node` and the closest source is `dist` and
 * is restricted to be less than or equal to `explorationLimit()`. This
 * predicate completely disregards sink definitions.
 *
 * This predicate is intended for dataflow exploration and debugging and may
 * perform poorly if the number of sources is too big and/or the exploration
 * limit is set too high without using barriers.
 *
 * This predicate is disabled (has no results) by default. Override
 * `explorationLimit()` with a suitable number to enable this predicate.
 *
 * To use this in a `path-problem` query, import the module `PartialPathGraph`.
 */
final predicate hasPartialFlow(PartialPathNode source, PartialPathNode node, int dist) {

该谓词用与数据流的探索

@kind path-problem

import DataFlow::PartialPathGraph(不要引入PathGraph模块，否则会报错)

dis是大概距离，该谓词完全不理会你的sink定义

首先，你必须override一下explorationLimit()谓词

1	override int explorationLimit() { result = 5 }

这里的5就是数据流搜索半径

一个不错的使用示例子：

predicate adhocPartialFlow(Callable c, PartialPathNode n, Node src, int dist) {
  exists(MyConfig conf, PartialPathNode source |
    conf.hasPartialFlow(source, n, dist) and
    src = source.getNode() and
    c = n.getNode().getEnclosingCallable()
  )
}

如果你只关心某个特定的soruce，那么src参数就是多余的了，可以内部exists替代

当然，你也可以根据n添加其他感兴趣的列，但一般建议至少包括封闭的可调用程序和到源头的距离，因为它们可以成为有用的列，以便对结果进行排序。

如果你看到大量的部分流动结果，你可以通过几种方式来关注它们。

如果数据流按照预期的路径走了很远的距离，这可能导致大量无用的flow被包括在探索半径内。

为了精简它们，你可以用沿着路径出现的合适node替换source定义，并从该node重新开始局部数据流探索。
可以创造性地使用barrier和sanitizer来剪枝。这也减少了调试时需要探索的partial flow的数量。

序言