Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ private import DataFlowPrivate
class LocalSourceNode extends Node {
cached
LocalSourceNode() {
not simpleLocalFlowStep(_, this) and
// Currently, we create synthetic post-update nodes for
// - arguments to calls that may modify said argument
// - direct reads a writes of object attributes
// Both of these preserve the identity of the underlying pointer, and hence we exclude these as
// local source nodes.
// We do, however, allow the post-update nodes that arise from object creation (which are non-synthetic).
not this instanceof SyntheticPostUpdateNode
this instanceof ExprNode and
not simpleLocalFlowStep(_, this)
or
// Module variable nodes must be local source nodes, otherwise type trackers cannot step through
// them.
this instanceof ModuleVariableNode
or
// We explicitly include any read of a global variable, as some of these may have local flow going
// into them.
this = any(ModuleVariableNode mvn).getARead()
Comment on lines +39 to 45
Copy link
Copy Markdown
Member

@RasmusWL RasmusWL Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these comments, makes it much easier 👍

Are there any instances where having ModuleVariableNode provides any value instead of just having any(ModuleVariableNode mvn).getARead()? That is, could we write this part as:

    // Module variable nodes must be local source nodes, otherwise type trackers cannot
    // step through them. We explicitly include any read of a global variable, as some
    // of these may have local flow going into them.
    this = any(ModuleVariableNode mvn).getARead()

I tried running the following query on a few databases I had lying around (salt, nova, peewee), but it did not give any results.

import python
import semmle.python.dataflow.new.internal.LocalSources
import semmle.python.dataflow.new.internal.DataFlowPublic
private import semmle.python.dataflow.new.internal.DataFlowPrivate

from ModuleVariableNode mvn, Node flowsToNode
where
  not mvn = flowsToNode and
  mvn.(LocalSourceNode).flowsTo(flowsToNode) and
  not mvn.getARead().(LocalSourceNode).flowsTo(flowsToNode)
select mvn, flowsToNode

I guess I could just try to run our tests with that change, so I've just started doing that 😛

Copy link
Copy Markdown
Member

@RasmusWL RasmusWL Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried out running the tests, and some of them failed. For example

x = m4.blah4 #$ use=moduleImport("a4").getMember("b4").getMember("c4").getMember("blah4")

and the query above doesn't give any results either, so clearly that query is not good enough 😓

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble arises when you want to type track from some_global = tracked to sink(some_global). In practice this happens in two steps: first we go from tracked to ModuleVariableNode for some_global and then from this node to the read.

However, since type trackers (in their usual formulation) step from LocalSourceNode to LocalSourceNode, we'll lose the first step unless the ModuleVariableNode is also a LocalSourceNode.

On the other hand, we should eventually be able to get rid of the getARead bit. It's only there right now because we have local flow into global variables, which I think we should get rid of (but we do want it in the global scope, so this is slightly tricky).

For instance we currently have local flow from source to sink in the following bit of code:

def make_safe():
    global g
    g = NON_SOURCE

def foo():
    global g
    g = SOURCE
    make_safe()
    sink(g)

Now, we'll actually have this regardless (unless we put some effort into tracking what globals are modified by what functions, which we might), since there will be the aforementioned step through the module variable node. But we shouldn't have local flow here.

I think based on this discussion that I should maybe expand the comment for ModuleVariableNode a bit. 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6218 (which is based on this branch) clarifies this situation a bit by creating a specific TypeTrackingNode class that encompasses both LocalSourceNode and ModuleVariableNode. That way, we don't have the somewhat confusing inclusion of ModuleVariableNode as a LocalSourceNode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this PR be closed then, and we focus on the other one instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to merge this one as is (assuming you're happy with it), as the changes it makes have been tested (performance and otherwise), whereas the performance tests for the other PR are still ongoing. Also, the other changes will affect the shared type tracking library, so are perhaps a bit more controversial.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, fine by me 👍

}

Expand Down