Extract generics#686
Conversation
0db4ec8 to
7b099ef
Compare
smowton
left a comment
There was a problem hiding this comment.
Looking good so far! Worth considering for the future: do we want to extract the same thing for functionName[param, param...] and for typeName[param, param, ...]? Which one will be harder: distinguishing the cases in the extractor, or in QL?
| @@ -0,0 +1,53 @@ | |||
| // // First we need to wrap some database types | |||
| // class Location extends @location { | |||
There was a problem hiding this comment.
Remove, uncomment, or note why this is left here commented
There was a problem hiding this comment.
Oh yes. These were from the scripts that I copied, but I don't think it's needed. I meant to test out doing an upgrade and a downgrade, and if they work I will delete it.
0bac20b to
2c2b64d
Compare
7a959b4 to
360f244
Compare
164d49f to
e0f8c68
Compare
e0f8c68 to
c0e6bb6
Compare
|
There seem to be some non-trivial test failures relating to data flow which I'll look into on Monday. |
smowton
left a comment
There was a problem hiding this comment.
Reviewed up to overrideTypesOfTypeSetLiterals
| populateTypeParamParents(tw, meth.Type().(*types.Signature).TypeParams(), methlbl) | ||
| populateTypeParamParents(tw, meth.Type().(*types.Signature).RecvTypeParams(), methlbl) |
There was a problem hiding this comment.
Add a comment (in both places) clarifying what's going on with this vs. the very similar calls in extractObjects
|
|
||
| // IndexExpr is the type of index expression AST nodes | ||
| // IndexExpr is the type of AST nodes for index expressions and generic type | ||
| // instantiation expressions with one type argument |
There was a problem hiding this comment.
| // instantiation expressions with one type argument | |
| // instantiation expressions with one type argument | |
| // Note that syntactically unambiguous generic instantiations will be extracted | |
| // as one of the Generic...InstantiationExpr expression kinds below. |
|
|
||
| // GenericFunctionInstantiationExpr is the type of AST nodes that represent a instantiation | ||
| // of a generic type. These correspond to some index expression AST nodes and all index | ||
| // list expression AST nodes. |
There was a problem hiding this comment.
| // list expression AST nodes. | |
| // list expression AST nodes. | |
| // Note some syntactically ambiguous instantations are extracted as an `IndexExpr` to be disambiguated in QL later. |
There was a problem hiding this comment.
Actually function instantiations can be distinguished in the extractor.
|
|
||
| // GenericTypeInstantiationExpr is the type of AST nodes that represent an instantiation | ||
| // of a generic type. These correspond to some index expression AST nodes and all index | ||
| // list expression AST nodes. |
There was a problem hiding this comment.
| // list expression AST nodes. | |
| // list expression AST nodes. | |
| // Note some syntactically ambiguous instantations are extracted as an `IndexExpr` to be disambiguated in QL later. |
| n := flattenBinaryExprTree(tw, expr.X, lbl, 0) | ||
| flattenBinaryExprTree(tw, expr.Y, lbl, n) |
There was a problem hiding this comment.
Is this the same as flattenBinaryExprTree(expr)?
| // Go through a `FieldList` and update the types of all type set literals which | ||
| // are not already union types to be union types. We do this by changing the | ||
| // types stored in `tw.Package.TypesInfo.Types`. | ||
| func overrideTypesOfTypeSetLiterals(tw *trap.Writer, fields *ast.FieldList) { |
There was a problem hiding this comment.
| func overrideTypesOfTypeSetLiterals(tw *trap.Writer, fields *ast.FieldList) { | |
| func makeTypeSetLiteralsUnionTyped(tw *trap.Writer, fields *ast.FieldList) { |
Probably add a comment at the use sites explaining the reasoning why we're confident we don't extract a type-set literal prior to this override being applied.
bde455b to
4181beb
Compare
smowton
left a comment
There was a problem hiding this comment.
Suggested tests to add: some basic dataflow tests to make sure we can see flow into and out of an instantiation of a generic function / method and/or via the fields of a generic type.
We should also add some basic control-flow tests to make sure our new expr kinds aren't being picked up and incorporated into the CFG leading to potential dead-ends.
| u instanceof InterfaceType and | ||
| ( | ||
| not u instanceof BasicInterfaceType and | ||
| if exists(u.(InterfaceType).getAnEmbeddedTypeSetLiteral()) | ||
| then | ||
| forall(Type intersectionType | | ||
| intersectionType = u.(InterfaceType).getAnEmbeddedTypeSetLiteral().getATerm().getType() and | ||
| forall(TypeSetLiteralType tslit | | ||
| tslit = u.(InterfaceType).getAnEmbeddedTypeSetLiteral() | ||
| | | ||
| intersectionType = tslit.getATerm().getType() | ||
| ) | ||
| | | ||
| intersectionType.implementsComparable() | ||
| ) | ||
| else u.(InterfaceType).isOrEmbedsComparable() | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Neaten this with exists(InterfaceType uif | uif = u | ... to eliminate all the u.(InterfaceType)
| forall(TypeSetLiteralType tslit | | ||
| tslit = u.(InterfaceType).getAnEmbeddedTypeSetLiteral() | ||
| | | ||
| intersectionType = tslit.getATerm().getType() | ||
| ) |
There was a problem hiding this comment.
Add a comment explaining what this is doing? I don't follow what the nested forall seeks to achieve.
There was a problem hiding this comment.
All types in the intersection of all the embedded type set literals must implement comparable.
There was a problem hiding this comment.
By "add a comment" (here and elsewhere) I mean add a comment in the code for future readers' benefit, not just in the PR
There was a problem hiding this comment.
I did add it in the code. I just copied it here so you could see the wording easily without having to hunt it down in the updated PR.
There was a problem hiding this comment.
Doh right sorry, I clicked through and surprisingly Github took me to the code before your change.
There was a problem hiding this comment.
Looking again, based on the General Interfaces bit of https://go.dev/ref/spec#Interface_types I'm not sure this definition of the intersection is complete? Based on that spec, are you sure we don't need to consider method specifications here for example, to restrict to only those types with the required method defined on them?
| override string toString() { result = "pointer type" } | ||
| } | ||
|
|
||
| private newtype TTerm = |
There was a problem hiding this comment.
As a publicly exposed name this is a bit non-specific (let the record show I avoided calling it 'generic'). How about TypeSetTerm?
| Type getType() { result = tp } | ||
|
|
||
| /** Holds if `t` is in the type set of this term. */ | ||
| predicate hasInTypeSet(Type t) { if tilde = false then t = tp else t.getUnderlyingType() = tp } |
There was a problem hiding this comment.
matchesType or includesType perhaps? I'm thinking hasInTypeSet would apply to a type-set not an individual term
| /** | ||
| * Holds if `tp` is an explicitly embedded type with index `index`. | ||
| * | ||
| * `tp` is either a type set literal type or its underlyign type is an |
There was a problem hiding this comment.
| * `tp` is either a type set literal type or its underlyign type is an | |
| * `tp` is either a type set literal type or its underlying type is an |
Also confirm, are you saying tp can't directly be an interface type? If not (i.e. this is short for 'is type-set-literal OR is interface OR underlying is interface') then perhaps abbreviate this to
if tp (or its underlying type) is a type-set literal or a non-basic interface -- guessing from the context that having an underlying which is a type-set literal is impossible, but that's ok
There was a problem hiding this comment.
It turns out it can be an interface type, if you do something like type t interface{ interface{ Read(p []byte) (n int, err error } }. There is also some confusion caused by the tension between the fact that it is standard to refer to types like Reader as interfaces, when they are in fact named types whose underlying type is an interface. So your suggested wording works.
| * Note that the methods of the embedded interface are already considered | ||
| * as part of the method set of this interface. |
There was a problem hiding this comment.
Not immediately clear what this has to do with this predicate?
There was a problem hiding this comment.
I want to make it clear that even though we are giving the user a way to access an embedded interface, they do not need to do that to access methods on this interface which come from an embedded interface, because we have already processed them so that they are considered methods on this interface.
| * Note that the methods of the embedded interface are already considered | ||
| * as part of the method set of this interface. | ||
| */ | ||
| Type getAnExplicitlyEmbeddedInterface() { |
There was a problem hiding this comment.
Sounds like this might be better called directly rather than explicitly embedded?
| * Gets the type set literal with index `index` from the definition of this | ||
| * interface type. | ||
| * | ||
| * Note that the indexing includes embedded interfaces but not methods. |
There was a problem hiding this comment.
Are you saying the indices might be non-contiguous? If so just say that.
| # 20| 0: [CallExpr] call to Println | ||
| # 20| Type = (int, error) | ||
| # 20| 0: [FunctionName, SelectorExpr] selection of Println | ||
| # 20| Type = func([]interface { }) int, error |
There was a problem hiding this comment.
Can we contrive to avoid this change to eliminate lots of churn?
There was a problem hiding this comment.
Would you be okay with the churn if it was in a separate PR? I would prefer to get rid of the double space, though I can see that it is not necessary to do it at the same time as all these other changes.
There was a problem hiding this comment.
I've done it in a separate PR and rebased this branch on top of it. When I next force-push the churn will be reduced.
|
No further comments from me -- I don't thoroughly understand the ins and outs of the nightmare type system the Golang authors have created, but this seems like we could incrementally fix any mistakes in the event we've both misunderstood. I'm a bit worried about the performance of the doubly-nested forall describing "for all types that are present in the type-sets of all elements of this interface", but I guess we'll only find out about that when we get some nontrivial generic projects in the wild to test this out against. |
|
Testing steps: [ ] DCA |
|
@smowton If it makes you feel better about the doubly-nested forall, there isn't really any reason to ever have more than one type set literal in an interface, so it will almost always boil down to checking all the terms in term union. |
8e3c70e to
c23a496
Compare
3287623 to
95c5077
Compare
1fc8554 to
17fc38a
Compare
|
This pull request introduces 1 alert when merging 17fc38a into b8165d4 - view on LGTM.com new alerts:
|
17fc38a to
c84ef3d
Compare
|
This pull request introduces 1 alert when merging c84ef3d into ee94eb5 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging 2959803 into ee94eb5 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging dfce34b into ef7363c - view on LGTM.com new alerts:
|
The tests which involve a flow through a receiver with a non-trivial access path currently don't give the right result. This should be fixed in a follow-up issue.
"type declarations inside generic functions are not currently supported"
|
This pull request introduces 1 alert when merging c0fbd03 into ef7363c - view on LGTM.com new alerts:
|
|
DCA and distribution test both suggest this at least harmless. One of the tests has some |
As of github/codeql-go#686 landing we support extracting generics, dataflow analysis in programs that use generics, etc. Note this hasn't gone out in a release yet but I would expect it to be in 2.9.2.
Current status (updated 2022-04-12):
I think this is complete except for performance testing.
Note that for the moment we are ignoring instantiated types, and treating them as the uninstantiated type.