Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

Extract generics#686

Merged
smowton merged 46 commits intogithub:mainfrom
owen-mc:extract-generics
May 11, 2022
Merged

Extract generics#686
smowton merged 46 commits intogithub:mainfrom
owen-mc:extract-generics

Conversation

@owen-mc
Copy link
Copy Markdown
Contributor

@owen-mc owen-mc commented Feb 9, 2022

Current status (updated 2022-04-12):

I think this is complete except for performance testing.

Note that for the moment we are ignoring instantiated types, and treating them as the uninstantiated type.

Copy link
Copy Markdown
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! Worth considering for the future: do we want to extract the same thing for functionName[param, param...] and for typeName[param, param, ...]? Which one will be harder: distinguishing the cases in the extractor, or in QL?

@@ -0,0 +1,53 @@
// // First we need to wrap some database types
// class Location extends @location {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, uncomment, or note why this is left here commented

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. These were from the scripts that I copied, but I don't think it's needed. I meant to test out doing an upgrade and a downgrade, and if they work I will delete it.

@owen-mc owen-mc force-pushed the extract-generics branch from 7a959b4 to 360f244 Compare March 2, 2022 16:54
@owen-mc owen-mc force-pushed the extract-generics branch 2 times, most recently from 164d49f to e0f8c68 Compare March 31, 2022 15:59
@owen-mc owen-mc marked this pull request as ready for review March 31, 2022 16:00
@owen-mc owen-mc requested a review from a team as a code owner March 31, 2022 16:00
@owen-mc owen-mc force-pushed the extract-generics branch from e0f8c68 to c0e6bb6 Compare April 1, 2022 13:27
@owen-mc
Copy link
Copy Markdown
Contributor Author

owen-mc commented Apr 1, 2022

There seem to be some non-trivial test failures relating to data flow which I'll look into on Monday.

Copy link
Copy Markdown
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to overrideTypesOfTypeSetLiterals

Comment thread extractor/extractor.go Outdated
Comment on lines +396 to +401
populateTypeParamParents(tw, meth.Type().(*types.Signature).TypeParams(), methlbl)
populateTypeParamParents(tw, meth.Type().(*types.Signature).RecvTypeParams(), methlbl)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment (in both places) clarifying what's going on with this vs. the very similar calls in extractObjects

Comment thread extractor/dbscheme/tables.go Outdated

// IndexExpr is the type of index expression AST nodes
// IndexExpr is the type of AST nodes for index expressions and generic type
// instantiation expressions with one type argument
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// instantiation expressions with one type argument
// instantiation expressions with one type argument
// Note that syntactically unambiguous generic instantiations will be extracted
// as one of the Generic...InstantiationExpr expression kinds below.


// GenericFunctionInstantiationExpr is the type of AST nodes that represent a instantiation
// of a generic type. These correspond to some index expression AST nodes and all index
// list expression AST nodes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// list expression AST nodes.
// list expression AST nodes.
// Note some syntactically ambiguous instantations are extracted as an `IndexExpr` to be disambiguated in QL later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually function instantiations can be distinguished in the extractor.

Comment thread extractor/dbscheme/tables.go Outdated

// GenericTypeInstantiationExpr is the type of AST nodes that represent an instantiation
// of a generic type. These correspond to some index expression AST nodes and all index
// list expression AST nodes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// list expression AST nodes.
// list expression AST nodes.
// Note some syntactically ambiguous instantations are extracted as an `IndexExpr` to be disambiguated in QL later.

Comment thread extractor/extractor.go Outdated
Comment on lines +974 to +975
n := flattenBinaryExprTree(tw, expr.X, lbl, 0)
flattenBinaryExprTree(tw, expr.Y, lbl, n)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as flattenBinaryExprTree(expr)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

Comment thread extractor/extractor.go Outdated
// Go through a `FieldList` and update the types of all type set literals which
// are not already union types to be union types. We do this by changing the
// types stored in `tw.Package.TypesInfo.Types`.
func overrideTypesOfTypeSetLiterals(tw *trap.Writer, fields *ast.FieldList) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func overrideTypesOfTypeSetLiterals(tw *trap.Writer, fields *ast.FieldList) {
func makeTypeSetLiteralsUnionTyped(tw *trap.Writer, fields *ast.FieldList) {

Probably add a comment at the use sites explaining the reasoning why we're confident we don't extract a type-set literal prior to this override being applied.

@owen-mc owen-mc force-pushed the extract-generics branch 2 times, most recently from bde455b to 4181beb Compare April 4, 2022 23:22
Copy link
Copy Markdown
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested tests to add: some basic dataflow tests to make sure we can see flow into and out of an instantiation of a generic function / method and/or via the fields of a generic type.

We should also add some basic control-flow tests to make sure our new expr kinds aren't being picked up and incorporated into the CFG leading to potential dead-ends.

Comment thread extractor/trap/labels.go
Comment thread ql/lib/semmle/go/Types.qll Outdated
Comment on lines +115 to +131
u instanceof InterfaceType and
(
not u instanceof BasicInterfaceType and
if exists(u.(InterfaceType).getAnEmbeddedTypeSetLiteral())
then
forall(Type intersectionType |
intersectionType = u.(InterfaceType).getAnEmbeddedTypeSetLiteral().getATerm().getType() and
forall(TypeSetLiteralType tslit |
tslit = u.(InterfaceType).getAnEmbeddedTypeSetLiteral()
|
intersectionType = tslit.getATerm().getType()
)
|
intersectionType.implementsComparable()
)
else u.(InterfaceType).isOrEmbedsComparable()
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neaten this with exists(InterfaceType uif | uif = u | ... to eliminate all the u.(InterfaceType)

Comment thread ql/lib/semmle/go/Types.qll Outdated
Comment on lines +122 to +125
forall(TypeSetLiteralType tslit |
tslit = u.(InterfaceType).getAnEmbeddedTypeSetLiteral()
|
intersectionType = tslit.getATerm().getType()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining what this is doing? I don't follow what the nested forall seeks to achieve.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All types in the intersection of all the embedded type set literals must implement comparable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "add a comment" (here and elsewhere) I mean add a comment in the code for future readers' benefit, not just in the PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add it in the code. I just copied it here so you could see the wording easily without having to hunt it down in the updated PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh right sorry, I clicked through and surprisingly Github took me to the code before your change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again, based on the General Interfaces bit of https://go.dev/ref/spec#Interface_types I'm not sure this definition of the intersection is complete? Based on that spec, are you sure we don't need to consider method specifications here for example, to restrict to only those types with the required method defined on them?

Comment thread ql/lib/semmle/go/Types.qll Outdated
override string toString() { result = "pointer type" }
}

private newtype TTerm =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a publicly exposed name this is a bit non-specific (let the record show I avoided calling it 'generic'). How about TypeSetTerm?

Comment thread ql/lib/semmle/go/Types.qll Outdated
Type getType() { result = tp }

/** Holds if `t` is in the type set of this term. */
predicate hasInTypeSet(Type t) { if tilde = false then t = tp else t.getUnderlyingType() = tp }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchesType or includesType perhaps? I'm thinking hasInTypeSet would apply to a type-set not an individual term

Comment thread ql/lib/semmle/go/Types.qll Outdated
/**
* Holds if `tp` is an explicitly embedded type with index `index`.
*
* `tp` is either a type set literal type or its underlyign type is an
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `tp` is either a type set literal type or its underlyign type is an
* `tp` is either a type set literal type or its underlying type is an

Also confirm, are you saying tp can't directly be an interface type? If not (i.e. this is short for 'is type-set-literal OR is interface OR underlying is interface') then perhaps abbreviate this to

if tp (or its underlying type) is a type-set literal or a non-basic interface -- guessing from the context that having an underlying which is a type-set literal is impossible, but that's ok

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it can be an interface type, if you do something like type t interface{ interface{ Read(p []byte) (n int, err error } }. There is also some confusion caused by the tension between the fact that it is standard to refer to types like Reader as interfaces, when they are in fact named types whose underlying type is an interface. So your suggested wording works.

Comment on lines +745 to +752
* Note that the methods of the embedded interface are already considered
* as part of the method set of this interface.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately clear what this has to do with this predicate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make it clear that even though we are giving the user a way to access an embedded interface, they do not need to do that to access methods on this interface which come from an embedded interface, because we have already processed them so that they are considered methods on this interface.

Comment thread ql/lib/semmle/go/Types.qll Outdated
* Note that the methods of the embedded interface are already considered
* as part of the method set of this interface.
*/
Type getAnExplicitlyEmbeddedInterface() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this might be better called directly rather than explicitly embedded?

Comment thread ql/lib/semmle/go/Types.qll Outdated
* Gets the type set literal with index `index` from the definition of this
* interface type.
*
* Note that the indexing includes embedded interfaces but not methods.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the indices might be non-contiguous? If so just say that.

# 20| 0: [CallExpr] call to Println
# 20| Type = (int, error)
# 20| 0: [FunctionName, SelectorExpr] selection of Println
# 20| Type = func([]interface { }) int, error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we contrive to avoid this change to eliminate lots of churn?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be okay with the churn if it was in a separate PR? I would prefer to get rid of the double space, though I can see that it is not necessary to do it at the same time as all these other changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it in a separate PR and rebased this branch on top of it. When I next force-push the churn will be reduced.

@smowton
Copy link
Copy Markdown
Contributor

smowton commented Apr 7, 2022

No further comments from me -- I don't thoroughly understand the ins and outs of the nightmare type system the Golang authors have created, but this seems like we could incrementally fix any mistakes in the event we've both misunderstood.

I'm a bit worried about the performance of the doubly-nested forall describing "for all types that are present in the type-sets of all elements of this interface", but I guess we'll only find out about that when we get some nontrivial generic projects in the wild to test this out against.

@smowton
Copy link
Copy Markdown
Contributor

smowton commented Apr 7, 2022

Testing steps:

[ ] DCA
[ ] LGTM dist build

@owen-mc
Copy link
Copy Markdown
Contributor Author

owen-mc commented Apr 8, 2022

@smowton If it makes you feel better about the doubly-nested forall, there isn't really any reason to ever have more than one type set literal in an interface, so it will almost always boil down to checking all the terms in term union.

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Apr 27, 2022

This pull request introduces 1 alert when merging 17fc38a into b8165d4 - view on LGTM.com

new alerts:

  • 1 for Unreachable statement

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Apr 29, 2022

This pull request introduces 1 alert when merging c84ef3d into ee94eb5 - view on LGTM.com

new alerts:

  • 1 for Unreachable statement

@owen-mc owen-mc force-pushed the extract-generics branch from c84ef3d to 2959803 Compare May 3, 2022 12:18
@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented May 3, 2022

This pull request introduces 1 alert when merging 2959803 into ee94eb5 - view on LGTM.com

new alerts:

  • 1 for Unreachable statement

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented May 10, 2022

This pull request introduces 1 alert when merging dfce34b into ef7363c - view on LGTM.com

new alerts:

  • 1 for Unreachable statement

@smowton smowton force-pushed the extract-generics branch from dfce34b to c0fbd03 Compare May 10, 2022 13:52
@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented May 10, 2022

This pull request introduces 1 alert when merging c0fbd03 into ef7363c - view on LGTM.com

new alerts:

  • 1 for Unreachable statement

@smowton
Copy link
Copy Markdown
Contributor

smowton commented May 11, 2022

DCA and distribution test both suggest this at least harmless. One of the tests has some MISSING: expectations which all relate to taking pointers to generic methods: to check whether this is particular to generics or a general shortcoming of Go's modelling of function pointers. However given this appears at least harmless I will merge this and follow up investigating the function pointer problems.

@smowton smowton merged commit 440b311 into github:main May 11, 2022
smowton added a commit to github/codeql that referenced this pull request May 11, 2022
As of github/codeql-go#686 landing we support extracting generics, dataflow analysis in programs that use generics, etc. Note this hasn't  gone out in a release yet but I would expect it to be in 2.9.2.
@owen-mc owen-mc deleted the extract-generics branch May 12, 2022 06:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants