Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 42 additions & 10 deletions extensions/ql-vscode/src/model-editor/yaml.ts
Original file line number Diff line number Diff line change
Expand Up @@ -181,25 +181,40 @@ function createDataExtensionYamlsByGrouping(
>,
createFilename: (method: Method) => string,
): Record<string, string> {
const methodsByFilename: Record<string, Record<string, ModeledMethod[]>> = {};
const actualFilenameByCanonicalFilename: Record<string, string> = {};

const methodsByCanonicalFilename: Record<
string,
Record<string, ModeledMethod[]>
> = {};

// We only want to generate a yaml file when it's a known external API usage
// and there are new modeled methods for it. This avoids us overwriting other
// files that may contain data we don't know about.
for (const method of methods) {
if (method.signature in newModeledMethods) {
methodsByFilename[createFilename(method)] = {};
const filename = createFilename(method);
const canonicalFilename = canonicalizeFilename(filename);

methodsByCanonicalFilename[canonicalFilename] = {};
actualFilenameByCanonicalFilename[canonicalFilename] = filename;
}
}

// First populate methodsByFilename with any existing modeled methods.
for (const [filename, methodsBySignature] of Object.entries(
existingModeledMethods,
)) {
if (filename in methodsByFilename) {
const canonicalFilename = canonicalizeFilename(filename);

if (canonicalFilename in methodsByCanonicalFilename) {
for (const [signature, methods] of Object.entries(methodsBySignature)) {
methodsByFilename[filename][signature] = [...methods];
methodsByCanonicalFilename[canonicalFilename][signature] = [...methods];
}

// Ensure that if a file exists on disk, we use the same capitalization
// as the original file.
actualFilenameByCanonicalFilename[canonicalFilename] = filename;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what will happen if two files with the same canonicalization already exist on disk.

What I believe will happen is that we'll arbitrarily pick one of them as the true filename. Which one we pick will depend on the order they were listed by codeql resolve extensions when we listed existing packs. We'll merge their data together into one file (unless they contain data for the same method signature, in which the canonical file will win) and write all of this data back out to the canonical file. The non-canonical file will remain untouched.

This could be quite bad actually. When the extension or CodeQL reads in data extensions it'll read data from both the canonical and non-canonical files, but when the extension writes data it'll only update the canonical file. Effectively the non-canonical file will because invisible and untouchable by the model editor, but it'll continue to exist and supply model data to CodeQL.

Do you agree with my understanding here, or have I got it wrong?

I'm not sure what else we could do here. Should we display a warning? Or could we go as far as to delete the other file after we've merged its data into the canonical file?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, it'll just pick one arbitrarily and merge all data into that file. I'm not sure if we should be deleting the file because we could be deleting data that is not in the merged file (this can happen if the CodeQL query doesn't return the method). This can only happen on Linux or on a (non-default) case-sensitive filesystem on Windows/macOS.

I'm not sure if we should be handling this case because having two files with the same canonical filename will already fail on Windows/macOS. For example, if you create a model pack with a.yml and A.yml, I suspect that the CodeQL CLI will already arbitrarily pick one of these files on case-insensitive filesystems. Starting from this PR, we should never create filenames with different capitalizations, and I don't really think this is something the model editor should check.

This bug is also very unlikely to happen in Java since package names are usually lowercase, and even in C# it should be very unlikely. The only reason that I ran into this bug is because one database contained multiple test projects with different capitalization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug is also very unlikely to happen in Java since package names are usually lowercase, and even in C# it should be very unlikely.

That's a good point. It is unlikely to happen in practice. It's likely only to happen if people manually create a conflicting file. Given we don't have many users right now and it's unlikely someone hit this already I think we can ignore it for now.

we could be deleting data that is not in the merged file

Doesn't change our decision, but I don't think this is true. For every method we're saving we'll load the full contents of any files that canonicalise that way, and then write the canonical files back out again.

Anyway, let's move on for now. We're agreed we want to do this transformation to the canonical file. If any users hit bugs or confusion around this then we can add extra logging or warnings later on.

}
}

Expand All @@ -209,19 +224,29 @@ function createDataExtensionYamlsByGrouping(
const newMethods = newModeledMethods[method.signature];
if (newMethods) {
const filename = createFilename(method);
const canonicalFilename = canonicalizeFilename(filename);

// Override any existing modeled methods with the new ones.
methodsByFilename[filename][method.signature] = [...newMethods];
methodsByCanonicalFilename[canonicalFilename][method.signature] = [
...newMethods,
];

if (!(canonicalFilename in actualFilenameByCanonicalFilename)) {
Comment thread
robertbrignull marked this conversation as resolved.
Outdated
actualFilenameByCanonicalFilename[canonicalFilename] = filename;
}
}
}

const result: Record<string, string> = {};

for (const [filename, methods] of Object.entries(methodsByFilename)) {
result[filename] = createDataExtensionYaml(
language,
Object.values(methods).flatMap((methods) => methods),
);
for (const [canonicalFilename, methods] of Object.entries(
methodsByCanonicalFilename,
)) {
result[actualFilenameByCanonicalFilename[canonicalFilename]] =
createDataExtensionYaml(
language,
Object.values(methods).flatMap((methods) => methods),
);
}

return result;
Expand Down Expand Up @@ -299,6 +324,13 @@ export function createFilenameForPackage(
return `${prefix}${packageName}${suffix}.yml`;
}

function canonicalizeFilename(filename: string) {
// We want to canonicalize filenames so that they are always in the same format
// for comparison purposes. This is important because we want to avoid overwriting
// data extension YAML files on case-insensitive file systems.
return filename.toLowerCase();
}

function validateModelExtensionFile(data: unknown): data is ModelExtensionFile {
modelExtensionFileSchemaValidate(data);

Expand Down
129 changes: 128 additions & 1 deletion extensions/ql-vscode/test/unit-tests/model-editor/yaml.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ import {
createFilenameForPackage,
loadDataExtensionYaml,
} from "../../../src/model-editor/yaml";
import { CallClassification } from "../../../src/model-editor/method";
import { CallClassification, Method } from "../../../src/model-editor/method";
import { QueryLanguage } from "../../../src/common/query-language";
import { ModeledMethod } from "../../../src/model-editor/modeled-method";

describe("createDataExtensionYaml", () => {
it("creates the correct YAML file", () => {
Expand Down Expand Up @@ -980,6 +981,132 @@ describe("createDataExtensionYamlsForFrameworkMode", () => {
`,
});
});

describe("with same package names but different capitalizations", () => {
const methods: Method[] = [
{
library: "HostTestAppDbContext",
signature:
"Volo.Abp.TestApp.MongoDb.HostTestAppDbContext#get_FifthDbContextDummyEntity()",
packageName: "Volo.Abp.TestApp.MongoDb",
typeName: "HostTestAppDbContext",
methodName: "get_FifthDbContextDummyEntity",
methodParameters: "()",
supported: false,
supportedType: "none",
usages: [],
},
{
library: "CityRepository",
signature:
"Volo.Abp.TestApp.MongoDB.CityRepository#FindByNameAsync(System.String)",
packageName: "Volo.Abp.TestApp.MongoDB",
typeName: "CityRepository",
methodName: "FindByNameAsync",
methodParameters: "(System.String)",
supported: false,
supportedType: "none",
usages: [],
},
];
const newModeledMethods: Record<string, ModeledMethod[]> = {
"Volo.Abp.TestApp.MongoDb.HostTestAppDbContext#get_FifthDbContextDummyEntity()":
[
{
type: "sink",
input: "Argument[0]",
kind: "sql",
provenance: "df-generated",
signature:
"Volo.Abp.TestApp.MongoDb.HostTestAppDbContext#get_FifthDbContextDummyEntity()",
packageName: "Volo.Abp.TestApp.MongoDb",
typeName: "HostTestAppDbContext",
methodName: "get_FifthDbContextDummyEntity",
methodParameters: "()",
},
],
"Volo.Abp.TestApp.MongoDB.CityRepository#FindByNameAsync(System.String)":
[
{
type: "neutral",
kind: "summary",
provenance: "df-generated",
signature:
"Volo.Abp.TestApp.MongoDB.CityRepository#FindByNameAsync(System.String)",
packageName: "Volo.Abp.TestApp.MongoDB",
typeName: "CityRepository",
methodName: "FindByNameAsync",
methodParameters: "(System.String)",
},
],
};
const modelYaml = `extensions:
- addsTo:
pack: codeql/csharp-all
extensible: sourceModel
data: []

- addsTo:
pack: codeql/csharp-all
extensible: sinkModel
data:
- ["Volo.Abp.TestApp.MongoDb","HostTestAppDbContext",true,"get_FifthDbContextDummyEntity","()","","Argument[0]","sql","df-generated"]

- addsTo:
pack: codeql/csharp-all
extensible: summaryModel
data: []

- addsTo:
pack: codeql/csharp-all
extensible: neutralModel
data:
- ["Volo.Abp.TestApp.MongoDB","CityRepository","FindByNameAsync","(System.String)","summary","df-generated"]
`;

it("creates the correct YAML files when there are existing modeled methods", () => {
const yaml = createDataExtensionYamlsForFrameworkMode(
QueryLanguage.CSharp,
methods,
newModeledMethods,
{},
);

expect(yaml).toEqual({
"models/Volo.Abp.TestApp.MongoDB.model.yml": modelYaml,
});
});

it("creates the correct YAML files when there are existing modeled methods", () => {
const yaml = createDataExtensionYamlsForFrameworkMode(
QueryLanguage.CSharp,
methods,
newModeledMethods,
{
"models/Volo.Abp.TestApp.mongodb.model.yml": {
"Volo.Abp.TestApp.MongoDB.CityRepository#FindByNameAsync(System.String)":
[
{
type: "neutral",
kind: "summary",
provenance: "manual",
signature:
"Volo.Abp.TestApp.MongoDB.CityRepository#FindByNameAsync(System.String)",
packageName: "Volo.Abp.TestApp.MongoDB",
typeName: "CityRepository",
methodName: "FindByNameAsync",
methodParameters: "(System.String)",
},
],
},
},
);

expect(yaml).toEqual({
"models/Volo.Abp.TestApp.mongodb.model.yml": modelYaml,
Comment thread
robertbrignull marked this conversation as resolved.
});
});
});
});

describe("loadDataExtensionYaml", () => {
Expand Down