-
Notifications
You must be signed in to change notification settings - Fork 226
Fix incorrect model files on case-insensitive file systems #3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what will happen if two files with the same canonicalization already exist on disk.
What I believe will happen is that we'll arbitrarily pick one of them as the true filename. Which one we pick will depend on the order they were listed by
codeql resolve extensionswhen we listed existing packs. We'll merge their data together into one file (unless they contain data for the same method signature, in which the canonical file will win) and write all of this data back out to the canonical file. The non-canonical file will remain untouched.This could be quite bad actually. When the extension or CodeQL reads in data extensions it'll read data from both the canonical and non-canonical files, but when the extension writes data it'll only update the canonical file. Effectively the non-canonical file will because invisible and untouchable by the model editor, but it'll continue to exist and supply model data to CodeQL.
Do you agree with my understanding here, or have I got it wrong?
I'm not sure what else we could do here. Should we display a warning? Or could we go as far as to delete the other file after we've merged its data into the canonical file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, it'll just pick one arbitrarily and merge all data into that file. I'm not sure if we should be deleting the file because we could be deleting data that is not in the merged file (this can happen if the CodeQL query doesn't return the method). This can only happen on Linux or on a (non-default) case-sensitive filesystem on Windows/macOS.
I'm not sure if we should be handling this case because having two files with the same canonical filename will already fail on Windows/macOS. For example, if you create a model pack with
a.ymlandA.yml, I suspect that the CodeQL CLI will already arbitrarily pick one of these files on case-insensitive filesystems. Starting from this PR, we should never create filenames with different capitalizations, and I don't really think this is something the model editor should check.This bug is also very unlikely to happen in Java since package names are usually lowercase, and even in C# it should be very unlikely. The only reason that I ran into this bug is because one database contained multiple test projects with different capitalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. It is unlikely to happen in practice. It's likely only to happen if people manually create a conflicting file. Given we don't have many users right now and it's unlikely someone hit this already I think we can ignore it for now.
Doesn't change our decision, but I don't think this is true. For every method we're saving we'll load the full contents of any files that canonicalise that way, and then write the canonical files back out again.
Anyway, let's move on for now. We're agreed we want to do this transformation to the canonical file. If any users hit bugs or confusion around this then we can add extra logging or warnings later on.