[BlackboxBenchmarking] Add daily cron job to aggregate fuzzer stats#5265
Open
dylanjew wants to merge 10 commits intodylanj/builtin-indexfrom
Open
[BlackboxBenchmarking] Add daily cron job to aggregate fuzzer stats#5265dylanjew wants to merge 10 commits intodylanj/builtin-indexfrom
dylanjew wants to merge 10 commits intodylanj/builtin-indexfrom
Conversation
ffc1050 to
21c24de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds cron job to aggregate fuzzer stats into a daily bigquery table
fuzzer_stats.daily_stats.Context
We will use this to benchmark our blackbox fuzzers, previously we couldn't easily join the fuzzing hours from BigQuery with the bugs filed by clusterfuzz in our dashboards. We need a separate aggregated table because the
fuzzer_statsJobRuntables are all in separate datasets per fuzzer, and we can't simply query across all of those datasets in BigQuery or Plx.The cron job defaults to yesterdays stats so we can run it after the stats are loaded into bigquery, but takes a date flag so we can backfill days as necessary.
Idempotency
Whenever a date is inserted, the schema uses
WRITE_TRUNCATEwith a date partition to overwrite all of the rows for that date. So if the job runs multiple times for the same day, it will not add additional rows but overwrite any previous rows for that date.This simplifies edge cases where the job fails or runs multiple times. We can just make sure the last run of the job succeeds and the data will be correct. It will just pull in the latest data on the JobRun tables for the fuzzers.
Example query:
The remaining work here is to set up the cron job configuration. This PR only adds the logic for the job. crbug.com/501066151
Related PRs:
These migrate the bigquery and datastore schemas to support the new fields
#5264
#5263
Testing
Ran this against the dev data and verified that the fuzzer stats bigquery table is populated.
Logs from dev: https://paste.googleplex.com/4884361662038016
After the job inserted the aggregated rows into BigQuery, I was able to compare the aggregated testcase stats and fuzzing hours between fuzzers for a given date range.