Batch mergeability checks on base branch pushes#744
Conversation
There was a problem hiding this comment.
I'm really sorry that it took me so long to take a look at this, I didn't have much time for reviews in the past 2-3 months.
The implementation looks great! I tested it on a live repository and it seems to work as expected, it would be cool to get rid of the thousands useless GH API requests to reload the mergeability of all open PRs.
I left some comments regarding the implementation. I haven't looked at the failing tests yet.
|
|
||
| let mut after = None; | ||
| loop { | ||
| let (fetched_prs, cursor) = repo_state |
There was a problem hiding this comment.
We usually abstract paging within the client calls that deal with the GH API, so let's:
- Rename
get_pull_requests_batchtoget_pull_requests_by_base_branch - Handle paging inside
get_pull_requests_by_base_branch
The most performant way would be to overlap loading the PR pages from GitHub and then querying our database, but I'd leave that for a separate PR, as this change is already complicated enough :)
| PullRequestStatus::Open | PullRequestStatus::Draft => { | ||
| tracing::info!("Mergeability status unknown, scheduling retry."); | ||
| mq_tx.enqueue_retry(mq_item); | ||
| return Ok(()); |
There was a problem hiding this comment.
If we hit a single PR that is in the unknown state, we would stop the whole check, but that is unnecessary. Let's store the PRs that were unknown in a Vec, and then after we go through the whole batch, decide whether we then retry the whole batch (if there are, say, more than 20 such PRs) or if we enqueue the individual PRs one-by-one (to avoid downloading 1000 PRs again if only a few of them are left with unknown mergeability).
| .get_pull_request(repo_state.repository(), pr.number) | ||
| .await? | ||
| { | ||
| update_pr_with_known_mergeability( |
There was a problem hiding this comment.
Can you please wrap this in a span that mentions the PR number? Because currently in the log we only have this:
TRACE Mergeability check{item=Batch(PullRequestBatchToCheck { repo: kobzol/bors-kindergarten2, base_branch: "main", attempt: 2, conflict_source: ... })}
So it's not really possible to see from the log which PR is getting updated.
| tracing::warn!("Cannot find DB pull request for {fetched_pr:?}"); | ||
| } | ||
| }) => { | ||
| if *attempt >= BATCH_MAX_RETRIES { |
There was a problem hiding this comment.
Could you please add logging at the start and end of the batch check? And record the time it took. So that we can easily see in logs how fast it is.
| ) -> anyhow::Result<()> { | ||
| let affected_prs = db | ||
| .set_stale_mergeability_status_by_base_branch(repo_state.repository(), &payload.branch) | ||
| db.set_stale_mergeability_status_by_base_branch(repo_state.repository(), &payload.branch) |
There was a problem hiding this comment.
Let's re-add the previous functionality, where we check the affected_prs, and only schedule the batch check if at least one PR was affected.
| } | ||
|
|
||
| #[derive(Debug)] | ||
| pub struct PullRequestSummary { |
There was a problem hiding this comment.
| pub struct PullRequestSummary { | |
| /// Basic information about a PR needed to handle mergeability checks and label changes. | |
| pub struct PullRequestInfo { |
Closes #707