Export: enhance trimming using build statement metadata#3530
Conversation
setting subincludes at package level instead of at target level
- register subinclude statements in the package metadata - filter subincludes label - export all non build_target related statements
…dary build def with sources the updated test uses non-standard child naming to validate new trimming logic
unclear if this is necessary
| SubrepoName: subrepo, | ||
| targets: map[string]*BuildTarget{}, | ||
| Outputs: map[string]*BuildTarget{}, | ||
| BuildFileMetadata: newNoopPackageMetadata(), |
There was a problem hiding this comment.
Why do we always use Noop here?
There was a problem hiding this comment.
We default to noop and override if the metadata option is set. Please take a look at the unified NewPackage. Does it make sense?
| // RegisterStatement maps a build statement to target in the package. | ||
| func (pkg *Package) RegisterStatement(target *BuildTarget, stmtProvider BuildStatementProvider) { | ||
| pkg.mutex.Lock() | ||
| defer pkg.mutex.Unlock() |
There was a problem hiding this comment.
Why are you only protecting writes with the mutex, and not reads? Go will panic if there is a concurrent read and write.
tbh, I'd expect any locking to be done inside the BuildFileMetadata implementation (which would avoid paying the cost of locking when we're using the Noop implementation). Better still would be to use concurrency-safe datastructures wherever possible (which most likely use locks under the hood anyway, but might not)
There was a problem hiding this comment.
I was going to justify the use of *Package level methods for reusing the mutex. I initially enriched the AddTarget logic with a BuildStatement, reusing that lock but eventually separated into different methods.
Read is currently only done synchronously, since the export doesn't (yet) support multi threading. I fully agreed and will migrate BuildFileMetadata to concurrent-safe maps. Thanks for raising this.
There was a problem hiding this comment.
Updated. I've used RWMutex instead of dedicated data structures. Mostly because of simplicity and avoiding the performance overhead but also to follow the example of Package and BuildTarget. I can be persuaded in the other direction but I think the parsing is sparse and the operation quick enough that we won't be waiting on the locks that often.
| state *core.BuildState | ||
| targetDir string | ||
|
|
||
| exportedTargets map[*core.Package]map[core.BuildLabel]bool |
There was a problem hiding this comment.
Using a pointer as a key always makes me uncomfortable; can we avoid this?
More generally, could we avoid the nested map? core.BuildLabel includes the package name anyway, right?
There was a problem hiding this comment.
We can probably avoid using the pointer has key by using the package label, but we will have to look up in the graph each time. I was using the pointer directly assuming some consistency of no repeated package instances. Should I use the string instead and lookup in the graph each time we want to use it?
The nested map is useful for looping though the exported target per each package (and for efficient verification of visited targets). What's your opinion, should I try to unnest?
| targetPath := filepath.Join(be.targetDir, file) | ||
| if err := os.RemoveAll(targetPath); err != nil { | ||
| log.Fatalf("failed to remove .plzconfig file %s: %v", file, err) | ||
| } | ||
| if err := fs.CopyFile(file, path, 0); err != nil { | ||
| if err := fs.CopyFile(file, targetPath, 0); err != nil { | ||
| log.Fatalf("failed to copy .plzconfig file %s: %v", file, err) | ||
| } |
There was a problem hiding this comment.
Should we be using fs.RecursiveCopy like we do most places below?
There was a problem hiding this comment.
Why should we? Can .plzconfigs be directories?
| } | ||
| if ignoreDirectories[d.Name()] { | ||
| return filepath.SkipDir | ||
| cursor = bStmt.Start |
There was a problem hiding this comment.
Not sure why the linter thinks this is ineffectual; have you got a test case which proves it?
| return "" | ||
| } | ||
|
|
||
| sort.Sort(filteredLabels) |
There was a problem hiding this comment.
In the interests of making minimal changes to the export, I think we should remove this sort?
There was a problem hiding this comment.
Since we use a map to register these labels, the order of insertion is not enforced. Sorting ensures that the output is deterministic. We could possibly retain some of the original order by changing some of the logic in PackageMetadata if you consider this important, however, if we keep the formatting I believe it sorts the subincludes.
| return func() *core.BuildStatement { | ||
| stmtScope := s | ||
| for curr := s; curr != nil; curr = curr.callerScope { | ||
| if curr.pkg != nil && curr.filename == s.pkg.Filename { |
There was a problem hiding this comment.
Think we need more of a comment here and more in the doc comment to explain this condition and why we don't break once it's true (which I'm guessing is to handle somebody defining a function in a BUILD file? Do we have an export test case for that? And an export test case for statements inside loops and if-statements?).
If I understand this correctly, we're effectively looking for the highest-level function call which is inside a BUILD file (as opposed to calls within other functions)?
Can we unit-test this function and ActiveSubincludes?
| if f.nativeCode != nil { | ||
| if f.kwargs { | ||
| return f.callNative(s.NewScope("<builtin code>", 0), c) | ||
| return f.callNative(s.NewScope("", 0), c) |
There was a problem hiding this comment.
Why this change? I think the <builtin code> thing was useful for debugging
There was a problem hiding this comment.
I added that in the previous PR, but since that argument is supposed to be a filename I'm not sure it makes sense. If we attempt to open a file with it, it should fail the same as with using "", I just thought it could be misleading but I'm happy to revert this.
| func (s *scope) ActiveSubincludes() core.SubincludesLabelProvider { | ||
| return func() core.BuildLabels { | ||
| seen := map[core.BuildLabel]bool{} | ||
| for curr := s; curr != nil; curr = curr.callerScope { |
There was a problem hiding this comment.
It is not immediately obvious to me how or why this works. I'd expect us to need to be looking for sibling subinclude statements, rather than traversing through call stacks. Please write a nice explicit and verbose comment
Enhancements to plz export, moving from a basic target-level trimming (using gc.RewriteFile) to build statement-level trimming, including only the required build rules and subincludes.
For consistency, we format all the exported BUILD files.
Changelog:
ToDo: