Skip to content

Export: enhance trimming using build statement metadata#3530

Open
DuBento wants to merge 53 commits into
thought-machine:masterfrom
DuBento:export-from-package-metadata
Open

Export: enhance trimming using build statement metadata#3530
DuBento wants to merge 53 commits into
thought-machine:masterfrom
DuBento:export-from-package-metadata

Conversation

@DuBento
Copy link
Copy Markdown
Contributor

@DuBento DuBento commented May 1, 2026

Enhancements to plz export, moving from a basic target-level trimming (using gc.RewriteFile) to build statement-level trimming, including only the required build rules and subincludes.
For consistency, we format all the exported BUILD files.

Changelog:

  • Introduced PackageMetadata to track the relationship between BUILD file statements, the targets they generate, and the subincludes they require. Made optional and enabled for the export.
  • Refactored src/export/export.go to enforce better separation of the DefaultExporter (for trimming) and NoTrimExporter.
  • Added logic to parse BUILD files and selectively write back statements based on whether the generated targets are part of the export set.
  • Implemented "minimal subinclude" generation, which rewrites subinclude() calls to only include labels actually used by the exported targets.
  • Update and added some of the e2e to reflect the changes in implementation.

ToDo:

  • multi-threaded export.
  • fix lazy parsing for export. Some targets required by the export are missing from the graph.

duarte added 30 commits April 29, 2026 19:28
setting subincludes at package level instead of at target level
- register subinclude statements in the package metadata
- filter subincludes label
- export all non build_target related statements
…dary build def with sources

the updated test uses non-standard child naming to validate new trimming logic
Comment thread src/core/package.go Outdated
Comment thread src/core/package.go Outdated
SubrepoName: subrepo,
targets: map[string]*BuildTarget{},
Outputs: map[string]*BuildTarget{},
BuildFileMetadata: newNoopPackageMetadata(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we always use Noop here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to noop and override if the metadata option is set. Please take a look at the unified NewPackage. Does it make sense?

Comment thread src/core/package.go
Comment thread src/core/package.go Outdated
Comment thread src/core/package.go Outdated
// RegisterStatement maps a build statement to target in the package.
func (pkg *Package) RegisterStatement(target *BuildTarget, stmtProvider BuildStatementProvider) {
pkg.mutex.Lock()
defer pkg.mutex.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you only protecting writes with the mutex, and not reads? Go will panic if there is a concurrent read and write.

tbh, I'd expect any locking to be done inside the BuildFileMetadata implementation (which would avoid paying the cost of locking when we're using the Noop implementation). Better still would be to use concurrency-safe datastructures wherever possible (which most likely use locks under the hood anyway, but might not)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to justify the use of *Package level methods for reusing the mutex. I initially enriched the AddTarget logic with a BuildStatement, reusing that lock but eventually separated into different methods.

Read is currently only done synchronously, since the export doesn't (yet) support multi threading. I fully agreed and will migrate BuildFileMetadata to concurrent-safe maps. Thanks for raising this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I've used RWMutex instead of dedicated data structures. Mostly because of simplicity and avoiding the performance overhead but also to follow the example of Package and BuildTarget. I can be persuaded in the other direction but I think the parsing is sparse and the operation quick enough that we won't be waiting on the locks that often.

Comment thread src/core/package_metadata.go Outdated
Comment thread src/core/package_metadata.go Outdated
Comment thread src/core/package_metadata.go Outdated
Comment thread src/core/package_metadata.go Outdated
Comment thread src/export/export.go Outdated
Comment thread src/export/export.go
state *core.BuildState
targetDir string

exportedTargets map[*core.Package]map[core.BuildLabel]bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a pointer as a key always makes me uncomfortable; can we avoid this?

More generally, could we avoid the nested map? core.BuildLabel includes the package name anyway, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably avoid using the pointer has key by using the package label, but we will have to look up in the graph each time. I was using the pointer directly assuming some consistency of no repeated package instances. Should I use the string instead and lookup in the graph each time we want to use it?

The nested map is useful for looping though the exported target per each package (and for efficient verification of visited targets). What's your opinion, should I try to unnest?

Comment thread src/export/export.go
Comment on lines +113 to 119
targetPath := filepath.Join(be.targetDir, file)
if err := os.RemoveAll(targetPath); err != nil {
log.Fatalf("failed to remove .plzconfig file %s: %v", file, err)
}
if err := fs.CopyFile(file, path, 0); err != nil {
if err := fs.CopyFile(file, targetPath, 0); err != nil {
log.Fatalf("failed to copy .plzconfig file %s: %v", file, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using fs.RecursiveCopy like we do most places below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we? Can .plzconfigs be directories?

Comment thread src/export/export.go Outdated
Comment thread src/export/export.go Outdated
Comment thread src/export/export.go
}
if ignoreDirectories[d.Name()] {
return filepath.SkipDir
cursor = bStmt.Start
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the linter thinks this is ineffectual; have you got a test case which proves it?

Comment thread src/export/export.go
return ""
}

sort.Sort(filteredLabels)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interests of making minimal changes to the export, I think we should remove this sort?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use a map to register these labels, the order of insertion is not enforced. Sorting ensures that the output is deterministic. We could possibly retain some of the original order by changing some of the logic in PackageMetadata if you consider this important, however, if we keep the formatting I believe it sorts the subincludes.

Comment thread src/export/export_test.go
Comment thread src/export/export_test.go Outdated
Comment thread src/export/export_test.go Outdated
return func() *core.BuildStatement {
stmtScope := s
for curr := s; curr != nil; curr = curr.callerScope {
if curr.pkg != nil && curr.filename == s.pkg.Filename {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we need more of a comment here and more in the doc comment to explain this condition and why we don't break once it's true (which I'm guessing is to handle somebody defining a function in a BUILD file? Do we have an export test case for that? And an export test case for statements inside loops and if-statements?).

If I understand this correctly, we're effectively looking for the highest-level function call which is inside a BUILD file (as opposed to calls within other functions)?

Can we unit-test this function and ActiveSubincludes?

Comment thread src/parse/asp/objects.go
if f.nativeCode != nil {
if f.kwargs {
return f.callNative(s.NewScope("<builtin code>", 0), c)
return f.callNative(s.NewScope("", 0), c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? I think the <builtin code> thing was useful for debugging

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that in the previous PR, but since that argument is supposed to be a filename I'm not sure it makes sense. If we attempt to open a file with it, it should fail the same as with using "", I just thought it could be misleading but I'm happy to revert this.

Comment thread src/parse/asp/interpreter.go Outdated
func (s *scope) ActiveSubincludes() core.SubincludesLabelProvider {
return func() core.BuildLabels {
seen := map[core.BuildLabel]bool{}
for curr := s; curr != nil; curr = curr.callerScope {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not immediately obvious to me how or why this works. I'd expect us to need to be looking for sibling subinclude statements, rather than traversing through call stacks. Please write a nice explicit and verbose comment

Comment thread src/parse/asp/interpreter.go Outdated
Comment thread src/parse/asp/targets.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants