Export: enhance trimming using build statement metadata by DuBento · Pull Request #3530 · thought-machine/please

DuBento · 2026-05-01T14:45:16Z

Enhancements to plz export, moving from a basic target-level trimming (using gc.RewriteFile) to build statement-level trimming, including only the required build rules and subincludes.
For consistency, we format all the exported BUILD files.

Changelog:

Introduced PackageMetadata to track the relationship between BUILD file statements, the targets they generate, and the subincludes they require. Made optional and enabled for the export.
Refactored src/export/export.go to enforce better separation of the DefaultExporter (for trimming) and NoTrimExporter.
Added logic to parse BUILD files and selectively write back statements based on whether the generated targets are part of the export set.
Implemented "minimal subinclude" generation, which rewrites subinclude() calls to only include labels actually used by the exported targets.
Update and added some of the e2e to reflect the changes in implementation.

ToDo:

multi-threaded export.
fix lazy parsing for export. Some targets required by the export are missing from the graph.

… build targets

setting subincludes at package level instead of at target level

- register subinclude statements in the package metadata - filter subincludes label - export all non build_target related statements

…dary build def with sources the updated test uses non-standard child naming to validate new trimming logic

unclear if this is necessary

toastwaffle · 2026-05-08T14:49:25Z

+		SubrepoName:       subrepo,
+		targets:           map[string]*BuildTarget{},
+		Outputs:           map[string]*BuildTarget{},
+		BuildFileMetadata: newNoopPackageMetadata(),


Why do we always use Noop here?

We default to noop and override if the metadata option is set. Please take a look at the unified NewPackage. Does it make sense?

toastwaffle · 2026-05-08T14:55:57Z

+// RegisterStatement maps a build statement to target in the package.
+func (pkg *Package) RegisterStatement(target *BuildTarget, stmtProvider BuildStatementProvider) {
+	pkg.mutex.Lock()
+	defer pkg.mutex.Unlock()


Why are you only protecting writes with the mutex, and not reads? Go will panic if there is a concurrent read and write.

tbh, I'd expect any locking to be done inside the BuildFileMetadata implementation (which would avoid paying the cost of locking when we're using the Noop implementation). Better still would be to use concurrency-safe datastructures wherever possible (which most likely use locks under the hood anyway, but might not)

I was going to justify the use of *Package level methods for reusing the mutex. I initially enriched the AddTarget logic with a BuildStatement, reusing that lock but eventually separated into different methods.

Read is currently only done synchronously, since the export doesn't (yet) support multi threading. I fully agreed and will migrate BuildFileMetadata to concurrent-safe maps. Thanks for raising this.

Updated. I've used RWMutex instead of dedicated data structures. Mostly because of simplicity and avoiding the performance overhead but also to follow the example of Package and BuildTarget. I can be persuaded in the other direction but I think the parsing is sparse and the operation quick enough that we won't be waiting on the locks that often.

toastwaffle · 2026-05-11T09:24:36Z

+	state     *core.BuildState
+	targetDir string
+
+	exportedTargets map[*core.Package]map[core.BuildLabel]bool


Using a pointer as a key always makes me uncomfortable; can we avoid this?

More generally, could we avoid the nested map? core.BuildLabel includes the package name anyway, right?

We can probably avoid using the pointer has key by using the package label, but we will have to look up in the graph each time. I was using the pointer directly assuming some consistency of no repeated package instances. Should I use the string instead and lookup in the graph each time we want to use it?

The nested map is useful for looping though the exported target per each package (and for efficient verification of visited targets). What's your opinion, should I try to unnest?

toastwaffle · 2026-05-11T09:28:41Z

+		targetPath := filepath.Join(be.targetDir, file)
+		if err := os.RemoveAll(targetPath); err != nil {
+			log.Fatalf("failed to remove .plzconfig file %s: %v", file, err)
 		}
-		if err := fs.CopyFile(file, path, 0); err != nil {
+		if err := fs.CopyFile(file, targetPath, 0); err != nil {
 			log.Fatalf("failed to copy .plzconfig file %s: %v", file, err)
 		}


Should we be using fs.RecursiveCopy like we do most places below?

Why should we? Can .plzconfigs be directories?

toastwaffle · 2026-05-11T15:23:12Z

 			}
-			if ignoreDirectories[d.Name()] {
-				return filepath.SkipDir
+			cursor = bStmt.Start


Not sure why the linter thinks this is ineffectual; have you got a test case which proves it?

toastwaffle · 2026-05-11T15:25:43Z

+		return ""
+	}
+
+	sort.Sort(filteredLabels)


In the interests of making minimal changes to the export, I think we should remove this sort?

Since we use a map to register these labels, the order of insertion is not enforced. Sorting ensures that the output is deterministic. We could possibly retain some of the original order by changing some of the logic in PackageMetadata if you consider this important, however, if we keep the formatting I believe it sorts the subincludes.

toastwaffle · 2026-05-12T08:01:10Z

+	return func() *core.BuildStatement {
+		stmtScope := s
+		for curr := s; curr != nil; curr = curr.callerScope {
+			if curr.pkg != nil && curr.filename == s.pkg.Filename {


Think we need more of a comment here and more in the doc comment to explain this condition and why we don't break once it's true (which I'm guessing is to handle somebody defining a function in a BUILD file? Do we have an export test case for that? And an export test case for statements inside loops and if-statements?).

If I understand this correctly, we're effectively looking for the highest-level function call which is inside a BUILD file (as opposed to calls within other functions)?

Can we unit-test this function and ActiveSubincludes?

toastwaffle · 2026-05-12T08:04:14Z

 	if f.nativeCode != nil {
 		if f.kwargs {
-			return f.callNative(s.NewScope("<builtin code>", 0), c)
+			return f.callNative(s.NewScope("", 0), c)


Why this change? I think the <builtin code> thing was useful for debugging

I added that in the previous PR, but since that argument is supposed to be a filename I'm not sure it makes sense. If we attempt to open a file with it, it should fail the same as with using "", I just thought it could be misleading but I'm happy to revert this.

toastwaffle · 2026-05-12T08:05:46Z

+func (s *scope) ActiveSubincludes() core.SubincludesLabelProvider {
+	return func() core.BuildLabels {
+		seen := map[core.BuildLabel]bool{}
+		for curr := s; curr != nil; curr = curr.callerScope {


It is not immediately obvious to me how or why this works. I'd expect us to need to be looking for sibling subinclude statements, rather than traversing through call stacks. Please write a nice explicit and verbose comment

duarte added 30 commits April 29, 2026 19:28

Package level metadata that includes a mapping of build statements to…

0ef91bc

… build targets

export using BuildFileMetadata logic

63f1b0d

register caller scope for callstack-like traversal

419bd56

bufio writer for build statements

8e1bd00

rewrite export logic into explicit flow

258b170

export targets related to build statement

4110735

enrich target with subincludes by looping though all scopes

a4252c9

separate buildstmt register from adding target

82f117f

select and write subincludes

eb657de

setting subincludes at package level instead of at target level

skip statement for preloaded subincludes

6535794

test: trim subincludes

8e6c9c0

rename file

9f0b4fe

use slices.collect and sort interface

0e597c6

rewrite export with 2 concrete interfaces: default and notrim

de76e2c

notrim: full copy BUILD file and visit sources

03ed361

double new lines between targets

717aba1

suppress diff output when enforcing repo differences

94160bf

export by filtering the original BUILD file

582319e

- register subinclude statements in the package metadata - filter subincludes label - export all non build_target related statements

subincludes use label short string

e6a5293

Simplify package filtering method into a more explicit "switch" case

ea329c3

skip subrepos and internal packages when writing package file

1a47c22

fix reusing err var resulted in failed filtering

a35119a

doc strings for package metadata

9c1522c

export missing doc strings

9395a59

test: named go_repo and change testify for slimmer UUID

3c3d3d0

export dependencies of subrepos - 13/14 tests passing

13e95cb

test: internal repo test using temp directory to avoid stale data

28b5441

fix: 0 label subinclude

781af3a

adjusting dependency lookup and adjacent target test to include secon…

2f8355c

…dary build def with sources the updated test uses non-standard child naming to validate new trimming logic

test: add custom tool to native test

3e89c85

duarte added 11 commits April 30, 2026 13:32

test: go_test export with several deps

9c5af3e

optional metadata parsing

922da6f

test: minimal subinclude statement

5aeb360

collect map keys on active subinclude labels

517486a

non-fatal warning for missing source while exporting

f8f8e3f

skip internal package export

8aa98e3

unclear if this is necessary

move some fatal to error and continue

5fa64f4

missing docstrings

e60dbb6

run go fmt and plz fmt

c7409ab

move to error and continue for target lookup

be87020

rename new parser method on test files

6f73f78

toastwaffle reviewed May 11, 2026

View reviewed changes

use pkg.Metadata directly and remove intermediate pkg methods

762c016

toastwaffle reviewed May 11, 2026

View reviewed changes

toastwaffle reviewed May 12, 2026

View reviewed changes

DuBento added 11 commits May 12, 2026 11:04

mutex in packagemetadata

406f95a

update stmt provider to avoid dereference

b86ff9f

NewPackage with variadic optional functions

c7cd118

infof to debugf

faf8c33

package metadata doc comments improvements

dd8cfca

improve doc comments and adjust method visibility for export.go

084f233

hide build output when testing export

bf0f734

open and write of exported package file merged into the same method

33d2bc2

update export_test.go with suggestions

54f577c

apply review suggestions to export.go

ae5f520

rename and doc fields for scope

5f0e549

Conversation

DuBento commented May 1, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants