FEAT: extend profiling to child processes#431
Conversation
|
Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens. |
f9a37af to
aca4e2c
Compare
|
Unfortunately there's too little context to determine why the tests are failing on other platforms. Heck I can't even replicate the macOS failures on my machine with matching dep versions. Just wrote in more code for extracting the debug outputs, force-pushed, and hopefully I will have more clues for what to work on. |
|
Added a ton of logging/debug messages, a few debug-only config options and a
Footnotes
|
|
WRT to the existing multithreaded case, IIRC the main point of that is just to ensure we don't hang. I could be misremembering. For the forkserver issue, from what I understand that's a long lived process, so maybe some forkerserve state debugging (not sure if your code does this or not, I haven't looked at it yet) def debug_forkserver_state(label):
import os
from multiprocessing import forkserver
fs = forkserver._forkserver # private CPython state
pid = getattr(fs, "_forkserver_pid", None)
# "Did *this process* already know about / launch a forkserver?"
known_forkserver_pid = pid
known_forkserver_started = pid is not None
# "Was *this current process* itself started by a forkserver?"
started_by_forkserver = forkserver.get_inherited_fds() is not None
# Best-effort liveness check for the known forkserver PID, if any.
# This is optional, and mostly useful in the parent/originating process.
forkserver_pid_alive = None
if pid is not None:
try:
os.kill(pid, 0)
except OSError:
forkserver_pid_alive = False
else:
forkserver_pid_alive = True
print(
f"[{label}] "
f"pid={os.getpid()} ppid={os.getppid()} "
f"known_forkserver_pid={known_forkserver_pid!r} "
f"known_forkserver_started={known_forkserver_started} "
f"forkserver_pid_alive={forkserver_pid_alive} "
f"started_by_forkserver={started_by_forkserver}"
)And maybe also check if the main module attributes differ in any meaningful way? getattr(sys.modules['__main__'].__spec__, 'name', None)
getattr(sys.modules['__main__'], '__file__', None)
multiprocessing.get_start_method()But IDK, your guess is probably better than mine at this point. |
|
Thanks for the input! May have to take a closer look at the I have a piece of good news and a half:
Footnotes
|
|
Figured it out, the issue is that:
I think the bug is fixed (will push shortly), but I've noticed significant performance regression (about 100% slowdown of |
|
Tests seem flaky... there is no good reason that 6e60a6a failed on building for |
|
Hi @Erotemic, I think this is ready for review if you have the time... sorry for the metric ton of code. |
Erotemic
left a comment
There was a problem hiding this comment.
It will take me a while to fully go through this, but here is a start.
| ... print("We probably didn't count up to 100 but whatever") | ||
| We probably didn't count up to 100 but whatever | ||
|
|
||
| >>> with ( # doctest: +NORMALIZE_WHITESPACE |
There was a problem hiding this comment.
Note, because we use xdoctest, you should be able to put # doctest: +NORMALIZE_WHITESPACE as a standalone comment above the code for it to apply to every line after. Although I don't often use NORMALIZE_WHITESPACE, so I can't say I'm 100% confident in that.
| dir=get_path('purelib'), | ||
| ) | ||
| try: | ||
| pth_content = 'import {0}; {0}.load_pth_hook({1})'.format( |
There was a problem hiding this comment.
This has the potential to cause some import overhead. IIUC, we are going to effectively import the entire line_profiler.init. I'm not sure if that's going to be noticeable or not. One idea is to make a separate _line_profiler_hooks package that's just a minimal top level module for very fast import, but that may be overengineering. I'd need to think about it and probably test it.
There was a problem hiding this comment.
One data point on my machine, though I think other platforms are most likely gonna behave the same: importing line_profiler._child_process_profiling.pth_hook causes the following to be imported:
- All top-level
line_profilersubmodules except the following:~.cleanup~.curated_profiling~._threading_patches~.ipython_extension
- None of the
line_profiler.autoprofilecomponents, and - None of the
line_profiler._child_process_profilingcomponents except~~.pth_hookitself.
So, as one would intuit, most of the "core" stuff directly used by line_profiler.LineProfiler and @line_profiler.profile. Could be worthwhile to set it up as a separate namespace... and it should be trivial to configure, given that we already create two separate namespaces (line_profiler and kernprof).
The remaining question is more of a design one: should write_pth_hook() stay in said namespace for symmetry with load_pth_hook(), or should it become an instance method of LineProfilingCache's?
| ) | ||
| if not fnames: | ||
| return LineStats.get_empty_instance() | ||
| return LineStats.from_files(*fnames, on_defective='ignore') |
There was a problem hiding this comment.
Maybe on_defective='warn' makes more sense here?
There was a problem hiding this comment.
Could be. The original motivation is that:
- Since the
._setup_in_child_process()code is run in all child processes at startup incl. unused worker processes and the fork-server process, each such process occupies and secures a tempfile name (*.lprof) by touching it. Said tempfile is supposed to be written to when the corresponding Python process terminates. - But then some of these files could result in errors upon reading: processes killed via signals before cleanup is run result in empty tempfiles, and those killed mid-cleanup make for corrupted ones.
- I opted to suppress the error/warning messages given that they originate directly from
pickleand are thus not the most intuitive/helpful.
But maybe that's just indication that we should've vet the input files better before passing them onto pickle. And given that the rest of the PR has seen a lot of iteration after these lines were first written, we should probably get a lot fewer false alarms from the warnings, and where we do have something to warn it will be of actual problematic cases (e.g. unclean exits resulting in incomplete data).
| """ | ||
| # XXX: why can `coverage` get away with not doing all these | ||
| # lock-file hijinks and just patching `BaseProcess._bootstrap()`? | ||
| def get_poller_args( |
There was a problem hiding this comment.
Do these need to be nested functions? Would a helper class make more sense than relying on closures / be more testable?
Do we have a test for a worker that does some non trivial processing and then intentionally terminates so we can have an explicit comparison between the unpatched and patched terminate and make sure they both behave similar enough?
There was a problem hiding this comment.
Yeah I should probably refactor wrap_terminate() for readability. We've barely used closure variables in the nested funcs anyway (just cache in process_has_returned()); still, that can be easily remedied, given that I can just pass cache to the function in the _Poller.poll_until() constructor.
We do test Process.terminate() as shown in the coverage report, albeit indirectly:
- In normal ("happy-path") code execution
Process.terminate()isn't even called – AFAIK it only happens when the parallel workload errored out, which for whatever reason may leave the child process stuck "in limbo", where e.g.atexitcleanup callbacks may or may not be executed, or be in various stages thereof. - Such children are
.terminate()-d as a part of finalization of the parentPool(seePool.__init__()and.terminate()), which in particular does not wait for child-process cleanup, thus necessitating the patch. - In order to test this we have tests1 where the parallel workload can be "set up for failure". Those are where the coverage on
wrap_terminate()comes from. A quick check is how running the test suite with-k "test_child and success"only gives 52% coverage onmultiprocessing_patches.pywhile-k "test_child and failure"gives 72%.
Given that the existing code do call vanilla_impl() in a finally clause, it is probably the closest we can get to the unpatched behavior in that we (1) do eventually send the SIGTERM, and (2) before that, give child processes (as far as possible – which isn't 100% of the time, hence the necessity of the timeout) the required grace to run cleanup and write profiling data. Anyhow, maybe we can look into tests where we explicitly create a Process object and later .terminate() it, if that would be more assuring.
Footnotes
-
Granted, we just sum over consecutive integers in those tests. They have the benefit that the workload size corresponds directly to the num-hits on the loop line, and hence we can keep track of whether the gathered profiling data are complete. If that's too trivial maybe we can do Fibonacci or something instead. ↩
| self, obj: Any, attr: str, value: Any, *, | ||
| name: str | None = None, | ||
| cleanup: bool = True, | ||
| priority: float = 0, |
There was a problem hiding this comment.
Thanks, good catch!
| ) | ||
|
|
||
|
|
||
| @pytest.mark.retry(_NUM_RETRIES, |
There was a problem hiding this comment.
Why is a retry necessary here? Is there something non deterministic going on? Any chance we are hiding a race condition? Similar question with other pytest.mark.retry marked tests.
There was a problem hiding this comment.
The short answer
Yes, workers with failing workload ends up in limbo sometimes, as mentioned in the other comment. It should theoretically be possible to drop the retry on POSIX (and even the whole wrap_terminate() deal) because of the use of SIGTERM handlers, but the consensus seems to be that signal handling is busted on Windows, so we still need the timeout which is by nature flaky.
The long answer
- Again, processes receiving parallel workloads which don't error out seem to always exit cleanly and never need to be
.terminate()-d. Those which don't... don't. Which sets up the potential race condition: the failing child could be cleaning up, stuck in limbo (possibly waiting for communication from the parent?), or whatever; meanwhile the parent don't really care either way, because it already received all the results (i.e. pickled return-value objects or exceptions) from the children, and is ready to.terminate()whichever child process that remain alive. - In earlier iterations of the PR I just set each child process up to manage a lock file, and had the parent process wait without a timeout for said file before sending the
SIGTERM(which is essentially the only thing that the vanillaProcess.terminate()does). Which then caused the tests to sometimes hang indefinitely, meaning that in some failing child processes we never naturally got to the point where thefinally: cache.cleanup()clause is executed. Unfortunately such failure also seems to be nondeterministic, at least at a cursory glance. - Hence the timeout. Which did alleviate the issue in that we do get all of the profiling data most of the time – because the parent process no longer willy-nilly kills the children, at least until the timeout expires. This handles cases where the child process is healthy enough to actually attempt cleanup, but would have been prevented from doing so by the parent's
.terminate()-ing it. - But then of course there are still corner cases for when the children is FUBAR – and I think it can be argued that those cases are what the
Process.terminate()method is for in the first place. Like, in a perfect would the child process would've just executed itsatexithooks and exited non-zero, right after raising the error and pushing it to the communication queue with the parent. But obviously that doesn't always happen. In those cases anything goes... hence the retries.
Notably, coverage also seems to be having trouble with consistently handling data collection when child processes are .terminate()-d, and just gives up while on Windows:
@pytest.mark.skipif(env.WINDOWS, reason="SIGTERM doesn't work the same on Windows")
@pytest.mark.flaky(max_runs=3) # Sometimes a test fails due to inherent randomness. Try more times.
class SigtermTest(CoverageTest):What do we do?
I didn't like having to retry either – but here we are. But maybe we can at least refactor and separate the tests, so that we make sure that the mark is only applied to cases where it might be needed/justified (i.e. tests with failing parallel workloads and on Windows).
That or we just skip (as coverage does) or XFAIL on Windows.
|
I have a new idea for improving determinism in profiling
However, this strat does have complications:
|
|
As it is now, the code (at least on non-Windows platforms) already reliably and deterministically captures all profiling data related to tasks set to the child processes. If the child process gets to execute any task at all, it must have gone through setup which allows for the capturing of profiling data. What however is more difficult to handle consistently, and is causing most of the recent pipeline failures (other than 25785347402, which was entirely on me), is what happens with child processes which terminates without ever having received a task.1 While currently the Such failure calls into question:
Footnotes
|
|
Added the PID bookkeeping, but there seems to be an unrelated Heisenbug which I'm struggling to pin down. Every ≈ 1000–2000 runs of The newest pipeline failures are semi-related:
|
|
@Erotemic yikes, unfortunately job 76507649976 seems to have gotten stuck despite multiple precautions (timeouts for dubious sections, hundreds/thousands of rounds of local tests)... worse, its the Mac job that's stuck (which I suppose is the most expensive on CI), despite that being the most thoroughly tested env, since it's also my local. I don't have the perms to cancel the job; please do so ASAP before it continues to rack up compute. Terribly sorry for this. |
|
Yikes! Just saw the message. I don't see an immediate way to cancel it. Maybe it timed out and it just looks like it is churning? It does say: "The job has exceeded the maximum execution time of 6h0m0s", and in this PR I see: Cancelled after 365m. So I think it is not running. |
|
Yep, I think Actions have a default 6 h limit (not sure if it's the entire pipeline or individual jobs) on GitHub. This happened on Monday so said default timeout expired a while before intervention... it may not be ideal, but we can probably put in loose-ish timeouts for For the (probably1) offending test We can probably mitigate this by folding the cases tested by Footnotes
|
|
When trying to sniff around and trigger and fix the bug I ran into an even weirder issue.
This seems to indicate something being wrong when profiling starts on a non-main thread; in view of that, how |
|
Sorry for the lack of update. Was a bit shellshocked after the last blunder and held off on making pushes until I can diagnose what happened and better reproduce the sporadic hanging bug... ... in which however I am not quite successful. It still only shows up every ≈1000 tests or so, and apparently according to the logs, that is due to the signal handler's somehow not firing on one of the child processes; still trying to figure out how that is at all possible. Still I've made some further changes:
If it isn't as safe and deterministic to temper with signal handling as I've initially hoped, we may just have to report profiling stats more granularly like on Windows, where each multiproc pool task prompts the child process to write the profiling stats to its assigned file. Since one of the patches already patches task/result queues to send the child-process identity back to the parent alongside the task result, it should be easy (though with overhead) to also slip the collected stats in. Will see if that helps. Oh and looks like I broke something else completely related... not a good sign. The lint-job failure seems unrelated though, it was on |
|
Still working on this on-and-off, but I'm unfortunately still stuck with flaky tests... seems that there's something which makes SIGTERM handling inherently unreliable in Unfortunately this means that there's no guarantee for profiling output when a child process is |
fe70770 to
bc3adb9
Compare
- `line_profiler/curated_profiling.py`
New module for setting up profiling in a curated environment
- `ClassifiedPreimportTargets.from_targets()`
Method for creating a `ClassifiedPreimportTargets` instance,
facilitating writing pre-import modules in a replicable and portable
manner
- `ClassifiedPreimportTargets.write_preimport_module()`
Method for writing a pre-import module based on an instance;
also fixed bug where the body of the written module was intercepted
without appearing in the debug output
- `kernprof.py`
- `_gather_preimport_targets()`
Migrated to `line_profiler.curated_profiling`
- `_write_preimports()`
Now using the new `ClassifiedPreimportTargets` class, moving esp.
the logic to the `write_preimport_module()` method
- `kernprof.py::_manage_profiler` `line_profiler/curated_profiling.py::CuratedProfilerContext` New context-manager classes for handling profiler setup and teardown - `kernprof.py::_pre_profile()` Refactored into the above context managers and other private functions (`_prepare_profiler()`, `_prepare_exec_script()`)
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
New class for passing info onto child processes so that profiling
can resume there
line_profiler/pth_hook.py
New submodule for the .pth-file-based solution to propagating
profiling into child processes:
write_pth_hook()
In the main process, write the temporary .pth file to be loaded
in child processes
load_pth_hook()
Called by the .pth in child process, loading the cache and
setting up profiling based thereon
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
Added new `.profile_imports` attribute to correspond to `kernprof`'s
`--prof-imports` flag
line_profiler/_child_process_profiling/meta_path_finder.py
New submodule defining the `RewritingFinder` class, a meta path
finder which rewrites a single module on import
line_profiler/_child_process_profiling/pth_hook.py
write_pth_hook()
Now also handling the `os.fork()` patching/wrapping
_setup_in_child_process()
Now creating a `RewritingFinder` to mirror what
`~.autoprofile.autoprofile.run()` does in the main process
.
line_profiler/_child_process_profiling/multiprocessing_patches/
Split the contents of `__init__.py` into separate submodules:
poller.py::Poller
Migrated from `__init__.py::_Poller`
_queue.py::PutWrapper
Migrated from `__init__.py::_QueuePutWrapper`
config.py::MPConfig
Migrated from `__init__.py::MPConfig`
_infrastructure.py
Patch
- Migrated from `__init__.py::_Patch`
- Added attribute `.priority` to the interface
SingleModulePatch
Migrated from `__init__.py::Patch`
Registry
- Migrated from `__init__.py::_PatchRegistry`
- Added class method `.from_entry_point()` for constructing
an instance from patches loaded from an entry point
_mandatory_patches.py::{RebootForkserver,ResourceTracker,RunPy}Patch
- Migrated from the eponymous classes in `__init__.py`
- Now declared as `line_profiler._multiproc_patches` entry
points
_profiling_patches.py
POOL_PATCH, PROCESS_PATCH
`SingleModulePatch` objects that are now exposed as
`line_profiler._multiproc_patches` entry points
wrap_worker[_{write_on_exit,write_per_task}]()
Migrated from `__init__.py::wrap_worker_pool*()`
wrap_bootstrap(), wrap_terminate()
Migrated from the eponymous functions in `__init__.py`
_optional_patches.py
CHILD_PIDS_PATCH, LOGGING_PATCH
`SingleModulePatch` objects that are now exposed as
`line_profiler._multiproc_patches` entry points
wrap_handle_results(), wrap_process()
Migrated from the eponymous functions in `__init__.py`
wrap_worker()
Migrated from `__init__.py::wrap_worker_pid()`
__init__.py::__all__
Now including the following:
- `MPConfig` (= `config.py::MPConfig`)
- `Poller` (= `poller.py::Poller`)
- `Registry` (= `_infrastructure.py::Registry`)
- `Timeout` (= `poller.py::Poller.Timeout`)
tests/test_child_procs.py
Updated imports from
`line_profiler._child_process_profiling.multiprocessing_patches`
setup.py
Added the aforementioned `line_profiler._multiproc_patches` entry
points
line_profiler/_child_process_profiling/multiprocessing_patches/
_mandatory_patches.py::wrap_{terminate,start,bootstrap}()
_mandatory_patches.py::PROCESS_TERMINATION_PATCH
New patch for `multiprocessing.process.BaseProcess` with
increased priority, which ensures that SIGTERM isn't sent to
child processes until they are finished with setting up;
exposed as entry point `__process_termination`
_mandatory_patches.py::setup_mp_child()
Refactored and migrated from
`_profiling_patches.py::setup_mp_child()`
_profiling_patches.py::wrap_{worker_*,bootstrap}()
Removed basic setup (migrated to `PROCESS_TERMINATION_PATCH`)
setup.py
Added entry point `__process_termination` to
`line_profiler._multiproc_patches`
line_profiler/_child_process_profiling/cache.py
::LineProfilingCache._wrap_os_fork()
`os.fork()` wrapper now clearing the cleanup-callback stacks on the
pre-fork instance in the forked process to avoid clashes with the
new, post-fork instance
line_profiler/rc/line_profiler.toml
Updated comments
We no longer catch and handle SIGTERM because that apparently causes
deadlocks in `multiprocessing` child processes (see CPython GitHub
issues 73945 & 82408, coverage issue 1310)... have to figure out another
way to ensure the writing of profiling data.
line_profiler/_child_process_profiling/
cache.py::LineProfilingCache
Removed private attributes and methods related to signal
handling
multiprocessing_patches/_mandatory_patches.py::setup_mp_child()
No longer adding `SIGTERM` handling
multiprocessing_patches/_profiling_patches.py
dump_stats_quick()
Updated call signature
wrap_worker_write_*(), wrap_bootstrap()
Removed reference to `SIGTERM` handling
multiprocessing_patches/mp_config.py::MPConfig
Removed attribute `.catch_sigterm`
line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing]
Removed config item `catch_sigterm`
TODO:
Update `_profiling_patches` to mitigate `BaseProcess.terminate()`
line_profiler/_child_process_profiling/multiprocessing_patches
(The `child_pids` optional patch has been reworked into the
`__pool_worker_pid` mandatory patch.)
__init__.py::apply()
Updated docstring
_infrastructure.py::Registry.from_entry_point()
Updated doctest
_mandatory_patches.py
POOL_WORKER_PID_PATCH
Migrated from `_optional_patches.py::CHILD_PIDS_PATCH`
wrap_handle_results(), wrap_process(), wrap_worker()
Migrated from eponymous functions in `_optional_patches.py`
_profiling_patches.py
wrap_worker()
Now always using the previous `wrap_worker_write_per_task()`
implementation, because we're no longer catching SIGTERM
even on POSIX
wrap_terminate()
Now only blocking the call to `BaseProcess.terminate()` if
`self` is a pool worker which has run at least 1 task; this
avoids having idle workers in a "dirty" state interfering
with pool termination
line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing.patches]
Removed item `child_pids`
setup.py
Renamed entry point: `line_profiler._multiproc_patches.child_pids`
-> `.__pool_worker_pid`
tests/test_child_procs.py
_run_kernprof_main_in_process()
_test_apply_mp_patches()
test_apply_mp_patches_success()
Removed references to the `child_pids` patch
tests/test_child_procs.py::test_apply_mp_patches_failure()
Removed `@pytest.mark.retry` marker
tests/conftest.py, tests/test_child_procs.py
Removed because `@pytest.mark.retry` is no longer used anywhere
|
The lint job failed on EDIT: of course they set |
|
Yeah, I'm going to either pin Let's take this approach. Let's clearly define the boundary between the cases that we guarantee will work, and the unsupported cases. It feels like there are some fundamental limitations.
|
|
Yeah at this point I think I have to accept defeat, that:
Which is a bummer, but I guess it's about time I learned, instead of continuing burning CI compute in flaky tests and letting the PR sit in limbo. As such I'll probably have to:
Other remarks:
|
line_profiler/_child_process_profiling/_cache_logging.py
Updated assertions to emit more helpful error messages when they
fail
line_profiler/_child_process_profiling/multiprocessing_patches
/_mandatory_patches.py
wrap_process()
Removed (superseded by `wrap_terminate_pool()`)
wrap_terminate_pool()
New wrapper for `multiprocessing.pool.Pool._terminate_pool()`
for handling end-of-process bookkeeping, so that workers that
exited before they can be `.terminate()`-ed or `.join()`-ed are
also accounted for
POOL_WORKER_PID_PATCH.targets
Now patching `multiprocessing.pool.Pool._terminate_pool()`
instead of `.Process`
line_profiler/_child_process_profiling/multiprocessing_patches
/_profiling_patches.py
wrap_terminate()
Removed because relying on timeouts is inherently unreliable
PROCESS_PATCH.targets
No longer patching
`multiprocessing.process.BaseProcess.terminate()`
tests/test_child_procs/
Separated `tests.py` into these files:
- `_test_child_procs_utils.py`:
Util constants, functions and classes used by the tests and the
fixtures
- `multiproc_examples/*.py`:
Example scripts that are profiled
- `conftest.py`:
Fixtures and fixture functions
- `test_child_procs.py`:
Actual tests
TODO: Fixes; new tests for non-`Pool`-based parallelism
line_profiler/_child_process_profiling/multiprocessing_patches
/_profiling_patches.py
wrap_process()
New wrapper for `Pool.Process` that marks a worker process
wrap_bootstrap()
Now avoiding calling end-of function cleanup for marked
pool-worker processes, because they may be terminated mid-write
to the profiling-stats file (which should already be written to
on a per-task basis), causing data loss/corruption
_get_terminate_poller(), _process_has_returned()
Removed dead code
tests/test_child_procs/test_child_procs.py
test_apply_mp_patches_pool_{failure,success}()
- Renamed from `test_apply_mp_patches_*()`
- Removed subtests for when only the `process` patch is applied
test_pool_multiproc_script_sanity_check()
test_running_pool_multiproc_script()
test_profiling_pool_multiproc_script_{success,failure}()
Renamed from resp. tests without the `pool_` infix
TODO: new tests for `BaseProcess`-based parallelism
tests/test_child_procs/multiproc_examples/
external_module.py::split_workload()
New function for generating parallel workloads (lists of
consecutive numbers)
pool_test_module.py
my_local_sum()
Removed superfluous call to `reverse()` (because pyutils#424 has
been fixed)
sum_in_child_procs()
Now using `split_workload()`
main()
Now permitting `--start-method=dummy`
process_test_module.py
New test module for testing directly managing
`multiprocessing.process.BaseProcess` instances and eschewing
`multiprocessing.pool.Pool`
Worker
Wrapper around a `multiprocessing.process.BaseProcess` which
allows for executing a callable in another process and
retrieving the result
my_local_sum(), sum_in_child_procs(), main()
Functions and CLI with the same interface as those in
`pool_test_module.py`
tests/test_child_procs/conftest.py
process_test_module[_object]
New fixture representing
`tests/test_child_procs/multiproc_examples/process_test_module.py`
ext_module_object, pool_test_module_object
Updated docstrings
tests/test_child_procs/multiproc_examples/pool_test_module.py
Fixed imports of `external_module` utilities, placing individual
import targets in their own import statements (see issue pyutils#433)
tests/test_child_procs/multiproc_examples/process_test_module.py
- Fixed import... (ditto)
- Fixes to `Worker` when `start_method='dummy'`:
- No longer errors out when `daemon` is passed to `Worker.new()`
(see GH issue python/cpython#85716)
- No longer re-raising caught exceptions (in dummy worker backed
by thread objects) so that we don't get a
`PytestUnhandledThreadExceptionWarning`
tests/test_child_procs/test_child_procs.py
test_runpy_patches()
Updated because of the additonal import in `pool_test_module.py`
test_apply_mp_patches_{success,failure}()
test_multiproc_script_sanity_check()
test_profiling_multiproc_script_{success,failure}()
Refactored from the resp. tests infixed with `pool_`, so as to
also test `process_test_module.py`
tests/test_child_procs/_test_child_procs_utils.py
search_cache_logs()
Updated error messages to include info about the `re.Match`
objects (where appropriate)
add_timeout(), CheckWarnings
Fixed error-type representations in doctests
tests/test_child_procs/multiproc_examples/process_test_module.py
::Worker
Added `xdoctest` skip directive to the doctest, because it gives a
pickling error (of unknown reasons) when run with `xdoctest` but not
`doctest`
XXX:
Try to replicate and mitigate the last test failure, where
`search_cache_logs()` seemed to be complaining about a nonexistent
regex match
tests/test_child_procs/test_child_procs.py
::test_apply_mp_patches_{success,failure}()
- Removed `add_timeout()` decorator around inner function because
the code hindering `BaseProcess.terminate()` is no longer used
- Removed warning suppression because we no longer expect the
warning about forking in a multithreaded process
bc3adb9 to
7737fbb
Compare
line_profiler/_child_process_profiling/multiprocessing_patches/
__init__.py
No longer exposing the `Poller` and `Timeout` names (as they no
longer exist)
_mandatory_patches.py
PROCESS_SETUP_PATCH
Renamed and refactored from `PROCESS_TERMINATION_PATCH`: no
longer going through the polling-for-lock-file routine
because it should not be necessary (setup code should have
been run before moving onto the parallel workload thanks to
the .pth file anyway)
wrap_start(), wrap_terminate()
Removed
mp_config.py::MPConfig
Removed the `.polling` field
poller.py
Removed
line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing.polling]
Removed subtable
setup.py
Updated `line_profiler._multiproc_patches` plugin paths
tests/test_child_procs/_test_child_procs_utils.py::@add_timeout
- Added optional `name` parameter so that the background thread can
be retrieved for debug purposes
- Fixed doctest so that it doesn't create a long-lived background
thread which pollutes the thread pool and causes subsequent tests
using `os.fork()` to issue the forking-multithreaded-process
`DeprecationWarning`
|
Whew, now that we have no platform-dependent solutions and no poor attempts at synchronization, we should've torn all the wonky stuff out and it should be a lot more stable. Couple things though:
|
…filing) Full review of the --prof-child-procs feature branch plus a broader repo/CI/test-suite audit. Identifies three merge-blocking defects, each reproduced locally: - fork-based children re-dump inherited parent stats, double-counting all pre-fork hits/times in the merged output - a silently-failed child .pth hook deadlocks patched pools via a result-protocol mismatch - unwritable site-packages hard-crashes kernprof instead of degrading Includes a prioritized remediation plan with per-task acceptance criteria, test-suite and packaging/xcookie findings, and pre-existing core bugs found along the way. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… docs Splits the single PR pyutils#431 review into four dated documents: - fable-review-pr431-2026-07-05.md: frozen PR-specific findings/evidence - fable-review-fullrepo-2026-07-05.md: frozen repo-wide findings - fable-pr431-plan-2026-07-05.md: live task board for the PR remediation - fable-fullrepo-plan-2026-07-05.md: live task board for repo-wide work, including the new AST-pipeline robustness program and the .pyi-removal (modern inline annotations) program The plan files are the coordination hub: agents claim tasks on the status board, record outcomes in the agent log, and escalate decisions via open questions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
See: TTsangSC#5 |
This PR adds support for
kernprofto profile code execution in child Python processes, building on ongoing work (see Credits).Usage
The EXPERIMENTAL new flags
--no-prof-child-procsand--prof-child-procs[=...]are added to kernprof. By setting--prof-child-procsto true, child Python processes created by the profiled process are also profiled:1Note how the
sum_worker()calls are profiled:Highlights
os.system()andsubprocess.run()multiprocessing2multiprocessing"start methods" ('fork','forkserver', and'spawn') tested to be compatible, where available on the platform--preimportsor via test-code rewriting) replicated in child processesExplanation
line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for--prod-modand--preimports) so that profiling can be replicated in child processes..pthfile is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.os.fork()(where available) is patched with a wrapper which ensures consistent global states.coverage.multiproc, variousmultiprocessingcomponents are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and report profiling data appropriately. Patches are inherited by forked child processes and reapplied by spawned ones. Extra care is taken in ensuring that profiling is not affected even when the parallel workload errors out.3kernprofthen gather and merge with the profiling result in main process.Code changes
New code (click to expand)
_line_profiler_hook.pyNew module installed along with
line_profilerfor managing the temporary.pthfiles used for setting up profiling in child processes; this module is kept as lightweight as possible to minimize the amount of startup code run as the mere result of having said.pthfile(s) (presumably) in the virtual environment'ssite-packages.load_pth_hook():For processes inheriting a matching "parent PID" from the environment (see
LineProfilingCachebelow), load the cache and set up theLineProfilerinstance used, like how the mainkernprofprocess does.line_profiler/_threading_patches.pyNew submodule patching
threadingfor the consistent gathering of profiling data between tracing modes.apply():When legacy tracing is used (Python < 3.12 or
LINE_PROFILER_CORE=ctrace), patchthreading.Thread.__init__()so that the profiler's.enable_countis synced to the new thread; this is necessary for correctness, ensuring that profiling continues on the new thread.line_profiler/cleanup.pyNew submodule defining the
Cleanupclass, which handles various setup/cleanup tasks like:line_profiler/curated_profiling.pyNew submodule containing mostly relocated code from
kernprof, so that child processes can more easily reestablish profiling:ClassifiedPreimportTargets:Object resolving and classifying the
--prof-mods, and writing a corresponding preimport moduleCuratedProfilerContext:Context manager managing the state of the
LineProfiler, e.g.:line_profiler.profileon startupthreading(see_threading_patchesabove) so that the profiler stays enabled on newly spawned threads.enable_counts on teardownline_profiler/_child_process_profiling/New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:
/cache.py::LineProfilingCache:"Session state" object. It:
Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches,
.pthtempfile creation, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.Injects the following environment variables, which are inherited by child processes:
${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directoryFrom the combination of both, child processes can retrieve the cache by calling
.load()./multiprocessing_patches/:Sub-sub-package for the patching of
multiprocessingfacilities so that child processes managed thereby are properly profiled. The main entities are:/__init__.py::apply():Select and apply the appropriate patches below to
multiprocessingmodule components./_mandatory_patches.py::PROCESS_SETUP_PATCH:Patch: perform setups specific to
multiprocessing-managed child processes./_mandatory_patches.py::POOL_WORKER_PID_PATCH:Patch: keep track of which
multiprocessing.pooltasks are sent to whichPool-worker child processes, so that we don't expect profiling output from idle workers./_mandatory_patches.py::RebootForkserverPatch:Patch: manage the fork-server process (from which child processes are, well, forked; see
multiprocessing.forkserver), ensuring that (1) when profiling, it is rebooted with the proper profiling tooling which its forked children can inherit, and (2) when profiling ends, it is rebooted so that said tooling doesn't leak into future child processes./_mandatory_patches.py::ResourceTrackerPatch:Patch: make sure that we are aware of the resource-tracker server process (
multiprocessing.resource_tracker) and don't expect profiling output therefrom./_mandatory_patches.py::RunpyPatch:Patch the copy of
runpythatmultiprocessing.spawnuses; necessary for profiling to function in non-eager-preimports mode (--no-preimports)./_profiling_patches.py::POOL_PATCH:Patch
multiprocessing.poolso thatPool-worker child processes dump profiling data after every task received,3 regardless of whether the parallel workload succeeded or errored out./_profiling_patches.py::PROCESS_PATCH:Patch
multiprocessing.process.BaseProcessso that child processes dump profiling data after the parallel workload (Process(target=...)) is executed./_optional_patches.py::LOGGING_PATCH:Patch the various logging functions in
multiprocessing.util(e.g.multiprocessing.util.info()) so that their log entries are also visible in theLineProfilingCachedebug logs./runpy_patches.py::create_runpy_wrapper():Make a clone of the
runpymodule which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities thatline_profiler.autoprofile.autoprofile.run()uses to set up profiling. (SeeRunpyPatchabove.)tests/test_child_procs/Refactored and greatly extended into a package from the previous
tests/test_child_procs.py; the tests proper are now found attests/test_child_procs/test_child_procs.py, while the example scripts/modules to be profiled have been moved to their own respective files intests/test_child_procs/multiproc_examples/./_test_child_procs_utils.py:Define various utilities, e.g.:
ModuleFixture:Helper object to be created as
pytestfixtures which encapsulates the/multiproc_examples/scripts, making them temporarily available as modules in the current and/or child processes.Params:Helper object extending
@pytest.mark.parametrize()which handles concatenation and Cartesian products of parametrizations.CheckWarnings:Helper object analogous to
@pytest.warns()allowing for easy checks for/against certain warnings issued.@add_timeout:Decorator for isolating a function in a new thread so that it can be timed out.
run_subproc():Wrapper around
subprocess.run()which provide extra debugging output (standard streams, timing info, etc.)/multiproc_examples/:The example scripts to be profiled.
/pool_test_module.py,/external_module.py:The existing example script (and the module it imports), where
multiprocessing.pool.Poolis used for parallelism./process_test_module.py:New example script mirroring
/pool_test_module.pywhich instead directly manages instances ofmultiprocessing.process.BaseProcessand communication of parallel workloads and results therewith./conftest.py:Set up
_test_child_procs_utils.py::ModuleFixturefixtures for the/multiproc_examples/scripts so that they can be imported as modules in tests./test_child_procs.pyline_profiler._child_processing_profilingcomponents, or as close as is possible thereto:test_runpy_patches():Test the functionality of
~.runpy_patches.create_runpy_wrapper().test_cache_dump_load():Test the functionalities of
~.cache.LineProfilingCache.dump()and.load().test_cache_setup_main_process():Test the functionality of
~.cache.LineProfilingCache._setup_in_main_process().test_cache_setup_child():Test the functionality of
~.cache.LineProfilingCache._setup_in_child_process().test_load_pth_hook():Test the functionality of
~.pth_hook.load_pth_hook().test_apply_mp_patches_{success,failure}():Test the functionality of
~.multiprocessing_patches.apply().test_profiling_multiproc_script_{success,failure}():"Main" new tests for running the test scripts with
kernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:success|failure: whether the parallel workload errors outrun_func: execution modes (kernprof <script>,kernprof -m <module>, andkernprof -c <code>)test_module: whether to runpool_test_module.py(testingmultiprocessing.pool) orprocess_test_module.py(testingmultiprocessing.process)prof_child_procs: whether to use child-process profiling (--[no-]prof-child-procs)preimports: eager vs. on-import profiling (--[no-]preimports)use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modulesstart_method:multiprocessing"start methods" ('fork','forkserver', and'spawn')test_profiling_bare_python():New test for profiling child processes where the code run by
kernprof --prof-child-procsspins up another Python process via non-multiprocessingmeans (e.g.os.system()orsubprocess.run()).Modified code (fixes; click to expand)
pyproject.toml::[tool.ty.terminal]Now explicitly setting
error-on-warningto false because the default behavior changed inty v0.0.52. (Note that this conflicts with identical change made in #434.)line_profiler/line_profiler.py::LineStatsFixed doctest in multiple methods (
.__eq__(),.__add__(),.__iadd__(),.from_stats_objects()) which may give the wrong impression of the layout of the.timings; the dict keys are supposed to be of the layout(filename: str, lineno: int, func_name: str), not(func_name, lineno, filename).line_profiler/toml_config.py::ConfigSource.get_subconfig()Fixed bug where the subtable is not deep-copied even with
copy=True.kernprof.py::_prepare_profiler()Fixed bug in the pre-refactor
_pre_profile()wheresys.argvis replaced with another list, preventing the@_restore.sequence(sys.argv)decorator from correctly restoring thesys.argventries; also see the section below.Modified code (others; click to expand)
.github/workflows/tests.ymlAdded timeouts to job stages where
pytestis invoked, so that ifmultiprocessingcauses any of the new tests to deadlock the whole pipeline doesn't get stuck in limbo for hours.Build sdist -> Test full loose dist: 10 minutes<OS>, arch=<ARCH> -> Build binary wheels: 60 minutes<PYTHON_VERS> on <OS>, arch=<ARCH> with <EXTRA> -> Test wheel <EXTRA>: 10 minutessetup.py_line_profiler_hooksto the installedpy_modules.line_profiler._multiproc_patches, where the patches living in variousline_profiler._child_process_profiling.multiprocessing_patchessubmodules are centrally declared, so thatline_profiler._child_process_profiling.multiprocessing_patches.apply()can consistently find and apply all of them as needed.line_profiler/line_profiler.py::LineStats.get_empty_instance():New convenience class method for creating an instance with no profiling data and the platform-appropriate
.unit..from_files():Added new parameters
on_defective | on_empty: Literal['ignore', 'warn', 'error'], allowing for passing over bad (empty/malformed) files with optional warnings. The old behavior (on_defective=on_empty='error') remains the default.line_profiler/line_profiler_utils.pyAdded new utilties:
CallbackRepr:reprlib.Reprsubclass helpful for formatting calls (via its.format_call()method) and adjacent objects (e.g.functools.partial).block_indent():Block-indent a multi-line string to sit flush with a prefix.
make_tempfile():Convenience wrapper around
tempfile.mkstemp()which returns apathlib.Path.line_profiler/rc/line_profiler.toml[tool.line_profiler.kernprof]:New key-value pair
prof-child-procs = <bool>for the default of thekernprof --[no-]prof-child-procsflag.[tool.line_profiler.child_processes]:New key-value pairs for controlling the setup of profiling in child processes:
pth_files = { prefix = <str>, suffix = <str>}:Instructions on how the temporary .pth files responsible for setting up shop in child processes are created.
multiprocessing = { patches = { pool = <bool>, process = <bool>, logging = <bool>} }:Toggles for which of the
multiprocessingpatches to apply.The
child_processestable and its contents are as of yet considered private implementation details.kernprof.py_add_core_parser_arguments():Now adding these flags to the parser:
--prof-child-procs[=...],--no-prof-child-procs:Whether to use the feature implemented in this PR; the default is read from
[tool.line_profiler.kernprof]::prof-child-procs.--debug-log=...:Undocumented (private) flag for writing out the
LineProfilingCachedebug logs._write_preimports():Refactored to use the new/relocated facilities at
line_profiler.curated_profiling._dump_filtered_stats():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes)._dump_filtered_line_stats()which it now calls._manage_profiler:Context manager refactored from the old
_pre_profile()for more Pythonic handling of setups and teardowns._prepare_child_profiling_cache()._prepare_profiler(),_prepare_exec_script())._post_profile()on context exit so that we no longer have to explicitlytry: ... finally: ...in_main_profile()._post_profile():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).line_profiler.curated_profiling.tests/test_child_procs/test_child_procs.pyMass-relocated code which isn't directly the test functions into other locations in the package (see New Code above). Plus the following changes (among others):
/pool_test_module.py:Added the following command-line flags:
--start-methodselects a specificmultiprocessing"start method", includingdummywhich usesmultiprocessing.dummy.--localtoggles between using a sum function defined locally in/pool_test_module.pyor the one defined externally in/external_module.py.--force-failuretoggles whether the sum function should return normally or raise an error./conftest.py::pool_test_module:Supersede the previous
test_module./_test_child_procs_utils.py::ModuleFixture, allowing for easy setup as a temporarily import-able module in the current and/or child processes (see above).pool_test_module_clone(same source code but separate module) andprocess_test_module(same CLI but different model of parallelism)/_test_child_procs_utils.py::_run_as_{script,module}():_run_as_literal_code()to also testkernprof -c ....test_moduleas aModuleFixtureinstead of a path, and handling its installation./_test_child_procs_utils.py::_run_test_module():run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing ofkernprofexecution modes as test parametrization.profiled_code_is_tempfile: boolhelps with constructing thekernprofcommand line in cases where the code is anonymous (kernprof -c ...).use_local_func: bool,fail: bool, andstart_method: Literal['fork', 'forkserver', 'spawn'] | Noneallows for fuzzing code execution with the aforementionedtest_moduleCLI flags (resp.--local,--force-failure, and--start-method).nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with--prof-child-procs, only those in the main process without).subproc: boolfor toggling whether to run the test module in-process or in a subprocess.failis true, thekernprofsubprocess should fail..pthfiles created bykernprof --prof-child-procsshould be cleaned up.nhits(where available).subprocis false, that certain warnings aren't issued (e.g.LineStatswarning that we're trying to read profiling stats from an empty file), paralleling the checks in/test_child_procs.py::test_apply_mp_patches_{success,failure}().LineProfilingCachedebug logs and printing them for debug purposes./test_child_procs.py::test_multiproc_script_sanity_check():/process_test_module.py.use_local_func,fail, andstart_method, to ensure that the test scripts are fully functional in vanilla Python.as_module: boolwithrun_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ...,python -m ..., and the newpython -cadded via the aforementioned_run_as_literal_code())./test_child_procs.py::test_running_multiproc_script():New parametrization
run_funcallows for absorbing the oldtest_running_multiproc_module()into the same test as additional parametrization, as well as testingkernprof -c.Caveats
.pthfile created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of.pthfiles should be called out, given their recent spotlight in a CVE vulnerability..pthfile is written tosys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the same Python will have to import and run_line_profiler_hooks.load_pth_hook(). While the function itself should quit rather quickly when we're not in a child process, and without causing any ofline_profilerto be loaded intosys.modules, it does still incur a certain overhead for interpreter boot-up.What didn't work
multiprocessing-managed child processes have the usualatexithook which dumps profiling data disabled. On top of that themultiprocessingpatch "pool" (~._child_process_profiling.multiprocessing_patches._profilling_patches.py::POOL_PATCH) also disables the dumping of profiling data at the end ofmultiprocessing.process.BaseProcess._bootstrap()patched in by the patch "process" ([...]::PROCESS_PATCH). This was the conclusion after much head-scratching, where processes terminated (presumably) in the middle of aLineStats.write()call results in corrupted/incomplete data.SIGTERMthatBaseProcess.terminate()sends to the child and only dumped the stats then and there (along with normal end-of-callback stats-dumping), which sounds much more efficient than dumping stats for every task submitted to aPool. However:SIGTERMon Windows is generally noted to be inconsistent. See e.g. thiscoverage.control.Coverage._init_for_start()comment and this SO discussion that it refers to.multiprocessing-managed children is known to be messy and causes occasional deadlocks; a lesson I found out way too late and after much wailing and gnashing of teeth. See e.g. Deadlock in multiprocessing.pool.Pool on terminate python/cpython#73945, Setting a signal handler gets multiprocessing.Pool stuck python/cpython#82408, and Process hang with Coverage 6.3 coveragepy/coveragepy#1310. Guess that's whycoveragemade theirSIGTERMhandler opt-in after originally having that on by default.multiprocessing.poolis so gung-ho in killing off worker processes and replacing them, without much regard of what actually went down down there.3TODO
line_profiler._child_process_profilingto become public API?Credits
pytest-autoprofile, which in turn was based on the solution implemented incoverage.multiproc. And the longer I've worked on this, the more the similarities seem to have diminished ...LineStatsaggregation #380kernprofto always run code insys.modules['__main__']'s namespace #423Notes
Welp. This took way longer than I expected. (EDIT: I think I wrote this sentence like two months ago.) The main friction points were that:
line_profiler._child_process_profiling.cache.LineProfilingCacheclass tackles this issue.pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Figuring out how to do so consistently took the better part of these months.Footnotes
Note however that the equivalent vanilla Python command (
python -c ...) would error out, because functions sent tomultiprocessingmust be pickle-able and thus must reside in a physical file. This is sidestepped bykernprof's always writing code received bykernprof -c ...and... | kernprof -to a tempfile (ENH: auto-profilestdinor literal snippets #338). ↩In the test suite, process creation is tested both with the most common
multiprocessing[.get_context(...)].Pooland with self-managedProcessobjects. Different patches tomultiprocessingare responsible for profiling-data collection between the two situations, and they have been tested to work both (1) individually on their respective use-cases and (2) together without stepping on one another. Seetests/test_child_procs/test_child_procs.py::test_profiling_multiproc_script_{success,failure}(). ↩Since
multiprocessing.pool.Pool-managed child processes ("workers") are regularlyand wantonlyterminated, which bypasses Python control flow and prevents e.g.atexithooks from executing, we've taken to report profiling data fromPoolworkers on a per-task basis. ↩ ↩2 ↩3