Skip to content

FEAT: extend profiling to child processes#431

Open
TTsangSC wants to merge 148 commits into
pyutils:mainfrom
TTsangSC:profile-child-processes
Open

FEAT: extend profiling to child processes#431
TTsangSC wants to merge 148 commits into
pyutils:mainfrom
TTsangSC:profile-child-processes

Conversation

@TTsangSC

@TTsangSC TTsangSC commented Apr 14, 2026

Copy link
Copy Markdown
Collaborator

This PR adds support for kernprof to profile code execution in child Python processes, building on ongoing work (see Credits).

Usage

The EXPERIMENTAL new flags --no-prof-child-procs and --prof-child-procs[=...] are added to kernprof. By setting --prof-child-procs to true, child Python processes created by the profiled process are also profiled:1

$ kernprof -lv --prof-child-procs -c "if True:
    import itertools
    import multiprocessing
    from collections.abc import Collection

    def sum_worker(nums: Collection[int]) -> int:
        result = 0
        for x in nums:
            result += x
        return result

    def sum_parallel(nums: Collection[int], nprocs: int) -> int:
        size_ = len(nums) / nprocs
        size = int(size_)
        if size_ > size:
            size += 1
        with multiprocessing.Pool(nprocs) as pool:
            sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
            pool.close()
            pool.join()
        return sum_worker(sub_sums)

    if __name__ == '__main__':
        print(sum_parallel(range(1, 1001), 3))"
500500
Wrote profile results to 'kernprof-command-<...>.lprof'
Timer unit: 1e-06 s

Total time: 0.000312 s
File: <...>/kernprof-command.py
Function: sum_worker at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                               def sum_worker(nums: Collection[int]) -> int:
     7         4          3.0      0.8      1.0          result = 0
     8      1007        155.0      0.2     49.7          for x in nums:
     9      1003        153.0      0.2     49.0              result += x
    10         4          1.0      0.2      0.3          return result

Total time: 0.100223 s
File: <...>/kernprof-command.py
Function: sum_parallel at line 12

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                               def sum_parallel(nums: Collection[int], nprocs: int) -> int:
    13         1          1.0      1.0      0.0          size_ = len(nums) / nprocs
    14         1          1.0      1.0      0.0          size = int(size_)
    15         1          0.0      0.0      0.0          if size_ > size:
    16         1          0.0      0.0      0.0              size += 1
    17         2      21685.0  10842.5     21.6          with multiprocessing.Pool(nprocs) as pool:
    18         1      68692.0  68692.0     68.5              sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
    19         1         27.0     27.0      0.0              pool.close()
    20         1       9800.0   9800.0      9.8              pool.join()
    21         1         17.0     17.0      0.0          return sum_worker(sub_sums)

Note how the sum_worker() calls are profiled:

  • The main process contributes 1 call and 3 loops summing the sub-sums.
  • The 3 child processes each contributes 1 call, and they loop over all 1000 of the items combined.

Highlights

  • Children created by (including but not limited to) these methods can be profiled:
    • os.system() and subprocess.run()
    • multiprocessing2
  • All three multiprocessing "start methods" ('fork', 'forkserver', and 'spawn') tested to be compatible, where available on the platform
  • Profiling unaffected by whether the profiled function run in child processes:
    • Is locally defined in the profiled code or imported
    • Executes cleanly or errors out
  • Mode of profiling (with eager --preimports or via test-code rewriting) replicated in child processes

Explanation

  • A serializable cache object (line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for --prod-mod and --preimports) so that profiling can be replicated in child processes.
  • In the main process, environment variables are injected, so that it and its children would have access to its PID and the cache-directory location.
  • A temporary .pth file is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.
  • os.fork() (where available) is patched with a wrapper which ensures consistent global states.
  • As with coverage.multiproc, various multiprocessing components are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and report profiling data appropriately. Patches are inherited by forked child processes and reapplied by spawned ones. Extra care is taken in ensuring that profiling is not affected even when the parallel workload errors out.3
  • When properly set up, child processes write profiling output on exit to the session cache directory, which kernprof then gather and merge with the profiling result in main process.

Code changes

New code (click to expand)

_line_profiler_hook.py

New module installed along with line_profiler for managing the temporary .pth files used for setting up profiling in child processes; this module is kept as lightweight as possible to minimize the amount of startup code run as the mere result of having said .pth file(s) (presumably) in the virtual environment's site-packages.

  • load_pth_hook():
    For processes inheriting a matching "parent PID" from the environment (see LineProfilingCache below), load the cache and set up the LineProfiler instance used, like how the main kernprof process does.

line_profiler/_threading_patches.py

New submodule patching threading for the consistent gathering of profiling data between tracing modes.

  • apply():
    When legacy tracing is used (Python < 3.12 or LINE_PROFILER_CORE=ctrace), patch threading.Thread.__init__() so that the profiler's .enable_count is synced to the new thread; this is necessary for correctness, ensuring that profiling continues on the new thread.

line_profiler/cleanup.py

New submodule defining the Cleanup class, which handles various setup/cleanup tasks like:

  • Registering/Calling callbacks
  • Creation/Deletion of tempfiles
  • Insertion/Reversion of environment variables
  • Patching/Restoration of object attributes

line_profiler/curated_profiling.py

New submodule containing mostly relocated code from kernprof, so that child processes can more easily reestablish profiling:

  • ClassifiedPreimportTargets:
    Object resolving and classifying the --prof-mods, and writing a corresponding preimport module
  • CuratedProfilerContext:
    Context manager managing the state of the LineProfiler, e.g.:
    • Slipping it into line_profiler.profile on startup
    • Patching threading (see _threading_patches above) so that the profiler stays enabled on newly spawned threads
    • Purging its .enable_counts on teardown

line_profiler/_child_process_profiling/

New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:

  • /cache.py::LineProfilingCache:
    "Session state" object. It:
    • Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches, .pth tempfile creation, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.

    • Injects the following environment variables, which are inherited by child processes:

      • ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID
      • ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directory

      From the combination of both, child processes can retrieve the cache by calling .load().

  • /multiprocessing_patches/:
    Sub-sub-package for the patching of multiprocessing facilities so that child processes managed thereby are properly profiled. The main entities are:
    • /__init__.py::apply():
      Select and apply the appropriate patches below to multiprocessing module components.
    • /_mandatory_patches.py::PROCESS_SETUP_PATCH:
      Patch: perform setups specific to multiprocessing-managed child processes.
    • /_mandatory_patches.py::POOL_WORKER_PID_PATCH:
      Patch: keep track of which multiprocessing.pool tasks are sent to which Pool-worker child processes, so that we don't expect profiling output from idle workers.
    • /_mandatory_patches.py::RebootForkserverPatch:
      Patch: manage the fork-server process (from which child processes are, well, forked; see multiprocessing.forkserver), ensuring that (1) when profiling, it is rebooted with the proper profiling tooling which its forked children can inherit, and (2) when profiling ends, it is rebooted so that said tooling doesn't leak into future child processes.
    • /_mandatory_patches.py::ResourceTrackerPatch:
      Patch: make sure that we are aware of the resource-tracker server process (multiprocessing.resource_tracker) and don't expect profiling output therefrom.
    • /_mandatory_patches.py::RunpyPatch:
      Patch the copy of runpy that multiprocessing.spawn uses; necessary for profiling to function in non-eager-preimports mode (--no-preimports).
    • /_profiling_patches.py::POOL_PATCH:
      Patch multiprocessing.pool so that Pool-worker child processes dump profiling data after every task received,3 regardless of whether the parallel workload succeeded or errored out.
    • /_profiling_patches.py::PROCESS_PATCH:
      Patch multiprocessing.process.BaseProcess so that child processes dump profiling data after the parallel workload (Process(target=...)) is executed.
    • /_optional_patches.py::LOGGING_PATCH:
      Patch the various logging functions in multiprocessing.util (e.g. multiprocessing.util.info()) so that their log entries are also visible in the LineProfilingCache debug logs.
  • /runpy_patches.py::create_runpy_wrapper():
    Make a clone of the runpy module which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities that line_profiler.autoprofile.autoprofile.run() uses to set up profiling. (See RunpyPatch above.)

tests/test_child_procs/

Refactored and greatly extended into a package from the previous tests/test_child_procs.py; the tests proper are now found at tests/test_child_procs/test_child_procs.py, while the example scripts/modules to be profiled have been moved to their own respective files in tests/test_child_procs/multiproc_examples/.

  • /_test_child_procs_utils.py:
    Define various utilities, e.g.:
    • ModuleFixture:
      Helper object to be created as pytest fixtures which encapsulates the /multiproc_examples/ scripts, making them temporarily available as modules in the current and/or child processes.
    • Params:
      Helper object extending @pytest.mark.parametrize() which handles concatenation and Cartesian products of parametrizations.
    • CheckWarnings:
      Helper object analogous to @pytest.warns() allowing for easy checks for/against certain warnings issued.
    • @add_timeout:
      Decorator for isolating a function in a new thread so that it can be timed out.
    • run_subproc():
      Wrapper around subprocess.run() which provide extra debugging output (standard streams, timing info, etc.)
  • /multiproc_examples/:
    The example scripts to be profiled.
    • /pool_test_module.py, /external_module.py:
      The existing example script (and the module it imports), where multiprocessing.pool.Pool is used for parallelism.
    • /process_test_module.py:
      New example script mirroring /pool_test_module.py which instead directly manages instances of multiprocessing.process.BaseProcess and communication of parallel workloads and results therewith.
  • /conftest.py:
    Set up _test_child_procs_utils.py::ModuleFixture fixtures for the /multiproc_examples/ scripts so that they can be imported as modules in tests.
  • /test_child_procs.py
    • Added "unit tests" for the line_profiler._child_processing_profiling components, or as close as is possible thereto:
      • test_runpy_patches():
        Test the functionality of ~.runpy_patches.create_runpy_wrapper().
      • test_cache_dump_load():
        Test the functionalities of ~.cache.LineProfilingCache.dump() and .load().
      • test_cache_setup_main_process():
        Test the functionality of ~.cache.LineProfilingCache._setup_in_main_process().
      • test_cache_setup_child():
        Test the functionality of ~.cache.LineProfilingCache._setup_in_child_process().
      • test_load_pth_hook():
        Test the functionality of ~.pth_hook.load_pth_hook().
      • test_apply_mp_patches_{success,failure}():
        Test the functionality of ~.multiprocessing_patches.apply().
    • Other new tests:
      • test_profiling_multiproc_script_{success,failure}():
        "Main" new tests for running the test scripts with kernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:
        • success|failure: whether the parallel workload errors out
        • run_func: execution modes (kernprof <script>, kernprof -m <module>, and kernprof -c <code>)
        • test_module: whether to run pool_test_module.py (testing multiprocessing.pool) or process_test_module.py (testing multiprocessing.process)
        • prof_child_procs: whether to use child-process profiling (--[no-]prof-child-procs)
        • preimports: eager vs. on-import profiling (--[no-]preimports)
        • use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modules
        • start_method: multiprocessing "start methods" ('fork', 'forkserver', and 'spawn')
      • test_profiling_bare_python():
        New test for profiling child processes where the code run by kernprof --prof-child-procs spins up another Python process via non-multiprocessing means (e.g. os.system() or subprocess.run()).

Modified code (fixes; click to expand)

pyproject.toml::[tool.ty.terminal]

Now explicitly setting error-on-warning to false because the default behavior changed in ty v0.0.52. (Note that this conflicts with identical change made in #434.)

line_profiler/line_profiler.py::LineStats

Fixed doctest in multiple methods (.__eq__(), .__add__(), .__iadd__(), .from_stats_objects()) which may give the wrong impression of the layout of the .timings; the dict keys are supposed to be of the layout (filename: str, lineno: int, func_name: str), not (func_name, lineno, filename).

line_profiler/toml_config.py::ConfigSource.get_subconfig()

Fixed bug where the subtable is not deep-copied even with copy=True.

kernprof.py::_prepare_profiler()

Fixed bug in the pre-refactor _pre_profile() where sys.argv is replaced with another list, preventing the @_restore.sequence(sys.argv) decorator from correctly restoring the sys.argv entries; also see the section below.

Modified code (others; click to expand)

.github/workflows/tests.yml

Added timeouts to job stages where pytest is invoked, so that if multiprocessing causes any of the new tests to deadlock the whole pipeline doesn't get stuck in limbo for hours.

  • Build sdist -> Test full loose dist: 10 minutes
  • <OS>, arch=<ARCH> -> Build binary wheels: 60 minutes
  • <PYTHON_VERS> on <OS>, arch=<ARCH> with <EXTRA> -> Test wheel <EXTRA>: 10 minutes

setup.py

  • Added _line_profiler_hooks to the installed py_modules.
  • Added a new entry point line_profiler._multiproc_patches, where the patches living in various line_profiler._child_process_profiling.multiprocessing_patches submodules are centrally declared, so that line_profiler._child_process_profiling.multiprocessing_patches.apply() can consistently find and apply all of them as needed.

line_profiler/line_profiler.py::LineStats

  • .get_empty_instance():
    New convenience class method for creating an instance with no profiling data and the platform-appropriate .unit.
  • .from_files():
    Added new parameters on_defective | on_empty: Literal['ignore', 'warn', 'error'], allowing for passing over bad (empty/malformed) files with optional warnings. The old behavior (on_defective=on_empty='error') remains the default.

line_profiler/line_profiler_utils.py

Added new utilties:

  • CallbackRepr:
    reprlib.Repr subclass helpful for formatting calls (via its .format_call() method) and adjacent objects (e.g. functools.partial).
  • block_indent():
    Block-indent a multi-line string to sit flush with a prefix.
  • make_tempfile():
    Convenience wrapper around tempfile.mkstemp() which returns a pathlib.Path.

line_profiler/rc/line_profiler.toml

  • [tool.line_profiler.kernprof]:
    New key-value pair prof-child-procs = <bool> for the default of the kernprof --[no-]prof-child-procs flag.

  • [tool.line_profiler.child_processes]:
    New key-value pairs for controlling the setup of profiling in child processes:

    • pth_files = { prefix = <str>, suffix = <str>}:
      Instructions on how the temporary .pth files responsible for setting up shop in child processes are created.
    • multiprocessing = { patches = { pool = <bool>, process = <bool>, logging = <bool>} }:
      Toggles for which of the multiprocessing patches to apply.

    The child_processes table and its contents are as of yet considered private implementation details.

kernprof.py

  • _add_core_parser_arguments():
    Now adding these flags to the parser:
    • --prof-child-procs[=...], --no-prof-child-procs:
      Whether to use the feature implemented in this PR; the default is read from [tool.line_profiler.kernprof]::prof-child-procs.
    • --debug-log=...:
      Undocumented (private) flag for writing out the LineProfilingCache debug logs.
  • _write_preimports():
    Refactored to use the new/relocated facilities at line_profiler.curated_profiling.
  • _dump_filtered_stats():
    • New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
    • Partially split off into the new _dump_filtered_line_stats() which it now calls.
  • _manage_profiler:
    Context manager refactored from the old _pre_profile() for more Pythonic handling of setups and teardowns.
    • Added setup for the session cache via calling _prepare_child_profiling_cache().
    • The old function body is split off into smaller components (_prepare_profiler(), _prepare_exec_script()).
    • Now calling _post_profile() on context exit so that we no longer have to explicitly try: ... finally: ... in _main_profile().
  • _post_profile():
    • New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
    • Simplified because some of the cleanup is relocated to line_profiler.curated_profiling.

tests/test_child_procs/test_child_procs.py

Mass-relocated code which isn't directly the test functions into other locations in the package (see New Code above). Plus the following changes (among others):

  • /pool_test_module.py:
    Added the following command-line flags:
    • --start-method selects a specific multiprocessing "start method", including dummy which uses multiprocessing.dummy.
    • --local toggles between using a sum function defined locally in /pool_test_module.py or the one defined externally in /external_module.py.
    • --force-failure toggles whether the sum function should return normally or raise an error.
  • /conftest.py::pool_test_module:
    Supersede the previous test_module.
    • Now a /_test_child_procs_utils.py::ModuleFixture, allowing for easy setup as a temporarily import-able module in the current and/or child processes (see above).
    • Now joined by pool_test_module_clone (same source code but separate module) and process_test_module (same CLI but different model of parallelism)
  • /_test_child_procs_utils.py::_run_as_{script,module}():
    • Now joined by a _run_as_literal_code() to also test kernprof -c ....
    • Now taking test_module as a ModuleFixture instead of a path, and handling its installation.
  • /_test_child_procs_utils.py::_run_test_module():
    • New convenience wrappers run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing of kernprof execution modes as test parametrization.
    • New parameters:
      • profiled_code_is_tempfile: bool helps with constructing the kernprof command line in cases where the code is anonymous (kernprof -c ...).
      • use_local_func: bool, fail: bool, and start_method: Literal['fork', 'forkserver', 'spawn'] | None allows for fuzzing code execution with the aforementioned test_module CLI flags (resp. --local, --force-failure, and --start-method).
      • nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with --prof-child-procs, only those in the main process without).
      • subproc: bool for toggling whether to run the test module in-process or in a subprocess.
    • Added checks:
      • If fail is true, the kernprof subprocess should fail.
      • Temporary .pth files created by kernprof --prof-child-procs should be cleaned up.
      • Profiling output is consistent with the provided nhits (where available).
      • If subproc is false, that certain warnings aren't issued (e.g. LineStats warning that we're trying to read profiling stats from an empty file), paralleling the checks in /test_child_procs.py::test_apply_mp_patches_{success,failure}().
    • Now retrieving the LineProfilingCache debug logs and printing them for debug purposes.
  • /test_child_procs.py::test_multiproc_script_sanity_check():
    • Now also testing the new /process_test_module.py.
    • Now fuzzing the parametrizations use_local_func, fail, and start_method, to ensure that the test scripts are fully functional in vanilla Python.
    • Superseded the argument as_module: bool with run_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ..., python -m ..., and the new python -c added via the aforementioned _run_as_literal_code()).
  • /test_child_procs.py::test_running_multiproc_script():
    New parametrization run_func allows for absorbing the old test_running_multiproc_module() into the same test as additional parametrization, as well as testing kernprof -c.

Caveats

  • The temporary .pth file created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of .pth files should be called out, given their recent spotlight in a CVE vulnerability.
  • Since the .pth file is written to sys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the same Python will have to import and run _line_profiler_hooks.load_pth_hook(). While the function itself should quit rather quickly when we're not in a child process, and without causing any of line_profiler to be loaded into sys.modules, it does still incur a certain overhead for interpreter boot-up.

What didn't work

  • Currently multiprocessing-managed child processes have the usual atexit hook which dumps profiling data disabled. On top of that the multiprocessing patch "pool" (~._child_process_profiling.multiprocessing_patches._profilling_patches.py::POOL_PATCH) also disables the dumping of profiling data at the end of multiprocessing.process.BaseProcess._bootstrap() patched in by the patch "process" ([...]::PROCESS_PATCH). This was the conclusion after much head-scratching, where processes terminated (presumably) in the middle of a LineStats.write() call results in corrupted/incomplete data.
  • In theory one could've caught the SIGTERM that BaseProcess.terminate() sends to the child and only dumped the stats then and there (along with normal end-of-callback stats-dumping), which sounds much more efficient than dumping stats for every task submitted to a Pool. However:
  • All attempts at cross-process synchronization (e.g. setting up lock files so that child processes can signal to the parent that they are ready for whatever) turned out... badly. There seems to be a million way for things to go sideways in child processes, and I guess now I understand why multiprocessing.pool is so gung-ho in killing off worker processes and replacing them, without much regard of what actually went down down there.3

TODO

  • Add documentation on this new feature, but I guess we should wait until we're happy with the feature and the code.
  • Maybe we should indicate this feature to be experimental...
  • Would it make more sense for any of the content in line_profiler._child_process_profiling to become public API?

Credits

Notes

Welp. This took way longer than I expected. (EDIT: I think I wrote this sentence like two months ago.) The main friction points were that:

  • There isn't a pre-existing "global-ish" state object that I can leverage, and which can be easily replicated in subprocesses. The new line_profiler._child_process_profiling.cache.LineProfilingCache class tackles this issue.
  • I had a very hard time trying to make profiling results consistent even when the parallelly-executed function errors out. Would have thought that I already took care of that in the other project (see pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Figuring out how to do so consistently took the better part of these months.

Footnotes

  1. Note however that the equivalent vanilla Python command (python -c ...) would error out, because functions sent to multiprocessing must be pickle-able and thus must reside in a physical file. This is sidestepped by kernprof's always writing code received by kernprof -c ... and ... | kernprof - to a tempfile (ENH: auto-profile stdin or literal snippets #338).

  2. In the test suite, process creation is tested both with the most common multiprocessing[.get_context(...)].Pool and with self-managed Process objects. Different patches to multiprocessing are responsible for profiling-data collection between the two situations, and they have been tested to work both (1) individually on their respective use-cases and (2) together without stepping on one another. See tests/test_child_procs/test_child_procs.py::test_profiling_multiproc_script_{success,failure}().

  3. Since multiprocessing.pool.Pool-managed child processes ("workers") are regularly and wantonly terminated, which bypasses Python control flow and prevents e.g. atexit hooks from executing, we've taken to report profiling data from Pool workers on a per-task basis. 2 3

@TTsangSC TTsangSC changed the title FEAT: extend profiling to child processes [Draft] FEAT: extend profiling to child processes Apr 14, 2026
@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens.

@TTsangSC TTsangSC force-pushed the profile-child-processes branch 2 times, most recently from f9a37af to aca4e2c Compare April 16, 2026 21:16
@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Unfortunately there's too little context to determine why the tests are failing on other platforms. Heck I can't even replicate the macOS failures on my machine with matching dep versions. Just wrote in more code for extracting the debug outputs, force-pushed, and hopefully I will have more clues for what to work on.

@TTsangSC

TTsangSC commented Apr 21, 2026

Copy link
Copy Markdown
Collaborator Author

Added a ton of logging/debug messages, a few debug-only config options and a kernprof flag,1 some unit tests of the individual components, but apparently two problems remain:

  • tests/test_child_procs.py::test_apply_mp_patches(start_method='dummy') is failing out of the gate (on 3.10), because I assumed that the profiler would just catch the data from within the same process. Which it did on my machine, but I was on 3.14. Further tests revealed that it passes on 3.12+ but only when LINE_PROFILER_CORE="ctrace" is not set, indicating that there are some inconsistencies between how threading is handled between the legacy trace system and sys.monitoring.

    • Even weirder is that we kinda already have preexisting tests covering multithreaded cases (which use concurrent.futures.ThreadPoolExecutor, i.e. threading in the backend):

      • tests/test_complex_case.py::test_varied_complex_invocations() tests that kernprof runs the code and writes a profiling-stats file, but doesn't really test the content of said file. (We should probably update this test once the PR matures.)
      • tests/test_sys_trace.py::test_wrapping_thread_local_callbacks() mainly tests the use of trace callbacks, but also inadvertently the collection of profiling data across threads, because the target function calls one profiled function on the main thread and another on the worker thread, and we expect and test for profiling outputs from both.

    This begs the question, we already know from the 2nd that we can collect data from a worker thread in the same process, even before this PR – and consistently so between all our supposed EDIT: supported Python versions. So why is it different when multiprocessing.dummy (i.e. a wrapper around threading) is used?

  • Particularly on Linux, the tests/test_child_procs.py::test_profiling_multiproc_script(preimports=False, start_method='forkserver') tests have been consistently failing for a few pushes. From the debug output, the issue is apparently that multiprocessing.spawn.prepare() in the child process isn't calling runpy.run_path(), which would've set profiling up. My guess is that we're somehow failing the check inside _fixup_main_from_path()... but that would imply that (1) the child processes are somehow inheriting sys.modules['__main__'] from the fork-server process from which they are forked, and (2) said fork-server process doesn't already have profiling set up. Neither of those seems to be true on MacOS when using forkserver so IDK what's happening. Maybe something to do with how the default start methods are different between the platforms...

Footnotes

  1. Maybe I added a bit too much stuff and should tear some of that out when we're done debugging...

@Erotemic

Copy link
Copy Markdown
Member

WRT to the existing multithreaded case, IIRC the main point of that is just to ensure we don't hang. I could be misremembering.

For the forkserver issue, from what I understand that's a long lived process, so maybe some forkerserve state debugging (not sure if your code does this or not, I haven't looked at it yet)

def debug_forkserver_state(label):
    import os
    from multiprocessing import forkserver

    fs = forkserver._forkserver  # private CPython state
    pid = getattr(fs, "_forkserver_pid", None)

    # "Did *this process* already know about / launch a forkserver?"
    known_forkserver_pid = pid
    known_forkserver_started = pid is not None

    # "Was *this current process* itself started by a forkserver?"
    started_by_forkserver = forkserver.get_inherited_fds() is not None

    # Best-effort liveness check for the known forkserver PID, if any.
    # This is optional, and mostly useful in the parent/originating process.
    forkserver_pid_alive = None
    if pid is not None:
        try:
            os.kill(pid, 0)
        except OSError:
            forkserver_pid_alive = False
        else:
            forkserver_pid_alive = True

    print(
        f"[{label}] "
        f"pid={os.getpid()} ppid={os.getppid()} "
        f"known_forkserver_pid={known_forkserver_pid!r} "
        f"known_forkserver_started={known_forkserver_started} "
        f"forkserver_pid_alive={forkserver_pid_alive} "
        f"started_by_forkserver={started_by_forkserver}"
    )

And maybe also check if the main module attributes differ in any meaningful way?

getattr(sys.modules['__main__'].__spec__, 'name', None)
getattr(sys.modules['__main__'], '__file__', None)
multiprocessing.get_start_method()

But IDK, your guess is probably better than mine at this point.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Thanks for the input! May have to take a closer look at the ForkServer object as you've suggested.

I have a piece of good news and a half:

  • Apparently the first discrepancy is due to how LineProfiler.enable_count is managed – somehow (seriously IDK why) with legacy trace but not with sys.monitoring, the profiler starts out not being .enable()-d in the new thread.1 I have yet to push the fix yet, but patching threading.Thread.__init__() iff we're using legacy trace so that the .enable_count in the new thread is synced seems to fix the bug.
  • The discrepancy between my local tests and the CI for is apparently not platform issue but a Python version one – I'm on 3.13.3 while CI is on 3.13.13, and I managed to replicate the failing pattern in test_profiling_multiproc_script after brew upgrade python@3.13. Still scouring through the diffs to figure out what exactly changed between the versions... but at least I can test out the fix before pushing. That said, differing behavior between patch versions is a big red flag and a PITA to catch and fix...

Footnotes

  1. Meanwhile the aforementioned test_wrapping_thread_local_callbacks consistently worked, because the profiled function is (1) explicitly wrapped by the profiler inside the concurrent workload, and (2) called through the wrapper the profiler. These factors ensured that the profiler was enabled from within the new threads.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Figured it out, the issue is that:

  • Prior to gh-126631: fix pre-loading of __main__ python/cpython#135295 (merged since 3.13.8 and 3.14.1), the 'init_main_from_path' key from multiprocessing.spawn.get_preparation_data() was erroneously ignored by multiprocessing.forkserver.ForkServer.ensure_running() and not passed to multiprocessing.forkserver.main().
  • The side effect was that:
    • The fork-server process was started without calling multiprocessing.spawn.import_main_path(), and thus child worker processes forked therefrom started out without a sys.modules['__main__'].
    • This in turn means that when the child processes run multiprocessing.spawn.prepare(), they had to set __main__ up via runpy.run_path() themselves, thus running the code which sets up the rewrite-based profiling (like line_profiler.autoprofile.autoprofile.run()).
  • Meanwhile, post-135925 said __main__ setup has been done in the fork-server process itself. However, since the fork-server process is neither our main process or a child worker process, the setup code path is different and doesn't involve patching multiprocessing.runpy.run_path(), and hence no rewriting is done.

I think the bug is fixed (will push shortly), but I've noticed significant performance regression (about 100% slowdown of test_profiling_multiproc_script()) between up-to-date patch versions of Python 3.13 and 14 and older ones. Gotta try to iron that out...

@TTsangSC

TTsangSC commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator Author

Tests seem flaky... there is no good reason that 6e60a6a failed on building for 3.12 ARM manylinux while d2203a0 succeeded.

@codecov

codecov Bot commented Apr 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.51267% with 181 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.85%. Comparing base (4940bca) to head (c818cc3).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
line_profiler/_child_process_profiling/cache.py 83.38% 39 Missing and 18 partials ⚠️
line_profiler/curated_profiling.py 71.28% 20 Missing and 9 partials ⚠️
...ofiling/multiprocessing_patches/_infrastructure.py 83.87% 17 Missing and 8 partials ⚠️
...ling/multiprocessing_patches/_mandatory_patches.py 88.37% 9 Missing and 6 partials ⚠️
...rofiler/_child_process_profiling/_cache_logging.py 93.79% 5 Missing and 3 partials ⚠️
line_profiler/_threading_patches.py 80.00% 7 Missing and 1 partial ⚠️
...ess_profiling/multiprocessing_patches/mp_config.py 78.12% 7 Missing ⚠️
...ling/multiprocessing_patches/_profiling_patches.py 85.29% 4 Missing and 1 partial ⚠️
...profiler/_child_process_profiling/runpy_patches.py 91.52% 4 Missing and 1 partial ⚠️
line_profiler/cleanup.py 96.15% 5 Missing ⚠️
... and 5 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
+ Coverage   82.40%   85.85%   +3.45%     
==========================================
  Files          20       33      +13     
  Lines        2256     3592    +1336     
  Branches      359      501     +142     
==========================================
+ Hits         1859     3084    +1225     
- Misses        300      359      +59     
- Partials       97      149      +52     
Files with missing lines Coverage Δ
...iling/multiprocessing_patches/_optional_patches.py 100.00% <100.00%> (ø)
line_profiler/line_profiler_utils.py 96.07% <97.53%> (+9.12%) ⬆️
line_profiler/toml_config.py 91.17% <50.00%> (-1.31%) ⬇️
...cess_profiling/multiprocessing_patches/__init__.py 85.71% <85.71%> (ø)
...rocess_profiling/multiprocessing_patches/_queue.py 82.60% <82.60%> (ø)
...ling/multiprocessing_patches/_profiling_patches.py 85.29% <85.29%> (ø)
...profiler/_child_process_profiling/runpy_patches.py 91.52% <91.52%> (ø)
line_profiler/cleanup.py 96.15% <96.15%> (ø)
line_profiler/line_profiler.py 94.87% <85.29%> (-0.91%) ⬇️
...ess_profiling/multiprocessing_patches/mp_config.py 78.12% <78.12%> (ø)
... and 6 more

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8e40f6...c818cc3. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TTsangSC TTsangSC changed the title [Draft] FEAT: extend profiling to child processes FEAT: extend profiling to child processes Apr 29, 2026
@TTsangSC

TTsangSC commented May 2, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @Erotemic, I think this is ready for review if you have the time... sorry for the metric ton of code.

@Erotemic Erotemic left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will take me a while to fully go through this, but here is a start.

... print("We probably didn't count up to 100 but whatever")
We probably didn't count up to 100 but whatever

>>> with ( # doctest: +NORMALIZE_WHITESPACE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, because we use xdoctest, you should be able to put # doctest: +NORMALIZE_WHITESPACE as a standalone comment above the code for it to apply to every line after. Although I don't often use NORMALIZE_WHITESPACE, so I can't say I'm 100% confident in that.

dir=get_path('purelib'),
)
try:
pth_content = 'import {0}; {0}.load_pth_hook({1})'.format(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the potential to cause some import overhead. IIUC, we are going to effectively import the entire line_profiler.init. I'm not sure if that's going to be noticeable or not. One idea is to make a separate _line_profiler_hooks package that's just a minimal top level module for very fast import, but that may be overengineering. I'd need to think about it and probably test it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One data point on my machine, though I think other platforms are most likely gonna behave the same: importing line_profiler._child_process_profiling.pth_hook causes the following to be imported:

  • All top-level line_profiler submodules except the following:
    • ~.cleanup
    • ~.curated_profiling
    • ~._threading_patches
    • ~.ipython_extension
  • None of the line_profiler.autoprofile components, and
  • None of the line_profiler._child_process_profiling components except ~~.pth_hook itself.

So, as one would intuit, most of the "core" stuff directly used by line_profiler.LineProfiler and @line_profiler.profile. Could be worthwhile to set it up as a separate namespace... and it should be trivial to configure, given that we already create two separate namespaces (line_profiler and kernprof).

The remaining question is more of a design one: should write_pth_hook() stay in said namespace for symmetry with load_pth_hook(), or should it become an instance method of LineProfilingCache's?

)
if not fnames:
return LineStats.get_empty_instance()
return LineStats.from_files(*fnames, on_defective='ignore')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe on_defective='warn' makes more sense here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be. The original motivation is that:

  • Since the ._setup_in_child_process() code is run in all child processes at startup incl. unused worker processes and the fork-server process, each such process occupies and secures a tempfile name (*.lprof) by touching it. Said tempfile is supposed to be written to when the corresponding Python process terminates.
  • But then some of these files could result in errors upon reading: processes killed via signals before cleanup is run result in empty tempfiles, and those killed mid-cleanup make for corrupted ones.
  • I opted to suppress the error/warning messages given that they originate directly from pickle and are thus not the most intuitive/helpful.

But maybe that's just indication that we should've vet the input files better before passing them onto pickle. And given that the rest of the PR has seen a lot of iteration after these lines were first written, we should probably get a lot fewer false alarms from the warnings, and where we do have something to warn it will be of actual problematic cases (e.g. unclean exits resulting in incomplete data).

"""
# XXX: why can `coverage` get away with not doing all these
# lock-file hijinks and just patching `BaseProcess._bootstrap()`?
def get_poller_args(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be nested functions? Would a helper class make more sense than relying on closures / be more testable?

Do we have a test for a worker that does some non trivial processing and then intentionally terminates so we can have an explicit comparison between the unpatched and patched terminate and make sure they both behave similar enough?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I should probably refactor wrap_terminate() for readability. We've barely used closure variables in the nested funcs anyway (just cache in process_has_returned()); still, that can be easily remedied, given that I can just pass cache to the function in the _Poller.poll_until() constructor.

We do test Process.terminate() as shown in the coverage report, albeit indirectly:

  • In normal ("happy-path") code execution Process.terminate() isn't even called – AFAIK it only happens when the parallel workload errored out, which for whatever reason may leave the child process stuck "in limbo", where e.g. atexit cleanup callbacks may or may not be executed, or be in various stages thereof.
  • Such children are .terminate()-d as a part of finalization of the parent Pool (see Pool.__init__() and .terminate()), which in particular does not wait for child-process cleanup, thus necessitating the patch.
  • In order to test this we have tests1 where the parallel workload can be "set up for failure". Those are where the coverage on wrap_terminate() comes from. A quick check is how running the test suite with -k "test_child and success" only gives 52% coverage on multiprocessing_patches.py while -k "test_child and failure" gives 72%.

Given that the existing code do call vanilla_impl() in a finally clause, it is probably the closest we can get to the unpatched behavior in that we (1) do eventually send the SIGTERM, and (2) before that, give child processes (as far as possible – which isn't 100% of the time, hence the necessity of the timeout) the required grace to run cleanup and write profiling data. Anyhow, maybe we can look into tests where we explicitly create a Process object and later .terminate() it, if that would be more assuring.

Footnotes

  1. Granted, we just sum over consecutive integers in those tests. They have the benefit that the workload size corresponds directly to the num-hits on the loop line, and hence we can keep track of whether the gathered profiling data are complete. If that's too trivial maybe we can do Fibonacci or something instead.

Comment thread line_profiler/cleanup.py
self, obj: Any, attr: str, value: Any, *,
name: str | None = None,
cleanup: bool = True,
priority: float = 0,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused argument.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch!

Comment thread tests/test_child_procs.py Outdated
)


@pytest.mark.retry(_NUM_RETRIES,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a retry necessary here? Is there something non deterministic going on? Any chance we are hiding a race condition? Similar question with other pytest.mark.retry marked tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer

Yes, workers with failing workload ends up in limbo sometimes, as mentioned in the other comment. It should theoretically be possible to drop the retry on POSIX (and even the whole wrap_terminate() deal) because of the use of SIGTERM handlers, but the consensus seems to be that signal handling is busted on Windows, so we still need the timeout which is by nature flaky.

The long answer

  • Again, processes receiving parallel workloads which don't error out seem to always exit cleanly and never need to be .terminate()-d. Those which don't... don't. Which sets up the potential race condition: the failing child could be cleaning up, stuck in limbo (possibly waiting for communication from the parent?), or whatever; meanwhile the parent don't really care either way, because it already received all the results (i.e. pickled return-value objects or exceptions) from the children, and is ready to .terminate() whichever child process that remain alive.
  • In earlier iterations of the PR I just set each child process up to manage a lock file, and had the parent process wait without a timeout for said file before sending the SIGTERM (which is essentially the only thing that the vanilla Process.terminate() does). Which then caused the tests to sometimes hang indefinitely, meaning that in some failing child processes we never naturally got to the point where the finally: cache.cleanup() clause is executed. Unfortunately such failure also seems to be nondeterministic, at least at a cursory glance.
  • Hence the timeout. Which did alleviate the issue in that we do get all of the profiling data most of the time – because the parent process no longer willy-nilly kills the children, at least until the timeout expires. This handles cases where the child process is healthy enough to actually attempt cleanup, but would have been prevented from doing so by the parent's .terminate()-ing it.
  • But then of course there are still corner cases for when the children is FUBAR – and I think it can be argued that those cases are what the Process.terminate() method is for in the first place. Like, in a perfect would the child process would've just executed its atexit hooks and exited non-zero, right after raising the error and pushing it to the communication queue with the parent. But obviously that doesn't always happen. In those cases anything goes... hence the retries.

Notably, coverage also seems to be having trouble with consistently handling data collection when child processes are .terminate()-d, and just gives up while on Windows:

@pytest.mark.skipif(env.WINDOWS, reason="SIGTERM doesn't work the same on Windows")
@pytest.mark.flaky(max_runs=3)  # Sometimes a test fails due to inherent randomness. Try more times.
class SigtermTest(CoverageTest):

What do we do?

I didn't like having to retry either – but here we are. But maybe we can at least refactor and separate the tests, so that we make sure that the mark is only applied to cases where it might be needed/justified (i.e. tests with failing parallel workloads and on Windows).

That or we just skip (as coverage does) or XFAIL on Windows.

@TTsangSC

TTsangSC commented May 5, 2026

Copy link
Copy Markdown
Collaborator Author

I have a new idea for improving determinism in profiling multiprocessing stuff.

  • The main culprit for failure in profiling child processes is that:
    • While by wrapping Process._bootstrap() we are "guaranteed" on Python level that cleanup will occur after the parallel workload exits, it is important to note that the processes created by Pool doesn't directly have the callables we sent to the pool (via e.g. .map(), .starmap(), or .apply()) as their workloads. Instead their workload is the multiprocessing.pool.worker() function, which fetches tasks (i.e. our callables and their arguments) from an inqueue and pushes the results (either the return values of the parallel workloads or the errors they raised) back into an outqueue.
    • The Pool object in the main process communicates with the children via said queues, and sees it fit to (OS-level) terminate them as long as all the results have been collected – even if the workload (i.e. worker()) hasn't exited per-se.
  • To maintain Python-level control, we must write profiling results on the task level, before the result is handed back to the pool. This can be achieved via patching the methods where tasks are pushed onto the queues, slipping in calls to LineProfilingCache.load().profiler.dump_stats() – namely Pool._get_tasks() and Pool._guarded_task_generation().
  • Unfortunately everything that needs to be communicated between processes has to be pickle-able, meaning that we can't simply create wrapper functions around the original callable. Still, if we wrote a helper class which does the wrapping on its .__call__() method it should be workable.

However, this strat does have complications:

  • Generally we have one profiler instance (LineProfilingCache.load().profiler) per child process. Since each process can handle multiple tasks, we must be careful against double-counting profiling data. Maintaining a single per-process filename handle to which profiling data is written and overwritten may be a solution. That or we reset the profiler after every task, but we haven't merged feat: Add reset_stats method to LineProfiler for resetting accumulated profiling data #322 yet.
  • This works fine and all with Pool-based parallelism, but fails if one sidestepped the Pool and otherwise managed their own Processes. Maybe we can have a switch somewhere for choosing whether to patch Pool or Process (or both, but we'll have to make sure the patches don't clash and cause double-counting), like how coverage allows for configuring which patch(es) to apply.

@TTsangSC

TTsangSC commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

As it is now, the code (at least on non-Windows platforms) already reliably and deterministically captures all profiling data related to tasks set to the child processes. If the child process gets to execute any task at all, it must have gone through setup which allows for the capturing of profiling data.

What however is more difficult to handle consistently, and is causing most of the recent pipeline failures (other than 25785347402, which was entirely on me), is what happens with child processes which terminates without ever having received a task.1 While currently the SIGTERM handler (which ensures that the profiling stats are written before the child process kicks the bucket) is set up rather early (at the start of multiprocessing.pool.worker() or multiprocessing.process.BaseProcess._bootstrap()), it is apparently not always early enough, as shown by the tests failing on the account of the empty-prof-stats-file warnings.2

Such failure calls into question:

  • Can any child-process setup be guaranteed to happen early enough that we can ensure the writing of profiling data before the parent process can terminate it? (Probably not.)
  • Is more bookkeeping the only option (e.g. having each child report the number of parallel tasks executed, and special-casing those that didn't report any)?
  • Is that even a worthwhile pursuit, given that more bookkeeping = more overhead? Should we just accept the occasional empty-file warnings as long as any actual workload executed can be profiled?

Footnotes

  1. This seems to only happens when a task fails (e.g. in the test_*_failure() tests), but that may also be an artifact of our setup (which creates exactly as many tasks as there are child processes).

  2. I must note that the code has been extensively tested on local without failing like this, albeit without pytest-cov. It seems that the overhead of setting up coverage however delayed the setting up of the SIGTERM handler in child processes, hence preventing the profiling data (or the lack thereof) from being written. So it is indeed a race condition... between the parent and child processes that is.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Added the PID bookkeeping, but there seems to be an unrelated Heisenbug which I'm struggling to pin down. Every ≈ 1000–2000 runs of test_apply_mp_patches_failure() with start_method=fork or forkserver on my local would lock up and fail... and the bug vanishes when I also apply the logging patch to tee the multiprocessing internal logs. So much for determinism... 🤦‍♂️

The newest pipeline failures are semi-related:

  • Job 76436927210 failed on test_child_procs.py::test_profiling_multiproc_script_failure[2000-3-run_func0-script-True-with-child-prof-fork-False-no-preimports-False-external] – makes sense, since this is just the integration-test and isolated-in-its-own-subprocess version of test_apply_mp_patches_failure().
  • Job 76436927202 failed on test_child_procs.py::test_apply_mp_patches_failure[100-2-False-True-process-only-spawn], but that was because the poller we use for Pool.terminate() on Windows timed out, and I forgot that pytest.raises() would catch the timeout but fail it with an AssertionError because of mismatching error messages. Guess that I'll just use an explicit try-except to fish for the specific RuntimeError I want.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

@Erotemic yikes, unfortunately job 76507649976 seems to have gotten stuck despite multiple precautions (timeouts for dubious sections, hundreds/thousands of rounds of local tests)... worse, its the Mac job that's stuck (which I suppose is the most expensive on CI), despite that being the most thoroughly tested env, since it's also my local.

I don't have the perms to cancel the job; please do so ASAP before it continues to rack up compute.

Terribly sorry for this.

@Erotemic

Copy link
Copy Markdown
Member

Yikes! Just saw the message. I don't see an immediate way to cancel it. Maybe it timed out and it just looks like it is churning? It does say: "The job has exceeded the maximum execution time of 6h0m0s", and in this PR I see: Cancelled after 365m. So I think it is not running.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Yep, I think Actions have a default 6 h limit (not sure if it's the entire pipeline or individual jobs) on GitHub. This happened on Monday so said default timeout expired a while before intervention... it may not be ideal, but we can probably put in loose-ish timeouts for build_and_test_sdist, build_binpy_wheels, and test_binpy_wheels, like maybe 1 h for build_binpy_wheels and 10 m for the others.

For the (probably1) offending test test_apply_mp_patches_failure(), unfortunately I'm still struggling to replicate the failure – not even the hanging part, and much less how it hanged despite the critical part's being supposedly isolated in a thread on a timer. (Doesn't exactly inspire confidence for the PR I know...) Before I apparently "fixed" similar failures on local (and thus felt confident enough to push after much testing), the logs seemed to be stuck after the call to Process.terminate() before resuming after the timeout expired. My guess is that Process.join() hung, but that must have meant that Process.terminate() somehow failed to nail the processes. But beyond that IDK.

We can probably mitigate this by folding the cases tested by test_apply_mp_patches_*() back into test_multiproc_script_sanity_check() so that we're "protected" by the kernprof subprocess, but of course (1) we're not supposed to have to do that, (2) we lose granularity and coverage by not running the patched code in-process, and (3) more subprocess-based tests means more overhead... but any such overhead is probably minimal compared to spending 6 h stuck in limbo. (Again, very sorry.)

Footnotes

  1. Because the stuck job wasn't on pytest --verbose and pytest output is kinda line-buffered, the test where we were stuck could've been any of the latter ones in tests/test_child_procs.py. But that one is the most suspicious because the kernprof ones use subprocess.run(timeout=...) instead of my jury-rigged thread-based timeout solution, which is probably more robust.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

When trying to sniff around and trigger and fix the bug I ran into an even weirder issue.

  • In principle, the @_timeout decorator does nothing other than to run the function on a new (daemon) thread, pickle and return/raise the result if it finished execution without the time limit, and just raise a timeout error otherwise.
  • Moving said decorator out from the function using multiprocessing to the entire test function, the first subtest passes while the following mostly fails, on account of the profiling stats being inconsistent.
  • Interestingly, the failure patterns seem to be that the stats are:
    • Correctly collected in child threads and/or processes when using multiprocessing.dummy (i.e. threads) or the start methods forkserver or spawn, but
    • Not collected when using the start method fork, and
    • Not collected in the "current" thread on which the test function is run.

This seems to indicate something being wrong when profiling starts on a non-main thread; in view of that, how start_method='fork' completely falls apart kinda makes sense, given Python's warnings on mixing process forking and multithreading. But then this still doesn't explain why we're losing stats on the current thread from the second subtest onwards. There's possibly some pollution in thread-local states which I'll have to diagnose...

@TTsangSC

TTsangSC commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Sorry for the lack of update. Was a bit shellshocked after the last blunder and held off on making pushes until I can diagnose what happened and better reproduce the sporadic hanging bug...

... in which however I am not quite successful. It still only shows up every ≈1000 tests or so, and apparently according to the logs, that is due to the signal handler's somehow not firing on one of the child processes; still trying to figure out how that is at all possible. Still I've made some further changes:

  • Added workflow-level timeouts to all the test jobs to avert the worse-case scenario of something being stuck for six hours.
  • Fixed a bug in LineProfilingCache which causes global-state pollution after kernprof.main() has been called.
  • Updated tests/test_child_procs.py::test_profiling_multiproc_{success,failure}() so that most of the subtests are run with kernprof.main() in-process.

If it isn't as safe and deterministic to temper with signal handling as I've initially hoped, we may just have to report profiling stats more granularly like on Windows, where each multiproc pool task prompts the child process to write the profiling stats to its assigned file. Since one of the patches already patches task/result queues to send the child-process identity back to the parent alongside the task result, it should be easy (though with overhead) to also slip the collected stats in. Will see if that helps.

Oh and looks like I broke something else completely related... not a good sign. The lint-job failure seems unrelated though, it was on line_profiler/autoprofile/profmod_extractor.py and I didn't touch that file. Seems an artifact of our not pinning the ty version – on 0.0.31 it still type-checked and now on 0.0.46 it doesn't. Will fix both either way.

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Still working on this on-and-off, but I'm unfortunately still stuck with flaky tests... seems that there's something which makes SIGTERM handling inherently unreliable in multiprocessing child processes, causing them to randomly hang (see python/cpython#73945, python/cpython#82408, and coveragepy/coveragepy#1310).

Unfortunately this means that there's no guarantee for profiling output when a child process is BaseProcess.terminate()-ed, since an unhandled SIGTERM breaks the Python runtime (try-finally clauses, atexit hooks, etc.). Seems that the only workaround is to avoid having children terminated at all, will look more into that.

@TTsangSC TTsangSC force-pushed the profile-child-processes branch from fe70770 to bc3adb9 Compare June 26, 2026 01:47
TTsangSC added 5 commits June 26, 2026 04:26
- `line_profiler/curated_profiling.py`
  New module for setting up profiling in a curated environment

  - `ClassifiedPreimportTargets.from_targets()`
    Method for creating a `ClassifiedPreimportTargets` instance,
    facilitating writing pre-import modules in a replicable and portable
    manner
  - `ClassifiedPreimportTargets.write_preimport_module()`
    Method for writing a pre-import module based on an instance;
    also fixed bug where the body of the written module was intercepted
    without appearing in the debug output

- `kernprof.py`
  - `_gather_preimport_targets()`
    Migrated to `line_profiler.curated_profiling`
  - `_write_preimports()`
    Now using the new `ClassifiedPreimportTargets` class, moving esp.
    the logic to the `write_preimport_module()` method
- `kernprof.py::_manage_profiler`
  `line_profiler/curated_profiling.py::CuratedProfilerContext`
  New context-manager classes for handling profiler setup and teardown
- `kernprof.py::_pre_profile()`
  Refactored into the above context managers and other private functions
  (`_prepare_profiler()`, `_prepare_exec_script()`)
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
    New class for passing info onto child processes so that profiling
    can resume there

line_profiler/pth_hook.py
    New submodule for the .pth-file-based solution to propagating
    profiling into child processes:

    write_pth_hook()
        In the main process, write the temporary .pth file to be loaded
        in child processes
    load_pth_hook()
        Called by the .pth in child process, loading the cache and
        setting up profiling based thereon
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
    Added new `.profile_imports` attribute to correspond to `kernprof`'s
    `--prof-imports` flag

line_profiler/_child_process_profiling/meta_path_finder.py
    New submodule defining the `RewritingFinder` class, a meta path
    finder which rewrites a single module on import

line_profiler/_child_process_profiling/pth_hook.py
    write_pth_hook()
        Now also handling the `os.fork()` patching/wrapping
    _setup_in_child_process()
        Now creating a `RewritingFinder` to mirror what
        `~.autoprofile.autoprofile.run()` does in the main process

.
TTsangSC added 6 commits June 26, 2026 04:26
line_profiler/_child_process_profiling/multiprocessing_patches/
    Split the contents of `__init__.py` into separate submodules:

    poller.py::Poller
        Migrated from `__init__.py::_Poller`
    _queue.py::PutWrapper
        Migrated from `__init__.py::_QueuePutWrapper`
    config.py::MPConfig
        Migrated from `__init__.py::MPConfig`
    _infrastructure.py
        Patch
            - Migrated from `__init__.py::_Patch`
            - Added attribute `.priority` to the interface
        SingleModulePatch
            Migrated from `__init__.py::Patch`
        Registry
            - Migrated from `__init__.py::_PatchRegistry`
            - Added class method `.from_entry_point()` for constructing
              an instance from patches loaded from an entry point
    _mandatory_patches.py::{RebootForkserver,ResourceTracker,RunPy}Patch
        - Migrated from the eponymous classes in `__init__.py`
        - Now declared as `line_profiler._multiproc_patches` entry
          points
    _profiling_patches.py
        POOL_PATCH, PROCESS_PATCH
            `SingleModulePatch` objects that are now exposed as
            `line_profiler._multiproc_patches` entry points
        wrap_worker[_{write_on_exit,write_per_task}]()
            Migrated from `__init__.py::wrap_worker_pool*()`
        wrap_bootstrap(), wrap_terminate()
            Migrated from the eponymous functions in `__init__.py`
    _optional_patches.py
        CHILD_PIDS_PATCH, LOGGING_PATCH
            `SingleModulePatch` objects that are now exposed as
            `line_profiler._multiproc_patches` entry points
        wrap_handle_results(), wrap_process()
            Migrated from the eponymous functions in `__init__.py`
        wrap_worker()
            Migrated from `__init__.py::wrap_worker_pid()`
    __init__.py::__all__
        Now including the following:
        - `MPConfig` (= `config.py::MPConfig`)
        - `Poller` (= `poller.py::Poller`)
        - `Registry` (= `_infrastructure.py::Registry`)
        - `Timeout` (= `poller.py::Poller.Timeout`)

tests/test_child_procs.py
    Updated imports from
    `line_profiler._child_process_profiling.multiprocessing_patches`

setup.py
    Added the aforementioned `line_profiler._multiproc_patches` entry
    points
line_profiler/_child_process_profiling/multiprocessing_patches/
    _mandatory_patches.py::wrap_{terminate,start,bootstrap}()
    _mandatory_patches.py::PROCESS_TERMINATION_PATCH
        New patch for `multiprocessing.process.BaseProcess` with
        increased priority, which ensures that SIGTERM isn't sent to
        child processes until they are finished with setting up;
        exposed as entry point `__process_termination`
    _mandatory_patches.py::setup_mp_child()
        Refactored and migrated from
        `_profiling_patches.py::setup_mp_child()`
    _profiling_patches.py::wrap_{worker_*,bootstrap}()
        Removed basic setup (migrated to `PROCESS_TERMINATION_PATCH`)

setup.py
    Added entry point `__process_termination` to
    `line_profiler._multiproc_patches`
line_profiler/_child_process_profiling/cache.py
::LineProfilingCache._wrap_os_fork()
    `os.fork()` wrapper now clearing the cleanup-callback stacks on the
    pre-fork instance in the forked process to avoid clashes with the
    new, post-fork instance

line_profiler/rc/line_profiler.toml
    Updated comments
We no longer catch and handle SIGTERM because that apparently causes
deadlocks in `multiprocessing` child processes (see CPython GitHub
issues 73945 & 82408, coverage issue 1310)... have to figure out another
way to ensure the writing of profiling data.

line_profiler/_child_process_profiling/
    cache.py::LineProfilingCache
        Removed private attributes and methods related to signal
        handling
    multiprocessing_patches/_mandatory_patches.py::setup_mp_child()
        No longer adding `SIGTERM` handling
    multiprocessing_patches/_profiling_patches.py
        dump_stats_quick()
            Updated call signature
        wrap_worker_write_*(), wrap_bootstrap()
            Removed reference to `SIGTERM` handling
    multiprocessing_patches/mp_config.py::MPConfig
        Removed attribute `.catch_sigterm`

line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing]
    Removed config item `catch_sigterm`

TODO:
    Update `_profiling_patches` to mitigate `BaseProcess.terminate()`
line_profiler/_child_process_profiling/multiprocessing_patches
    (The `child_pids` optional patch has been reworked into the
    `__pool_worker_pid` mandatory patch.)

    __init__.py::apply()
        Updated docstring
    _infrastructure.py::Registry.from_entry_point()
        Updated doctest
    _mandatory_patches.py
        POOL_WORKER_PID_PATCH
            Migrated from `_optional_patches.py::CHILD_PIDS_PATCH`
        wrap_handle_results(), wrap_process(), wrap_worker()
            Migrated from eponymous functions in `_optional_patches.py`
    _profiling_patches.py
        wrap_worker()
            Now always using the previous `wrap_worker_write_per_task()`
            implementation, because we're no longer catching SIGTERM
            even on POSIX
        wrap_terminate()
            Now only blocking the call to `BaseProcess.terminate()` if
            `self` is a pool worker which has run at least 1 task; this
            avoids having idle workers in a "dirty" state interfering
            with pool termination

line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing.patches]
    Removed item `child_pids`

setup.py
    Renamed entry point: `line_profiler._multiproc_patches.child_pids`
    -> `.__pool_worker_pid`

tests/test_child_procs.py
    _run_kernprof_main_in_process()
    _test_apply_mp_patches()
    test_apply_mp_patches_success()
        Removed references to the `child_pids` patch
tests/test_child_procs.py::test_apply_mp_patches_failure()
    Removed `@pytest.mark.retry` marker

tests/conftest.py, tests/test_child_procs.py
    Removed because `@pytest.mark.retry` is no longer used anywhere
@TTsangSC

TTsangSC commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

The lint job failed on ty with 5 diagnostics, but they are all warnings (redundant casts) and I don't think we're supposed to fail on those...

EDIT: of course they set error-on-warning to true in 0.0.52.

@Erotemic

Erotemic commented Jun 26, 2026

Copy link
Copy Markdown
Member

Yeah, I'm going to either pin ty or use a more stable linter for CI. Don't worry much about that.

Let's take this approach.

Let's clearly define the boundary between the cases that we guarantee will work, and the unsupported cases.

It feels like there are some fundamental limitations.

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored

https://man7.org/linux/man-pages/man7/signal.7.html

@TTsangSC

Copy link
Copy Markdown
Collaborator Author

Yeah at this point I think I have to accept defeat, that:

  • Per-task profiling (as implemented in the pool patch) is the only way to consistently profile Pool workloads (i.e. "tasks");
  • A worker process can arbitrarily hang between tasks, probably because it is waiting to acquire a lock on a shared queue, which is either held by the parent as a part of pool termination, or maybe by another worker which got terminated before it can properly release the lock.
  • We probably shouldn't try to block/defer the call to BaseProcess.terminate() (as implemented in the process patch) as it's bound to either be nondeterministic or cause hangs.

Which is a bummer, but I guess it's about time I learned, instead of continuing burning CI compute in flaky tests and letting the PR sit in limbo. As such I'll probably have to:

  • Remove the subtests where we apply only the process patch but not the pool patch, because the underlying test script is ultimately using pool-based parallelism.
  • Keep the subtests that has both applied (i.e. the default behavior) to ensure that there are no adverse interactions between the two, and
  • Write new tests that directly create and manage BaseProcess objects without going through a Pool, and test that the pool patch works as expected there.

Other remarks:

  • All the Linux failures in the last pipeline however happened on a single other test, and was because of an edge case that I haven't yet run into: an idle worker process (having run 0 tasks), which however has naturally died before Pool._terminate_pool() is called. Since the process isn't alive, neither BaseProcess.terminate() nor .join() was called, and thus we didn't register that the process was idle and that we expected its profiling-stats file to be empty. I have a fix on local where we patch Pool._terminate_pool() instead of Pool.Process() for the bookkeeping, which should fix the issues.
  • And the Windows job... unfortunately it errored out when processing the logs, which makes it a bit hard to figure out what exactly happened. Even more curious is how it failed on a *_success test (i.e. one where the parallel workload doesn't raise an error), while usually it's the *_failure tests where all hell breaks loose. Guess that we can fix the bare asserts and put in more helpful error messages.
  • The MacOS failure particularly stings because the offending tests have been run into the ground on my local. Trust me, I wouldn't have pushed if the tests were a tenth as flaky in my testing as they behave on CI. But alas, works-on-my-machine-itis is a thing...

TTsangSC added 8 commits June 27, 2026 21:43
line_profiler/_child_process_profiling/_cache_logging.py
    Updated assertions to emit more helpful error messages when they
    fail
line_profiler/_child_process_profiling/multiprocessing_patches
/_mandatory_patches.py
    wrap_process()
        Removed (superseded by `wrap_terminate_pool()`)
    wrap_terminate_pool()
        New wrapper for `multiprocessing.pool.Pool._terminate_pool()`
        for handling end-of-process bookkeeping, so that workers that
        exited before they can be `.terminate()`-ed or `.join()`-ed are
        also accounted for
    POOL_WORKER_PID_PATCH.targets
        Now patching `multiprocessing.pool.Pool._terminate_pool()`
        instead of `.Process`
line_profiler/_child_process_profiling/multiprocessing_patches
/_profiling_patches.py
    wrap_terminate()
        Removed because relying on timeouts is inherently unreliable
    PROCESS_PATCH.targets
        No longer patching
        `multiprocessing.process.BaseProcess.terminate()`
tests/test_child_procs/
    Separated `tests.py` into these files:
    - `_test_child_procs_utils.py`:
      Util constants, functions and classes used by the tests and the
      fixtures
    - `multiproc_examples/*.py`:
      Example scripts that are profiled
    - `conftest.py`:
      Fixtures and fixture functions
    - `test_child_procs.py`:
      Actual tests

TODO: Fixes; new tests for non-`Pool`-based parallelism
line_profiler/_child_process_profiling/multiprocessing_patches
/_profiling_patches.py
    wrap_process()
        New wrapper for `Pool.Process` that marks a worker process
    wrap_bootstrap()
        Now avoiding calling end-of function cleanup for marked
        pool-worker processes, because they may be terminated mid-write
        to the profiling-stats file (which should already be written to
        on a per-task basis), causing data loss/corruption
    _get_terminate_poller(), _process_has_returned()
        Removed dead code
tests/test_child_procs/test_child_procs.py
    test_apply_mp_patches_pool_{failure,success}()
        - Renamed from `test_apply_mp_patches_*()`
        - Removed subtests for when only the `process` patch is applied
    test_pool_multiproc_script_sanity_check()
    test_running_pool_multiproc_script()
    test_profiling_pool_multiproc_script_{success,failure}()
        Renamed from resp. tests without the `pool_` infix

TODO: new tests for `BaseProcess`-based parallelism
tests/test_child_procs/multiproc_examples/
    external_module.py::split_workload()
        New function for generating parallel workloads (lists of
        consecutive numbers)
    pool_test_module.py
        my_local_sum()
            Removed superfluous call to `reverse()` (because pyutils#424 has
            been fixed)
        sum_in_child_procs()
            Now using `split_workload()`
        main()
            Now permitting `--start-method=dummy`
    process_test_module.py
        New test module for testing directly managing
        `multiprocessing.process.BaseProcess` instances and eschewing
        `multiprocessing.pool.Pool`

        Worker
            Wrapper around a `multiprocessing.process.BaseProcess` which
            allows for executing a callable in another process and
            retrieving the result
        my_local_sum(), sum_in_child_procs(), main()
            Functions and CLI with the same interface as those in
            `pool_test_module.py`
tests/test_child_procs/conftest.py
    process_test_module[_object]
        New fixture representing
        `tests/test_child_procs/multiproc_examples/process_test_module.py`
    ext_module_object, pool_test_module_object
        Updated docstrings

tests/test_child_procs/multiproc_examples/pool_test_module.py
    Fixed imports of `external_module` utilities, placing individual
    import targets in their own import statements (see issue pyutils#433)

tests/test_child_procs/multiproc_examples/process_test_module.py
    - Fixed import... (ditto)
    - Fixes to `Worker` when `start_method='dummy'`:
      - No longer errors out when `daemon` is passed to `Worker.new()`
        (see GH issue python/cpython#85716)
      - No longer re-raising caught exceptions (in dummy worker backed
        by thread objects) so that we don't get a
        `PytestUnhandledThreadExceptionWarning`

tests/test_child_procs/test_child_procs.py
    test_runpy_patches()
        Updated because of the additonal import in `pool_test_module.py`
    test_apply_mp_patches_{success,failure}()
    test_multiproc_script_sanity_check()
    test_profiling_multiproc_script_{success,failure}()
        Refactored from the resp. tests infixed with `pool_`, so as to
        also test `process_test_module.py`
TTsangSC added 2 commits June 30, 2026 03:32
tests/test_child_procs/_test_child_procs_utils.py
    search_cache_logs()
        Updated error messages to include info about the `re.Match`
        objects (where appropriate)
    add_timeout(), CheckWarnings
        Fixed error-type representations in doctests

tests/test_child_procs/multiproc_examples/process_test_module.py
::Worker
    Added `xdoctest` skip directive to the doctest, because it gives a
    pickling error (of unknown reasons) when run with `xdoctest` but not
    `doctest`

XXX:
    Try to replicate and mitigate the last test failure, where
    `search_cache_logs()` seemed to be complaining about a nonexistent
    regex match
tests/test_child_procs/test_child_procs.py
::test_apply_mp_patches_{success,failure}()
    - Removed `add_timeout()` decorator around inner function because
      the code hindering `BaseProcess.terminate()` is no longer used
    - Removed warning suppression because we no longer expect the
      warning about forking in a multithreaded process
@TTsangSC TTsangSC force-pushed the profile-child-processes branch from bc3adb9 to 7737fbb Compare July 2, 2026 01:24
TTsangSC added 2 commits July 3, 2026 00:21
line_profiler/_child_process_profiling/multiprocessing_patches/
    __init__.py
        No longer exposing the `Poller` and `Timeout` names (as they no
        longer exist)
    _mandatory_patches.py
        PROCESS_SETUP_PATCH
            Renamed and refactored from `PROCESS_TERMINATION_PATCH`: no
            longer going through the polling-for-lock-file routine
            because it should not be necessary (setup code should have
            been run before moving onto the parallel workload thanks to
            the .pth file anyway)
        wrap_start(), wrap_terminate()
            Removed
    mp_config.py::MPConfig
        Removed the `.polling` field
    poller.py
        Removed

line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing.polling]
    Removed subtable

setup.py
    Updated `line_profiler._multiproc_patches` plugin paths
tests/test_child_procs/_test_child_procs_utils.py::@add_timeout
    - Added optional `name` parameter so that the background thread can
      be retrieved for debug purposes
    - Fixed doctest so that it doesn't create a long-lived background
      thread which pollutes the thread pool and causes subsequent tests
      using `os.fork()` to issue the forking-multithreaded-process
      `DeprecationWarning`
@TTsangSC

TTsangSC commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Whew, now that we have no platform-dependent solutions and no poor attempts at synchronization, we should've torn all the wonky stuff out and it should be a lot more stable.

Couple things though:

  • While the current subtests shouldn't have (much) overlap between them, test_apply_mp_patches_*() and test_profiling_multiproc_script_*() do still consist of a ton of subtests (between testing (1) the mechanisms of parallelism (Pool vs self-managed Processes), (2) the multiprocessing start methods, (3) whether to use eager pre-imports, (4) which patches to apply, and (5) which kernprof mode to use (normal, -c, and -m)), and it may be worth looking into further cutting down on them while retaining coverage.
  • 7f35b11 contains the config update to shut ty up about redundant typing.cast(), the same as found in FIX: handling multi-target (from-)import statements #434 (58de3d0), and thus would make for a conflict when that is merged – assuming that the same change isn't already patched into main by a smaller PR that is.

Erotemic added a commit to Erotemic/line_profiler that referenced this pull request Jul 5, 2026
…filing)

Full review of the --prof-child-procs feature branch plus a broader
repo/CI/test-suite audit. Identifies three merge-blocking defects, each
reproduced locally:

- fork-based children re-dump inherited parent stats, double-counting
  all pre-fork hits/times in the merged output
- a silently-failed child .pth hook deadlocks patched pools via a
  result-protocol mismatch
- unwritable site-packages hard-crashes kernprof instead of degrading

Includes a prioritized remediation plan with per-task acceptance
criteria, test-suite and packaging/xcookie findings, and pre-existing
core bugs found along the way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Erotemic added a commit to Erotemic/line_profiler that referenced this pull request Jul 5, 2026
… docs

Splits the single PR pyutils#431 review into four dated documents:
- fable-review-pr431-2026-07-05.md: frozen PR-specific findings/evidence
- fable-review-fullrepo-2026-07-05.md: frozen repo-wide findings
- fable-pr431-plan-2026-07-05.md: live task board for the PR remediation
- fable-fullrepo-plan-2026-07-05.md: live task board for repo-wide work,
  including the new AST-pipeline robustness program and the .pyi-removal
  (modern inline annotations) program

The plan files are the coordination hub: agents claim tasks on the
status board, record outcomes in the agent log, and escalate decisions
via open questions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Erotemic

Erotemic commented Jul 5, 2026

Copy link
Copy Markdown
Member

See: TTsangSC#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants