Very slow tests during profile generation

The recent Python releases, `3.13.13`, `3.14.4`, and `3.15.0a8`, are all slower when running tests for the profile generation (`--enable-optimizations`).

Comparisons of profile task tests "Total Duration":

| `--build` (cpu)                                | `3.14.3`     | `3.14.4`       |
|------------------------------------------------|--------------|----------------|
| `x86_64-linux-gnu` (12th Gen Intel i9)         | 56.3 sec     | 2 min 32 sec   |
| `i686-linux-gnu` (12th Gen Intel i9)           | 10 min 2 sec | 28 min 19 sec  |
| `x86_64-linux-gnu` (Intel Xeon Platinum 8488C) | 1 min 8 sec  | 3 min 21 sec   |
|                                                |              |                |
| `i686-linux-gnu` (Intel Xeon Platinum 8488C)   | 12 min       | 9 hour 27 min  |
| `i686-linux-gnu` (AMD EPYC 9R14)               |              | 27 min 36 sec  |
| `i686-linux-musl` (Intel Xeon Platinum 8488C)  |              | 27 hour 24 min |
|                                                |              |                |
| `aarch64-linux-gnu` (AWS Graviton3E)           | 1 min 35 sec | 3 min 28 sec   |
| `arm-linux-gnueabihf` (AWS Graviton3E)         | 2 min 53 sec | 5 min 5 sec    |
| `arm-linux-gnueabi` (AWS Graviton3E)           | 3 min 39 sec | 3 min 43 sec * |
| `powerpc64le-linux-gnu` (Power9)               | 3 min 10 sec | 12 min 10 sec  |
| `s390x-linux-gnu` (IBM/S390)                   | 2 min 5 sec  | 2 min 53 sec   |

\* warning: target does not support atomic profile update, single mode is selected


While this is mostly an inconvenience, it is really exacerbated on some CPUs and build architectures. As seen in the table, when compiled for `i686` on Linux (Debian or Alpine) the tests are extraordinarily slow on an Intel Xeon Platinum 8488C (AWS `c7i.2xlarge`). But it is not as bad if compiled and run on an AMD EPYC 9R14 (AWS `c7a.2xlarge`) with the same configure and make flags.

If built without `--enable-optimizations` on configure, then a manual test run is speedy (`LD_LIBRARY_PATH=/usr/src/python ./python -m test --pgo --timeout=`). My guess is that this is related to the addition of `-fprofile-update=atomic` from https://github.com/python/cpython/pull/145802.

I understand that 32-bit Linux is not a PEP-11 supported architecture and just hope that this is interesting enough to fix. If not, we can workaround this by dropping `--enable-optimizations` for `i386` like we do on Alpine-based images or perhaps changing the instance type that we use for building.

---

Full build setup (either build with `--platform=linux/i386` or swap `FROM` to `FROM i386/debian:trixie-slim`): [Dockerfile](https://github.com/docker-library/python/blob/ced4ac7ca9f8f8bdbb113f06fe02c42895875aa4/3.14/slim-trixie/Dockerfile).

`configure` and `make` invocations:

```console
$ ./configure --build=i686-linux-gnu --enable-loadable-sqlite-extensions --enable-optimizations --enable-option-checking=fatal --enable-shared --with-lto --with-ensurepip
...
$ make -j 8 EXTRA_CFLAGS=-'g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/usr/src/python=. -fstack-protector-strong -Wformat -Werror=format-security' LDFLAGS='-Wl,-z,relro -Wl,--strip-all'

$ # alpine uses a different build arg for configure: --build=i686-linux-musl
$ # and used a shorter make command:
$ make -j 8 'EXTRA_CFLAGS=-DTHREAD_STACK_SIZE=0x100000' 'LDFLAGS= -Wl,--strip-all'
```

Example fast run on AMD EPYC:

```console
+ make -j 8 EXTRA_CFLAGS=-g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/usr/src/python=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-z,relro -Wl,--strip-all
...
# Next, run the profile task to generate the profile information.
LD_LIBRARY_PATH=/usr/src/python ./python -m test --pgo --timeout=
Using random seed: 2335155114
0:00:00 load avg: 3.86 Run 43 tests sequentially in a single process
0:00:00 load avg: 3.86 [ 1/43] test_array
0:00:45 load avg: 2.35 [ 1/43] test_array passed in 45.7 sec
0:00:45 load avg: 2.35 [ 2/43] test_base64
0:00:52 load avg: 2.24 [ 2/43] test_base64 passed
0:00:52 load avg: 2.24 [ 3/43] test_binascii
0:00:55 load avg: 2.14 [ 3/43] test_binascii passed
0:00:55 load avg: 2.14 [ 4/43] test_binop
0:00:58 load avg: 2.14 [ 4/43] test_binop passed
0:00:58 load avg: 2.14 [ 5/43] test_bisect
0:01:01 load avg: 2.05 [ 5/43] test_bisect passed
0:01:01 load avg: 2.05 [ 6/43] test_bytes
0:03:54 load avg: 1.06 [ 6/43] test_bytes passed in 2 min 52 sec
0:03:54 load avg: 1.06 [ 7/43] test_bz2
0:04:02 load avg: 1.05 [ 7/43] test_bz2 passed
0:04:02 load avg: 1.05 [ 8/43] test_cmath
0:04:08 load avg: 1.05 [ 8/43] test_cmath passed
0:04:08 load avg: 1.05 [ 9/43] test_codecs
0:04:36 load avg: 1.03 [ 9/43] test_codecs passed
0:04:36 load avg: 1.03 [10/43] test_collections
0:04:57 load avg: 1.02 [10/43] test_collections passed
0:04:57 load avg: 1.02 [11/43] test_complex
0:05:04 load avg: 1.02 [11/43] test_complex passed
0:05:04 load avg: 1.02 [12/43] test_dataclasses
0:05:24 load avg: 1.01 [12/43] test_dataclasses passed
0:05:24 load avg: 1.01 [13/43] test_datetime
0:06:34 load avg: 1.48 [13/43] test_datetime passed in 1 min 9 sec
0:06:34 load avg: 1.48 [14/43] test_decimal
0:07:57 load avg: 1.30 [14/43] test_decimal passed in 1 min 23 sec
0:07:57 load avg: 1.30 [15/43] test_difflib
0:08:21 load avg: 1.20 [15/43] test_difflib passed
0:08:21 load avg: 1.20 [16/43] test_float
0:08:29 load avg: 1.18 [16/43] test_float passed
0:08:29 load avg: 1.18 [17/43] test_fstring
0:08:49 load avg: 1.13 [17/43] test_fstring passed
0:08:49 load avg: 1.13 [18/43] test_functools
0:09:54 load avg: 1.04 [18/43] test_functools passed in 1 min 5 sec
0:09:54 load avg: 1.04 [19/43] test_generators
0:10:11 load avg: 1.03 [19/43] test_generators passed
0:10:11 load avg: 1.03 [20/43] test_hashlib
0:10:22 load avg: 1.02 [20/43] test_hashlib passed
0:10:22 load avg: 1.02 [21/43] test_heapq
0:10:44 load avg: 1.02 [21/43] test_heapq passed
0:10:44 load avg: 1.02 [22/43] test_int
0:11:04 load avg: 1.01 [22/43] test_int passed
0:11:04 load avg: 1.01 [23/43] test_itertools
0:13:02 load avg: 1.00 [23/43] test_itertools passed in 1 min 57 sec
0:13:02 load avg: 1.00 [24/43] test_json
0:13:34 load avg: 1.00 [24/43] test_json passed in 31.9 sec
0:13:34 load avg: 1.00 [25/43] test_long
0:14:28 load avg: 1.00 [25/43] test_long passed in 54.1 sec
0:14:28 load avg: 1.00 [26/43] test_lzma
0:14:34 load avg: 1.00 [26/43] test_lzma passed
0:14:34 load avg: 1.00 [27/43] test_math
0:15:24 load avg: 1.00 [27/43] test_math passed in 50.3 sec
0:15:24 load avg: 1.00 [28/43] test_memoryview
0:15:59 load avg: 1.00 [28/43] test_memoryview passed in 34.2 sec
0:15:59 load avg: 1.00 [29/43] test_operator
0:16:04 load avg: 1.00 [29/43] test_operator passed
0:16:04 load avg: 1.00 [30/43] test_ordered_dict
0:16:56 load avg: 1.00 [30/43] test_ordered_dict passed in 51.8 sec
0:16:56 load avg: 1.00 [31/43] test_patma
0:17:04 load avg: 1.00 [31/43] test_patma passed
0:17:04 load avg: 1.00 [32/43] test_pickle
0:19:37 load avg: 1.00 [32/43] test_pickle passed in 2 min 32 sec
0:19:37 load avg: 1.00 [33/43] test_pprint
0:19:45 load avg: 1.00 [33/43] test_pprint passed
0:19:45 load avg: 1.00 [34/43] test_re
0:20:10 load avg: 1.08 [34/43] test_re passed
0:20:10 load avg: 1.08 [35/43] test_set
0:22:25 load avg: 1.01 [35/43] test_set passed in 2 min 15 sec
0:22:25 load avg: 1.01 [36/43] test_sqlite3
0:22:46 load avg: 1.00 [36/43] test_sqlite3 passed
0:22:46 load avg: 1.00 [37/43] test_statistics
0:23:56 load avg: 1.00 [37/43] test_statistics passed in 1 min 10 sec
0:23:56 load avg: 1.00 [38/43] test_str
0:25:29 load avg: 1.00 [38/43] test_str passed in 1 min 32 sec
0:25:29 load avg: 1.00 [39/43] test_struct
0:25:47 load avg: 1.00 [39/43] test_struct passed
0:25:47 load avg: 1.00 [40/43] test_tabnanny
0:25:58 load avg: 1.00 [40/43] test_tabnanny passed
0:25:58 load avg: 1.00 [41/43] test_time
0:26:08 load avg: 1.00 [41/43] test_time passed
0:26:08 load avg: 1.00 [42/43] test_xml_etree
0:26:46 load avg: 1.00 [42/43] test_xml_etree passed in 38.3 sec
0:26:46 load avg: 1.00 [43/43] test_xml_etree_c
0:27:36 load avg: 1.00 [43/43] test_xml_etree_c passed in 50.1 sec

Total duration: 27 min 36 sec
Total tests: run=9,827 skipped=260
Total test files: run=43/43
Result: SUCCESS
```

Example slow run on Intel Xeon:

```console
+ make -j 8 EXTRA_CFLAGS=-g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/usr/src/python=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-z,relro -Wl,--strip-all
...
# Next, run the profile task to generate the profile information.
LD_LIBRARY_PATH=/usr/src/python ./python -m test --pgo --timeout=
Using random seed: 2791744646
0:00:00 load avg: 1.03 Run 43 tests sequentially in a single process
0:00:00 load avg: 1.03 [ 1/43] test_array
0:22:04 load avg: 1.00 [ 1/43] test_array passed in 22 min 4 sec
0:22:04 load avg: 1.00 [ 2/43] test_base64
0:25:48 load avg: 1.00 [ 2/43] test_base64 passed in 3 min 43 sec
0:25:48 load avg: 1.00 [ 3/43] test_binascii
0:27:04 load avg: 1.00 [ 3/43] test_binascii passed in 1 min 15 sec
0:27:04 load avg: 1.00 [ 4/43] test_binop
0:28:11 load avg: 1.00 [ 4/43] test_binop passed in 1 min 7 sec
0:28:11 load avg: 1.00 [ 5/43] test_bisect
0:29:26 load avg: 1.00 [ 5/43] test_bisect passed in 1 min 14 sec
0:29:26 load avg: 1.00 [ 6/43] test_bytes
1:26:49 load avg: 1.00 [ 6/43] test_bytes passed in 57 min 22 sec
1:26:49 load avg: 1.00 [ 7/43] test_bz2
1:29:52 load avg: 1.00 [ 7/43] test_bz2 passed in 3 min 2 sec
1:29:52 load avg: 1.00 [ 8/43] test_cmath
1:31:42 load avg: 1.06 [ 8/43] test_cmath passed in 1 min 50 sec
1:31:42 load avg: 1.06 [ 9/43] test_codecs
1:38:25 load avg: 1.00 [ 9/43] test_codecs passed in 6 min 43 sec
1:38:25 load avg: 1.00 [10/43] test_collections
1:42:24 load avg: 1.00 [10/43] test_collections passed in 3 min 58 sec
1:42:24 load avg: 1.00 [11/43] test_complex
1:47:13 load avg: 1.00 [11/43] test_complex passed in 4 min 49 sec
1:47:13 load avg: 1.00 [12/43] test_dataclasses
1:53:54 load avg: 1.00 [12/43] test_dataclasses passed in 6 min 40 sec
1:53:54 load avg: 1.00 [13/43] test_datetime
2:15:44 load avg: 1.03 [13/43] test_datetime passed in 21 min 50 sec
2:15:44 load avg: 1.03 [14/43] test_decimal
2:39:28 load avg: 1.00 [14/43] test_decimal passed in 23 min 44 sec
2:39:28 load avg: 1.00 [15/43] test_difflib
2:41:48 load avg: 1.00 [15/43] test_difflib passed in 2 min 19 sec
2:41:48 load avg: 1.00 [16/43] test_float
2:44:24 load avg: 1.00 [16/43] test_float passed in 2 min 36 sec
2:44:24 load avg: 1.00 [17/43] test_fstring
2:49:09 load avg: 1.05 [17/43] test_fstring passed in 4 min 45 sec
2:49:09 load avg: 1.05 [18/43] test_functools
3:07:18 load avg: 1.00 [18/43] test_functools passed in 18 min 8 sec
3:07:18 load avg: 1.00 [19/43] test_generators
3:14:05 load avg: 1.00 [19/43] test_generators passed in 6 min 47 sec
3:14:05 load avg: 1.00 [20/43] test_hashlib
3:17:40 load avg: 1.00 [20/43] test_hashlib passed in 3 min 34 sec
3:17:40 load avg: 1.00 [21/43] test_heapq
3:28:38 load avg: 1.00 [21/43] test_heapq passed in 10 min 58 sec
3:28:38 load avg: 1.00 [22/43] test_int
3:48:18 load avg: 1.00 [22/43] test_int passed in 19 min 39 sec
3:48:18 load avg: 1.00 [23/43] test_itertools
4:10:19 load avg: 1.00 [23/43] test_itertools passed in 22 min 1 sec
4:10:19 load avg: 1.00 [24/43] test_json
4:18:34 load avg: 1.00 [24/43] test_json passed in 8 min 14 sec
4:18:34 load avg: 1.00 [25/43] test_long
4:25:29 load avg: 1.00 [25/43] test_long passed in 6 min 54 sec
4:25:29 load avg: 1.00 [26/43] test_lzma
4:27:52 load avg: 1.00 [26/43] test_lzma passed in 2 min 22 sec
4:27:52 load avg: 1.00 [27/43] test_math
4:39:35 load avg: 1.00 [27/43] test_math passed in 11 min 42 sec
4:39:35 load avg: 1.00 [28/43] test_memoryview
4:53:47 load avg: 1.00 [28/43] test_memoryview passed in 14 min 12 sec
4:53:47 load avg: 1.00 [29/43] test_operator
4:55:55 load avg: 1.00 [29/43] test_operator passed in 2 min 7 sec
4:55:55 load avg: 1.00 [30/43] test_ordered_dict
5:19:04 load avg: 1.00 [30/43] test_ordered_dict passed in 23 min 8 sec
5:19:04 load avg: 1.00 [31/43] test_patma
5:21:45 load avg: 1.00 [31/43] test_patma passed in 2 min 41 sec
5:21:45 load avg: 1.00 [32/43] test_pickle
6:29:32 load avg: 1.00 [32/43] test_pickle passed in 1 hour 7 min
6:29:32 load avg: 1.00 [33/43] test_pprint
6:33:25 load avg: 1.01 [33/43] test_pprint passed in 3 min 52 sec
6:33:25 load avg: 1.01 [34/43] test_re
6:45:49 load avg: 1.03 [34/43] test_re passed in 12 min 24 sec
6:45:49 load avg: 1.03 [35/43] test_set
7:19:18 load avg: 1.00 [35/43] test_set passed in 33 min 28 sec
7:19:18 load avg: 1.00 [36/43] test_sqlite3
test_sqlite3: testing with SQLite version 3.46.1
...
test_ctx_mgr_rollback_if_commit_failed (test.test_sqlite3.test_dbapi.MultiprocessTests.test_ctx_mgr_rollback_if_commit_failed) ... ERROR
...

======================================================================
ERROR: test_ctx_mgr_rollback_if_commit_failed (test.test_sqlite3.test_dbapi.MultiprocessTests.test_ctx_mgr_rollback_if_commit_failed)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/python/Lib/test/test_sqlite3/test_dbapi.py", line 1972, in test_ctx_mgr_rollback_if_commit_failed
    proc.communicate(input="end", timeout=SHORT_TIMEOUT)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/subprocess.py", line 1221, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/subprocess.py", line 2154, in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/python/Lib/subprocess.py", line 1268, in _check_timeout
    raise TimeoutExpired(
    ...<2 lines>...
            stderr=b''.join(stderr_seq) if stderr_seq else None)
subprocess.TimeoutExpired: Command '['/usr/src/python/python', '-c', 'if 1:\n            import sqlite3\n            def wait():\n                print("started")\n                assert "database is locked" in input()\n\n            cx = sqlite3.connect("@test_14264_tmpæ", timeout=0)\n            cx.create_function("wait", 0, wait)\n            with cx:\n                cx.execute("create table t(t)")\n            try:\n                # execute two transactions; both will try to lock the db\n                cx.executescript(\'\'\'\n                    -- start a transaction and wait for parent\n                    begin transaction;\n                    select * from t;\n                    select wait();\n                    rollback;\n\n                    -- start a new transaction; would fail if parent holds lock\n                    begin transaction;\n                    select * from t;\n                    rollback;\n                \'\'\')\n            finally:\n                cx.close()\n        ']' timed out after 30.0 seconds

----------------------------------------------------------------------
Ran 503 tests in 385.278s

FAILED (errors=1, skipped=4)
test test_sqlite3 failed
7:28:56 load avg: 1.00 [36/43] test_sqlite3 failed (1 error) in 9 min 38 sec
7:28:56 load avg: 1.00 [37/43] test_statistics
7:56:12 load avg: 1.00 [37/43] test_statistics passed in 27 min 15 sec
7:56:12 load avg: 1.00 [38/43] test_str
8:30:31 load avg: 1.00 [38/43] test_str passed in 34 min 19 sec
8:30:31 load avg: 1.00 [39/43] test_struct
8:38:50 load avg: 1.01 [39/43] test_struct passed in 8 min 18 sec
8:38:50 load avg: 1.01 [40/43] test_tabnanny
8:43:37 load avg: 1.00 [40/43] test_tabnanny passed in 4 min 46 sec
8:43:37 load avg: 1.00 [41/43] test_time
8:46:45 load avg: 0.99 [41/43] test_time passed in 3 min 8 sec
8:46:45 load avg: 0.99 [42/43] test_xml_etree
9:03:20 load avg: 1.04 [42/43] test_xml_etree passed in 16 min 35 sec
9:03:20 load avg: 1.04 [43/43] test_xml_etree_c
9:27:26 load avg: 1.00 [43/43] test_xml_etree_c passed in 24 min 5 sec

Total duration: 9 hour 27 min
Total tests: run=9,827 skipped=260
Total test files: run=43/43 failed=1
Result: FAILURE
```

The sqlite random test failure is likely irrelevant, but kept for completeness.

----

Aborted full build logs (auto cancelled after 3 hours, but you can see the slow tests at the end):
 - 3.15.0a8 (Debian Bookworm): https://doi-janky.infosiftr.net/job/meta/view/builds/job/i386/job/build/16353/consoleFull
 - 3.14.4: https://doi-janky.infosiftr.net/job/meta/view/builds/job/i386/job/build/16357/consoleFull
 - 3.13.13: https://doi-janky.infosiftr.net/job/meta/view/builds/job/i386/job/build/16338/consoleFull


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Very slow tests during profile generation #148535

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

`--build` (cpu)	`3.14.3`	`3.14.4`
`x86_64-linux-gnu` (12th Gen Intel i9)	56.3 sec	2 min 32 sec
`i686-linux-gnu` (12th Gen Intel i9)	10 min 2 sec	28 min 19 sec
`x86_64-linux-gnu` (Intel Xeon Platinum 8488C)	1 min 8 sec	3 min 21 sec

`i686-linux-gnu` (Intel Xeon Platinum 8488C)	12 min	9 hour 27 min
`i686-linux-gnu` (AMD EPYC 9R14)		27 min 36 sec
`i686-linux-musl` (Intel Xeon Platinum 8488C)		27 hour 24 min

`aarch64-linux-gnu` (AWS Graviton3E)	1 min 35 sec	3 min 28 sec
`arm-linux-gnueabihf` (AWS Graviton3E)	2 min 53 sec	5 min 5 sec
`arm-linux-gnueabi` (AWS Graviton3E)	3 min 39 sec	3 min 43 sec *
`powerpc64le-linux-gnu` (Power9)	3 min 10 sec	12 min 10 sec
`s390x-linux-gnu` (IBM/S390)	2 min 5 sec	2 min 53 sec

Uh oh!

Very slow tests during profile generation #148535

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions