Skip to content

IcoswISC240_WOA23_performance_test failing on Chicoma #882

@altheaden

Description

@altheaden

IcoswISC240_WOA23_performance_test is failing on Chicoma.
Test log:

compass calling: compass.ocean.tests.global_ocean.performance_test.PerformanceTest.run()
  inherited from: compass.testcase.TestCase.run()
  in /users/althea/code/compass/main/compass/testcase.py

compass calling: compass.run.serial._run_test()
  in /users/althea/code/compass/main/compass/run/serial.py

Running steps:
  prognostic_ice_shelf_melt
  data_ice_shelf_melt

  * step: prognostic_ice_shelf_melt

compass calling: compass.ocean.tests.global_ocean.forward.ForwardStep.runtime_setup()
  in /users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py

Warning: replacing namelist options in namelist.ocean
config_dt = 02:00:00
config_btr_dt = 00:06:00

compass calling: compass.ocean.tests.global_ocean.forward.ForwardStep.run()
  in /users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py

Warning: replacing namelist options in namelist.ocean
config_pio_num_iotasks = 1
config_pio_stride = 36
Running: gpmetis graph.info 36
******************************************************************************
METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
 (HEAD: , Built on: Jan  8 2025, 16:43:49)
 size of idx_t: 64bits, real_t: 64bits, idx_t *: 64bits

Graph Information -----------------------------------------------------------
 Name: graph.info, #Vertices: 7301, #Edges: 21002, #Parts: 36

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=1.030, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
 seed=-1, niter=10, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
 - Edgecut: 1446, communication volume: 1535.

 - Balance:
     constraint #0:  1.026 out of 0.005

 - Most overweight partition:
     pid: 25, actual: 208, desired: 202, ratio: 1.03.

 - Subdomain connectivity: max: 6, min: 2, avg: 4.33

 - Each partition is contiguous.

Timing Information ----------------------------------------------------------
  I/O:          		   0.004 sec
  Partitioning: 		   0.016 sec   (METIS time)
  Reporting:    		   0.001 sec

Memory Information ----------------------------------------------------------
  Max memory used:		   1.575 MB
******************************************************************************

Running: srun -c 1 -N 1 -n 36 ./ocean_model -n namelist.ocean -s streams.ocean
PE 0: MPICH processor detected:
PE 0:   AMD Rome (23:49:0) (family:model:stepping)
MPI VERSION    : CRAY MPICH version 8.1.28.29 (ANL base 3.4a2)
MPI BUILD INFO : Wed Nov 15 20:57 2023 (git hash 1cde46f) (CH4)
PE 0: MPICH environment settings =====================================
PE 0:   MPICH_ENV_DISPLAY                              = 1
PE 0:   MPICH_VERSION_DISPLAY                          = 1
PE 0:   MPICH_ABORT_ON_ERROR                           = 0
PE 0:   MPICH_CPUMASK_DISPLAY                          = 0
PE 0:   MPICH_STATS_DISPLAY                            = 0
PE 0:   MPICH_RANK_REORDER_METHOD                      = 1
PE 0:   MPICH_RANK_REORDER_DISPLAY                     = 0
PE 0:   MPICH_MEMCPY_MEM_CHECK                         = 0
PE 0:   MPICH_USE_SYSTEM_MEMCPY                        = 0
PE 0:   MPICH_OPTIMIZED_MEMCPY                         = 1
PE 0:   MPICH_ALLOC_MEM_PG_SZ                          = 4096
PE 0:   MPICH_ALLOC_MEM_POLICY                         = PREFERRED
PE 0:   MPICH_ALLOC_MEM_AFFINITY                       = SYS_DEFAULT
PE 0:   MPICH_MALLOC_FALLBACK                          = 0
PE 0:   MPICH_MEM_DEBUG_FNAME                          = 
PE 0:   MPICH_INTERNAL_MEM_AFFINITY                    = SYS_DEFAULT
PE 0:   MPICH_NO_BUFFER_ALIAS_CHECK                    = 0
PE 0:   MPICH_COLL_SYNC                                = MPI_Bcast
PE 0:   MPICH_SINGLE_HOST_ENABLED                        = 1
PE 0:   MPICH_USE_PERSISTENT_TOPS                      = 0
PE 0:   MPICH_DISABLE_PERSISTENT_RECV_TOPS             = 0
PE 0:   MPICH_MAX_TOPS_COUNTERS                        = 0
PE 0:   MPICH_ENABLE_ACTIVE_WAIT                       = 0
PE 0: MPICH/RMA environment settings =================================
PE 0:   MPICH_RMA_MAX_PENDING                          = 128
PE 0:   MPICH_RMA_SHM_ACCUMULATE                       = 0
PE 0: MPICH/Dynamic Process Management environment settings ==========
PE 0:   MPICH_DPM_DIR                                  = 
PE 0:   MPICH_LOCAL_SPAWN_SERVER                       = 0
PE 0:   MPICH_SPAWN_USE_RANKPOOL                       = 0
PE 0: MPICH/SMP environment settings =================================
PE 0:   MPICH_SMP_SINGLE_COPY_MODE                     = XPMEM
PE 0:   MPICH_SMP_SINGLE_COPY_SIZE                     = 8192
PE 0:   MPICH_SHM_PROGRESS_MAX_BATCH_SIZE              = 8
PE 0: MPICH/COLLECTIVE environment settings ==========================
PE 0:   MPICH_COLL_OPT_OFF                             = 0
PE 0:   MPICH_BCAST_ONLY_TREE                          = 1
PE 0:   MPICH_BCAST_INTERNODE_RADIX                    = 4
PE 0:   MPICH_BCAST_INTRANODE_RADIX                    = 4
PE 0:   MPICH_ALLTOALL_SHORT_MSG                       = 64-512
PE 0:   MPICH_ALLTOALL_SYNC_FREQ                       = 1-24
PE 0:   MPICH_ALLTOALLV_THROTTLE                       = 8
PE 0:   MPICH_ALLGATHER_VSHORT_MSG                     = 1024-4096
PE 0:   MPICH_ALLGATHERV_VSHORT_MSG                    = 1024-4096
PE 0:   MPICH_GATHERV_SHORT_MSG                        = 131072
PE 0:   MPICH_GATHERV_MIN_COMM_SIZE                    = 64
PE 0:   MPICH_GATHERV_MAX_TMP_SIZE                     = 536870912
PE 0:   MPICH_GATHERV_SYNC_FREQ                        = 16
PE 0:   MPICH_IGATHERV_MIN_COMM_SIZE                   = 1000
PE 0:   MPICH_IGATHERV_SYNC_FREQ                       = 100
PE 0:   MPICH_IGATHERV_RAND_COMMSIZE                   = 2048
PE 0:   MPICH_IGATHERV_RAND_RECVLIST                   = 0
PE 0:   MPICH_SCATTERV_SHORT_MSG                       = 2048-8192
PE 0:   MPICH_SCATTERV_MIN_COMM_SIZE                   = 64
PE 0:   MPICH_SCATTERV_MAX_TMP_SIZE                    = 536870912
PE 0:   MPICH_SCATTERV_SYNC_FREQ                       = 16
PE 0:   MPICH_SCATTERV_SYNCHRONOUS                     = 0
PE 0:   MPICH_ALLREDUCE_MAX_SMP_SIZE                   = 262144
PE 0:   MPICH_ALLREDUCE_BLK_SIZE                       = 716800
PE 0:   MPICH_GPU_ALLGATHER_VSHORT_MSG_ALGORITHM       = 1
PE 0:   MPICH_GPU_ALLREDUCE_USE_KERNEL                 = 0
PE 0:   MPICH_GPU_COLL_STAGING_BUF_SIZE                = 1048576
PE 0:   MPICH_GPU_ALLREDUCE_STAGING_THRESHOLD          = 256
PE 0:   MPICH_ALLREDUCE_NO_SMP                         = 0
PE 0:   MPICH_REDUCE_NO_SMP                            = 0
PE 0:   MPICH_REDUCE_SCATTER_COMMUTATIVE_LONG_MSG_SIZE = 524288
PE 0:   MPICH_REDUCE_SCATTER_MAX_COMMSIZE              = 1000
PE 0:   MPICH_SHARED_MEM_COLL_OPT                      = 1
PE 0:   MPICH_SHARED_MEM_COLL_NCELLS                   = 8
PE 0:   MPICH_SHARED_MEM_COLL_CELLSZ                   = 256
PE 0: MPICH MPIIO environment settings ===============================
PE 0:   MPICH_MPIIO_HINTS_DISPLAY                      = 0
PE 0:   MPICH_MPIIO_HINTS                              = NULL
PE 0:   MPICH_MPIIO_ABORT_ON_RW_ERROR                  = disable
PE 0:   MPICH_MPIIO_CB_ALIGN                           = 2
PE 0:   MPICH_MPIIO_DVS_MAXNODES                       = -1
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_DISPLAY       = 0
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_STRIDE        = -1
PE 0:   MPICH_MPIIO_MAX_NUM_IRECV                      = 50
PE 0:   MPICH_MPIIO_MAX_NUM_ISEND                      = 50
PE 0:   MPICH_MPIIO_MAX_SIZE_ISEND                     = 10485760
PE 0:   MPICH_MPIIO_OFI_STARTUP_CONNECT                = disable
PE 0:   MPICH_MPIIO_OFI_STARTUP_NODES_AGGREGATOR        = 2
PE 0: MPICH MPIIO statistics environment settings ====================
PE 0:   MPICH_MPIIO_STATS                              = 0
PE 0:   MPICH_MPIIO_TIMERS                             = 0
PE 0:   MPICH_MPIIO_WRITE_EXIT_BARRIER                 = 1
PE 0: MPICH Thread Safety settings ===================================
PE 0:   MPICH_ASYNC_PROGRESS                           = 0
PE 0:   MPICH_OPT_THREAD_SYNC                          = 1
PE 0:   rank 0 required = funneled, was provided = funneled
MPICH ERROR [Rank 0] [job id 21208684.35] [Fri Jan 10 09:53:02 2025] [nid001265] - Abort(1734831948) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1734831948) - process 0

aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1734831948) - process 0
srun: error: nid001265: task 0: Exited with exit code 255
srun: Terminating StepId=21208684.35
slurmstepd: error: *** STEP 21208684.35 ON nid001265 CANCELLED AT 2025-01-10T09:53:02 ***
srun: error: nid001265: tasks 1-35: Terminated
srun: Force Terminated StepId=21208684.35

      Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
  File "/users/althea/code/compass/main/compass/run/serial.py", line 322, in _log_and_run_test
    _run_test(test_case, available_resources)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/althea/code/compass/main/compass/run/serial.py", line 419, in _run_test
    _run_step(test_case, step, test_case.new_step_log_file,
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              available_resources)
              ^^^^^^^^^^^^^^^^^^^^
  File "/users/althea/code/compass/main/compass/run/serial.py", line 470, in _run_step
    step.run()
    ~~~~~~~~^^
  File "/users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py", line 224, in run
    run_model(self, update_pio=update_pio)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/althea/code/compass/main/compass/model.py", line 60, in run_model
    run_command(args=args, cpus_per_task=cpus_per_task, ntasks=ntasks,
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                openmp_threads=openmp_threads, config=config, logger=logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/althea/code/compass/main/compass/parallel.py", line 149, in run_command
    check_call(command_line_args, logger, env=env)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/althea/miniforge3/envs/dev_compass_1.7.0-alpha.1/lib/python3.13/site-packages/mpas_tools/logging.py", line 59, in check_call
    raise subprocess.CalledProcessError(process.returncode,
                                        print_args)
subprocess.CalledProcessError: Command 'srun -c 1 -N 1 -n 36 ./ocean_model -n namelist.ocean -s streams.ocean' returned non-zero exit status 143.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions