Skip to content

Segmentation fault after successful transfer #3

@drewhemm

Description

@drewhemm

Output on server:

# hdrdmacp -s -n 32 -m 8GB
Looking for IB devices ...

=============================================
Found 1 devices
---------------------------------------------
   device 0 : mlx4_0 : uverbs0 : IB : InfiniBand channel adapter : Num. ports=2 : port num=1 : lid=2
=============================================

Device mlx4_0 opened. num_comp_vectors=32
Port attributes:
           state: 4
         max_mtu: 5
      active_mtu: 5
  port_cap_flags: 38865002
      max_msg_sz: 1073741824
    active_width: 2
    active_speed: 4
      phys_state: 5
      link_layer: 1
buff_len_GB: 8
num_buff_sections: 32
We got this far...
Created 32 buffers of 250MB (8GB total)
Listening for connections on port ... 10470
=== [10 sec avg.] 0 GB/s  --  0 TB total received
=== [10 sec avg.] 0 GB/s  --  0 TB total received
=== [10 sec avg.] 0 GB/s  --  0 TB total received
Receiving file: /root/windows.iso
hi->flags: 0x1


Message from syslogd@HOSTNAME at May  9 16:45:46 ...
 kernel:[15768.841039] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
^C^C^C # tried to exit the program here, but to no avail
Message from syslogd@HOSTNAME at May  9 16:46:14 ...
 kernel:[15796.840997] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
Segmentation fault

Output on client:

# ./hdrdmacp windows.iso 192.168.19.1:/root/windows.iso
Looking for IB devices ...

=============================================
Found 1 devices
---------------------------------------------
   device 0 : mlx4_0 : uverbs0 : IB : InfiniBand channel adapter : Num. ports=2 : port num=1 : lid=1
=============================================

Device mlx4_0 opened. num_comp_vectors=96
Port attributes:
           state: 4
         max_mtu: 5
      active_mtu: 5
  port_cap_flags: 38865000
      max_msg_sz: 1073741824
    active_width: 2
    active_speed: 4
      phys_state: 5
      link_layer: 1
Created 4 buffers of 250MB (1GB total)
IP address: 192.168.19.1 (192.168.19.1)
Connected to 192.168.19.1:10470
Sending file: windows.iso-> (192.168.19.1:)/root/windows.iso   (5.50971 GB)
  queued 9MB (5509/5509 MB -- 100%  - 11.3267 Gbps)   ps)
  waiting for final 1 transfers to complete ...
  Transferred 5.50971 GB in 2.71587 sec  (16.2297 Gbps)
  I/O rate reading from file: 1.65955 sec  (26.56 Gbps)

Transfer from the client side looked good and I checked that the file size and md5sum of the destination file matched the source. If I can find a solution for the seg fault, I'll be a happy man!

Looks like the ib connection itself, as seen by opensm was interrupted:

# opensm
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048
Command Line Arguments:
 Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048

Using default GUID 0x2c903004bfc0b
Entering DISCOVERING state

Entering MASTER state


Message from syslogd@HOSTNAME at May  9 16:45:46 ...
 kernel:[15768.841039] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]

Message from syslogd@HOSTNAME at May  9 16:46:14 ...
 kernel:[15796.840997] watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [hdrdmacp:130454]
SM port is down

Entering DISCOVERING state

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions