Skip to content

Sensible performance degradation in dpt.tensor.sum #1461

@antonwolfy

Description

@antonwolfy

After merging #1446, dpt.tensor.sum became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).
Before the PR:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+62.g2eba93eac'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.67 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.64 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The new times:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+63.g03fd73794'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Devices info:

$ python -m dpctl -f
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 LINUX
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Xeon(R) Platinum 8469 CPU @2.00GHz
        Version             2023.16.6.0.22_223734
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) OpenCL Graphics
    Version     OpenCL 3.0
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             23.35.27191.25
        Filter string       opencl:gpu:0
Platform  2 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2023.16.6.0.22_223734
        Filter string       opencl:accelerator:0
Platform  3 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             1.3.27191
        Filter string       level_zero:gpu:0

Host info:

$ uname -a
Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions