JIT: Accelerate Vector.Dot for all base types by saucecontrol · Pull Request #111853 · dotnet/runtime · GitHub
Skip to content

JIT: Accelerate Vector.Dot for all base types#111853

Merged
EgorBo merged 7 commits into
dotnet:mainfrom
saucecontrol:vdot
Mar 11, 2025
Merged

JIT: Accelerate Vector.Dot for all base types#111853
EgorBo merged 7 commits into
dotnet:mainfrom
saucecontrol:vdot

Conversation

@saucecontrol

@saucecontrol saucecontrol commented Jan 27, 2025

Copy link
Copy Markdown
Member

Resolves #85207

  • Replaces the SSE4.1 fallback for long vector multiply with a faster SSE2 version and removes restrictions on op_Multiply and MultiplyAddEstimate intrinsics since these can always be accelerated now.
  • Removes AVX2 requirement for Vector256.Sum to be treated as intrinsic (only AVX instructions are used).
  • Removes restrictions on byte and long types so that Dot can be treated as intrinsic for all types.
  • Adds Vector512.Dot as intrinsic.
     
    Diffs look good. The only regressions are due to inlining or the slightly larger (but faster) SSE2 multiply code.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 27, 2025
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jan 27, 2025
Comment thread src/coreclr/jit/gentree.cpp Outdated
@saucecontrol saucecontrol marked this pull request as ready for review January 27, 2025 20:27
@saucecontrol

saucecontrol commented Jan 27, 2025

Copy link
Copy Markdown
Member Author

@saucecontrol

Copy link
Copy Markdown
Member Author

@EgorBot -amd -intel --envvars DOTNET_EnableAVX512F:0

using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes;

using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

public unsafe class LongBench
{
    private const int nitems = 1 << 10;
    private long* data;

    [GlobalSetup]
    public void Setup()
    {
        const int len = sizeof(long) * nitems;
        data = (long*)NativeMemory.AlignedAlloc(len, 64);
        Random.Shared.NextBytes(new Span<byte>(data, len));
    }

    [Benchmark]
    public Vector128<long> Multiply128()
    {
        long* ptr = data, end = ptr + nitems - Vector128<long>.Count;
        var res = Vector128<long>.Zero;

        while (ptr < end)
        {
            res ^= Vector128.LoadAligned(ptr) * Vector128.LoadAligned(ptr + Vector128<long>.Count);
            ptr += Vector128<long>.Count;
        }

        return res;
    }

    [Benchmark]
    public Vector256<long> Multiply256()
    {
        long* ptr = data, end = ptr + nitems - Vector256<long>.Count;
        var res = Vector256<long>.Zero;

        while (ptr < end)
        {
            res ^= Vector256.LoadAligned(ptr) * Vector256.LoadAligned(ptr + Vector256<long>.Count);
            ptr += Vector256<long>.Count;
        }

        return res;
    }

    [Benchmark]
    public Vector<long> MultiplyVectorT()
    {
        long* ptr = data, end = ptr + nitems - Vector<long>.Count;
        var res = Vector<long>.Zero;

        while (ptr < end)
        {
            res ^= Vector.Load(ptr) * Vector.Load(ptr + Vector256<long>.Count);
            ptr += Vector<long>.Count;
        }

        return res;
    }
}

@saucecontrol

Copy link
Copy Markdown
Member Author

cc @EgorBo I believe you were the last to touch most of this

@EgorBo EgorBo self-requested a review March 10, 2025 19:43
@EgorBo

EgorBo commented Mar 11, 2025

Copy link
Copy Markdown
Member

/azp run Fuzzlyn, runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-avx512

@azure-pipelines

Copy link
Copy Markdown

@EgorBo EgorBo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@EgorBo EgorBo merged commit f565711 into dotnet:main Mar 11, 2025
@saucecontrol saucecontrol deleted the vdot branch March 11, 2025 20:59
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finish Avx512 specific lightup for Vector128/256/512<T>

3 participants