.NET 8 Update: Hardware Intrinsics by pCYSl5EDgo · Pull Request #1744 · MessagePack-CSharp/MessagePack-CSharp · GitHub
Skip to content

.NET 8 Update: Hardware Intrinsics#1744

Closed
pCYSl5EDgo wants to merge 28 commits into
MessagePack-CSharp:developfrom
pCYSl5EDgo:net8-hi
Closed

.NET 8 Update: Hardware Intrinsics#1744
pCYSl5EDgo wants to merge 28 commits into
MessagePack-CSharp:developfrom
pCYSl5EDgo:net8-hi

Conversation

@pCYSl5EDgo

@pCYSl5EDgo pCYSl5EDgo commented Jan 24, 2024

Copy link
Copy Markdown
Contributor

This is a follow up pull request of #988.

Goals

Improve (U?Int16|32|64)|Single|Double|BooleanArrayFormatter with SIMD instruction and make them about twice as fast in .NET 8.
List<T>, ArraySegment<T> and (ReadOnly)?Memory<T>s can also be accelerated by this.

Without SIMD in .NET 6, the performance is generally improved by this proposal.

History

3 years have passed and .NET 7 introduced many convenient SIMD Hardware Intrinsics as I explained in this Japanese article.

SIMD Intrinsics between .NET Core3.1 and .NET 6 required fixed statement and unsafe pointer operation.
In .NET 7, Vector.LoadUnsafe(ref T source) emerged which requires reference of type T.
No, well, you end up having to go through the pseudo-pointer operations with the Unsafe class, but it is an advantage that fixed statement is unnecessary.

.NET 7 also added a lot of crossplatform SIMD instructions. There is no need to write a lot of platform dependent branches any more!

Changes

Finally, I am now able to write code that is (and I hope (must be)) more understandable to others than it was before.

I ran BenchmarkDotnet on my machine and found that this SIMD improvement performed about the same as the previous implementation on short arrays, which SIMD does not do well, and 2 to 5 times better on long arrays, which SIMD does well.

Annotation

This Draft Pull Request is for performance measurement and is not intended to be actually merged.
I will prepare a clean commit log Pull Request when you give the go-ahead.

@pCYSl5EDgo

pCYSl5EDgo commented Jan 24, 2024

Copy link
Copy Markdown
Contributor Author

@pCYSl5EDgo pCYSl5EDgo changed the title .NET 8 Update: .NET 8 Update: Hardware Intrinsics Jan 24, 2024
@pCYSl5EDgo pCYSl5EDgo mentioned this pull request Jan 24, 2024
14 tasks
Comment thread src/MessagePack.Experimental/MessagePack.Experimental.csproj

@AArnott AArnott left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prepare a clean commit log Pull Request when you give the go-ahead.

I'll wait for this.

@pCYSl5EDgo

pCYSl5EDgo commented Jan 25, 2024

Copy link
Copy Markdown
Contributor Author

I've rewrited formatters in C# 9 and moved them to in main MessagePack directory because this PR aims to improve default ArrayFormatters.

Unsafe.AsRef is silently changed from ref T AsRef<T> (scoped ref T source) to ref T AsRef<T> (scoped ref readonly T source). In .NET 8, it requires C#12.

I found this PR is dependent on #1734.

SIMD does not support integer division and 64bit integer multiplication
So DateTime serialization is very poor
@pCYSl5EDgo

Copy link
Copy Markdown
Contributor Author

This comment is written for later developers and describes why DateTimeArrayFormatter does not use SIMD.

DateTimeArrayFormatter is now added but this formatter does not utilize any SIMD functions at all since SIMD doesn't have any ability of integer div/mod operation.
In addition, until AVX512 (which can be used in Zen4 machine), there are api for no 64bit integer multiplication.

Reference SharpLab assembly code.

DateTime serialize without SIMD benchmark report (x2 in both)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 75.78 ns 7.245 ns 74.02 ns 1.00
Simd Scalar 0 67.42 ns 2.760 ns 67.27 ns 0.88
Old Vector 0 66.37 ns 4.844 ns 64.94 ns 0.88
Simd Vector 0 63.57 ns 1.288 ns 63.37 ns 0.80
Old Scalar 1 rand 75.78 ns 2.390 ns 75.85 ns 1.00
Simd Scalar 1 rand 73.43 ns 3.087 ns 72.82 ns 0.97
Old Vector 1 rand 75.10 ns 1.925 ns 74.83 ns 0.98
Simd Vector 1 rand 95.68 ns 2.165 ns 95.50 ns 1.25
Old Scalar 1 utc 74.75 ns 1.509 ns 75.14 ns 1.00
Simd Scalar 1 utc 74.45 ns 1.426 ns 74.37 ns 1.00
Old Vector 1 utc 70.76 ns 1.946 ns 69.86 ns 0.95
Simd Vector 1 utc 71.09 ns 1.399 ns 70.95 ns 0.95
Old Scalar 3 rand 141.65 ns 19.168 ns 140.26 ns 1.00
Simd Scalar 3 rand 174.33 ns 11.602 ns 171.55 ns 1.24
Old Vector 3 rand 139.78 ns 4.395 ns 139.68 ns 1.01
Simd Vector 3 rand 148.41 ns 12.839 ns 144.56 ns 1.06
Old Scalar 3 utc 102.32 ns 11.200 ns 97.79 ns 1.00
Simd Scalar 3 utc 85.21 ns 5.066 ns 83.90 ns 0.84
Old Vector 3 utc 95.13 ns 6.573 ns 93.21 ns 0.94
Simd Vector 3 utc 81.06 ns 2.696 ns 81.77 ns 0.75
Old Scalar 8 rand 264.54 ns 13.817 ns 259.70 ns 1.00
Simd Scalar 8 rand 255.52 ns 9.032 ns 256.24 ns 0.95
Old Vector 8 rand 258.59 ns 8.122 ns 259.58 ns 0.95
Simd Vector 8 rand 182.73 ns 1.917 ns 182.52 ns 0.68
Old Scalar 8 utc 131.87 ns 2.304 ns 131.94 ns 1.00
Simd Scalar 8 utc 107.83 ns 2.043 ns 108.59 ns 0.82
Old Vector 8 utc 146.33 ns 3.718 ns 147.20 ns 1.10
Simd Vector 8 utc 106.37 ns 0.322 ns 106.37 ns 0.81
Old Scalar 16 rand 379.41 ns 7.893 ns 379.08 ns 1.00
Simd Scalar 16 rand 402.33 ns 14.135 ns 401.98 ns 1.06
Old Vector 16 rand 452.44 ns 11.477 ns 448.50 ns 1.19
Simd Vector 16 rand 336.69 ns 11.269 ns 334.30 ns 0.89
Old Scalar 16 utc 214.99 ns 6.397 ns 212.61 ns 1.00
Simd Scalar 16 utc 151.15 ns 4.020 ns 152.62 ns 0.70
Old Vector 16 utc 270.33 ns 36.246 ns 256.22 ns 1.25
Simd Vector 16 utc 157.51 ns 3.501 ns 156.49 ns 0.73
Old Scalar 31 rand 719.14 ns 3.350 ns 718.38 ns 1.00
Simd Scalar 31 rand 682.40 ns 6.484 ns 681.68 ns 0.95
Old Vector 31 rand 804.97 ns 3.056 ns 804.61 ns 1.12
Simd Vector 31 rand 778.17 ns 57.820 ns 765.90 ns 1.11
Old Scalar 31 utc 419.78 ns 6.257 ns 417.69 ns 1.00
Simd Scalar 31 utc 300.25 ns 14.616 ns 297.95 ns 0.76
Old Vector 31 utc 449.02 ns 12.661 ns 450.35 ns 1.07
Simd Vector 31 utc 307.18 ns 8.844 ns 304.63 ns 0.74
Old Scalar 64 rand 1,705.32 ns 59.621 ns 1,681.02 ns 1.00
Simd Scalar 64 rand 1,398.57 ns 29.381 ns 1,397.54 ns 0.81
Old Vector 64 rand 1,696.00 ns 34.492 ns 1,683.47 ns 0.98
Simd Vector 64 rand 1,077.99 ns 4.281 ns 1,078.17 ns 0.63
Old Scalar 64 utc 809.06 ns 18.949 ns 815.92 ns 1.00
Simd Scalar 64 utc 498.54 ns 21.903 ns 495.30 ns 0.63
Old Vector 64 utc 886.79 ns 12.981 ns 888.45 ns 1.09
Simd Vector 64 utc 564.98 ns 82.444 ns 533.36 ns 0.65
Old Scalar 4096 rand 125,547.24 ns 1,213.114 ns 125,271.29 ns 1.00
Simd Scalar 4096 rand 99,558.42 ns 1,035.725 ns 99,378.20 ns 0.79
Old Vector 4096 rand 122,202.52 ns 631.395 ns 122,173.18 ns 0.97
Simd Vector 4096 rand 100,926.67 ns 1,528.048 ns 100,785.58 ns 0.80
Old Scalar 4096 utc 62,545.58 ns 721.018 ns 62,293.51 ns 1.00
Simd Scalar 4096 utc 38,460.85 ns 1,240.379 ns 37,988.86 ns 0.63
Old Vector 4096 utc 65,894.73 ns 795.250 ns 65,698.10 ns 1.05
Simd Vector 4096 utc 35,719.59 ns 445.582 ns 35,522.50 ns 0.57
Old Scalar 4194304 rand 141,690,505.77 ns 1,478,730.033 ns 141,738,825.00 ns 1.00
Simd Scalar 4194304 rand 112,130,300.00 ns 1,334,293.852 ns 112,089,200.00 ns 0.79
Old Vector 4194304 rand 141,831,608.33 ns 381,558.216 ns 141,931,612.50 ns 1.00
Simd Vector 4194304 rand 113,438,212.00 ns 1,037,760.445 ns 113,063,760.00 ns 0.80
Old Scalar 4194304 utc 86,847,380.92 ns 4,397,553.959 ns 84,739,441.67 ns 1.00
Simd Scalar 4194304 utc 43,334,727.78 ns 867,568.775 ns 42,976,520.83 ns 0.50
Old Vector 4194304 utc 87,281,693.71 ns 2,851,344.404 ns 86,920,320.00 ns 0.99
Simd Vector 4194304 utc 44,852,673.57 ns 371,590.583 ns 44,774,720.00 ns 0.52

@AArnott

AArnott commented Mar 31, 2024

Copy link
Copy Markdown
Collaborator

@pCYSl5EDgo I haven't merged this as it's still marked Draft. I'm curious what your intention is for this PR going forward.

@pCYSl5EDgo

pCYSl5EDgo commented Apr 1, 2024

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants