Skip to content

[NFC] Inline more core HeapType methods#8681

Merged
kripken merged 2 commits intoWebAssembly:mainfrom
kripken:inline.type.moar
May 8, 2026
Merged

[NFC] Inline more core HeapType methods#8681
kripken merged 2 commits intoWebAssembly:mainfrom
kripken:inline.type.moar

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented May 7, 2026

This makes Unsubtyping over 2x faster on a large Dart testcase. This was the
slowest pass there by far. On -Os this saves 5.5% of total time.

I am not totally happy about moving this code into headers, but the benefit
is so big that I think it makes sense, unless we can think of an alternative? If
it helps, the key slowdowns this fixes are HeapType::getShared() took
33% of total runtime in Unsubtyping (!) and HeapType::getKind() took 6%.

@kripken kripken requested a review from tlively May 7, 2026 18:35
@kripken kripken requested a review from a team as a code owner May 7, 2026 18:35
Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for such significant performance wins. A couple other things that might be interesting to try: 1) LTO, 2) putting just the fast paths that check bit patterns in the header and leaving everything that touches the HeapTypeInfo in the .cpp file.

@tlively
Copy link
Copy Markdown
Member

tlively commented May 7, 2026

We could also split some of this stuff into a separate wasm-type-impl.h header, but I'm not sure that gets us very much.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2026

I suspect LTO might help here, yeah. Though the hard part would be getting LTO enabled in all our builds. Also, the more LTO differs from normal builds, the more we'd need to always remember to profile on LTO builds locally, which could be annoying.

Hmm, I'd rather not move only some methods into the header. There could be more bottlenecks we run into later. In fact this is a followup to the first set of methods we moved, so that first set was insufficient...

@tlively
Copy link
Copy Markdown
Member

tlively commented May 7, 2026

I don't mean moving some of these methods into the header and others not, I mean taking a method like this:

inline Shareability HeapType::getShared() const {
  if (isBasic()) {
    return (getID() & SharedMask) != 0 ? Shared : Unshared;
  }
  return getHeapTypeInfo(*this)->share;
}

And keeping this in the header:

inline Shareability HeapType::getShared() const {
  if (isBasic()) {
    return (getID() & SharedMask) != 0 ? Shared : Unshared;
  }
  return getSharedImpl();
}

And adding an additional private method Shareability getSharedImpl() const with this implementation in the .cpp file:

Shareability HeapType::getShared() const {
  return getHeapTypeInfo(*this)->share;
}

If we do that for all the methods, then the fast paths will remain inlineable and we would still have HeapTypeInfo hidden in the .cpp file. I expect this will get us most of the performance benefits because the optimizer won't generally be able to see through the load from the HeapTypeInfo anyway.

But if there's still a significant performance downside with that approach, then I agree the current change is the best choice.

@kripken
Copy link
Copy Markdown
Member Author

kripken commented May 7, 2026

Now I see your point, thanks. Ok, interesting, but this turns out not to work well. I used this diff:

diff --git a/src/wasm-type.h b/src/wasm-type.h
index 972688928..471a9cbc5 100644
--- a/src/wasm-type.h
+++ b/src/wasm-type.h
@@ -164,6 +164,7 @@ public:
   bool isShared() const;
 
   Shareability getShared() const;
+  Shareability getSharedCompound() const;
 
   // Check if the type is a given basic heap type, while ignoring whether it is
   // shared or not.
@@ -1206,7 +1207,7 @@ inline Shareability HeapType::getShared() const {
   if (isBasic()) {
     return (getID() & SharedMask) != 0 ? Shared : Unshared;
   }
-  return getHeapTypeInfo(*this)->share;
+  return getSharedCompound();
 }
 
 inline bool HeapType::isBottom() const {
diff --git a/src/wasm/wasm-type.cpp b/src/wasm/wasm-type.cpp
index ae4983daa..be4c2661f 100644
--- a/src/wasm/wasm-type.cpp
+++ b/src/wasm/wasm-type.cpp
@@ -1345,6 +1345,10 @@ FeatureSet HeapType::getFeatures() const {
   return collector.feats;
 }
 
+Shareability HeapType::getSharedCompound() const {
+  return getHeapTypeInfo(*this)->share;
+}
+
 HeapType RecGroup::Iterator::operator*() const {
   if (parent->id & 1) {
     // This is a trivial recursion group. Mask off the low bit to recover the

That does the idea for getShared() which was the by-far highest in the profile. With this diff, most of the benefit of this PR goes away: before the PR we have 18.9 seconds, with the PR 8.4, and with the PR + this diff, 14.3.

I believe the reason is that the call overhead is significant. That is, getHeapTypeInfo(*this)->share is extremely fast. Adding a call on top of it is the source of the slowness.

@tlively
Copy link
Copy Markdown
Member

tlively commented May 7, 2026

Interesting, thanks for checking. LGTM as-is.

@kripken kripken merged commit 3180c6f into WebAssembly:main May 8, 2026
16 checks passed
@kripken kripken deleted the inline.type.moar branch May 8, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants