@@ -102,6 +102,78 @@ Returns the unique group ID.
102102"""
103103function get_group_id end
104104
105+ """
106+ get_sub_group_size()::UInt32
107+
108+ Returns the number of work-items in the sub-group.
109+
110+ !!! note
111+ Backend implementations **must** implement:
112+ ```
113+ @device_override get_sub_group_size()::UInt32
114+ ```
115+ """
116+ function get_sub_group_size end
117+
118+ """
119+ get_max_sub_group_size()::UInt32
120+
121+ Returns the maximum sub-group size for sub-groups in the current workgroup.
122+
123+ !!! note
124+ Backend implementations **must** implement:
125+ ```
126+ @device_override get_max_sub_group_size()::UInt32
127+ ```
128+ """
129+ function get_max_sub_group_size end
130+
131+ """
132+ get_num_sub_groups()::UInt32
133+
134+ Returns the number of sub-groups in the current workgroup.
135+
136+ !!! note
137+ Backend implementations **must** implement:
138+ ```
139+ @device_override get_num_sub_groups()::UInt32
140+ ```
141+ """
142+ function get_num_sub_groups end
143+
144+ """
145+ get_sub_group_id()::UInt32
146+
147+ Returns the sub-group ID within the work-group.
148+
149+ !!! note
150+ 1-based.
151+
152+ !!! note
153+ Backend implementations **must** implement:
154+ ```
155+ @device_override get_sub_group_id()::UInt32
156+ ```
157+ """
158+ function get_sub_group_id end
159+
160+ """
161+ get_sub_group_local_id()::UInt32
162+
163+ Returns the work-item ID within the current sub-group.
164+
165+ !!! note
166+ 1-based.
167+
168+ !!! note
169+ Backend implementations **must** implement:
170+ ```
171+ @device_override get_sub_group_local_id()::UInt32
172+ ```
173+ """
174+ function get_sub_group_local_id end
175+
176+
105177"""
106178 localmemory(::Type{T}, dims)
107179
@@ -139,6 +211,29 @@ function barrier()
139211 error (" Group barrier used outside kernel or not captured" )
140212end
141213
214+ """
215+ sub_group_barrier()
216+
217+ After a `sub_group_barrier()` call, all read and writes to global and local memory
218+ from each thread in the sub-group are visible in from all other threads in the
219+ sub-group.
220+
221+ This does **not** guarantee that a write from a thread in a certain sub-group will
222+ be visible to a thread in a different sub-group.
223+
224+ !!! note
225+ `sub_group_barrier()` must be encountered by all workitems of a sub-group executing the kernel or by none at all.
226+
227+ !!! note
228+ Backend implementations **must** implement:
229+ ```
230+ @device_override sub_group_barrier()
231+ ```
232+ """
233+ function sub_group_barrier ()
234+ error (" Sub-group barrier used outside kernel or not captured" )
235+ end
236+
142237"""
143238 _print(args...)
144239
@@ -220,6 +315,22 @@ kernel launch with too big a workgroup is attempted.
220315"""
221316function max_work_group_size end
222317
318+ """
319+ sub_group_size(backend)::Int
320+
321+ Returns a reasonable sub-group size supported by the currently
322+ active device for the specified backend. This would typically
323+ be 32, or 64 for devices that don't support 32.
324+
325+ !!! note
326+ Backend implementations **must** implement:
327+ ```
328+ sub_group_size(backend::NewBackend)::Int
329+ ```
330+ As well as the on-device functionality.
331+ """
332+ function sub_group_size end
333+
223334"""
224335 multiprocessor_count(backend::NewBackend)::Int
225336
0 commit comments