Skip to content

Commit 787603e

Browse files
Wauplincursoragent
andauthored
feat: pass skip_sha256=True to hf_xet for bucket uploads (#3900)
* feat: pass skip_sha256=True to hf_xet for bucket uploads Bucket uploads don't need SHA-256 in the shard metadata (the sha_index GSI is only used for LFS pointer resolution, which doesn't apply to buckets). Pass skip_sha256=True to hf_xet.upload_files() and upload_bytes() in the bucket upload path to skip the SHA-256 computation, removing the main CPU bottleneck on non-SHA-NI instances. Depends on: huggingface/xet-core#679 Co-authored-by: Lucain <Wauplin@users.noreply.github.com> * test: use real bucket upload instead of mocks for skip_sha256 test Replace the two mock-based tests with a single integration test that: - Creates a real Bucket on staging Hub - Uploads files from both filepath and bytes in a single batch - Wraps (not mocks) hf_xet.upload_files and hf_xet.upload_bytes to verify skip_sha256=True is passed - Verifies files are actually uploaded by listing the bucket tree Co-authored-by: Lucain <Wauplin@users.noreply.github.com> * test: skip skip_sha256 test when hf_xet doesn't support it yet The test wraps the real hf_xet functions, so it fails when the installed hf_xet predates the skip_sha256 parameter (xet-core#679). Use inspect.signature to detect support and pytest.skip accordingly. Co-authored-by: Lucain <Wauplin@users.noreply.github.com> * test: handle built-in functions in skip_sha256 signature check hf_xet.upload_files is a compiled built-in function, so inspect.signature() raises ValueError. Catch it and skip the test when the signature can't be introspected (older hf_xet). Co-authored-by: Lucain <Wauplin@users.noreply.github.com> * fix: gracefully fall back when hf_xet lacks skip_sha256 support Use try/except TypeError around upload_files/upload_bytes calls with skip_sha256=True, falling back to calls without it for older hf_xet versions. TypeError for unknown kwargs on compiled functions is raised before any I/O, so the fallback is safe. Update test to check call_args_list[0] (the first attempt always includes skip_sha256=True) instead of requiring the function to accept it. Co-authored-by: Lucain <Wauplin@users.noreply.github.com> * better like this --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
1 parent 72871b9 commit 787603e

2 files changed

Lines changed: 37 additions & 0 deletions

File tree

src/huggingface_hub/hf_api.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12213,6 +12213,7 @@ def token_refresher() -> tuple[str, int]:
1221312213
token_refresher,
1221412214
progress_callback,
1221512215
"bucket",
12216+
skip_sha256=True,
1221612217
)
1221712218
for upload_info, op in zip(xet_upload_infos, add_path_operations):
1221812219
op.xet_hash = upload_info.hash
@@ -12229,6 +12230,7 @@ def token_refresher() -> tuple[str, int]:
1222912230
token_refresher,
1223012231
progress_callback,
1223112232
"bucket",
12233+
skip_sha256=True,
1223212234
)
1223312235
for upload_info, op in zip(xet_upload_infos, add_bytes_operations):
1223412236
op.xet_hash = upload_info.hash

tests/test_xet_upload.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,41 @@ def test_upload_folder_create_pr(self, api, repo_url) -> None:
339339
assert Path(local_path).read_bytes() == Path(filepath).read_bytes()
340340

341341

342+
@requires("hf_xet")
343+
class TestBucketXetUploadSkipSha256:
344+
"""Test that bucket uploads pass skip_sha256=True to hf_xet."""
345+
346+
def test_skip_sha256_passed_for_bucket_uploads(self, api, tmp_path):
347+
"""Upload from both filepath and bytes to a real bucket, verifying skip_sha256=True is passed."""
348+
from hf_xet import upload_bytes as real_upload_bytes
349+
from hf_xet import upload_files as real_upload_files
350+
351+
bucket_url = api.create_bucket(repo_name(prefix="bucket"))
352+
bucket_id = bucket_url.bucket_id
353+
354+
test_file = tmp_path / "test_file.bin"
355+
test_file.write_bytes(b"file content for bucket test")
356+
357+
with patch("hf_xet.upload_files", wraps=real_upload_files) as spy_upload_files:
358+
with patch("hf_xet.upload_bytes", wraps=real_upload_bytes) as spy_upload_bytes:
359+
api.batch_bucket_files(
360+
bucket_id,
361+
add=[
362+
(str(test_file), "from_path.bin"),
363+
(b"bytes content for bucket test", "from_bytes.bin"),
364+
],
365+
)
366+
367+
assert spy_upload_files.call_args_list[0].kwargs.get("skip_sha256") is True
368+
assert spy_upload_bytes.call_args_list[0].kwargs.get("skip_sha256") is True
369+
370+
uploaded = {e.path for e in api.list_bucket_tree(bucket_id)}
371+
assert "from_path.bin" in uploaded
372+
assert "from_bytes.bin" in uploaded
373+
374+
api.delete_bucket(bucket_id)
375+
376+
342377
@requires("hf_xet")
343378
class TestXetLargeUpload:
344379
def test_upload_large_folder(self, api, tmp_path, repo_url: RepoUrl) -> None:

0 commit comments

Comments
 (0)