Text attributes data truncation#893
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughTEXT-family byte limits are centralized in ChangesByte-safe TEXT validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/unit/Validator/StructureTest.php (2)
412-413: 💤 Low valueMinor: Multibyte test lacks error message assertion.
For consistency with the ASCII overflow test on line 409, consider asserting the error message for the multibyte test as well.
🔍 Add assertion for consistency
// Multi-byte content over the limit is rejected the same way. $multibyte = \str_repeat('📝', 20000); $this->assertEquals(false, $validator->isValid(new Document($base + ['blocks_json' => $multibyte]))); + $this->assertEquals('Invalid document structure: Attribute "blocks_json" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/Validator/StructureTest.php` around lines 412 - 413, The multibyte overflow test starting at the line with str_repeat and '📝' is missing an assertion for the error message, unlike the ASCII overflow test on line 409. After the assertEquals call that validates isValid returns false for the multibyte Document, add an additional assertion to check the error message returned by the validator, following the same pattern as the ASCII overflow test for consistency.
995-1047: 💤 Low valueConsider adding test case for byte-safe limit edge.
While
testTextByteSafeValidationcomprehensively tests the 16,383-character byte-safe limit for legacy attributes,testTextValidationcould benefit from a similar edge-case test. Currently it tests 16,383 chars (pass) and 65,536 chars (fail due to declared size), but doesn't verify that 16,384 chars would fail due to the byte-safe limit.🧪 Optional test case to add
Add this test case between the existing assertions to verify byte-safe enforcement:
$this->assertEquals(false, $validator->isValid(new Document([ '$collection' => ID::custom('posts'), 'title' => 'Demo Title', 'description' => 'Demo description', 'rating' => 5, 'price' => 1.99, 'published' => true, 'tags' => ['dog', 'cat', 'mouse'], 'feedback' => 'team@appwrite.io', 'text_field' => \str_repeat('a', 16384), '$createdAt' => '2000-04-01T12:00:00.000+00:00', '$updatedAt' => '2000-04-01T12:00:00.000+00:00' ]))); $this->assertEquals('Invalid document structure: Attribute "text_field" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/Validator/StructureTest.php` around lines 995 - 1047, The testTextValidation method should include an additional test case to verify the byte-safe character limit edge case. Add a new assertion between the existing test cases that validates a Document with text_field set to \str_repeat('a', 16384) should fail (assertEquals false), and verify the corresponding error message from getDescription indicates the byte-safe limit of 16383 characters, similar to the pattern already established in the method with the 65536 character test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Database/Database.php`:
- Around line 118-121: The MAX_LONGTEXT_BYTES constant in the Database class is
set to a value that exceeds the 32-bit signed integer maximum, causing it to be
converted to a float in 32-bit PHP environments. This breaks the intdiv call in
downstream code that expects integer arguments. Fix this by explicitly casting
MAX_LONGTEXT_BYTES to int using the (int) type cast operator to ensure it
remains an integer on all PHP platforms, including 32-bit environments.
---
Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 412-413: The multibyte overflow test starting at the line with
str_repeat and '📝' is missing an assertion for the error message, unlike the
ASCII overflow test on line 409. After the assertEquals call that validates
isValid returns false for the multibyte Document, add an additional assertion to
check the error message returned by the validator, following the same pattern as
the ASCII overflow test for consistency.
- Around line 995-1047: The testTextValidation method should include an
additional test case to verify the byte-safe character limit edge case. Add a
new assertion between the existing test cases that validates a Document with
text_field set to \str_repeat('a', 16384) should fail (assertEquals false), and
verify the corresponding error message from getDescription indicates the
byte-safe limit of 16383 characters, similar to the pattern already established
in the method with the 65536 character test case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 338e4613-aa3d-423d-af38-75776ac9f9df
📒 Files selected for processing (4)
src/Database/Database.phpsrc/Database/Validator/Structure.phptests/e2e/Adapter/Scopes/DocumentTests.phptests/unit/Validator/StructureTest.php
Greptile SummaryThis PR adds a new
Confidence Score: 3/5The byte-based validation fix is correct for MariaDB/MySQL, but Structure.php now enforces MariaDB TEXT column byte ceilings unconditionally — both for adapters that map VAR_TEXT to unbounded storage and for array-typed text attributes stored as JSON rather than as TEXT columns — meaning valid data can be rejected by the generic validator in those configurations. The new ByteLength validator and the dual-validator pattern are well-implemented and fix the core silent-truncation problem. However, Structure.php wires in hard-coded MariaDB-specific byte caps without checking which adapter is active or whether the attribute value is destined for a TEXT column vs a JSON column. On adapters where VAR_TEXT maps to an unbounded type, or for any text attribute marked as an array, values well within what the backing store can hold will now be rejected. These paths affect the core document write and update flows. src/Database/Validator/Structure.php — the TEXT-family byte ceilings are applied without adapter context or array-storage awareness Important Files Changed
Reviews (12): Last reviewed commit: "Merge branch 'main' of github.com:utopia..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/unit/Validator/StructureTest.php (1)
441-445: ⚡ Quick winAssert the exact failure reason in the multibyte overflow test.
This test currently validates only
false; asserting the description too will lock the byte-limit contract and catch regressions in failure source.Suggested test hardening
$multibyte = \str_repeat('📝', 20000); $this->assertEquals(false, $validator->isValid(new Document($base + ['text' => $multibyte]))); +$this->assertEquals( + 'Invalid document structure: Attribute "text" has invalid type. Value must be a valid string no longer than 65535 bytes', + $validator->getDescription() +);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/Validator/StructureTest.php` around lines 441 - 445, The multibyte overflow test in the test method currently only asserts that the validator returns false but does not verify the specific failure reason. Strengthen this test by adding an additional assertion that checks the failure description or error message from the validator after the isValid call with the multibyte document. This will ensure that the specific byte-limit contract is properly validated and catch any regressions where the failure source changes. Reference the validator's method to retrieve the error description (commonly methods like getError, getFailureDescription, or similar) to assert the expected error message alongside the false result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@phpunit.xml`:
- Line 10: The stopOnFailure attribute in the phpunit.xml configuration is set
to "true", which causes the test suite to halt immediately upon encountering the
first failure. This masks subsequent failures and reduces diagnostic value in CI
environments. Change stopOnFailure from "true" to "false" to allow the full test
suite to execute and report all failures in a single run, providing better
visibility into the overall test health.
In `@src/Database/Validator/ByteLength.php`:
- Line 53: The file is missing a trailing newline at the end of the file, which
violates PSR-12 standards that require a single blank line at the end of every
file. Add a single newline character after the closing brace at the end of the
ByteLength.php file to satisfy the Pint code style checker and resolve the
single_blank_line_at_eof CI failure.
In `@tests/e2e/Adapter/Base.php`:
- Around line 26-27: Uncomment the two trait usage statements in the Base class
by removing the leading // from the lines containing CollectionTests and
CustomDocumentTypeTests. These commented-out trait usages need to be re-enabled
so that their test methods are included in the test discovery and execution,
restoring the full E2E coverage that these traits provide to the adapter tests.
---
Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 441-445: The multibyte overflow test in the test method currently
only asserts that the validator returns false but does not verify the specific
failure reason. Strengthen this test by adding an additional assertion that
checks the failure description or error message from the validator after the
isValid call with the multibyte document. This will ensure that the specific
byte-limit contract is properly validated and catch any regressions where the
failure source changes. Reference the validator's method to retrieve the error
description (commonly methods like getError, getFailureDescription, or similar)
to assert the expected error message alongside the false result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d8392e1-0183-4343-9aef-8defb270cf4f
📒 Files selected for processing (6)
phpunit.xmlsrc/Database/Validator/ByteLength.phpsrc/Database/Validator/Structure.phptests/e2e/Adapter/Base.phptests/e2e/Adapter/Scopes/DocumentTests.phptests/unit/Validator/StructureTest.php
| private const MARIADB_TEXT_BYTES = '' . Database::MAX_TEXT_BYTES; | ||
| private const MARIADB_MEDIUMTEXT_BYTES = '' . Database::MAX_MEDIUMTEXT_BYTES; | ||
| private const MARIADB_LONGTEXT_BYTES = '' . Database::MAX_LONGTEXT_BYTES; |
| case Database::VAR_LONGTEXT: | ||
| $validators[] = new Text($size, min: 0); | ||
| $validators[] = new ByteLength(Database::MAX_LONGTEXT_BYTES); | ||
| break; |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Database/Database.php (1)
9784-9789: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winUse a stronger digest for cache-key derivation.
md5is collision-prone; for cache field/key hashing, prefer SHA-256 to reduce collision-based cache poisoning/mix risks.Suggested patch
- $schemaHash = \md5( + $schemaHash = \hash( + 'sha256', \json_encode($collection->getAttribute('attributes', [])) . \json_encode($collection->getAttribute('indexes', [])) . \json_encode($collection->getAttribute('$permissions', [])) . \json_encode($collection->getAttribute('documentSecurity', false)) ); @@ - \md5(\json_encode($queryPayload) ?: ''), + \hash('sha256', \json_encode($queryPayload) ?: ''),Also applies to: 9792-9797
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Database/Database.php` around lines 9784 - 9789, The cache-key derivation in the schema hash uses a collision-prone digest and should be strengthened. Update the schema hash logic in Database::setCollection (the block building $schemaHash from attributes, indexes, $permissions, and documentSecurity) to use SHA-256 instead of md5, and make the same change in the other schema-hash block referenced by the review so both cache field derivations use the stronger digest consistently.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/Database/Database.php`:
- Around line 9784-9789: The cache-key derivation in the schema hash uses a
collision-prone digest and should be strengthened. Update the schema hash logic
in Database::setCollection (the block building $schemaHash from attributes,
indexes, $permissions, and documentSecurity) to use SHA-256 instead of md5, and
make the same change in the other schema-hash block referenced by the review so
both cache field derivations use the stronger digest consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d0f0028a-1410-443e-85e5-62fb04a7b23e
📒 Files selected for processing (5)
src/Database/Adapter/SQLite.phpsrc/Database/Database.phpsrc/Database/Validator/Structure.phptests/e2e/Adapter/Scopes/DocumentTests.phptests/unit/Validator/StructureTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/e2e/Adapter/Scopes/DocumentTests.php
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests