Skip to content

bagit.py and broken soft links #115

@ajnelson-nist

Description

@ajnelson-nist

Hello,

I encountered an issue today with bagit.py failing to deal with a broken soft link, and halting bagging an otherwise-intact file system tree. I was attempting to use the script on a directory tree that included a soft-linked file apparently meant to be set at a later date (e.g. .../foo.cfg pointing to .../config/populated_by_user/foo.cfg, similar to what the Apache webserver does with config files). The execution environment in this case is in a POSIX-interfaced file system.

The problem in bagit.py appears to stem from the function _can_read, on (today's) Line 1362. The broken soft link in this case pointed at a non-existent directory, raising an error on Line 206.

I suggest that a broken soft link should not prevent a directory tree from being bagged. It may be better for _can_read to only report actual directories and files that are unreadable, possibly with broken links as a new third output.

If helpful, there is a script that converts a file system walk (via os.walk) to DFXML, and it has an if-ladder that goes through all file-system-level file types, not just directories and regular files. See the walk_to_dfxml.py function filepath_to_fileobject, and all assignment statements matching name_type = (starting on Line 36 today). You may want _can_read to skip operating on other file types as well.

--Alex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions