Hello!
While gleefully using cloudpathlib, I needed a recursive iteration of files in a directory. This directory is large (6238 files), so my first approach - a recursive iterdir() + is_file() - took waaaay too long (likely due to #176).
I remembered that glob made better use of the cloud list calls, so I tried list(p.rglob("*")). In my directory, that took 17m:34s.
I then tried to 'cheat' and call [f for f, is_dir in p.client._list_dir(p, recursive=True) if not is_dir]. It took 1.435s.
I looked at the glob logic, but I still can't understand why the discrepency (Using Google Cloud). This may warrant another issue, but
However, I wonder if it might not be a good idea to make _list_dir a public function in the meantime as a workaround.
Another option is to add recursive and/or files_only keywords to iterdir. This deviates from pathlib API, but since these are added keywords, it might be OK?
I'm suggesting these options even though solving #176 would probably solve most issues, but these solutions are much simpler.
I'd of course be happy to send a PR.
WDYT?
Hello!
While gleefully using
cloudpathlib, I needed a recursive iteration of files in a directory. This directory is large (6238 files), so my first approach - a recursiveiterdir()+is_file()- took waaaay too long (likely due to #176).I remembered that
globmade better use of the cloud list calls, so I triedlist(p.rglob("*")). In my directory, that took 17m:34s.I then tried to 'cheat' and call
[f for f, is_dir in p.client._list_dir(p, recursive=True) if not is_dir]. It took 1.435s.I looked at the glob logic, but I still can't understand why the discrepency (Using Google Cloud). This may warrant another issue, but
However, I wonder if it might not be a good idea to make
_list_dira public function in the meantime as a workaround.Another option is to add
recursiveand/orfiles_onlykeywords toiterdir. This deviates from pathlib API, but since these are added keywords, it might be OK?I'm suggesting these options even though solving #176 would probably solve most issues, but these solutions are much simpler.
I'd of course be happy to send a PR.
WDYT?