-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example/Documentation unclear for low level reading of file size. #255
Comments
I would highly recommend you use the high level API, specifically Ultimately I can't help you write your actual application, I can help if you have specific questions about |
Thanks for your response. Does the high level api support using a filter
pattern?
Getting the top level folder share listing takes 30+ minutes as it contains
tens of thousands of folders.
Thank you
…On Tue, 5 Dec 2023 at 09:26, Jordan Borean ***@***.***> wrote:
I would highly recommend you use the high level API, specifically
smbclient.scandir to enumerate entries on a directory. There's not too
much that you really gain by using the low level API here as I've tried to
make the high level one as efficient as possible for the operations needed.
Even just things like opening a file/directory can be done with the high
level API and then using the raw file open object can be used for low level
operations that might not be exposed in the high level API.
Ultimately I can't help you write your actual application, I can help if
you have specific questions about smbprotocol that you may have but
that's about it. If you don't have a specific question or query then I'll
close this issue tomorrow.
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAX7BVZVS5VOSBM5ROFU6I3YH3LERAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBQGI2DSMRRGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yep, the smbprotocol/src/smbclient/_os.py Line 526 in 37512ee
* and ? that the underlying SMB server supports.
|
Awesome! thanks!
I am quite proud that I actually managed to get my first version using the
smbprotocol to work well enough for my purposes.
In the future I'll rely on smbclient for sure!
One last question, you might easily be able to answer for me. Is there a
record of the username or owner who uploaded/created the file in the samba
protocol?
Thanks again!
…On Tue, 5 Dec 2023 at 19:41, Jordan Borean ***@***.***> wrote:
Yep, the search_pattern kwarg
https://github.com/jborean93/smbprotocol/blob/37512ee0648ad64f98755833382fea790d9b2df6/src/smbclient/_os.py#L526
supports the normal server side filtering with * and ? that the
underlying SMB server supports.
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAX7BV6GYKYBJVMQYIIH2ADYH5TELAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRGQYDCNBTGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Bjorn Heijligers
+31620106733
|
The closest there is is the "Owner" of the file in the security descriptor. Unfortunately it's not reliable as on Windows this could be the |
Thanks! SID might actually be enough. I'm only interested in knowing which
files were created by the same users, not necessarily the name of the user.
…On Thu, 7 Dec 2023 at 22:56, Jordan Borean ***@***.***> wrote:
The closest there is is the "Owner" of the file in the security
descriptor. Unfortunately it's not reliable as on Windows this could be the
Administrators group or whatever is set in the user's group sids as the
owner. Plus getting that value will only give you the SID string in python,
you still need a separate process to translate that to an account name
which this library does not do.
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAX7BV44CC5J2GK7CQHZVNLYII3PPAVCNFSM6AAAAABAG6IO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWGE3TANRYGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Bjorn Heijligers
+31620106733
|
I'm trying to use GPT4 to implement a python smb crawler that has to connect over a VERY SLOW connection with a Synology NAS with MILLIONS of files. Luckily I only need a subset of the folder and of the file types.
Can someone help me get a basic version up and running. Both using various GPT tools and trying to parse the low level source code myself i haven't managed to get the following software design and reference implementation to work:
Prototype 3:
smbprotocol
PyYAML
loguru
single-threaded
tenacity
(for retry logic)Yaml.conf:
top_folder_filter: P100* file_copy_extention_filter: - .xml - .dat - .txt - .doc - .docx local_directory: ../download
Intended Pseudocode
`- Initialize:
- Load configuration from 'config.yaml'
- Establish connection to Samba server using 'server_ip', 'server_user', 'server_password'
- Initialize logging framework
Main Process:
Recursive Folder Crawl (folder):
Error Handling:
Finalize:
`
Attempt1:
'import logging
import threading
import yaml
from smbprotocol.connection import Connection
from smbprotocol.session import Session
from smbprotocol.tree import TreeConnect
from smbprotocol.open import Open, CreateDisposition, FileAttributes, CreateOptions, DirectoryAccessMask
from tenacity import retry, stop_after_attempt, wait_fixed
import os
import fnmatch
from smbprotocol.exceptions import SMBResponseException
Configure logging
logging.basicConfig(level=logging.INFO)
import uuid
def main():
if name == "main":
main()
'
attempt 2 (incomplete)
'
import yaml
from loguru import logger
from tenacity import retry, stop_after_attempt, wait_exponential
from smbprotocol.open import CreateDisposition, CreateOptions, DirectoryAccessMask, FileAttributes,
FileInformationClass, ImpersonationLevel, Open, ShareAccess
from contextlib import contextmanager
from io import BytesIO
from smbprotocol.connection import Connection
from smbprotocol.session import Session
from smbprotocol.open import CreateDisposition, FileAttributes, FilePipePrinterAccessMask, ImpersonationLevel, Open,
ShareAccess
from smbprotocol.tree import TreeConnect
from smbprotocol.connection import Connection
from smbprotocol.session import Session
from smbprotocol.tree import TreeConnect
from smbprotocol.connection import Connection
from smbprotocol.session import Session
from smbprotocol.open import CreateDisposition, CreateOptions, DirectoryAccessMask, FileAttributes,
FileInformationClass, ImpersonationLevel, Open, ShareAccess
from smbprotocol.tree import TreeConnect
import uuid,sys
def smb_b_open(tree, mode='r', share='r', username=None, password=None, encrypt=True):
"""
Functions similar to the builtin open() method where it will create an open handle to a file over SMB. This can be
used to read and/or write data to the file using the methods exposed by the Open() class in smbprotocol. Read and
write operations only support bytes and not text strings.
class FileEntry(object):
Define _listdir helper function for applying a filter pattern and recursion to listing the content of a samba share,
specified by the tree variable
def _listdir(tree, path, pattern, recurse):
full_path = tree.share_name
if path != "":
full_path += r"%s" % path
def main1():
# Load configuration
with open('config.yaml', 'r') as file:
config = yaml.safe_load(file)
# Samba client configuration
server_ip = config['server_ip']
username = config['server_user']
password = config['server_password']
share_name = config['share_name']
top_folder_filter = config['top_folder_filter']
file_copy_extention_filter = config['file_copy_extention_filter']
if name == "main":
main1()
'
Software Design Specification for a Remote Samba Share Crawler
Overview
The Remote Samba Share Crawler is designed to connect to a Samba share, crawl through its directories and files, and download specified files to a local directory. It supports various features like recursive crawling, threading, logging, and error handling.
Functional Requirements
Non-functional Requirements
Proposed Architecture
1. Classes and Modules
Crawler
: Main class handling connection, crawling, downloading, and state management.FileEntry
: Class representing a file or directory in the Samba share.yaml
orjson
).logging
module or an alternative).2. External Libraries
smbprotocol
,pysmb
, or an equivalent).PyYAML
orjson
).logging
module or an equivalent likeloguru
).3. Configuration
4. Logging
5. Error Handling and Retry Logic
6. Threading and Concurrency
The text was updated successfully, but these errors were encountered: