Skip to content

Commit 69ec6b9

Browse files
authored
Refactor file detection and scanning logic to fix commit file handling (#101)
* Refactor file detection and scanning logic to fix commit file handling - Fix file argument parsing to handle list, string, and JSON formats more robustly - Clarify git repository detection and file selection logic with better separation of concerns - Add force_api_mode to handle cases where no supported manifest files are found - Replace ambiguous should_skip_scan logic with clearer file detection flow - Add create_full_scan_with_report_url method to Core for API-mode scanning - Improve logging messages and remove unused code (get_all_scores method) - Ensure consistent diff object initialization and ID handling - Automatically enable disable_blocking when no supported files are detected * Add debugging options and lazy file loading to prevent file descriptor exhaustion - Add --save-submitted-files-list option to output JSON with list of scanned files, sizes, and metadata for debugging - Add --save-manifest-tar option to create tar.gz archive of all manifest files with original directory structure - Implement lazy file loading to prevent 'Too many open files' errors when scanning large numbers of manifest files - Add system resource utilities to check file descriptor limits and warn when approaching ulimit -n - Update .gitignore to exclude AI testing files and verification scripts - Update README with comprehensive documentation for new debugging features and examples
1 parent 9a1d030 commit 69ec6b9

File tree

9 files changed

+668
-86
lines changed

9 files changed

+668
-86
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,6 @@ file_generator.py
2424
.env.local
2525
Pipfile
2626
test/
27-
logs
27+
logs
28+
ai_testing/
29+
verify_find_files_lazy_loading.py

README.md

Lines changed: 80 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ The Socket Security CLI was created to enable integrations with other tools like
77
```` shell
88
socketcli [-h] [--api-token API_TOKEN] [--repo REPO] [--integration {api,github,gitlab}] [--owner OWNER] [--branch BRANCH]
99
[--committers [COMMITTERS ...]] [--pr-number PR_NUMBER] [--commit-message COMMIT_MESSAGE] [--commit-sha COMMIT_SHA]
10-
[--target-path TARGET_PATH] [--sbom-file SBOM_FILE] [--files FILES] [--default-branch] [--pending-head]
11-
[--generate-license] [--enable-debug] [--enable-json] [--enable-sarif] [--disable-overview] [--disable-security-issue]
12-
[--allow-unverified] [--ignore-commit-files] [--disable-blocking] [--scm SCM] [--timeout TIMEOUT]
13-
[--exclude-license-details]
10+
[--target-path TARGET_PATH] [--sbom-file SBOM_FILE] [--files FILES] [--save-submitted-files-list SAVE_SUBMITTED_FILES_LIST]
11+
[--default-branch] [--pending-head] [--generate-license] [--enable-debug] [--enable-json] [--enable-sarif]
12+
[--disable-overview] [--disable-security-issue] [--allow-unverified] [--ignore-commit-files] [--disable-blocking]
13+
[--scm SCM] [--timeout TIMEOUT] [--exclude-license-details]
1414
````
1515

1616
If you don't want to provide the Socket API Token every time then you can use the environment variable `SOCKET_SECURITY_API_KEY`
@@ -40,13 +40,15 @@ If you don't want to provide the Socket API Token every time then you can use th
4040
| --commit-sha | False | "" | Commit SHA |
4141
4242
#### Path and File
43-
| Parameter | Required | Default | Description |
44-
|:----------------------|:---------|:----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
45-
| --target-path | False | ./ | Target path for analysis |
46-
| --sbom-file | False | | SBOM file path |
47-
| --files | False | [] | Files to analyze (JSON array string) |
48-
| --excluded-ecosystems | False | [] | List of ecosystems to exclude from analysis (JSON array string). You can get supported files from the [Supported Files API](https://docs.socket.dev/reference/getsupportedfiles) |
49-
| --license-file-name | False | `license_output.json` | Name of the file to save the license details to if enabled |
43+
| Parameter | Required | Default | Description |
44+
|:----------------------------|:---------|:----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
45+
| --target-path | False | ./ | Target path for analysis |
46+
| --sbom-file | False | | SBOM file path |
47+
| --files | False | [] | Files to analyze (JSON array string) |
48+
| --excluded-ecosystems | False | [] | List of ecosystems to exclude from analysis (JSON array string). You can get supported files from the [Supported Files API](https://docs.socket.dev/reference/getsupportedfiles) |
49+
| --license-file-name | False | `license_output.json` | Name of the file to save the license details to if enabled |
50+
| --save-submitted-files-list | False | | Save list of submitted file names to JSON file for debugging purposes |
51+
| --save-manifest-tar | False | | Save all manifest files to a compressed tar.gz archive with original directory structure |
5052
5153
#### Branch and Scan Configuration
5254
| Parameter | Required | Default | Description |
@@ -133,6 +135,73 @@ The CLI determines which files to scan based on the following logic:
133135
- **Using `--files`**: If you specify `--files '["package.json"]'`, the CLI will check if this file exists and is a manifest file before triggering a scan.
134136
- **Using `--ignore-commit-files`**: This forces a scan of all manifest files in the target path, regardless of what's in your commit.
135137
138+
## Debugging and Troubleshooting
139+
140+
### Saving Submitted Files List
141+
142+
The CLI provides a debugging option to save the list of files that were submitted for scanning:
143+
144+
```bash
145+
socketcli --save-submitted-files-list submitted_files.json
146+
```
147+
148+
This will create a JSON file containing:
149+
- Timestamp of when the scan was performed
150+
- Total number of files submitted
151+
- Total size of all files (in bytes and human-readable format)
152+
- Complete list of file paths that were found and submitted for scanning
153+
154+
Example output file:
155+
```json
156+
{
157+
"timestamp": "2025-01-22 10:30:45 UTC",
158+
"total_files": 3,
159+
"total_size_bytes": 2048,
160+
"total_size_human": "2.00 KB",
161+
"files": [
162+
"./package.json",
163+
"./requirements.txt",
164+
"./Pipfile"
165+
]
166+
}
167+
```
168+
169+
This feature is useful for:
170+
- **Debugging**: Understanding which files the CLI found and submitted
171+
- **Verification**: Confirming that expected manifest files are being detected
172+
- **Size Analysis**: Understanding the total size of manifest files being uploaded
173+
- **Troubleshooting**: Identifying why certain files might not be included in scans or if size limits are being hit
174+
175+
> **Note**: This option works with both differential scans (when git commits are detected) and full scans (API mode).
176+
177+
### Saving Manifest Files Archive
178+
179+
For backup, sharing, or analysis purposes, you can save all manifest files to a compressed tar.gz archive:
180+
181+
```bash
182+
socketcli --save-manifest-tar manifest_files.tar.gz
183+
```
184+
185+
This will create a compressed archive containing all the manifest files that were found and submitted for scanning, preserving their original directory structure relative to the scanned directory.
186+
187+
Example usage with other options:
188+
```bash
189+
# Save both files list and archive
190+
socketcli --save-submitted-files-list files.json --save-manifest-tar backup.tar.gz
191+
192+
# Use with specific target path
193+
socketcli --target-path ./my-project --save-manifest-tar my-project-manifests.tar.gz
194+
```
195+
196+
The manifest archive feature is useful for:
197+
- **Backup**: Creating portable backups of all dependency manifest files
198+
- **Sharing**: Sending the exact files being analyzed to colleagues or support
199+
- **Analysis**: Examining the dependency files offline or with other tools
200+
- **Debugging**: Verifying file discovery and content issues
201+
- **Compliance**: Maintaining records of scanned dependency files
202+
203+
> **Note**: The tar.gz archive preserves the original directory structure, making it easy to extract and examine the files in their proper context.
204+
136205
## Development
137206
138207
This project uses `pyproject.toml` as the primary dependency specification.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ build-backend = "hatchling.build"
66

77
[project]
88
name = "socketsecurity"
9-
version = "2.1.21"
9+
version = "2.1.23"
1010
requires-python = ">= 3.10"
1111
license = {"file" = "LICENSE"}
1212
dependencies = [

socketsecurity/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
__author__ = 'socket.dev'
2-
__version__ = '2.1.21'
2+
__version__ = '2.1.23'

socketsecurity/config.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ class CliConfig:
5757
jira_plugin: PluginConfig = field(default_factory=PluginConfig)
5858
slack_plugin: PluginConfig = field(default_factory=PluginConfig)
5959
license_file_name: str = "license_output.json"
60+
save_submitted_files_list: Optional[str] = None
61+
save_manifest_tar: Optional[str] = None
6062

6163
@classmethod
6264
def from_args(cls, args_list: Optional[List[str]] = None) -> 'CliConfig':
@@ -101,6 +103,8 @@ def from_args(cls, args_list: Optional[List[str]] = None) -> 'CliConfig':
101103
'repo_is_public': args.repo_is_public,
102104
"excluded_ecosystems": args.excluded_ecosystems,
103105
'license_file_name': args.license_file_name,
106+
'save_submitted_files_list': args.save_submitted_files_list,
107+
'save_manifest_tar': args.save_manifest_tar,
104108
'version': __version__
105109
}
106110
try:
@@ -262,6 +266,18 @@ def create_argument_parser() -> argparse.ArgumentParser:
262266
metavar="<string>",
263267
help="SBOM file path"
264268
)
269+
path_group.add_argument(
270+
"--save-submitted-files-list",
271+
dest="save_submitted_files_list",
272+
metavar="<path>",
273+
help="Save list of submitted file names to JSON file for debugging purposes"
274+
)
275+
path_group.add_argument(
276+
"--save-manifest-tar",
277+
dest="save_manifest_tar",
278+
metavar="<path>",
279+
help="Save all manifest files to a compressed tar.gz archive with original directory structure"
280+
)
265281
path_group.add_argument(
266282
"--files",
267283
metavar="<json>",

0 commit comments

Comments
 (0)