How To Use

Run

Get all argument list:

python -m credsweeper --help
usage: python -m credsweeper [-h] (--path PATH [PATH ...] | --diff_path PATH [PATH ...]) [--rules [PATH]] [--ml_validation] [--ml_threshold FLOAT_OR_STR] [-b POSITIVE_INT] [--api_validation] [-j POSITIVE_INT] [--skip_ignored] [--save-json [PATH]] [-l LOG_LEVEL]

optional arguments:
-h, --help            show this help message and exit
--path PATH [PATH ...]
                        file or directory to scan
--diff_path PATH [PATH ...]
                        git diff file to scan
--rules [PATH]        path of rule config file (default: credsweeper/rules/config.yaml)
--ml_validation       Use credential ml validation option. Machine Learning is used to reduce FP (by far).
--ml_threshold FLOAT_OR_STR
                        setup threshold for the ml model. The lower the threshold - the more credentials will be reported. Allowed values: float between 0 and 1, or any of ['lowest', 'low', 'medium',
                        'high', 'highest'] (default: medium)
-b POSITIVE_INT, --ml_batch_size POSITIVE_INT
                        batch size for model inference (default: 16)
--api_validation      Add credential api validation option to credsweeper pipeline. External API is used to reduce FP for some rule types.
-j POSITIVE_INT, --jobs POSITIVE_INT
                        number of parallel processes to use (default: 1)
--skip_ignored        parse .gitignore files and skip credentials from ignored objects
--save-json [PATH]    save result to json file (default: output.json)
-l LOG_LEVEL, --log LOG_LEVEL
                        provide logging level. Example --log debug, (default: 'warning'),
                        detailed log config: credsweeper/secret/log.yaml

Note

Validation by ML model classifier is used to reduce False Positives (by far), but might increase False negatives and execution time. So –ml_validation is recommended, unless you want to minimize FN.

Typical False Positives: password = “template_password”

API validation is also used to reduce FP, but only for some rule types.

Get output as JSON file:

python -m credsweeper --ml_validation --path tests/samples/password --save-json output.json

To check JSON file run:

cat output.json
[
    {
        "rule": "Password",
        "severity": "medium",
        "line_data_list": [
            {
                "line": "password = \"cackle!\"",
                "line_num": 1,
                "path": "tests/samples/password",
                "entropy_validation": false
            }
        ],
        "api_validation": "NOT_AVAILABLE",
        "ml_validation": "VALIDATED_KEY"
    }
]

Get CLI output only:

python -m credsweeper --ml_validation --path tests/samples/password
rule: Password / severity: medium / line_data_list: [line : 'password = "cackle!"' / line_num : 1 / path : tests/samples/password / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: VALIDATED_KEY

Tests

To run all tests:

python -m pytest --cov=credsweeper --cov-report=term-missing -s tests/

To run only tests independent from external api:

python -m pytest -m "not api_validation" --cov=credsweeper --cov-report=term-missing -s tests/

Use as a python library

Minimal example for scanning line list:

from credsweeper import CredSweeper, StringContentProvider


to_scan = ["line one", "password='in_line_2'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password / severity: medium / line_data_list: [line: 'password='in_line_2'' / line_num: 2 / path:  / value: 'in_line_2' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Minimal example for scanning bytes:

from credsweeper import CredSweeper, ByteContentProvider


to_scan = b"line one\npassword='in_line_2'"
cred_sweeper = CredSweeper()
provider = ByteContentProvider(to_scan)
results = cred_sweeper.file_scan(provider)
for r in results:
    print(r)
rule: Password / severity: medium / line_data_list: [line: 'password='in_line_2'' / line_num: 2 / path:  / value: 'in_line_2' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Minimal example for the ML validation:

from credsweeper import CredSweeper, StringContentProvider, MlValidator, ThresholdPreset


to_scan = ["line one", "secret='fgELsRdFA'", "secret='template'"]
cred_sweeper = CredSweeper()
provider = StringContentProvider(to_scan)

# You can select lower or higher threshold to get more or less reports respectively
threshold = ThresholdPreset.medium
validator = MlValidator(threshold=threshold)

results = cred_sweeper.file_scan(provider)
for candidate in results:
    # For each results detected by a CredSweeper, you can validate them using MlValidator
    is_credential, with_probability = validator.validate(candidate)
    if is_credential:
        print(candidate)

Note that “secret=’template’” is not reported due to failing check by the MlValidator.

rule: Secret / severity: medium / line_data_list: [line: 'secret='fgELsRdFA'' / line_num: 2 / path:  / value: 'fgELsRdFA' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: NOT_AVAILABLE

Benchmark

We have a dataset for testing credential scanners that called CredData. If you want to test CredSweeper with this dataset please check here.

Configurations