ml_model package

Submodules

ml_model.features module

Most rules are described in ‘Secrets in Source Code: Reducing False Positives Using Machine Learning’.

class credsweeper.ml_model.features.Feature[source]

Bases: abc.ABC

Base class for features.

abstract extract(candidate)[source]
Return type

Any

class credsweeper.ml_model.features.FileExtension(extensions)[source]

Bases: credsweeper.ml_model.features.Feature

Categorical feature of file type.

Parameters

extensions (List[str]) – extension labels

extract(candidate)[source]
Return type

Any

class credsweeper.ml_model.features.HartleyEntropy(base, norm=False)[source]

Bases: credsweeper.ml_model.features.RenyiEntropy

Hartley entropy feature.

class credsweeper.ml_model.features.HasHtmlTag[source]

Bases: credsweeper.ml_model.features.Feature

Feature is true if line has HTML tags (HTML file).

extract(candidate)[source]
Return type

bool

class credsweeper.ml_model.features.IsSecretNumeric[source]

Bases: credsweeper.ml_model.features.Feature

Feature is true if candidate value is a numerical value.

extract(candidate)[source]
Return type

bool

class credsweeper.ml_model.features.PossibleComment[source]

Bases: credsweeper.ml_model.features.Feature

Feature is true if candidate line starts with #,*,/*? (Possible comment).

extract(candidate)[source]
Return type

bool

class credsweeper.ml_model.features.RenyiEntropy(base, alpha, norm=False)[source]

Bases: credsweeper.ml_model.features.Feature

Renyi entropy.

See next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Parameters
  • CHARS – Number base

  • alpha (float) – entropy parameter

  • norm – set True to normalize output probabilities

CHARS = {'base36': 'abcdefghijklmnopqrstuvwxyz1234567890', 'base64': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=', 'hex': '1234567890abcdefABCDEF'}
estimate_entropy(p_x)[source]

Calculate Renyi entropy of ‘p_x’ sequence.

Function is based on definition of Renyi entropy for arbitrary probability distribution. Please see next link for details: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s4_v1_article-27.pdf

Return type

float

extract(candidate)[source]
Return type

ndarray

get_probabilities(data)[source]

Get list of alphabet’s characters presented in inputted string.

Return type

ndarray

class credsweeper.ml_model.features.RuleName(rule_names)[source]

Bases: credsweeper.ml_model.features.Feature

Categorical feature that corresponds to rule name.

Parameters

rule_names (List[str]) – rule name labels

extract(candidate)[source]
Return type

Any

class credsweeper.ml_model.features.ShannonEntropy(base, norm=False)[source]

Bases: credsweeper.ml_model.features.RenyiEntropy

Shannon entropy feature.

class credsweeper.ml_model.features.WordInLine(words)[source]

Bases: credsweeper.ml_model.features.Feature

Feature is true if line contains at least one word from predefined list.

extract(candidate)[source]
Return type

bool

class credsweeper.ml_model.features.WordInPath(words)[source]

Bases: credsweeper.ml_model.features.Feature

Feature is true if candidate path contains at least one word from predefined list.

extract(candidate)[source]
Return type

bool

class credsweeper.ml_model.features.WordInSecret(words)[source]

Bases: credsweeper.ml_model.features.Feature

Feature returns true if candidate value contains at least one word from predefined list.

extract(candidate)[source]
Return type

bool

ml_model.ml_validator module

class credsweeper.ml_model.ml_validator.MlValidator[source]

Bases: object

classmethod encode(line, char_to_index)[source]
Return type

ndarray

classmethod extract_common_features(candidates)[source]

Extract features that are guaranteed to be the same for all candidates on the same line with same value.

Return type

ndarray

classmethod extract_unique_features(candidates)[source]

Extract features that can by different between candidates. Join them with or operator.

Return type

ndarray

classmethod get_group_features(value, candidates)[source]
Return type

Tuple[ndarray, ndarray]

classmethod validate(candidate)[source]

Validate single credential candidate.

Return type

Tuple[bool, float]

classmethod validate_groups(group_list, batch_size)[source]

Use ml model on list of candidate groups.

Parameters
Return type

Tuple[ndarray, ndarray]

Returns

Boolean numpy array with decision based on the threshold, and numpy array with probability predicted by the model

Module contents