API

This part of the documentation lists the full API reference of all public classes and functions.

tartufo.config

tartufo.config.compile_path_rules(patterns)[source]

Take a list of regex strings and compile them into patterns.

Any line starting with # will be ignored.

Parameters

patterns (Iterable[str]) – The list of patterns to be compiled

Return type

List[Pattern]

tartufo.config.compile_rules(patterns)[source]

Take a list of regex string with paths and compile them into a List of Rule.

Parameters

patterns (Iterable[Dict[str, str]]) – The list of patterns to be compiled

Return type

List[Rule]

Returns

List of Rule objects

tartufo.config.configure_regexes(include_default=True, rule_patterns=None, rules_repo=None, rules_repo_files=None)[source]

Build a set of regular expressions to be used during a regex scan.

Parameters
  • include_default (bool) – Whether to include the built-in set of regexes

  • rules_files – A list of files to load rules from

  • rule_patterns (Optional[Iterable[Dict[str, str]]]) – A set of previously-collected rules

  • rules_repo (Optional[str]) – A separate git repository to load rules from

  • rules_repo_files (Optional[Iterable[str]]) – A set of patterns used to find files in the rules repo

Return type

Set[Rule]

Returns

Set of Rule objects to be used for regex scans

tartufo.config.load_config_from_path(config_path, filename=None, traverse=True)[source]

Scan a path for a configuration file, and return its contents.

All key names are normalized to remove leading “-“/”–” and replace “-” with “_”. For example, “–repo-path” becomes “repo_path”.

In addition to checking the specified path, if traverse is True, this will traverse up through the directory structure, looking for a configuration file in parent directories. For example, given this directory structure:

working_dir/
|- tartufo.toml
|- group1/
|  |- project1/
|  |  |- tartufo.toml
|  |- project2/
|- group2/
   |- tartufo.toml
   |- project1/
   |- project2/
      |- tartufo.toml

The following config_path values will load the configuration files at the corresponding paths:

config_path

file

working_dir/group1/project1/

working_dir/group1/project1/tartufo.toml

working_dir/group1/project2/

working_dir/tartufo.toml

working_dir/group2/project1/

working_dir/group2/tartufo.toml

working_dir/group2/project2/

working_dir/group2/project2/tartufo.toml

Parameters
  • config_path (Path) – The path to search for configuration files

  • filename (Optional[str]) – A specific filename to look for. By default, this will look for both tartufo.toml and then pyproject.toml.

  • traverse (bool) –

Raises
Return type

Tuple[Path, MutableMapping[str, Any]]

Returns

A tuple consisting of the config file that was discovered, and the contents of that file loaded in as TOML data

tartufo.config.load_rules_from_file(rules_file)[source]

Load a set of JSON rules from a file and return them as compiled patterns.

Parameters

rules_file (TextIO) – An open file handle containing a JSON dictionary of regexes

Raises

ValueError – If the rules contain invalid JSON

Return type

Set[Rule]

tartufo.config.read_pyproject_toml(ctx, _param, value)[source]

Read config values from a file and load them as defaults.

Parameters
  • ctx (Context) – A context from a currently executing Click command

  • _param (Parameter) – The command parameter that triggered this callback

  • value (str) – The value passed to the command parameter

Raises

click.FileError – If there was a problem loading the configuration

Return type

Optional[str]

tartufo.scanner

class tartufo.scanner.FolderScanner(global_options, target, recurse)[source]

Bases: ScannerBase

Used to scan a folder.

Used for scanning a folder.

Parameters
  • global_options (GlobalOptions) – The options provided to the top-level tartufo command

  • target (str) – The local filesystem path to scan

  • recurse (bool) – Whether to recurse into sub-folders of the target

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

static calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Adapted from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Return type

float

Returns

The amount of entropy detected in the data

property chunks: Generator[Chunk, None, None]

Yield the individual files in the target directory.

property completed: bool

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Return type

float

Returns

Entropy detection threshold scaled to the input bitrate

property config_data: MutableMapping[str, Any]

Supplemental configuration to be merged into the *_options information.

entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Return type

bool

Returns

True if excluded, False otherwise

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[Issue, None, None]

Returns

Generator of issues flagged

property excluded_entropy: List[Rule]

Get a list of regexes used as an exclusive list of paths to scan.

property excluded_paths: List[Pattern]

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures

Get a list of the signatures of findings to be excluded from the scan results.

Returns

The signatures to be excluded from scan results

global_options: GlobalOptions
hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths: List[Pattern]

Get a list of regexes used as an exclusive list of paths to scan

property issue_count: int
property issue_file: IO
property issues: List[Issue]

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_issues()
Return type

Generator[Issue, None, None]

logger: Logger
recurse: bool
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Return type

bool

Returns

True if string and path matched, False otherwise.

property rules_regexes: Set[Rule]

Get a set of regular expressions to scan the code for.

Raises

types.ConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.ConfigException – If there were problems with the scanner’s configuration

Return type

Generator[Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Return type

bool

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

store_issue(issue)
Return type

None

Parameters

issue (Issue) –

target: str
class tartufo.scanner.GitPreCommitScanner(global_options, repo_path, include_submodules)[source]

Bases: GitScanner

For use in a git pre-commit hook.

Parameters
  • global_options (GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

  • include_submodules (bool) – Whether to scan git submodules in the repository

_iter_diff_index(diff)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff (Diff) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

static calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Adapted from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Return type

float

Returns

The amount of entropy detected in the data

property chunks

Yield the individual file changes currently staged for commit.

property completed: bool

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Return type

float

Returns

Entropy detection threshold scaled to the input bitrate

property config_data: MutableMapping[str, Any]

Supplemental configuration to be merged into the *_options information.

entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Return type

bool

Returns

True if excluded, False otherwise

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[Issue, None, None]

Returns

Generator of issues flagged

property excluded_entropy: List[Rule]

Get a list of regexes used as an exclusive list of paths to scan.

property excluded_paths: List[Pattern]

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures

Get a list of the signatures of findings to be excluded from the scan results.

Returns

The signatures to be excluded from scan results

filter_submodules(repo)

Exclude all git submodules and their contents from being scanned.

Parameters

repo (Repository) – The repository being scanned

Return type

None

global_options: GlobalOptions
static header_length(diff)

Compute the length of the git diff header text.

Parameters

diff (str) – The diff being inspected for a header

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths: List[Pattern]

Get a list of regexes used as an exclusive list of paths to scan

property issue_count: int
property issue_file: IO
property issues: List[Issue]

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_issues()
Return type

Generator[Issue, None, None]

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

Repository

logger: Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Return type

bool

Returns

True if string and path matched, False otherwise.

property rules_regexes: Set[Rule]

Get a set of regular expressions to scan the code for.

Raises

types.ConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.ConfigException – If there were problems with the scanner’s configuration

Return type

Generator[Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Return type

bool

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

store_issue(issue)
Return type

None

Parameters

issue (Issue) –

class tartufo.scanner.GitRepoScanner(global_options, git_options, repo_path)[source]

Bases: GitScanner

Used for scanning a full clone of a git repository.

Parameters
  • global_options (GlobalOptions) – The options provided to the top-level tartufo command

  • git_options (GitOptions) – The options specific to interacting with a git repository

  • repo_path (str) – The local filesystem path pointing to the repository

_iter_diff_index(diff)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff (Diff) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

static calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Adapted from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Return type

float

Returns

The amount of entropy detected in the data

property chunks: Generator[Chunk, None, None]

Yield individual diffs from the repository’s history.

Raises

types.GitRemoteException – If there was an error fetching branches

property completed: bool

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Return type

float

Returns

Entropy detection threshold scaled to the input bitrate

property config_data: MutableMapping[str, Any]

Supplemental configuration to be merged into the *_options information.

entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Return type

bool

Returns

True if excluded, False otherwise

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[Issue, None, None]

Returns

Generator of issues flagged

property excluded_entropy: List[Rule]

Get a list of regexes used as an exclusive list of paths to scan.

property excluded_paths: List[Pattern]

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures

Get a list of the signatures of findings to be excluded from the scan results.

Returns

The signatures to be excluded from scan results

filter_submodules(repo)

Exclude all git submodules and their contents from being scanned.

Parameters

repo (Repository) – The repository being scanned

Return type

None

git_options: GitOptions
global_options: GlobalOptions
static header_length(diff)

Compute the length of the git diff header text.

Parameters

diff (str) – The diff being inspected for a header

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths: List[Pattern]

Get a list of regexes used as an exclusive list of paths to scan

property issue_count: int
property issue_file: IO
property issues: List[Issue]

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_issues()
Return type

Generator[Issue, None, None]

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

Repository

logger: Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Return type

bool

Returns

True if string and path matched, False otherwise.

property rules_regexes: Set[Rule]

Get a set of regular expressions to scan the code for.

Raises

types.ConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.ConfigException – If there were problems with the scanner’s configuration

Return type

Generator[Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Return type

bool

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

store_issue(issue)
Return type

None

Parameters

issue (Issue) –

class tartufo.scanner.GitScanner(global_options, repo_path)[source]

Bases: ScannerBase, ABC

A base class for scanners looking at git history.

This is a lightweight base class to provide some basic functionality needed across all scanner that are interacting with git history.

Parameters
  • global_options (GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

_iter_diff_index(diff)[source]

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff (Diff) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

static calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Adapted from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Return type

float

Returns

The amount of entropy detected in the data

abstract property chunks: Generator[Chunk, None, None]

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

property completed: bool

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Return type

float

Returns

Entropy detection threshold scaled to the input bitrate

property config_data: MutableMapping[str, Any]

Supplemental configuration to be merged into the *_options information.

entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Return type

bool

Returns

True if excluded, False otherwise

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[Issue, None, None]

Returns

Generator of issues flagged

property excluded_entropy: List[Rule]

Get a list of regexes used as an exclusive list of paths to scan.

property excluded_paths: List[Pattern]

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures

Get a list of the signatures of findings to be excluded from the scan results.

Returns

The signatures to be excluded from scan results

filter_submodules(repo)[source]

Exclude all git submodules and their contents from being scanned.

Parameters

repo (Repository) – The repository being scanned

Return type

None

global_options: GlobalOptions
static header_length(diff)[source]

Compute the length of the git diff header text.

Parameters

diff (str) – The diff being inspected for a header

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths: List[Pattern]

Get a list of regexes used as an exclusive list of paths to scan

property issue_count: int
property issue_file: IO
property issues: List[Issue]

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_issues()
Return type

Generator[Issue, None, None]

abstract load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

Repository

logger: Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Return type

bool

Returns

True if string and path matched, False otherwise.

property rules_regexes: Set[Rule]

Get a set of regular expressions to scan the code for.

Raises

types.ConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.ConfigException – If there were problems with the scanner’s configuration

Return type

Generator[Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Return type

bool

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

store_issue(issue)
Return type

None

Parameters

issue (Issue) –

class tartufo.scanner.Issue(issue_type, matched_string, chunk)[source]

Bases: object

Represent an issue found while scanning a target.

Parameters
  • issue_type (IssueType) – What type of scan identified this issue

  • matched_string (str) – The string that was identified as a potential issue

  • chunk (Chunk) – The chunk of data where the match was found

OUTPUT_SEPARATOR: str = '~~~~~~~~~~~~~~~~~~~~~'
as_dict(compact=False)[source]

Return a dictionary representation of an issue.

This is primarily meant to aid in JSON serialization.

Parameters

compact – True to return a dictionary with fewer fields.

Return type

Dict[str, Optional[str]]

Returns

A JSON serializable dictionary representation of this issue

chunk: Chunk
issue_detail: Optional[str]
issue_type: IssueType
matched_string: str
property signature: str

Generate a stable hash-based signature uniquely identifying this issue.

class tartufo.scanner.ScannerBase(options)[source]

Bases: ABC

Provide the base, generic functionality needed by all scanners.

In fact, this contains all of the actual scanning logic. This part of the application should never differ; the part that differs, and the part that is left abstract here, is what content is provided to the various scans. For this reason, the chunks property is left abstract. It is up to the various scanners to implement this property, in the form of a generator, to yield all the individual pieces of content to be scanned.

Parameters

options (GlobalOptions) – A set of options to control the behavior of the scanner

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

static calculate_entropy(data)[source]

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Adapted from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Return type

float

Returns

The amount of entropy detected in the data

abstract property chunks: Generator[Chunk, None, None]

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

property completed: bool

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)[source]

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Return type

float

Returns

Entropy detection threshold scaled to the input bitrate

property config_data: MutableMapping[str, Any]

Supplemental configuration to be merged into the *_options information.

entropy_string_is_excluded(string, line, path)[source]

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Return type

bool

Returns

True if excluded, False otherwise

evaluate_entropy_string(chunk, line, string, min_entropy_score)[source]

Check entropy string using entropy characters and score.

Parameters
  • chunk (Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[Issue, None, None]

Returns

Generator of issues flagged

property excluded_entropy: List[Rule]

Get a list of regexes used as an exclusive list of paths to scan.

property excluded_paths: List[Pattern]

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures

Get a list of the signatures of findings to be excluded from the scan results.

Returns

The signatures to be excluded from scan results

global_options: GlobalOptions
hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths: List[Pattern]

Get a list of regexes used as an exclusive list of paths to scan

property issue_count: int
property issue_file: IO
property issues: List[Issue]

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_issues()[source]
Return type

Generator[Issue, None, None]

logger: Logger
static rule_matches(rule, string, line, path)[source]

Match string and path against rule.

Parameters
  • rule (Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Return type

bool

Returns

True if string and path matched, False otherwise.

property rules_regexes: Set[Rule]

Get a set of regular expressions to scan the code for.

Raises

types.ConfigException – If there was a problem compiling the rules

scan()[source]

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.ConfigException – If there were problems with the scanner’s configuration

Return type

Generator[Issue, None, None]

scan_entropy(chunk)[source]

Scan a chunk of data for apparent high entropy.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

scan_regex(chunk)[source]

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (Chunk) – The chunk of data to be scanned

Return type

Generator[Issue, None, None]

should_scan(file_path)[source]

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Return type

bool

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)[source]

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

store_issue(issue)[source]
Return type

None

Parameters

issue (Issue) –

tartufo.types

exception tartufo.types.BranchNotFoundException[source]

Raised if a branch was not found

class tartufo.types.Chunk(contents, file_path, metadata, is_diff)[source]

A single “chunk” of text to be inspected during a scan

Parameters
  • contents (str) – The actual text contents of the chunk

  • file_path (str) – The file path that is being inspected

  • metadata (Dict[str, Any]) – Commit/file metadata for the chunk being inspected

  • is_diff (bool) – True if contents is diff output (vs raw data)

exception tartufo.types.ConfigException[source]

Raised if there is a problem with the configuration

exception tartufo.types.GitException[source]

Raised if there is a problem interacting with git

exception tartufo.types.GitLocalException[source]

Raised if there is an error interacting with a local git repository

class tartufo.types.GitOptions(since_commit, max_depth, branch, include_submodules, progress)[source]

Configuration options specific to git-based scans

Parameters
  • since_commit (Optional[str]) – A commit hash to treat as a starting point in history for the scan

  • max_depth (int) – A maximum depth, or maximum number of commits back in history, to scan

  • branch (Optional[str]) – A specific branch to scan

  • include_submodules (bool) – Whether to also scan submodules of the repository

  • progress (bool) –

exception tartufo.types.GitRemoteException[source]

Raised if there is an error interacting with a remote git repository

class tartufo.types.GlobalOptions(rule_patterns, default_regexes, entropy, regex, scan_filenames, include_path_patterns, exclude_path_patterns, exclude_entropy_patterns, exclude_signatures, output_dir, temp_dir, buffer_size, git_rules_repo, git_rules_files, config, verbose, quiet, log_timestamps, output_format, entropy_sensitivity)[source]

Configuration options for controlling scans and output

Parameters
  • rule_patterns (Tuple[Dict[str, str], ...]) – Dictionaries containing regex patterns to match against

  • default_regexes (bool) – Whether to include built-in regex patterns in the scan

  • entropy (bool) – Whether to enable entropy scans

  • regex (bool) – Whether to enable regular expression scans

  • scan_filenames (bool) – Whether to check filenames for potential secrets

  • include_path_patterns (Tuple[Dict[str, str], ...]) – An exclusive list of paths to be scanned

  • exclude_path_patterns (Tuple[Dict[str, str], ...]) – A list of paths to be excluded from the scan

  • exclude_entropy_patterns (Tuple[Dict[str, str], ...]) – Patterns to be excluded from entropy matches

  • exclude_signatures (Tuple[Dict[str, str], ...]) – Signatures of previously found findings to be excluded from the list of current findings

  • exclude_findings – Signatures of previously found findings to be excluded from the list of current findings

  • output_dir (Optional[str]) – A directory where detailed findings results will be written

  • temp_dir (Optional[str]) – A directory where temporary files will be written

  • buffer_size (int) – Maximum number of issues that will be buffered on the heap

  • git_rules_repo (Optional[str]) – A remote git repository where additional rules can be found

  • git_rules_files (Tuple[str, ...]) – The files in the remote rules repository to load the rules from

  • config (Optional[TextIO]) – A configuration file from which default values are pulled

  • verbose (int) – How verbose the scanner should be with its logging

  • quiet (bool) – Whether to suppress all output

  • log_timestamps (bool) – Whether to include timestamps in log output

  • output_format (Optional[OutputFormat]) – What format should be output from the scan

  • entropy_sensitivity (int) – A number from 0 - 100 representing the sensitivity of entropy scans. A value of 0 will detect totally non-random values, while a value of 100 will detect only wholly random values.

class tartufo.types.IssueType(value)[source]

The method by which an issue was detected

class tartufo.types.LogLevel(value)[source]

The various Python logging levels

class tartufo.types.MatchType(value)[source]

What regex method to use when looking for a match

class tartufo.types.OutputFormat(value)[source]

The formats in which tartufo is able to output issue summaries

class tartufo.types.Rule(name, pattern, path_pattern, re_match_type, re_match_scope)[source]

A regular expression rule to be used for inspecting text during a scan

Parameters
  • name (Optional[str]) – A unique name for the rule

  • pattern (Pattern) – The regex pattern to be used by the pattern

  • path_pattern (Optional[Pattern]) – A regex pattern to match against the file path(s)

  • re_match_type (MatchType) – What type of regex operation to perform

  • re_match_scope (Optional[Scope]) – What scope to perform the match against

exception tartufo.types.ScanException[source]

Raised if there is a problem encountered during a scan

class tartufo.types.Scope(value)[source]

The scope to search for a regex match

exception tartufo.types.TartufoException[source]

Base class for all package exceptions

tartufo.util

tartufo.util.clone_git_repo(git_url, target_dir=None)[source]

Clone a remote git repository and return its filesystem path.

Parameters
  • git_url (str) – The URL of the git repository to be cloned

  • target_dir (Optional[Path]) – Where to clone the repository to

Return type

Tuple[Path, str]

Returns

Filesystem path of local clone and name of remote source

Raises

types.GitRemoteException – If there was an error cloning the repository

tartufo.util.del_rw(_func, name, _exc)[source]

Attempt to grant permission to and force deletion of a file.

This is used as an error handler for shutil.rmtree.

Parameters
  • _func (Callable) – The original calling function

  • name (str) – The name of the file to try removing

  • _exc (Exception) – The exception raised originally when the file was removed

Return type

None

tartufo.util.echo_result(options, scanner, repo_path, output_dir)[source]

Print all found issues out to the console, optionally as JSON.

Parameters
  • options (GlobalOptions) – Global options object

  • scanner (ScannerBase) – ScannerBase containing issues and excluded paths from config tree

  • repo_path (str) – The path to the repository the issues were found in

  • output_dir (Optional[Path]) – The directory that issue details were written out to

Return type

None

tartufo.util.extract_commit_metadata(commit, branch_name)[source]

Grab a consistent set of metadata from a git commit, for user output.

Parameters
  • commit (Commit) – The commit to extract the data from

  • branch_name (str) – What branch the commit was found on

Return type

Dict[str, Any]

tartufo.util.fail(msg, ctx, code=1)[source]

Print out a styled error message and exit.

Parameters
  • msg (str) – The message to print out to the user

  • ctx (Context) – A context from a currently executing Click command

  • code (int) – The exit code to use; must be >= 1

Return type

NoReturn

tartufo.util.find_strings_by_regex(text, regex, threshold=20)[source]

Locate strings (“words”) of interest in input text

Each returned string must have a length, at minimum, equal to threshold. This is meant to return longer strings which are likely to be things like auto-generated passwords, tokens, hashes, etc.

Parameters
  • text (str) – The text string to be analyzed

  • regex (Pattern) – A pattern which matches all character sequences of interest

  • threshold (int) – The minimum acceptable length of a matching string

Return type

Generator[str, None, None]

tartufo.util.generate_signature(snippet, filename)[source]

Generate a stable hash signature for an issue found in a commit.

These signatures are used for configuring excluded/approved issues, such as secrets intentionally embedded in tests.

Parameters
  • snippet (str) – A string which was found as a potential issue during a scan

  • filename (str) – The file where the issue was found

Return type

str

tartufo.util.is_shallow_clone(repo)[source]

Determine whether a repository is a shallow clone

This is used to work around https://github.com/libgit2/libgit2/issues/3058 Basically, any time a git repository is a “shallow” clone (it was cloned with –max-depth N), git will create a file at .git/shallow. So we simply need to test whether that file exists to know whether we are interacting with a shallow repository.

Parameters

repo (Repository) – The repository to check for “shallowness”

Return type

bool

tartufo.util.path_contains_git(path)[source]

Determine whether a filesystem path contains a git repository.

Parameters

path (str) – The fully qualified path to be checked

Return type

bool

tartufo.util.process_issues(repo_path, scan, options)[source]

Handle post-scan processing/reporting of a batch of issues.

Parameters
  • repo_path (str) – The repository that was scanned

  • scan (ScannerBase) – The scanner that performed the scan

  • options (GlobalOptions) – The options to use for determining output

Return type

None

tartufo.util.write_outputs(issues, output_dir)[source]

Write details of the issues to individual files in the specified directory.

Parameters
  • found_issues – A list of issues to be written out

  • output_dir (Path) – The directory where the files should be written

  • issues (Generator[Issue, None, None]) –

Return type

List[str]