API

This part of the documentation lists the full API reference of all public classes and functions.

tartufo.config

tartufo.config.compile_path_rules(patterns)[source]

Take a list of regex strings and compile them into patterns.

Any line starting with # will be ignored.

Parameters

patterns (Iterable[str]) – The list of patterns to be compiled

Return type

List[Pattern]

tartufo.config.compile_rules(patterns)[source]

Take a list of regex string with paths and compile them into a List of Rule.

Parameters

patterns (Iterable[Dict[str, str]]) – The list of patterns to be compiled

Returns

List of Rule objects

Return type

List[tartufo.types.Rule]

tartufo.config.configure_regexes(include_default=True, rules_files=None, rule_patterns=None, rules_repo=None, rules_repo_files=None)[source]

Build a set of regular expressions to be used during a regex scan.

Parameters
  • include_default (bool) – Whether to include the built-in set of regexes

  • rules_files (Optional[Iterable[TextIO]]) – A list of files to load rules from

  • rule_patterns (Optional[Iterable[Dict[str, str]]]) – A set of previously-collected rules

  • rules_repo (Optional[str]) – A separate git repository to load rules from

  • rules_repo_files (Optional[Iterable[str]]) – A set of patterns used to find files in the rules repo

Returns

Set of Rule objects to be used for regex scans

Return type

Set[tartufo.types.Rule]

tartufo.config.load_config_from_path(config_path, filename=None, traverse=True)[source]

Scan a path for a configuration file, and return its contents.

All key names are normalized to remove leading “-“/”–” and replace “-” with “_”. For example, “–repo-path” becomes “repo_path”.

In addition to checking the specified path, if traverse is True, this will traverse up through the directory structure, looking for a configuration file in parent directories. For example, given this directory structure:

working_dir/
|- tartufo.toml
|- group1/
|  |- project1/
|  |  |- tartufo.toml
|  |- project2/
|- group2/
   |- tartufo.toml
   |- project1/
   |- project2/
      |- tartufo.toml

The following config_path values will load the configuration files at the corresponding paths:

config_path

file

working_dir/group1/project1/

working_dir/group1/project1/tartufo.toml

working_dir/group1/project2/

working_dir/tartufo.toml

working_dir/group2/project1/

working_dir/group2/tartufo.toml

working_dir/group2/project2/

working_dir/group2/project2/tartufo.toml

Parameters
  • config_path (pathlib.Path) – The path to search for configuration files

  • filename (Optional[str]) – A specific filename to look for. By default, this will look for both tartufo.toml and then pyproject.toml.

  • traverse (bool) –

Raises
Returns

A tuple consisting of the config file that was discovered, and the contents of that file loaded in as TOML data

Return type

Tuple[pathlib.Path, MutableMapping[str, Any]]

tartufo.config.load_rules_from_file(rules_file)[source]

Load a set of JSON rules from a file and return them as compiled patterns.

Parameters

rules_file (TextIO) – An open file handle containing a JSON dictionary of regexes

Raises

ValueError – If the rules contain invalid JSON

Return type

Set[tartufo.types.Rule]

tartufo.config.read_pyproject_toml(ctx, _param, value)[source]

Read config values from a file and load them as defaults.

Parameters
  • ctx (click.core.Context) – A context from a currently executing Click command

  • _param (click.core.Parameter) – The command parameter that triggered this callback

  • value (str) – The value passed to the command parameter

Raises

click.FileError – If there was a problem loading the configuration

Return type

Optional[str]

tartufo.scanner

class tartufo.scanner.FolderScanner(global_options, target, recurse)[source]

Bases: tartufo.scanner.ScannerBase

Used to scan a folder.

Used for scanning a folder.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • target (str) – The local filesystem path to scan

  • recurse (bool) –

Return type

None

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Returns

The amount of entropy detected in the data

Return type

float

property chunks

Yield the individual files in the target directory.

Return type

Generator[Chunk, None, None]

property completed

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Returns

Entropy detection threshold scaled to the input bitrate

Return type

float

property config_data
entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Returns

True if excluded, False otherwise

Return type

bool

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (tartufo.types.Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[tartufo.scanner.Issue, None, None]

return: Iterator of issues flagged

property excluded_entropy

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures
global_options: tartufo.types.GlobalOptions
hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths

Get a list of regexes used as an exclusive list of paths to scan

property issues

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

logger: logging.Logger
recurse: bool
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (tartufo.types.Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Returns

True if string and path matched, False otherwise.

Return type

bool

property rules_regexes

Get a set of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

target: str
class tartufo.scanner.GitPreCommitScanner(global_options, repo_path, include_submodules)[source]

Bases: tartufo.scanner.GitScanner

For use in a git pre-commit hook.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

  • include_submodules (bool) –

Return type

None

_iter_diff_index(diff)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters
  • diff_index – The diff index / commit to be iterated over

  • diff (_pygit2.Diff) –

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Returns

The amount of entropy detected in the data

Return type

float

property chunks

Yield the individual file changes currently staged for commit.

Return type

Generator[Chunk, None, None]

property completed

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Returns

Entropy detection threshold scaled to the input bitrate

Return type

float

property config_data
entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Returns

True if excluded, False otherwise

Return type

bool

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (tartufo.types.Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[tartufo.scanner.Issue, None, None]

return: Iterator of issues flagged

property excluded_entropy

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures
filter_submodules(repo)

Exclude all git submodules and their contents from being scanned.

Parameters

repo (pygit2.repository.Repository) –

Return type

None

global_options: tartufo.types.GlobalOptions
static header_length(diff)

Compute the length of the git diff header text

Parameters

diff (str) –

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths

Get a list of regexes used as an exclusive list of paths to scan

property issues

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

pygit2.repository.Repository

logger: logging.Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (tartufo.types.Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Returns

True if string and path matched, False otherwise.

Return type

bool

property rules_regexes

Get a set of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.GitRepoScanner(global_options, git_options, repo_path)[source]

Bases: tartufo.scanner.GitScanner

Used for scanning a full clone of a git repository.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • git_options (tartufo.types.GitOptions) – The options specific to interacting with a git repository

  • repo_path (str) – The local filesystem path pointing to the repository

Return type

None

_iter_diff_index(diff)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters
  • diff_index – The diff index / commit to be iterated over

  • diff (_pygit2.Diff) –

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Returns

The amount of entropy detected in the data

Return type

float

property chunks

Yield individual diffs from the repository’s history.

Return type

Generator[Chunk, None, None]

Raises

types.GitRemoteException – If there was an error fetching branches

property completed

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Returns

Entropy detection threshold scaled to the input bitrate

Return type

float

property config_data
entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Returns

True if excluded, False otherwise

Return type

bool

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (tartufo.types.Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[tartufo.scanner.Issue, None, None]

return: Iterator of issues flagged

property excluded_entropy

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures
filter_submodules(repo)

Exclude all git submodules and their contents from being scanned.

Parameters

repo (pygit2.repository.Repository) –

Return type

None

git_options: tartufo.types.GitOptions
global_options: tartufo.types.GlobalOptions
static header_length(diff)

Compute the length of the git diff header text

Parameters

diff (str) –

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths

Get a list of regexes used as an exclusive list of paths to scan

property issues

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

pygit2.repository.Repository

logger: logging.Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (tartufo.types.Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Returns

True if string and path matched, False otherwise.

Return type

bool

property rules_regexes

Get a set of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.GitScanner(global_options, repo_path)[source]

Bases: tartufo.scanner.ScannerBase, abc.ABC

A base class for scanners looking at git history.

This is a lightweight base class to provide some basic functionality needed across all scanner that are interacting with git history.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

Return type

None

_iter_diff_index(diff)[source]

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters
  • diff_index – The diff index / commit to be iterated over

  • diff (_pygit2.Diff) –

Return type

Generator[Tuple[str, str], None, None]

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

calculate_entropy(data)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Returns

The amount of entropy detected in the data

Return type

float

abstract property chunks

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

Return type

Generator[Chunk, None, None]

property completed

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Returns

Entropy detection threshold scaled to the input bitrate

Return type

float

property config_data
entropy_string_is_excluded(string, line, path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Returns

True if excluded, False otherwise

Return type

bool

evaluate_entropy_string(chunk, line, string, min_entropy_score)

Check entropy string using entropy characters and score.

Parameters
  • chunk (tartufo.types.Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[tartufo.scanner.Issue, None, None]

return: Iterator of issues flagged

property excluded_entropy

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures
filter_submodules(repo)[source]

Exclude all git submodules and their contents from being scanned.

Parameters

repo (pygit2.repository.Repository) –

Return type

None

global_options: tartufo.types.GlobalOptions
static header_length(diff)[source]

Compute the length of the git diff header text

Parameters

diff (str) –

Return type

int

hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths

Get a list of regexes used as an exclusive list of paths to scan

property issues

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

abstract load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

pygit2.repository.Repository

logger: logging.Logger
repo_path: str
static rule_matches(rule, string, line, path)

Match string and path against rule.

Parameters
  • rule (tartufo.types.Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Returns

True if string and path matched, False otherwise.

Return type

bool

property rules_regexes

Get a set of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.Issue(issue_type, matched_string, chunk)[source]

Bases: object

Represent an issue found while scanning a target.

Parameters
  • issue_type (tartufo.types.IssueType) – What type of scan identified this issue

  • matched_string (str) – The string that was identified as a potential issue

  • chunk (tartufo.types.Chunk) – The chunk of data where the match was found

Return type

None

OUTPUT_SEPARATOR
as_dict(compact=False)[source]

Return a dictionary representation of an issue.

This is primarily meant to aid in JSON serialization.

Compact

True to return a dictionary with fewer fields.

Returns

A JSON serializable dictionary representation of this issue

Return type

Dict[str, Optional[str]]

chunk
issue_detail
issue_type
logger
matched_string
property signature

Generate a stable hash-based signature uniquely identifying this issue.

Return type

str

class tartufo.scanner.ScannerBase(options)[source]

Bases: abc.ABC

Provide the base, generic functionality needed by all scanners.

In fact, this contains all of the actual scanning logic. This part of the application should never differ; the part that differs, and the part that is left abstract here, is what content is provided to the various scans. For this reason, the chunks property is left abstract. It is up to the various scanners to implement this property, in the form of a generator, to yield all the individual pieces of content to be scanned.

Parameters

options (tartufo.types.GlobalOptions) –

Return type

None

b64_entropy_limit

Returns low entropy limit for suspicious base64 encodings

calculate_entropy(data)[source]

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters

data (str) – The data to be scanned for its entropy

Returns

The amount of entropy detected in the data

Return type

float

abstract property chunks

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

Return type

Generator[Chunk, None, None]

property completed

Return True if scan has completed

Returns

True if scan has completed; False if scan is in progress

compute_scaled_entropy_limit(maximum_bitrate)[source]

Determine low entropy cutoff for specified bitrate

Parameters

maximum_bitrate (float) – How many bits does each character represent?

Returns

Entropy detection threshold scaled to the input bitrate

Return type

float

property config_data
entropy_string_is_excluded(string, line, path)[source]

Find whether the signature of some data has been excluded in configuration.

Parameters
  • string (str) – String to check against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – Path to check against rule path pattern

Returns

True if excluded, False otherwise

Return type

bool

evaluate_entropy_string(chunk, line, string, min_entropy_score)[source]

Check entropy string using entropy characters and score.

Parameters
  • chunk (tartufo.types.Chunk) – The chunk of data to check

  • line (str) – Source line containing string of interest

  • string (str) – String to check

  • min_entropy_score (float) – Minimum entropy score to flag

Return type

Generator[tartufo.scanner.Issue, None, None]

return: Iterator of issues flagged

property excluded_entropy

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan

excluded_signatures
global_options: tartufo.types.GlobalOptions
hex_entropy_limit

Returns low entropy limit for suspicious hexadecimal encodings

property included_paths

Get a list of regexes used as an exclusive list of paths to scan

property issues

Get a list of issues found during the scan.

If the scan is still in progress, force it to complete first.

Returns

Any issues found during the scan.

logger: logging.Logger
static rule_matches(rule, string, line, path)[source]

Match string and path against rule.

Parameters
  • rule (tartufo.types.Rule) – Rule to perform match

  • string (str) – string to match against rule pattern

  • line (str) – Source line containing string of interest

  • path (str) – path to match against rule path_pattern

Returns

True if string and path matched, False otherwise.

Return type

bool

property rules_regexes

Get a set of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

scan()[source]

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

The scan method is thread-safe; if multiple concurrent scans are requested, the first will run to completion while other callers are blocked (after which they will each execute in turn, yielding cached issues without repeating the underlying repository scan).

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_entropy(chunk)[source]

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

scan_regex(chunk)[source]

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

Generator[tartufo.scanner.Issue, None, None]

should_scan(file_path)[source]

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.included_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)[source]

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

tartufo.types

exception tartufo.types.BranchNotFoundException[source]

Raised if a branch was not found

class tartufo.types.Chunk(contents: str, file_path: str, metadata: Dict[str, Any])[source]
Parameters
  • contents (str) –

  • file_path (str) –

  • metadata (Dict[str, Any]) –

Return type

None

contents
file_path
metadata
exception tartufo.types.ConfigException[source]

Raised if there is a problem with the configuration

exception tartufo.types.GitException[source]

Raised if there is a problem interacting with git

exception tartufo.types.GitLocalException[source]

Raised if there is an error interacting with a local git repository

class tartufo.types.GitOptions(since_commit: Union[str, NoneType], max_depth: int, branch: Union[str, NoneType], include_submodules: bool)[source]
Parameters
  • since_commit (Optional[str]) –

  • max_depth (int) –

  • branch (Optional[str]) –

  • include_submodules (bool) –

Return type

None

branch
include_submodules
max_depth
since_commit
exception tartufo.types.GitRemoteException[source]

Raised if there is an error interacting with a remote git repository

class tartufo.types.GlobalOptions(rules: Tuple[TextIO, ], rule_patterns: Tuple[Dict[str, str], ], default_regexes: bool, entropy: bool, regex: bool, scan_filenames: bool, include_path_patterns: Union[Tuple[str, ], Tuple[Dict[str, str], ]], exclude_path_patterns: Union[Tuple[str, ], Tuple[Dict[str, str], ]], exclude_entropy_patterns: Tuple[Dict[str, str], ], exclude_signatures: Union[Tuple[Dict[str, str], ], Tuple[str, ]], output_dir: Union[str, NoneType], git_rules_repo: Union[str, NoneType], git_rules_files: Tuple[str, ], config: Union[TextIO, NoneType], verbose: int, quiet: bool, log_timestamps: bool, output_format: Union[str, NoneType], b64_entropy_score: float, hex_entropy_score: float, entropy_sensitivity: int)[source]
Parameters
  • rules (Tuple[TextIO, ..]) –

  • rule_patterns (Tuple[Dict[str, str], ..]) –

  • default_regexes (bool) –

  • entropy (bool) –

  • regex (bool) –

  • scan_filenames (bool) –

  • include_path_patterns (Union[Tuple[str, ..], Tuple[Dict[str, str], ..]]) –

  • exclude_path_patterns (Union[Tuple[str, ..], Tuple[Dict[str, str], ..]]) –

  • exclude_entropy_patterns (Tuple[Dict[str, str], ..]) –

  • exclude_signatures (Union[Tuple[Dict[str, str], ..], Tuple[str, ..]]) –

  • output_dir (Optional[str]) –

  • git_rules_repo (Optional[str]) –

  • git_rules_files (Tuple[str, ..]) –

  • config (Optional[TextIO]) –

  • verbose (int) –

  • quiet (bool) –

  • log_timestamps (bool) –

  • output_format (Optional[str]) –

  • b64_entropy_score (float) –

  • hex_entropy_score (float) –

  • entropy_sensitivity (int) –

Return type

None

b64_entropy_score
config
default_regexes
entropy
entropy_sensitivity
exclude_entropy_patterns
exclude_path_patterns
exclude_signatures
git_rules_files
git_rules_repo
hex_entropy_score
include_path_patterns
log_timestamps
output_dir
output_format
quiet
regex
rule_patterns
rules
scan_filenames
verbose
class tartufo.types.IssueType(value)[source]

An enumeration.

Entropy = 'High Entropy'
RegEx = 'Regular Expression Match'
class tartufo.types.LogLevel(value)[source]

An enumeration.

DEBUG = 3
ERROR = 0
INFO = 2
WARNING = 1
class tartufo.types.MatchType(value)[source]

An enumeration.

Match = 'match'
Search = 'search'
class tartufo.types.OutputFormat(value)[source]

An enumeration.

Compact = 'compact'
Json = 'json'
Text = 'text'
class tartufo.types.Rule(name: Union[str, NoneType], pattern: Pattern, path_pattern: Union[Pattern, NoneType], re_match_type: Union[str, tartufo.types.MatchType], re_match_scope: Union[str, tartufo.types.Scope, NoneType])[source]
Parameters
Return type

None

name
path_pattern
pattern
re_match_scope
re_match_type
exception tartufo.types.ScanException[source]

Raised if there is a problem encountered during a scan

class tartufo.types.Scope(value)[source]

An enumeration.

Line = 'line'
Word = 'word'
exception tartufo.types.TartufoException[source]

Base class for all package exceptions

tartufo.util

tartufo.util.clone_git_repo(git_url, target_dir=None)[source]

Clone a remote git repository and return its filesystem path.

Parameters
  • git_url (str) – The URL of the git repository to be cloned

  • target_dir (Optional[pathlib.Path]) – Where to clone the repository to

Returns

Filesystem path of local clone and name of remote source

Raises

types.GitRemoteException – If there was an error cloning the repository

Return type

Tuple[pathlib.Path, str]

tartufo.util.del_rw(_func, name, _exc)[source]

Attempt to grant permission to and force deletion of a file.

This is used as an error handler for shutil.rmtree.

Parameters
  • _func (Callable) – The original calling function

  • name (str) – The name of the file to try removing

  • _exc (Exception) – The exception raised originally when the file was removed

Return type

None

tartufo.util.echo_result(options, scanner, repo_path, output_dir)[source]

Print all found issues out to the console, optionally as JSON. :param options: Global options object :param scanner: ScannerBase containing issues and excluded paths from config tree :param repo_path: The path to the repository the issues were found in :param output_dir: The directory that issue details were written out to

Parameters
Return type

None

tartufo.util.extract_commit_metadata(commit, branch_name)[source]

Grab a consistent set of metadata from a git commit, for user output.

Parameters
  • commit (_pygit2.Commit) – The commit to extract the data from

  • branch – What branch the commit was found on

  • branch_name (str) –

Return type

Dict[str, Any]

tartufo.util.fail(msg, ctx, code=1)[source]

Print out a styled error message and exit.

Parameters
  • msg (str) – The message to print out to the user

  • ctx (click.core.Context) – A context from a currently executing Click command

  • code (int) – The exit code to use; must be >= 1

Return type

None

tartufo.util.find_strings_by_regex(text, regex, threshold=20)[source]

Locate strings (“words”) of interest in input text

Each returned string must have a length, at minimum, equal to threshold. This is meant to return longer strings which are likely to be things like auto-generated passwords, tokens, hashes, etc.

Parameters
  • text (str) – The text string to be analyzed

  • regex (Pattern) – A pattern which matches all character sequences of interest

  • threshold (int) – The minimum acceptable length of a matching string

Return type

Generator[str, None, None]

tartufo.util.generate_signature(snippet, filename)[source]

Generate a stable hash signature for an issue found in a commit.

These signatures are used for configuring excluded/approved issues, such as secrets intentionally embedded in tests.

Parameters
  • snippet (str) – A string which was found as a potential issue during a scan

  • filename (str) – The file where the issue was found

Return type

str

tartufo.util.is_shallow_clone(repo)[source]

Determine whether a repository is a shallow clone

Parameters

repo (pygit2.repository.Repository) – The repository to check for “shallowness”

Return type

bool

This is used to work around https://github.com/libgit2/libgit2/issues/3058 Basically, any time a git repository is a “shallow” clone (it was cloned with –max-depth N), git will create a file at .git/shallow. So we simply need to test whether that file exists to know whether we are interacting with a shallow repository.

tartufo.util.write_outputs(found_issues, output_dir)[source]

Write details of the issues to individual files in the specified directory.

Parameters
  • found_issues (List[Issue]) – A list of issues to be written out

  • output_dir (pathlib.Path) – The directory where the files should be written

Return type

List[str]