API

This part of the documentation lists the full API reference of all public classes and functions.

tartufo.config

tartufo.config.compile_path_rules(patterns)[source]

Take a list of regex strings and compile them into patterns.

Any line starting with # will be ignored.

Parameters

patterns (Iterable[str]) – The list of patterns to be compiled

Return type

List[Pattern]

tartufo.config.configure_regexes(include_default=True, rules_files=None, rules_repo=None, rules_repo_files=None)[source]

Build a set of regular expressions to be used during a regex scan.

Parameters
  • include_default (bool) – Whether to include the built-in set of regexes

  • rules_files (Optional[Iterable[TextIO]]) – A list of files to load rules from

  • rules_repo (Optional[str]) – A separate git repository to load rules from

  • rules_repo_files (Optional[Iterable[str]]) – A set of patterns used to find files in the rules repo

Return type

Dict[str, tartufo.types.Rule]

tartufo.config.load_config_from_path(config_path, filename=None, traverse=True)[source]

Scan a path for a configuration file, and return its contents.

All key names are normalized to remove leading “-“/”–” and replace “-” with “_”. For example, “–repo-path” becomes “repo_path”.

In addition to checking the specified path, if traverse is True, this will traverse up through the directory structure, looking for a configuration file in parent directories. For example, given this directory structure:

working_dir/
|- tartufo.toml
|- group1/
|  |- project1/
|  |  |- tartufo.toml
|  |- project2/
|- group2/
   |- tartufo.toml
   |- project1/
   |- project2/
      |- tartufo.toml

The following config_path values will load the configuration files at the corresponding paths:

config_path

file

working_dir/group1/project1/

working_dir/group1/project1/tartufo.toml

working_dir/group1/project2/

working_dir/tartufo.toml

working_dir/group2/project1/

working_dir/group2/tartufo.toml

working_dir/group2/project2/

working_dir/group2/project2/tartufo.toml

Parameters
  • config_path (pathlib.Path) – The path to search for configuration files

  • filename (Optional[str]) – A specific filename to look for. By default, this will look for both tartufo.toml and then pyproject.toml.

  • traverse (bool) –

Raises
Returns

A tuple consisting of the config file that was discovered, and the contents of that file loaded in as TOML data

Return type

Tuple[pathlib.Path, MutableMapping[str, Any]]

tartufo.config.load_rules_from_file(rules_file)[source]

Load a set of JSON rules from a file and return them as compiled patterns.

Parameters

rules_file (TextIO) – An open file handle containing a JSON dictionary of regexes

Raises

ValueError – If the rules contain invalid JSON

Return type

Dict[str, tartufo.types.Rule]

tartufo.config.read_pyproject_toml(ctx, _param, value)[source]

Read config values from a file and load them as defaults.

Parameters
  • ctx (click.core.Context) – A context from a currently executing Click command

  • _param (click.core.Parameter) – The command parameter that triggered this callback

  • value (str) – The value passed to the command parameter

Raises

click.FileError – If there was a problem loading the configuration

Return type

Optional[str]

tartufo.scanner

class tartufo.scanner.GitPreCommitScanner(global_options, repo_path)[source]

Bases: tartufo.scanner.GitScanner

For use in a git pre-commit hook.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

Return type

None

_iter_diff_index(diff_index)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

calculate_entropy(data, char_set)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters
  • data (str) – The data to be scanned for its entropy

  • char_set (str) – The character set used as a basis for the calculation

Returns

The amount of entropy detected in the data.

Return type

float

property chunks

Yield the individual file changes currently staged for commit.

Return type

Generator[Chunk, None, None]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan.

Return type

List[Pattern]

global_options: tartufo.types.GlobalOptions
property included_paths

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property issues

Get a list of issues found during the scan.

If a scan has not yet been run, run it.

Returns

Any issues found during the scan.

Return type

List[Issue]

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

git.repo.base.Repo

repo_path: str
property rules_regexes

Get a dictionary of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

Return type

Dict[str, Pattern]

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

List[tartufo.scanner.Issue]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.GitRepoScanner(global_options, git_options, repo_path)[source]

Bases: tartufo.scanner.GitScanner

Used for scanning a full clone of a git repository.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • git_options (tartufo.types.GitOptions) – The options specific to interacting with a git repository

  • repo_path (str) – The local filesystem path pointing to the repository

Return type

None

_iter_branch_commits(repo, branch)[source]

Iterate over and yield the commits on a branch.

Parameters
Return type

Generator[Tuple[git.objects.commit.Commit, git.objects.commit.Commit], None, None]

_iter_diff_index(diff_index)

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

calculate_entropy(data, char_set)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters
  • data (str) – The data to be scanned for its entropy

  • char_set (str) – The character set used as a basis for the calculation

Returns

The amount of entropy detected in the data.

Return type

float

property chunks

Yield individual diffs from the repository’s history.

Return type

Generator[Chunk, None, None]

Raises

types.GitRemoteException – If there was an error fetching branches

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan.

Return type

List[Pattern]

git_options: tartufo.types.GitOptions
global_options: tartufo.types.GlobalOptions
property included_paths

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property issues

Get a list of issues found during the scan.

If a scan has not yet been run, run it.

Returns

Any issues found during the scan.

Return type

List[Issue]

load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

git.repo.base.Repo

repo_path: str
property rules_regexes

Get a dictionary of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

Return type

Dict[str, Pattern]

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

List[tartufo.scanner.Issue]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.GitScanner(global_options, repo_path)[source]

Bases: tartufo.scanner.ScannerBase, abc.ABC

A base class for scanners looking at git history.

This is a lightweight base class to provide some basic functionality needed across all scanner that are interacting with git history.

Parameters
  • global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command

  • repo_path (str) – The local filesystem path pointing to the repository

Return type

None

_iter_diff_index(diff_index)[source]

Iterate over a “diff index”, yielding the individual file changes.

A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.

Note that binary files are wholly skipped.

Parameters

diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over

Return type

Generator[Tuple[str, str], None, None]

calculate_entropy(data, char_set)

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters
  • data (str) – The data to be scanned for its entropy

  • char_set (str) – The character set used as a basis for the calculation

Returns

The amount of entropy detected in the data.

Return type

float

abstract property chunks

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

Return type

Generator[Chunk, None, None]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan.

Return type

List[Pattern]

global_options: tartufo.types.GlobalOptions
property included_paths

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property issues

Get a list of issues found during the scan.

If a scan has not yet been run, run it.

Returns

Any issues found during the scan.

Return type

List[Issue]

abstract load_repo(repo_path)[source]

Load and return the repository to be scanned.

Parameters

repo_path (str) – The local filesystem path pointing to the repository

Raises

types.GitLocalException – If there was a problem loading the repository

Return type

git.repo.base.Repo

repo_path: str
property rules_regexes

Get a dictionary of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

Return type

Dict[str, Pattern]

scan()

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

List[tartufo.scanner.Issue]

scan_entropy(chunk)

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

scan_regex(chunk)

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

should_scan(file_path)

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

class tartufo.scanner.Issue(issue_type, matched_string, chunk)[source]

Bases: object

Represent an issue found while scanning a target.

Parameters
  • issue_type (tartufo.types.IssueType) – What type of scan identified this issue

  • matched_string (str) – The string that was identified as a potential issue

  • chunk (tartufo.types.Chunk) – The chunk of data where the match was found

Return type

None

OUTPUT_SEPARATOR: str = '~~~~~~~~~~~~~~~~~~~~~'
as_dict()[source]

Return a dictionary representation of an issue.

This is primarily meant to aid in JSON serialization.

Returns

A JSON serializable dictionary representation of this issue

Return type

Dict[str, Optional[str]]

chunk: tartufo.types.Chunk
issue_detail: Optional[str] = None
issue_type: tartufo.types.IssueType
matched_string: str = ''
property signature

Generate a stable hash-based signature uniquely identifying this issue.

Return type

str

class tartufo.scanner.ScannerBase(options)[source]

Bases: abc.ABC

Provide the base, generic functionality needed by all scanners.

In fact, this contains all of the actual scanning logic. This part of the application should never differ; the part that differs, and the part that is left abstract here, is what content is provided to the various scans. For this reason, the chunks property is left abstract. It is up to the various scanners to implement this property, in the form of a generator, to yield all the individual pieces of content to be scanned.

Parameters

options (tartufo.types.GlobalOptions) –

Return type

None

calculate_entropy(data, char_set)[source]

Calculate the Shannon entropy for a piece of data.

This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.

Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html

Parameters
  • data (str) – The data to be scanned for its entropy

  • char_set (str) – The character set used as a basis for the calculation

Returns

The amount of entropy detected in the data.

Return type

float

abstract property chunks

Yield “chunks” of data to be scanned.

Examples of “chunks” would be individual git commit diffs, or the contents of individual files.

Return type

Generator[Chunk, None, None]

property excluded_paths

Get a list of regexes used to match paths to exclude from the scan.

Return type

List[Pattern]

global_options: tartufo.types.GlobalOptions
property included_paths

Get a list of regexes used as an exclusive list of paths to scan.

Return type

List[Pattern]

property issues

Get a list of issues found during the scan.

If a scan has not yet been run, run it.

Returns

Any issues found during the scan.

Return type

List[Issue]

property rules_regexes

Get a dictionary of regular expressions to scan the code for.

Raises

types.TartufoConfigException – If there was a problem compiling the rules

Return type

Dict[str, Pattern]

scan()[source]

Run the requested scans against the target data.

This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.

Raises

types.TartufoConfigException – If there were problems with the scanner’s configuration

Return type

List[tartufo.scanner.Issue]

scan_entropy(chunk)[source]

Scan a chunk of data for apparent high entropy.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

scan_regex(chunk)[source]

Scan a chunk of data for matches against the configured regexes.

Parameters

chunk (tartufo.types.Chunk) – The chunk of data to be scanned

Return type

List[tartufo.scanner.Issue]

should_scan(file_path)[source]

Check if the a file path should be included in analysis.

If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.

Parameters

file_path (str) – The file path to check for inclusion

Returns

False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True

signature_is_excluded(blob, file_path)[source]

Find whether the signature of some data has been excluded in configuration.

Parameters
  • blob (str) – The piece of data which is being scanned

  • file_path (str) – The path and file name for the data being scanned

Return type

bool

tartufo.types

class tartufo.types.Chunk(contents: str, file_path: str, metadata: Dict[str, Any])[source]
Parameters
  • contents (str) –

  • file_path (str) –

  • metadata (Dict[str, Any]) –

Return type

None

contents
file_path
metadata
exception tartufo.types.ConfigException[source]

Raised if there is a problem with the configuration

exception tartufo.types.GitException[source]

Raised if there is a problem interacting with git

exception tartufo.types.GitLocalException[source]

Raised if there is an error interacting with a local git repository

class tartufo.types.GitOptions(since_commit: Union[str, NoneType], max_depth: int, branch: Union[str, NoneType], fetch: bool)[source]
Parameters
  • since_commit (Optional[str]) –

  • max_depth (int) –

  • branch (Optional[str]) –

  • fetch (bool) –

Return type

None

branch
fetch
max_depth
since_commit
exception tartufo.types.GitRemoteException[source]

Raised if there is an error interacting with a remote git repository

class tartufo.types.GlobalOptions(json: bool, rules: Tuple[TextIO, ], default_regexes: bool, entropy: bool, regex: bool, include_paths: Union[TextIO, NoneType], exclude_paths: Union[TextIO, NoneType], exclude_signatures: Tuple[str, ], output_dir: Union[str, NoneType], git_rules_repo: Union[str, NoneType], git_rules_files: Tuple[str, ], config: Union[TextIO, NoneType], verbose: int, quiet: bool)[source]
Parameters
  • json (bool) –

  • rules (Tuple[TextIO, ..]) –

  • default_regexes (bool) –

  • entropy (bool) –

  • regex (bool) –

  • include_paths (Optional[TextIO]) –

  • exclude_paths (Optional[TextIO]) –

  • exclude_signatures (Tuple[str, ..]) –

  • output_dir (Optional[str]) –

  • git_rules_repo (Optional[str]) –

  • git_rules_files (Tuple[str, ..]) –

  • config (Optional[TextIO]) –

  • verbose (int) –

  • quiet (bool) –

Return type

None

config
default_regexes
entropy
exclude_paths
exclude_signatures
git_rules_files
git_rules_repo
include_paths
json
output_dir
quiet
regex
rules
verbose
class tartufo.types.IssueType(value)[source]

An enumeration.

Entropy = 'High Entropy'
RegEx = 'Regular Expression Match'
class tartufo.types.Rule(name: Union[str, NoneType], pattern: Pattern, path_pattern: Union[Pattern, NoneType])[source]
Parameters
  • name (Optional[str]) –

  • pattern (Pattern) –

  • path_pattern (Optional[Pattern]) –

Return type

None

name
path_pattern
pattern
exception tartufo.types.ScanException[source]

Raised if there is a problem encountered during a scan

exception tartufo.types.TartufoException[source]

Base class for all package exceptions

tartufo.util

tartufo.util.clone_git_repo(git_url, target_dir=None)[source]

Clone a remote git repository and return its filesystem path.

Parameters
  • git_url (str) – The URL of the git repository to be cloned

  • target_dir (Optional[pathlib.Path]) – Where to clone the repository to

Raises

types.GitRemoteException – If there was an error cloning the repository

Return type

pathlib.Path

tartufo.util.del_rw(_func, name, _exc)[source]

Attempt to grant permission to and force deletion of a file.

This is used as an error handler for shutil.rmtree.

Parameters
  • _func (Callable) – The original calling function

  • name (str) – The name of the file to try removing

  • _exc (Exception) – The exception raised originally when the file was removed

Return type

None

tartufo.util.echo_result(options, scanner, repo_path, output_dir)[source]

Print all found issues out to the console, optionally as JSON. :param options: Global options object :param scanner: ScannerBase containing issues and excluded paths from config tree :param repo_path: The path to the repository the issues were found in :param output_dir: The directory that issue details were written out to

Parameters
Return type

None

tartufo.util.extract_commit_metadata(commit, branch)[source]

Grab a consistent set of metadata from a git commit, for user output.

Parameters
Return type

Dict[str, Any]

tartufo.util.fail(msg, ctx, code=1)[source]

Print out a styled error message and exit.

Parameters
  • msg (str) – The message to print out to the user

  • ctx (click.core.Context) – A context from a currently executing Click command

  • code (int) – The exit code to use; must be >= 1

Return type

None

tartufo.util.generate_signature(snippet, filename)[source]

Generate a stable hash signature for an issue found in a commit.

These signatures are used for configuring excluded/approved issues, such as secrets intentionally embedded in tests.

Parameters
  • snippet (str) – A string which was found as a potential issue during a scan

  • filename (str) – The file where the issue was found

Return type

str

tartufo.util.get_strings_of_set(word, char_set, threshold=20)[source]

Split a “word” into a set of “strings”, based on a given character set.

The returned strings must have a length, at minimum, equal to threshold. This is meant for extracting long strings which are likely to be things like auto-generated passwords, tokens, hashes, etc.

Parameters
  • word (str) – The word to be analyzed

  • char_set (Iterable[str]) – The set of characters used to compose the strings (i.e. hex)

  • threshold (int) – The minimum length for what is accepted as a string

Return type

List[str]

tartufo.util.write_outputs(found_issues, output_dir)[source]

Write details of the issues to individual files in the specified directory.

Parameters
  • found_issues (List[Issue]) – The list of issues to be written out

  • output_dir (pathlib.Path) – The directory where the files should be written

Return type

List[str]