API¶
This part of the documentation lists the full API reference of all public classes and functions.
tartufo.config¶
-
tartufo.config.
compile_path_rules
(patterns)[source]¶ Take a list of regex strings and compile them into patterns.
Any line starting with # will be ignored.
- Parameters
patterns (Iterable[str]) – The list of patterns to be compiled
- Return type
List[Pattern]
-
tartufo.config.
configure_regexes
(include_default=True, rules_files=None, rules_repo=None, rules_repo_files=None)[source]¶ Build a set of regular expressions to be used during a regex scan.
- Parameters
include_default (bool) – Whether to include the built-in set of regexes
rules_files (Optional[Iterable[TextIO]]) – A list of files to load rules from
rules_repo (Optional[str]) – A separate git repository to load rules from
rules_repo_files (Optional[Iterable[str]]) – A set of patterns used to find files in the rules repo
- Return type
Dict[str, Pattern]
-
tartufo.config.
load_config_from_path
(config_path, filename=None, traverse=True)[source]¶ Scan a path for a configuration file, and return its contents.
All key names are normalized to remove leading “-“/”–” and replace “-” with “_”. For example, “–repo-path” becomes “repo_path”.
In addition to checking the specified path, if
traverse
isTrue
, this will traverse up through the directory structure, looking for a configuration file in parent directories. For example, given this directory structure:working_dir/ |- tartufo.toml |- group1/ | |- project1/ | | |- tartufo.toml | |- project2/ |- group2/ |- tartufo.toml |- project1/ |- project2/ |- tartufo.toml
The following
config_path
values will load the configuration files at the corresponding paths:config_path
file
working_dir/group1/project1/
working_dir/group1/project1/tartufo.toml
working_dir/group1/project2/
working_dir/tartufo.toml
working_dir/group2/project1/
working_dir/group2/tartufo.toml
working_dir/group2/project2/
working_dir/group2/project2/tartufo.toml
- Parameters
config_path (pathlib.Path) – The path to search for configuration files
filename (Optional[str]) – A specific filename to look for. By default, this will look for both
tartufo.toml
and thenpyproject.toml
.traverse (bool) –
- Raises
FileNotFoundError – If no config file was found
types.ConfigException – If a config file was found, but could not be read
- Returns
A tuple consisting of the config file that was discovered, and the contents of that file loaded in as TOML data
- Return type
Tuple[pathlib.Path, MutableMapping[str, Any]]
-
tartufo.config.
load_rules_from_file
(rules_file)[source]¶ Load a set of JSON rules from a file and return them as compiled patterns.
- Parameters
rules_file (TextIO) – An open file handle containing a JSON dictionary of regexes
- Raises
ValueError – If the rules contain invalid JSON
- Return type
Dict[str, Pattern]
-
tartufo.config.
read_pyproject_toml
(ctx, _param, value)[source]¶ Read config values from a file and load them as defaults.
- Parameters
ctx (click.core.Context) – A context from a currently executing Click command
_param (click.core.Parameter) – The command parameter that triggered this callback
value (str) – The value passed to the command parameter
- Raises
click.FileError – If there was a problem loading the configuration
- Return type
Optional[str]
tartufo.scanner¶
-
class
tartufo.scanner.
GitPreCommitScanner
(global_options, repo_path)[source]¶ Bases:
tartufo.scanner.GitScanner
For use in a git pre-commit hook.
- Parameters
global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command
repo_path (str) – The local filesystem path pointing to the repository
- Return type
-
_iter_diff_index
(diff_index)¶ Iterate over a “diff index”, yielding the individual file changes.
A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.
Note that binary files are wholly skipped.
- Parameters
diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over
- Return type
-
calculate_entropy
(data, char_set)¶ Calculate the Shannon entropy for a piece of data.
This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html
-
property
chunks
¶ Yield the individual file changes currently staged for commit.
-
property
excluded_paths
¶ Get a list of regexes used to match paths to exclude from the scan.
- Return type
List[Pattern]
-
global_options
: tartufo.types.GlobalOptions¶
-
property
included_paths
¶ Get a list of regexes used as an exclusive list of paths to scan.
- Return type
List[Pattern]
-
property
issues
¶ Get a list of issues found during the scan.
If a scan has not yet been run, run it.
- Returns
Any issues found during the scan.
- Return type
List[Issue]
-
load_repo
(repo_path)[source]¶ Load and return the repository to be scanned.
- Parameters
repo_path (str) – The local filesystem path pointing to the repository
- Raises
types.GitLocalException – If there was a problem loading the repository
- Return type
-
property
rules_regexes
¶ Get a dictionary of regular expressions to scan the code for.
- Raises
types.TartufoConfigException – If there was a problem compiling the rules
- Return type
Dict[str, Pattern]
-
scan
()¶ Run the requested scans against the target data.
This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.
- Raises
types.TartufoConfigException – If there were problems with the scanner’s configuration
- Return type
List[tartufo.scanner.Issue]
-
scan_entropy
(chunk)¶ Scan a chunk of data for apparent high entropy.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
scan_regex
(chunk)¶ Scan a chunk of data for matches against the configured regexes.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
should_scan
(file_path)¶ Check if the a file path should be included in analysis.
If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.
- Parameters
file_path (str) – The file path to check for inclusion
- Returns
False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True
-
class
tartufo.scanner.
GitRepoScanner
(global_options, git_options, repo_path)[source]¶ Bases:
tartufo.scanner.GitScanner
Used for scanning a full clone of a git repository.
- Parameters
global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command
git_options (tartufo.types.GitOptions) – The options specific to interacting with a git repository
repo_path (str) – The local filesystem path pointing to the repository
- Return type
-
_iter_branch_commits
(repo, branch)[source]¶ Iterate over and yield the commits on a branch.
- Parameters
repo (git.repo.base.Repo) – The repository from which to extract the branch and commits
branch (git.remote.FetchInfo) – The branch to iterate over
- Return type
Generator[Tuple[git.objects.commit.Commit, git.objects.commit.Commit], None, None]
-
_iter_diff_index
(diff_index)¶ Iterate over a “diff index”, yielding the individual file changes.
A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.
Note that binary files are wholly skipped.
- Parameters
diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over
- Return type
-
calculate_entropy
(data, char_set)¶ Calculate the Shannon entropy for a piece of data.
This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html
-
property
chunks
¶ Yield individual diffs from the repository’s history.
- Return type
- Raises
types.GitRemoteException – If there was an error fetching branches
-
property
excluded_paths
¶ Get a list of regexes used to match paths to exclude from the scan.
- Return type
List[Pattern]
-
git_options
: tartufo.types.GitOptions¶
-
global_options
: tartufo.types.GlobalOptions¶
-
property
included_paths
¶ Get a list of regexes used as an exclusive list of paths to scan.
- Return type
List[Pattern]
-
property
issues
¶ Get a list of issues found during the scan.
If a scan has not yet been run, run it.
- Returns
Any issues found during the scan.
- Return type
List[Issue]
-
load_repo
(repo_path)[source]¶ Load and return the repository to be scanned.
- Parameters
repo_path (str) – The local filesystem path pointing to the repository
- Raises
types.GitLocalException – If there was a problem loading the repository
- Return type
-
property
rules_regexes
¶ Get a dictionary of regular expressions to scan the code for.
- Raises
types.TartufoConfigException – If there was a problem compiling the rules
- Return type
Dict[str, Pattern]
-
scan
()¶ Run the requested scans against the target data.
This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.
- Raises
types.TartufoConfigException – If there were problems with the scanner’s configuration
- Return type
List[tartufo.scanner.Issue]
-
scan_entropy
(chunk)¶ Scan a chunk of data for apparent high entropy.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
scan_regex
(chunk)¶ Scan a chunk of data for matches against the configured regexes.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
should_scan
(file_path)¶ Check if the a file path should be included in analysis.
If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.
- Parameters
file_path (str) – The file path to check for inclusion
- Returns
False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True
-
class
tartufo.scanner.
GitScanner
(global_options, repo_path)[source]¶ Bases:
tartufo.scanner.ScannerBase
,abc.ABC
A base class for scanners looking at git history.
This is a lightweight base class to provide some basic functionality needed across all scanner that are interacting with git history.
- Parameters
global_options (tartufo.types.GlobalOptions) – The options provided to the top-level tartufo command
repo_path (str) – The local filesystem path pointing to the repository
- Return type
-
_iter_diff_index
(diff_index)[source]¶ Iterate over a “diff index”, yielding the individual file changes.
A “diff index” is essentially analogous to a single commit in the git history. So what this does is iterate over a single commit, and yield the changes to each individual file in that commit, along with its file path. This will also check the file path and ensure that it has not been excluded from the scan by configuration.
Note that binary files are wholly skipped.
- Parameters
diff_index (git.diff.DiffIndex) – The diff index / commit to be iterated over
- Return type
-
calculate_entropy
(data, char_set)¶ Calculate the Shannon entropy for a piece of data.
This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html
-
abstract property
chunks
¶ Yield “chunks” of data to be scanned.
Examples of “chunks” would be individual git commit diffs, or the contents of individual files.
-
property
excluded_paths
¶ Get a list of regexes used to match paths to exclude from the scan.
- Return type
List[Pattern]
-
global_options
: tartufo.types.GlobalOptions¶
-
property
included_paths
¶ Get a list of regexes used as an exclusive list of paths to scan.
- Return type
List[Pattern]
-
property
issues
¶ Get a list of issues found during the scan.
If a scan has not yet been run, run it.
- Returns
Any issues found during the scan.
- Return type
List[Issue]
-
abstract
load_repo
(repo_path)[source]¶ Load and return the repository to be scanned.
- Parameters
repo_path (str) – The local filesystem path pointing to the repository
- Raises
types.GitLocalException – If there was a problem loading the repository
- Return type
-
property
rules_regexes
¶ Get a dictionary of regular expressions to scan the code for.
- Raises
types.TartufoConfigException – If there was a problem compiling the rules
- Return type
Dict[str, Pattern]
-
scan
()¶ Run the requested scans against the target data.
This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.
- Raises
types.TartufoConfigException – If there were problems with the scanner’s configuration
- Return type
List[tartufo.scanner.Issue]
-
scan_entropy
(chunk)¶ Scan a chunk of data for apparent high entropy.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
scan_regex
(chunk)¶ Scan a chunk of data for matches against the configured regexes.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
should_scan
(file_path)¶ Check if the a file path should be included in analysis.
If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.
- Parameters
file_path (str) – The file path to check for inclusion
- Returns
False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True
-
class
tartufo.scanner.
Issue
(issue_type, matched_string, chunk)[source]¶ Bases:
object
Represent an issue found while scanning a target.
- Parameters
issue_type (tartufo.types.IssueType) – What type of scan identified this issue
matched_string (str) – The string that was identified as a potential issue
chunk (tartufo.types.Chunk) – The chunk of data where the match was found
- Return type
-
as_dict
()[source]¶ Return a dictionary representation of an issue.
This is primarily meant to aid in JSON serialization.
-
chunk
: tartufo.types.Chunk¶
-
issue_type
: tartufo.types.IssueType¶
-
class
tartufo.scanner.
ScannerBase
(options)[source]¶ Bases:
abc.ABC
Provide the base, generic functionality needed by all scanners.
In fact, this contains all of the actual scanning logic. This part of the application should never differ; the part that differs, and the part that is left abstract here, is what content is provided to the various scans. For this reason, the chunks property is left abstract. It is up to the various scanners to implement this property, in the form of a generator, to yield all the individual pieces of content to be scanned.
- Parameters
options (tartufo.types.GlobalOptions) –
- Return type
-
calculate_entropy
(data, char_set)[source]¶ Calculate the Shannon entropy for a piece of data.
This essentially calculates the overall probability for each character in data to be to be present, based on the characters in char_set. By doing this, we can tell how random a string appears to be.
Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html
-
abstract property
chunks
¶ Yield “chunks” of data to be scanned.
Examples of “chunks” would be individual git commit diffs, or the contents of individual files.
-
property
excluded_paths
¶ Get a list of regexes used to match paths to exclude from the scan.
- Return type
List[Pattern]
-
global_options
: tartufo.types.GlobalOptions¶
-
property
included_paths
¶ Get a list of regexes used as an exclusive list of paths to scan.
- Return type
List[Pattern]
-
property
issues
¶ Get a list of issues found during the scan.
If a scan has not yet been run, run it.
- Returns
Any issues found during the scan.
- Return type
List[Issue]
-
property
rules_regexes
¶ Get a dictionary of regular expressions to scan the code for.
- Raises
types.TartufoConfigException – If there was a problem compiling the rules
- Return type
Dict[str, Pattern]
-
scan
()[source]¶ Run the requested scans against the target data.
This will iterate through all chunks of data as provided by the scanner implementation, and run all requested scans against it, as specified in self.global_options.
- Raises
types.TartufoConfigException – If there were problems with the scanner’s configuration
- Return type
List[tartufo.scanner.Issue]
-
scan_entropy
(chunk)[source]¶ Scan a chunk of data for apparent high entropy.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
scan_regex
(chunk)[source]¶ Scan a chunk of data for matches against the configured regexes.
- Parameters
chunk (tartufo.types.Chunk) – The chunk of data to be scanned
- Return type
List[tartufo.scanner.Issue]
-
should_scan
(file_path)[source]¶ Check if the a file path should be included in analysis.
If non-empty, self.included_paths has precedence over self.excluded_paths, such that a file path that is not matched by any of the defined self.included_paths will be excluded, even when it is not matched by any of the defined self.excluded_paths. If either self.included_paths or self.excluded_paths are undefined or empty, they will have no effect, respectively. All file paths are included by this function when no inclusions or exclusions exist.
- Parameters
file_path (str) – The file path to check for inclusion
- Returns
False if the file path is _not_ matched by self.indluded_paths (when non-empty) or if it is matched by self.excluded_paths (when non-empty), otherwise returns True
tartufo.types¶
-
class
tartufo.types.
Chunk
(contents: str, file_path: str, metadata: Dict[str, Any] = <factory>)[source]¶
-
exception
tartufo.types.
ConfigException
[source]¶ Raised if there is a problem with the configuration
-
exception
tartufo.types.
GitLocalException
[source]¶ Raised if there is an error interacting with a local git repository
-
class
tartufo.types.
GitOptions
(since_commit: Union[str, NoneType], max_depth: int, branch: Union[str, NoneType])[source]¶ - Parameters
- Return type
-
exception
tartufo.types.
GitRemoteException
[source]¶ Raised if there is an error interacting with a remote git repository
-
class
tartufo.types.
GlobalOptions
(json: bool, rules: Tuple[TextIO, …], default_regexes: bool, entropy: bool, regex: bool, include_paths: Union[TextIO, NoneType], exclude_paths: Union[TextIO, NoneType], exclude_signatures: Tuple[str, …], output_dir: Union[str, NoneType], git_rules_repo: Union[str, NoneType], git_rules_files: Tuple[str, …], config: Union[TextIO, NoneType])[source]¶ - Parameters
json (bool) –
rules (Tuple[TextIO, ..]) –
default_regexes (bool) –
entropy (bool) –
regex (bool) –
include_paths (Optional[TextIO]) –
exclude_paths (Optional[TextIO]) –
exclude_signatures (Tuple[str, ..]) –
output_dir (Optional[str]) –
git_rules_repo (Optional[str]) –
git_rules_files (Tuple[str, ..]) –
config (Optional[TextIO]) –
- Return type
-
config
: Optional[TextIO]¶
-
exclude_paths
: Optional[TextIO]¶
-
include_paths
: Optional[TextIO]¶
-
rules
: Tuple[TextIO, …]¶
-
class
tartufo.types.
IssueType
(value)[source]¶ An enumeration.
-
Entropy
= 'High Entropy'¶
-
RegEx
= 'Regular Expression Match'¶
-
tartufo.util¶
-
tartufo.util.
clone_git_repo
(git_url, target_dir=None)[source]¶ Clone a remote git repository and return its filesystem path.
- Parameters
git_url (str) – The URL of the git repository to be cloned
target_dir (Optional[pathlib.Path]) – Where to clone the repository to
- Raises
types.GitRemoteException – If there was an error cloning the repository
- Return type
-
tartufo.util.
del_rw
(_func, name, _exc)[source]¶ Attempt to grant permission to and force deletion of a file.
This is used as an error handler for shutil.rmtree.
-
tartufo.util.
echo_issues
(issues, as_json, repo_path, output_dir)[source]¶ Print all found issues out to the console, optionally as JSON.
- Parameters
issues (List[Issue]) – The list of issues to be printed out
as_json (bool) – Whether the output should be formatted as JSON
repo_path (str) – The path to the repository the issues were found in
output_dir (Optional[pathlib.Path]) – The directory that issue details were written out to
- Return type
-
tartufo.util.
extract_commit_metadata
(commit, branch)[source]¶ Grab a consistent set of metadata from a git commit, for user output.
- Parameters
commit (git.objects.commit.Commit) – The commit to extract the data from
branch (git.remote.FetchInfo) – What branch the commit was found on
- Return type
Dict[str, Any]
-
tartufo.util.
generate_signature
(snippet, filename)[source]¶ Generate a stable hash signature for an issue found in a commit.
These signatures are used for configuring excluded/approved issues, such as secrets intentionally embedded in tests.
-
tartufo.util.
get_strings_of_set
(word, char_set, threshold=20)[source]¶ Split a “word” into a set of “strings”, based on a given character set.
The returned strings must have a length, at minimum, equal to threshold. This is meant for extracting long strings which are likely to be things like auto-generated passwords, tokens, hashes, etc.
-
tartufo.util.
write_outputs
(found_issues, output_dir)[source]¶ Write details of the issues to individual files in the specified directory.
- Parameters
found_issues (List[Issue]) – The list of issues to be written out
output_dir (pathlib.Path) – The directory where the files should be written
- Return type
List[str]