Secure data in Git with the clean/smudge filter

Protect secrets in Git with the clean/smudge filter

When working on public Git repositories, you need to pay close attention so that you don't accidentally push secret information such as tokens, private server addresses, personal email addresses, and the like. One of the tools that can help you is Git's clean/smudge filter.

Clean and smudge your Git repository

The clean/smudge filter is quite simple to use. Create a filter driver with two commands—clean and smudge—and then apply the filter per record in the .gitattributes file.

This makes the filter stand between the working directory and the staging area for the specific .gitattributes record. When adding content from the working directory to the staging area with git add, files that match the .gitattributes record will go through the clean filter. When pulling content back into the working directory with git pull, those same files will go through the smudge filter.

Create the filter driver

Let's create a filter driver named cleanPass that uses sed expressions to replace the value secretpassword with the value hiddenpassword and vice versa:

# mac users should use gsed instead of sed
git config filter.cleanPass.clean "sed -e 's/secretpassword/hiddenpassword/g'"
git config filter.cleanPass.smudge "sed -e 's/hiddenpassword/secretpassword/g'"

Tip: Add --global to the config command to make the filter driver available globally to all repositories at ~/.gitconfig instead of locally at .git/config.

As an example, let's apply the cleanPass filter driver to the JSON file type. In the repository's .gitattributes, look for the JSON file type record and modify it. The result should look something like this, depending on the original configuration:

*.json text eol=lf filter=cleanPass

Tip: To avoid pushing, this can be done in .git/info/attributes, which takes precedence over .gitattributes. Just be sure not to mess up any of the original record configurations like eol.

From now on, every new or modified JSON file in the repository will go through the cleanPass filter. Figure 1 provides an animated look at what happens next.

git-clean-smudge-filter — Figure 1. The clean filter in action.

One filter to rule them all

That was cool, but using this filter can get quite cumbersome at times. For instance, on a recent task, I worked with multiple repositories using multiple (but identical) services from my team's Red Hat OpenShift lab. To add to the complexity, I usually work from two different computers—my desktop at my office and my laptop at home.

That all means that I needed to create multiple filters (or one complex one) and apply them to multiple file types, for multiple repositories, on multiple computers. That gave me a headache, especially when the fact hit me that this workflow will probably reoccur in future tasks.

Eventually, I decided to create a very simple script that handles multiple value replacements in both directions, save it on each of my computers, and configure it as a global filter.

Configure a global filter

Adding values to be replaced in this script is super easy, and attaching it to .gitattributes records, as seen above, is quite simple as well.

First, create a script in a path available to all repositories. For instance, you could put it in your home directory at ~/scripts/git-smudge-clean-filter.sh:

#!/bin/bash

declare -A mapArr

mapArr["my-work-private-server.mywork.com"]="<reducted-work-server>"
mapArr["my-personal-private-server.myowndomain.org"]="<reducted-personal-server>"
mapArr["A*&#QAADDA(77##F"]="super-secret-token"
mapArr["oops@mypersonal.email"]="support@correct.email"

# mac users should use gsed instead of sed
sedcmd="sed"
if [[ "$1" == "clean" ]]; then
  for key in ${!mapArr[@]}; do
    sedcmd+=" -e \"s/${key}/${mapArr[${key}]}/g\""
  done  
elif [[ "$1" == "smudge" ]]; then
  for key in ${!mapArr[@]}; do
    sedcmd+=" -e \"s/${mapArr[${key}]}/${key}/g\""
  done  
else  
  echo "use smudge/clean as the first argument"
  exit 1
fi

eval $sedcmd

Add as many pairs as you like to mapArr.

Note: Per the nature of Git's filter driver, you can't use the same key or value more than once.

Next, create the filter driver globally (note the script argument):

git config --global filter.reductScript.smudge "~/scripts/git-smudge-clean-filter.sh smudge"
git config --global filter.reductScript.clean "~/scripts/git-smudge-clean-filter.sh clean"

Now, for every .gitattributes record to which you apply the filter=reductScript configuration in any repository, every file matching this record will go through the script, and every value specified in mapArr will be reducted (or returned).

Conclusion

That's all there is to it. I hope you find this useful. We'll cover other Git tips in future articles.

In the meantime, check out the following Git resources:

Last updated: September 7, 2022

Report a website issue

Protect secrets in Git with the clean/smudge filter

Share:

Clean and smudge your Git repository

Create the filter driver

One filter to rule them all

Configure a global filter

Conclusion

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue