Proxy tools and configs to exist in corp land and at home


Let’s face it, having a corporate proxy with NTLM based authentication really sucks. The forced authentication to an AD backed directory makes sense but creates pain when you work in a terminal and unix/linux world. Thankfully over the past decade or so some amazing people have created some tooling to help navigate the NTLM proxy landscape. Among these tools, cntlm is a fantastic solution to this problem. While cntlm solves 90% of your problems, that last 10% involves turning proxy settings off easily when you are away from the office. There are some sophisticated ways to solve the in-office/out-of-office settings detection dilemma, but I have opted for a “simple” one which was inspired from a ServerFault post (

The solution presented here assumes the following:

  • Stuck using Windows 7/10 as your host OS
    • Note: If you have the luxury of using Mac OSX as a host OS, then also try Fiddler
  • Stuck with a corporate NTLM authenticated proxy
  • Using Virtualbox for development and virtualization tooling
  • Using a Linux VM in Virtualbox
  • Desire to seamlessly move in and out of corp offices and home without the need to edit the cntlm.ini or autodetect network locations
  • You have local admin privileges

Solution Design

  • Cntlm configured with:
    1. Corp proxy
    2. Local squid proxy on alternative port
  • Linux VM bash profile configured to use the cntlm proxy


  • When connected to the corp proxy, the connections will use it via the authentication creds configured in cntlm
  • When at home or away from the office, connections will first try the corp proxy, then failover to the 2nd local squid proxy which uses a direct internet connection
| |
| |
corp proxy direct internet
| |
| |
# HOST (Windows 7/10/etc) # ##############
# VirtualBox #----------------# VM (Linux) #
# cntlm (port 3128) # ############## .bash_profile (export proxy set to vbox nat default gateway)
# --proxy1 set to corp # (
# --proxy2 set to squid #
# squid (port 3129) #
# --using no proxy #
view raw gistfile1.txt hosted with ❤ by GitHub

This makes things simple and you dont have to change all your proxy settings all the time. Just use the local cntlm proxy and it will provide seamless functionality for out of office non-proxy direct connections by failing over to the local squid proxy. There is a downside to having to install the squid proxy and do the extra configuration, but it works nicely once setup.


  • Cntlm.ini file pre-configured, but you need to add your user account and password hash, domain, and proxy
  • Squid.conf pre-configured to bind to port 3129
  • Example .bash_profile relevant config parts
  • Example yum/apt/dnf config
  • Example wget config

Squid for Windows)


  1. Launch Squid proxy (use Windows services or Squid systray tool)
  2. Start Cntlm from administrator enabled command prompt:  net start cntlm

Get Configs

  • Cntlm.ini file
    • #
      # Cntlm Authentication Proxy Configuration
      # NOTE: all values are parsed literally, do NOT escape spaces,
      # do not quote. Use 0600 perms if you use plaintext password.
      # NOTE: Use plaintext password only at your own risk
      # Use hashes instead. You can use a "cntlm -M" and "cntlm -H"
      # command sequence to get the right config for your environment.
      # See cntlm man page
      Username MYUSERNAME
      #Password clear_text_password_not_recommended_use_hash
      #Construct hash as follows: cntlm -H -a NTLMv2 -d MYCORPDOMAIN -u MYUSERNAME
      Auth NTLMv2
      PassLM 1AD35398BE6565DDB5C4EF70C0593492
      PassNT 77B9081511704EE852F94227CF48A793
      PassNTLMv2 D5826E9C665C37C80B53397D5C07BBCB # Only for user 'MYUSERNAME', domain 'MYCORPDOMAIN'
      # Specify the netbios hostname cntlm will send to the parent
      # proxies. Normally the value is auto-guessed.
      # Workstation netbios_hostname
      # List of parent proxies to use. More proxies can be defined
      # one per line in format <proxy_ip>:<proxy_port>
      #Added a failover local squid proxy that use direct internet to allow for seamless access outside corp offices
      Proxy localhost:3129
      # List addresses you do not want to pass to parent proxies
      # * and ? wildcards can be used
      #NoProxy localhost, 127.0.0.*, 10.*, 192.168.*
      #Use NoProxy * for the equivalent of direct internet for all with no proxies, but this setting is not dynamic and must be edited and services restarted each time
      #NoProxy *
      # Specify the port cntlm will listen on
      # You can bind cntlm to specific interface by specifying
      # the appropriate IP address also in format <local_ip>:<local_port>
      # Cntlm listens on by default
      Listen 3128
      # If you wish to use the SOCKS5 proxy feature as well, uncomment
      # the following option. It can be used several times
      # to have SOCKS5 on more than one port or on different network
      # interfaces (specify explicit source address for that).
      # WARNING: The service accepts all requests, unless you use
      # SOCKS5User and make authentication mandatory. SOCKS5User
      # can be used repeatedly for a whole bunch of individual accounts.
      #SOCKS5Proxy 8010
      #SOCKS5User dave:password
      # Use -M first to detect the best NTLM settings for your proxy.
      # Default is to use the only secure hash, NTLMv2, but it is not
      # as available as the older stuff.
      # This example is the most universal setup known to man, but it
      # uses the weakest hash ever. I won't have it's usage on my
      # conscience. 🙂 Really, try -M first.
      #Auth LM
      #Flags 0x06820000
      # Enable to allow access from other computers
      Gateway yes
      # Useful in Gateway mode to allow/restrict certain IPs
      # Specifiy individual IPs or subnets one rule per line.
      #Deny 0/0
      # GFI WebMonitor-handling plugin parameters, disabled by default
      #ISAScannerSize 1024
      #ISAScannerAgent Wget/
      #ISAScannerAgent APT-HTTP/
      #ISAScannerAgent Yum/
      # Headers which should be replaced if present in the request
      #Header User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
      # Tunnels mapping local port to a machine behind the proxy.
      # The format is <local_port>:<remote_host>:<remote_port>
      view raw gistfile1.txt hosted with ❤ by GitHub
  • Squid.conf pre-configured to bind to port 3129
    • #
      # Recommended minimum configuration:
      # Example rule allowing access from your local networks.
      # Adapt to list your (internal) IP networks from where browsing
      # should be allowed
      acl localnet src # RFC1918 possible internal network
      acl localnet src # RFC1918 possible internal network
      acl localnet src # RFC1918 possible internal network
      acl localnet src fc00::/7 # RFC 4193 local private network range
      acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
      acl SSL_ports port 443
      acl Safe_ports port 80 # http
      acl Safe_ports port 21 # ftp
      acl Safe_ports port 443 # https
      acl Safe_ports port 70 # gopher
      acl Safe_ports port 210 # wais
      acl Safe_ports port 1025-65535 # unregistered ports
      acl Safe_ports port 280 # http-mgmt
      acl Safe_ports port 488 # gss-http
      acl Safe_ports port 591 # filemaker
      acl Safe_ports port 777 # multiling http
      acl CONNECT method CONNECT
      # Recommended minimum Access Permission configuration:
      # Only allow cachemgr access from localhost
      http_access allow localhost manager
      http_access deny manager
      # Deny requests to certain unsafe ports
      http_access deny !Safe_ports
      # Deny CONNECT to other than secure SSL ports
      http_access deny CONNECT !SSL_ports
      # We strongly recommend the following be uncommented to protect innocent
      # web applications running on the proxy server who think the only
      # one who can access services on "localhost" is a local user
      #http_access deny to_localhost
      # Example rule allowing access from your local networks.
      # Adapt localnet in the ACL section to list your (internal) IP networks
      # from where browsing should be allowed
      http_access allow localnet
      http_access allow localhost
      # And finally deny all other access to this proxy
      http_access deny all
      # Squid normally listens to port 3128
      #http_port 3128
      http_port 3129
      # Uncomment the line below to enable disk caching - path format is /cygdrive/<full path to cache folder>, i.e.
      #cache_dir aufs /cygdrive/d/squid/cache 3000 16 256
      # Leave coredumps in the first cache dir
      coredump_dir /var/cache/squid
      # Add any of your own refresh_pattern entries above these.
      refresh_pattern ^ftp: 1440 20% 10080
      refresh_pattern ^gopher: 1440 0% 1440
      refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
      refresh_pattern . 0 20% 4320
      max_filedescriptors 3200
      view raw gistfile1.txt hosted with ❤ by GitHub
  • Example .bash_profile relevant config parts
    • #################
      # On a Windows machine using Cygwin, use this:
      # PROXIES
      export http_proxy="localhost:3128"
      # On a Linux VM from Virtualbox with NAT, use the default Vbox gateway this:
      # PROXIES
      export http_proxy=""
      view raw gistfile1.txt hosted with ❤ by GitHub
  • Example yum/apt/dnf config
    • #CentOS yum proxy setup
      #Edit (as sudo or root) the /etc/yum.conf or /etc/dnf/dnf.conf file to make use of cntlm proxy from VirtualBox Linux VM using NAT
      view raw gistfile1.txt hosted with ❤ by GitHub
    • #Ubuntu Apt proxy setup
      #Create (as sudo or root) the /etc/apt/apt.conf file to make use of cntlm proxy from VirtualBox Linux VM using NAT
      Acquire::http::Proxy ";;
      Acquire::ftp::Proxy ";;
      view raw gistfile1.txt hosted with ❤ by GitHub
  • Example wget config

Multiple Git Identities & SSH

Having multiple Git configurations is inevitable for the busy developer.  It is likely a good thing (you are using version control in multiple settings).  However, setting your development workstation for multiple git accounts can be difficult.  Also, if you want to use Git with ssh, you need to be able to differentiate multiple ssh accounts from each other.  Furthermore, you might desire to use multiple Git platforms at once (public SaaS GitHub, GitHub Enterprise, Git-o-Lite, or Bit Bucket) with each one having both personal and professional repos and accounts.  There are several concepts you need to understand before getting started with using multiple Git identities with ssh:
  • Git protocols
  • Ssh passwordless logins
  • Git remotes
Let’s talk about each of these with respect to what you need to know to get working.

Git Protocols:
Git can use multiple protocols for data transfer communications.  According to the official Git-scm documentation, there are 4:  local, HTTP, SSH, and GIT.  Local is only used, as the name implies, for local Git operations on local disk or even NFS or CIFS mounts.  So, our options for over the network/internet communications are the other 3.  Since the bare Git protocol does not offer much in terms of security (no authentication, no encryption), the optimal choices are HTTPS or SSH.  As the Git documentation mentions, there are some drawbacks to doing HTTPS (server setup, credentials caching and storage, etc), but it is certainly widely used because services such as GitHub make the server side part simple, leaving only the workstation side setup to deal with of which, for OS X users, the keychain makes things nice.  Finally, SSH is ubiquitous and provides good security.  When combined with password-less logins, SSH becomes very convenient as well.

SSH Passwordless Logins:
SSH provides great transport security as well as authentication.  Due to it’s use of public/private key pairs and a configuration file, you can create multiple SSH identities each of which can use their own public/private key pair.  By exchanging your public key with remote servers you connect to, combined with a tracking mechanism and file called known hosts which tracks machine network addresses for added security, SSH allows you to safely login without passwords.  The public/private key pair encryption with the remote host is superior security to text passwords.

Git Remotes:
Git uses the concept of a remote to track the location of the central repository per project.  While the beauty of Git is that the project repos are distributed to all contributors on a project, the best practice is the use a central server for which everyone pushes their changes back for easier tracking.  This central source server in Git is called “origin” by convention.
When you clone a repo on a remote Git server, there is a git dotfile directory created with a config file inside.  This config file tracks the remote origin URL used to locate the Git server.  This configuration file can be edited to provide the functionality you need even after you clone a repo.

Configuring Multiple Git Identities with SSH
  • You have already created SSH keys and organized them in a way you like.
    • Note:  I prefer to have different SSH keys for personal versus professional.  I also find using a naming scheme other than the default helpful in identifying which keypairs are for which use.  I realize this might be considered overkill by some but I prefer this and it also helps make the example more illustrative by showing different keys with different accounts.
  • You have already setup GitHub public, GitHub Enterprise, BitBucket, Stash, Git-o-lite, or GitLab accounts and added your desired SSH public keys to them.
  • You are using OS X or Linux.
    • No offense to my Windows buddies here, but things aren’t the same config wise on Windows and I wanted to keep this post simple.
  • For purposes of this example, I will use multiple accounts in GitHub public, GitHub Enterprise, BitBucket, and Git-o-lite.
    • Note:  this SSH config file is abridged to just show the Git relevant portions.
1) Create a SSH Config file for configuring multiple SSH identities.
cd ~/  
touch .ssh/config
2) Open the new config file and edit it as follows (customizing for your accounts/keys/etc):
vim ~/.ssh/config

#SSH Github Personal Public Account
#Maps to a per repo .git/config email setting to work
User git
IdentityFile ~/.ssh/
IdentitiesOnly yes
#LogLevel DEBUG3

#SSH Github Personal Public Account GISTS
#Maps to a per repo .git/config email setting to work
User git
IdentityFile ~/.ssh/
IdentitiesOnly yes

#SSH Github Professional Public Account
#Maps to a per repo .git/config email setting to work
User git
IdentityFile ~/.ssh/
IdentitiesOnly yes
#LogLevel DEBUG3

#SSH Github Personal Public Account GISTS
#Maps to a per repo .git/config email setting to work
User git
IdentityFile ~/.ssh/
IdentitiesOnly yes

#SSH Bitbucket Personal Public Account
#Maps to a per repo .git/config email setting to work
User git
IdentityFile ~/.ssh/
IdentitiesOnly yes
#LogLevel DEBUG3

#SSH Github Enterprise Account
#Maps to a per repo .git/config email setting to work
User myldapuserid
IdentityFile ~/.ssh/
IdentitiesOnly yes
#LogLevel DEBUG3

#SSH Private Git-o-lite
#Maps to a per repo .git/config email setting to work
User gitolite
IdentityFile ~/.ssh/
IdentitiesOnly yes

3) Finally, you will need to modify the .git/config file in each repo you have cloned and will clone to match the name you specified in the Host section of the .ssh/config file for the URL in the remote “origin” section.
This is because Git is using the Hostname in the URL to match the Git hostname for the SSH key.  Think of this as kind of a DNS lookup by Git (probably a poor analogy) to locate the SSH hostname.
Here are some examples to map to the SSH config file above.  This is especially helpful when you have multiple user accounts on the same Git provider platform (such as GitHub) and need to differentiate user accounts (and associated SSH keys) when the hostname is the same (, etc).
cat ~/git-repos/github-public/personal/project1/.git/config

--- snipped ---

[remote "origin"]
    url =
    fetch = +refs/heads/*:refs/remotes/origin/*

--- snipped ---

cat ~/git-repos/github-public/professional/project2/.git/config

--- snipped ---

[remote "origin"]
    url =
    fetch = +refs/heads/*:refs/remotes/origin/*

--- snipped ---

cat ~/git-repos/bitbucket/personal/project3/.git/config

--- snipped ---

[remote "origin"]
    url =
    fetch = +refs/heads/*:refs/remotes/origin/*

--- snipped ---

cat ~/git-repos/github-enterprise/professional/project4/.git/config

--- snipped ---

[remote "origin"]
    url =
    fetch = +refs/heads/*:refs/remotes/origin/*

--- snipped ---

cat ~/git-repos/git-o-lite/professional/project5/.git/config

--- snipped ---

[remote "origin"]
    url =
    fetch = +refs/heads/*:refs/remotes/origin/*

--- snipped ---

Closing thoughts:
The need to edit your .git/config files on new or cloned repos when the remote origin path needs modification to match your associated SSH config is a little annoying.
I am sure there is a way to automate this to match the paths setup in the .ssh/config file, but I have not embarked to do this just yet.
The problem is you would need to define what SSH identity you want to use by name and then do a post Git Clone “hook” of sorts.  Git clone post hooks don’t exist and so you are left with using some form of git template combined with a script which updates the remote origin URL with the SSH identity Host identity path you desire.  I suppose you could do a sed operation via  bash script executed from some git init customizations combined with a git clone —template, but remembering all these setup steps steps made me just opt for using the simple git command line manually after I clone.
(Note: for all existing cloned git repos you have, you’ll need to go into each one and update the remote origin URL in the .git/config file to take advantage of any custom SSH host name path you are using).
For now, you can use the following command to update the per repo .git/config remote origin URL path:
git remote -v

git remote set-url origin

git remote -v

Note that in the SSH config file examples I provided, there is a line saying #LogLevel DEBUG3.  This is a great way to see live SSH authentication debugging information as you try to authenticate to each remote GIT server.  I left them remarked out just so I could turn them on if I ever needed to troubleshoot a connection.  As you learn how to use these SSH identity approaches to multiple GIT identities, you’ll find this debug command super helpful.


Chef Vault with Large Teams

Chef-vault is a tool created by Nordstrom and adopted by Chef as the de facto way to handle secrets management using the Chef platform.  Chef-vault builds on the original Chef encrypted data bags conceptÂą. Rather than a single shared decryption key, chef-vault creates a separate copy of the data bag item for each node that is granted access, using the existing RSA public/private key pair normally used for Chef API authentication. According to Noah Kantrowitz, who sums it up nicely, “This means you no longer have to worry about distributing the decryption keys, but it also moves chef-vault into the gray area between online and offline storage systems.  Once a server is granted access to a secret, it can continue to access it in an online manner. However granting or revoking new servers requires human interaction. This means chef-vault is incompatible with cloud-based auto-scaling or self-healing systems where nodes come/go frequently. It also inherits the same issues with lack of audit logging as all other data-bag driven approaches”.
Summary of Chef-Vault Functionality:
  • Uses encrypted data_bags, but adds layer of mgmt on top
  • Creates separate copy of encrypted data_bag item for each node granted access
    • No single shared decryption key
  • Uses the existing public/private key pair for the Chef API authentication (node public key)
  • Handles distribution of keys for you using Chef
  • Granting/revoking servers access is a separate human task
    • Cloud based systems which build and destroy nodes requires that new nodes be added/removed each time nodes status changes for vault
    • Requires using the chef-vault local knife commands
How to have teams of developers manage secrets in a common manner using chef-vault?
Of course you could say that each individual team has designated individual(s) with knife access to Chef servers and they use knife vault to get the secrets up to Chef and call it a day.  However, this is not always as cut and dry as one might hope.  For example, in my company we chose years ago to tightly restrict knife write access to chef servers only from a CI/CD server via a pipeline.  In other words, there are SDLC process limits and security process limits put in place to limit the risks of “wild west knife access”.  Another reason is that members of teams in larger orgs can come and go, and often you need systems where things are well documented and stored in ways that others can pick up and work where another left them.
Due to the need to have a consistent process as well as limit knife access to the pipeline, we needed a mechanism to get secrets into Vault without funneling them all to a central team every time a new secret is added or removed.  Because Chef Vault uses data_bags which are predominantly stored in version control, the seductive answer is to just throw them on your private version control system like GitHub Enterprise.  However, the moment you do that, your secrets no longer become secret.  What  was needed was a way to encrypt secrets in GitHub in a seamless manner such that a CI/CD pipeline server can ingest them for the express purpose of using knife vault to get it securely to Chef.
The design solution was to use a combination of technologies:  git-crypt + GPG + github + Jenkins + Chef vault.  Let’s dig into the design details.
Solution Summary:
  • Create a team based “GPG WoT” (web of trust)
  • Limit exposure of the public key to the folks that need it (dev team, Chef admins, CI server admin) in a localized web of trust.
  • Use git-crypt + the GPG key to seamlessly encrypt your desired secrets for storage in the data_bags/chef-vault git repo.
  • Allow a private CI/CD server instance (like Jenkins) to have your team GPG public key so it can grab your encrypted data_bags/chef-vault secrets and decrypt them
  • Have the CI/CD server instance then run a job which takes the unencrypted data_bags/chef-vault secrets and then uses knife vault to secure them again on the Chef server.

MyOrg Web of Trust (WoT) Using GPG

In order to sanely place secrets like passwords, certificates, and keys in a DVCS like GitHub with only authorized people having access, we need to use GPG keys.  The design is such that the authorized “web of trust” user accounts like an Infrastructure CI/CD server ldap user, Infrastructure Chef admins, or specific Security Engineering team members get a copy of a given team’s public GPG key.  An authorized user account will then be able to grab the data and unencrypt it using the GPG keys.  Depending on your Web of Trust preferences, you may desire to only allow access to your GPG public key to a smaller group.
Essentially, the design is such that each team can have a designated email address or name as an identifier which is used with the GPG public key.  This public key is then sent to ONLY those in your Org Web of Trust (WoT) that are authorized to get the public keys.  In this case we are not actually freely giving out team public GPG keys, but selectively choosing who can get them.  I realize that the beauty of the GPG key in the public internet use case is to make the public key widely available.  Here, we are using it selectively within a given org for purposes of limiting who has access to decrypt.  You are taking some steps to protect your public keys within a team for purposes of getting the shared secrets into a data_bag store (usually Git) securely.  The trick here is to have the data_bag items encrypted in the version control system like a private Github repository such as Github enterprise using GPG keys + Git-Crypt.
The process of getting them into the Chef server as an knife vault encrypted data_bag item then can be handled by a secured job on a CI server such as a private Jenkins instance.
The CI/CD server job must do 2 things:
  1. Grab the GPG encrypted data_bag items from version control (it has the GPG public keys for each team).
  2. Decrypt the GPG encrypted data_bag item and then use knife vault to upload and encrypt using normal Chef Vault processes.
This job workflow allows a distributed team to centrally store secrets in a way that a central job can grab it, decrypt it, and then re-encrypt it using native Chef vault processes.  Of course, when people leave the team, it behooves you ideally to generate a new team based GPG key and then re-encrypt your team secrets using the new GPG key and ensure the CI/CD server job gets the new key for its purposes.  This design also assumes you own a private DVCS system like Github enterprise or a private Github account.  I don’t recommend implementing this design for any open public internet projects, on a non-private Github or bitbucket for example, where mistakes in your workflow and WoT could potentially expose your secrets.  This is a weakness of the design in terms of security and management, but a fair trade off from the alternative of non-standard secrets management among large teams before they get it into chef vault.


  • CI Server & Admins:
    • Working Chef11/12 workstation environment (chef, knife, etc)
      • Chef DK
      • This includes you being a valid Chef user
      • Knife is best reserved to very few people.  Let a CI/CD pipeline do knife tasks.
    • Installed chef-vault gem on your workstation (or just use ChefDK)
      • Recommended to use RVM or rbenv to avoid installing gems in your system ruby or just use ChefDK
  • Org Team members & CI Server & Admins:

Workflow and Dev setup

  • Install git crypt on your local machine in order to encrypt your data bags.
  • Generate and share your team GPG pub key with the central Chef Admins for purposes of getting it to the private CI/CD pipeline server(s).
    • Export your pub key using:
      gpg --export -a "User Name" > public.key
    • This will create a file called public.key with the ascii representation of the public key for User Name (use LDAP userid or email is useful in large orgs)
  • Configure git-crypt for your repo
    • Fork a copy of the chef-repo/myChef repo to your personal GitHub repo (this is the repo where you store data_bags.
    • Clone your forked copy of the myChef repo to your local workstation.
    • Navigate to the data_bags directory and then to the chef-vault data_bag directory, then create a directory for your app tied to the role name
cd /chefRepo/data_bags/chef-vault/
mkdir myApproleProd
  • Create a .gitattributes file to tell git-crypt what files to encrypt in a repo directory
cat > .gitattributes << EOL
secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
  • Initialize the repo directory for git-crypt
    git-crypt init
  • Add your GPG key to git-crypt
    git-crypt add-gpg-user USER_ID
Workflow Summary :  Common Chef Repo Data_bags or Chef 12 Organizations
  • After Chef Admin team adds your team public key, follow the steps below to add a new secret to the chef-vault store within the data_bags root directory in Git.
  1. cd /chefRepo/data_bags/chef-vault
  2. git pull origin master
  3. git-crypt unlock
  4. Add your new data bag item ( make sure it ends with .key and id matches the file name, steps listed below)
    • A vault data bag needs to be checked in a specific directory structure standard (see below for details).
  5. git commit -am “adding encrypted chef-vault item"
  6. git push
  7. Submit pull request to accept additions to the repo
  8. Wait for the Chef CI/CD pipeline server job to:
    • Ingest the new items via the CI server job
    • Decrypt using your team GPG key
    • Knife vault to re-encrypt using chef-vault a given Chef server.
Chef-Vault Directory Design Standards
The data_bags directory structures allow data_bags that contain data_bag items.  This can exist in a sane root directory structure such that we can organize things in a Github repo in a way that is visually appealing.  Once the data_bags and their items are ingested by Chef to SOLR, the root directory structure in GitHub is not present.  Instead there are data_bags (keys) and data_bag items (values).
See details on how Chef views the data_bag directory structure.  Per the Chef docs, “A data bag is a container of related data bag items, where each individual data bag item is a JSON file.  knife can load a data bag item by specifying the name of the data bag to which the item belongs and then the filename of the data bag item.”
Essentially, the form is:
data_bag directory
     data_bag_item json file
You can chose the name the json data_bag_item in Chef vault with a .key extension, but the content should have “id” and “key” section.
 "id": "my_secret_item_name",
 "key": "value"
An example directory structure is below:
The bag_name should be updated based on your application role.
├── data_bags
│   ├── chef-vault
│   │   └── bag_name_matching_role
│   │      └── fooapp-secrets.key
│   │      └── barapp-secrets.key
  • We (CI/CD job) expects the bag_name to be added under the chef-vault directory
  • We (CI/CD job) also expect the bag_name to match the role name.
    • The vault data_bag would be created and available only for nodes with corresponding with chef role.
      • e.g: If you expect a vault item to be applied to a node with a chef role of myAppenvProd
      • The structure would look like this:
├── data_bags
│   ├── chef-vault
│   │   └── myApproleProd
│   │      └── myApp-secrets.key
  • For the item to be encrypted the data_bag_item should end with file extension “.key” and the contents should be in JSON format.
    • This is a design preference (.key), but the JSON format is a Chef data_bag thing (see here).

Design for sharing encrypted Items in a DVCS for Chef using chef-vault



Here are some common use cases or knife-vault that the CI/CD server job logic can execute.
Chef-vault commands used by the CI Server automation (Examples)
Create Vault Item
knife vault create fooapp fooapp-secrets \ -J data_bags/chef-vault/apps/fooapp/fooapp-secrets.json \ -A "adminuser1,adminuser2" -S "role:fooapp-server"
Show Vault Item
knife vault show fooapp fooapp-secrets -Fjson 
Delete Vault Item
knife vault delete fooapp fooapp-secrets
Delete old node
knife vault update fooapp fooapp-secrets \ -S "role:fooapp-server"
Update list of Admins
First see who has access:
knife search fooapp 'id:fooapp-secrets -a clients
Next change the membership:
knife vault update fooapp fooapp-secrets \ -J data_bags/chef-vault/apps/fooapp/fooapp-secrets.json \ -A "adminuser3,adminuser2" -S "role:fooapp-server"

Update/Rotate/Refresh Keys/secrets
Rotate keys for a single vault item:

knife vault rotate fooapp fooapp-secrets

Rotate all keys for all vault items:

knife vault rotate all keys


Containers Don’t Really Boot

Docker has been a great advancement for mass consumption of linux based containers.  The maturation of the virtual machine boom that has been happening since the early 2000’s led to mass acceptance and deployment in public and private clouds.  To be sure, asking for bare metal today can be seen as a faux pas without some well-defined use case (like super high IO).  So, now that folks have accepted that slices of CPU, memory, and disk are good enough through well-known hypervisors (kvm, esxi, xen) for most workloads, taking the next step to containers will not be that big of a leap.  Except that now it is more common to run containers on VMs than bare metal.  So now we get a slice of a slice of a slice of resources!
Virtual machines are just what their name implies:  full machines that are virtualized.  This means they have virtual hardware that virtually boots an OS kernel and mounts a filesystem.  The OS doesn’t really know that the hardware it is running on is not real.  Containers on the other hand are not virtual machines.  Containers are fully sandboxed processes using the host machine OS Kernel.  So, when running on a VM, they are slices of VM vCPU, memory, and disk for fully sandboxed processes.  This is the part that had me perplexed for a while until I ventured to understand exactly what happens when an lxc container starts versus a virtual machine.

Boot or Start?

Let’s compare boots of CentOS Linux on virtual machines versus containers:
Virtual Machine/Bare Metal:
  • Power on, the system BIOS loads and looks for sector 0, cylinder 0 of the boot drive (Typically /dev/hda, or /dev/sda)
  • The boot drive contains the MBR which then uses a boot loader such as GRUB (typically in /boot) which locates the kernel and loads it (based on GRUB config)
  • The kernel (vmlinuz) then uncompresses itself into memory
  • Load the temporary RO root filesystem via initramfs (configured in GRUB)
  • The Kernel locates and launches the /init program from within initramfs (/sbin/init)
  • Init determines run level via /etc/inittab and executes startup scripts
  • Per fstab entry, root filesystem completes integrity check and then is re-mounted as RW.
  • You get a shell via /bin/sh
  • Docker tells LXC (now libcontainer) to start a new container using the config in your Dockerfile
    sudo docker run -i -t centos7 /bin/bash 
    • Runs lxc-create or libcontainer equivalent with params (example)
      lxc-create -t mytemplate -f lxc.conf -n mycontainer
  • Docker on rhel/centos/fedora systems use device mapper which uses a sparse file for holding container images here:
  • Docker starts the base image (directory structure) as read only, and creates the new RW (CoW) layer on top of it.
  • Docker gives you a shell via /bin/sh (if you asked for it in the Docker run or a Dockerfile config)Docker configured union filesystem (AUFS, devicemapper, overlayfs) is used for mounted root filesystem

“It is perhaps more precise to say that a linux environment is started rather than booted with containers.”

The entire linux “boot” process that a normal virtual machine goes through is essentially skipped and only the last steps where the root filesystem is loaded and a shell is launched happens.  It is perhaps more precise to say that a linux environment is “started rather than booted”.  I was also further confused by the Docker terminology which uses the word “image” to describe something different from cloud images.  When I hear “image” I think of AMI style full virtual machine images as used in clouds.  These images are different from container images used by Docker.  Docker uses the term “image” to describe what is really a minimal layered root filesystem (union mounted).  This is all a bit confusing at first until you remember that everything in Linux is a file.  If you dig around and take a look at some of the utilities to create these “base images” such as febootstrap/supermin or debootstrap you will see that they are creating clean, consistent directories and files for the linux distro output in various formats such as .img or .tar.  So, the docker “images” are really nothing more than directories and files with pre-populated with the minimum viable set of linux components and utilities you need for a functional linux system.

“This is all a bit confusing at first until you remember that everything in Linux is a file.”

When Docker LXC/libcontainer based containers boot they are really just starting a kind of super “chroot” of processes with totally sandboxed storage and networking.  They don’t need to do a full boot since they are just leveraging the OS kernel of the host system.  All they seem to need are the minimum viable linux system directory structure tools and utilities.  Additionally, because Docker caches content, things run even faster since there is less to download.  These are reasons why containers “boot” or more precisely “start” incredibly fast.  Because you don’t have to go through a fully virtualized system boot process like a virtual machine or bare metal machine, you get productive “process-wise” rapidly in your own super sandboxed linux environment.

Union File Systems and the Neo Image Zeitgeist

One cool thing Docker introduced is the use of union mount layered file systems to control the craziness of working with images.  When I say image “craziness” I might need to clarify with a refresher for those youngsters who didn’t go through the whole “anti-image” zeitgeist of the past 5 years.  Let’s rewind to the early 2000’s when people discovered the ability to create sector by sector disk copies and saved all the time of installing apps over and over again (Ghost anyone?).  Everyone was in love with images then and naturally started using them in VMs when vmware was new and hot.  It was only after years of dealing with homegrown image versions and patching problems that folks started becoming more and more anti-image.  To be sure, many people made many horrible images (especially for Windows machines) that didn’t properly get sanitized (with sysprep or similar tools) before use as a VM template which only served to exacerbate things.

Fast forward to 2005-ish when CM tools like Puppet and later Chef in 2008 were formed in the “anti-image” rebellion.  What people wanted in these modern CM tools was the ability to repeatedly build pristine machines literally from bootstrap.  What this meant to many was no images ever: PXE the world and then Chef it.  As the public cloud took off so did people’s needs to build servers at an increasingly rapid pace. PXE bootstrapping everything was just to slow and often not possible in multi-tenant cloud environments like AWS.  The compromise answer was to create super minimal “base images” (also called JEOS or Stem Cell images) which were super pristine and small.  These base images for full virtual machines booted much faster and the fact that they had very little on them didn’t matter anymore since we could reliably and cleanly install everything declaratively in code using Puppet or Chef.

Fast forward to today and folks often find that full VM’s booted and installed with cookbooks are again not fast enough for them.  Also, the complexity of using some CM tools meant that it was a fair amount of work to stand up your Puppet or Chef environments before they paid off in speed and as a single source of truth.  Enter containers.  However, just getting a container only gives you a pristine base image if you start out with one.  Beyond any pristine base container image, any customizations you might need (like installing your app software components) would require you to get back to the days of image sprawl unless you used modern CM like Puppet or Chef to provision on top of your container base images.  Docker decided to fix this old problem with a new twist by using a union mount for layered or copy on write filesystems.  What they did was take the concept of the pristine base image (which we’ve learned is nothing more than minimum viable linux bistro directories and files with components and utilities) and allow you to leave it in a pristine shape.  They then allow you to layer components on top of each other that you need leaving each layer as read only thin deltas of changes.  The automation (infra as code) offered by Docker is via the Dockerfile where machine DNA is located.  What is still yet to be determined is whether the Dockerfile is enough to get your container in the end state you desire.  For example will layered “super thin delta images” be a replacement for modern CM tools?  Or, more likely, will CM tools like Chef be used to build each thin layer delta image (TLDI).

Thoughts on Loosing Weight

Rather than buy a bunch of books and other BS, let me drop some science on your straight from Silicon valley on weight loss.

Note:  This is anecdotal science straight from a sample of 1 at its best.

  1. Eat less food
    • (cut overall intake at least 30%)
  2. Eat better food
    • (cut anything with sugar or that metabolizes to sugar: sodas, breads, pasta, fast food)
  3. Exercise at least 5 days a week
    • (so sorry to bust your bubble, but this is key)
  4. Get more sleep
    • (yep, really… like 7+ hours min)
  5. Drink a buttload of water
    • (scientific amount to be sure)
  6. Reduce your stress
    • (I’m so effing serious here. The exercise and sleep does help some here)
  7. If you got this far, then you should drop like 10lbs every 2 months.

Now, that you have the harsh truth, get out there and get it done.

Note to self:  Please follow your own advice 🙂

Sources:  Dr. Padge

Path it your way

I really like the cool products the team at HashiCorp cook up.  Unless you have been frozen in carbonite for the past 3+ years, you’ve likely used or heard about Vagrant.  What a wonderful tool that lets you rapidly build and test on VMs and containers on VMs using Virtualbox.  As if that wasn’t awesome enough, this past year Mitchell and his team have introduced multiple awesome projects including terraform, consul, serf, and packer.  I wanted to get my hands on these tools and use them locally on my Mac.  As with Vagrant, each of these products are packaged into a nice downloadable binary which you can then extract to the directory of your choice for use.   However, after reading the documentation carefully, you’ll find that setting your path to these executables is a key part of your environment setup.

Setting path variables is not a big deal at all.  However, I wanted all my HashiCorp products executables to live in a specific directory configuration.

mkdir -p $HOME/hashicorp/{consul,packer,serf,terraform}

$tree $HOME/hashicorp -d
├── consul
├── packer
├── serf
└── terraform

Since there are multiple projects from the same company to be used, I wanted each one in its own subdirectory.  Rather than list each HashiCorp product in my .bash_profile PATH statements, and to prevent other things from getting broken in the process of editing the paths (RVM, etc), I decided to dig around on our good ole friend StackExchange and found some good tips for doing paths for multiple subdirectories.

So, I did the following in .bash_profile to have the shell iterate over each subdirectory and add to the $PATH export.

$vim .bash_profile
for p in $HOME/hashicorp/*/; do
 export HASHIPATH="$p"

Note: It is likely a security anti-pattern to have recursive paths if I were to get malicious code from Hashicorp since I am setting up a specific path root for all the tools from them I intend to drop into this directory structure. Again, this structure is just a personal preference to keep everything in a single root directory with release versions. I could have chosen to do symlinks or symlink tools like stow, but I just wanted something quick and functional with the directory structure I desired.

Now, the next awesome product that the folks at HashiCorp release that I want to use, I can simply drop it into the subdirectory of my choice with the versioning directory I like, and I’m good to go.

Generation wuss article…

This article from Bret Easton Ellis on Millennials is nothing short of brilliant and spot on.  Check it out.

Millennial« Generation Wuss » by Bret Easton Ellis

For Whom the Batphone Calls

A Runbook for Enterprise Adoption of Open Source Software 


Enterprise leaders wanting to do a “Dev and Ops” pivot need to own up to the fact that you have to earn you keep today when choosing this path.
If shit is broke, YOU should have the power to fix it.  This idea of outsourcing all problems to a paid closed source vendor support call can be naive and lazy in many cases.  You should have the power to be your own bat phone.  When you call Batman, you are in essence calling yourself.

Retirement Home Enterprise          

For the longest time, infrastructure and application support teams have operated in a super old workflow that was uninspiring.  They got paid well to be effectively “glorified maintenance/janitorial staff.”  An operations spreadsheet/app had a name on it that says they support X and Y.  When X or Y breaks, they put a ticket in a queue and call people.  These people look at some super basic stuff like 1) Is the server up and pingable?, 2) Is the app process running?, or 3) Is there a vague, nefarious network issue?  If none of these things get hits, they shrug their shoulders and  and open a ticket with the Vendor that they pay support to and then wait.  They then report to some person that reports to some other ops person that they are “working on the issue” with the vendor.  

The vendor then has a support engineer (L1/L2) work the call and ask for sane things like logs and read the case notes.  The enterprise support person just acts like a drone in this case typing whatever the vendor support engineer says and accepts whatever the vendor provides as an answer.  Unless an answer cannot be found, the enterprise support person just accepts the answer/fix and closes the case.  If no fix can be found, it effectively is like reporting a bug (If it is not already well known).  The fixing for the bug can be soon* or indefinite.  Unless the enterprise customer is huge (spends loads on support & products) the bug fix doesn’t get addressed fast.  It gets addressed on the schedule most convenient to the vendor.  These support contracts are not cheap.  This is such a lazy, lame approach.  I call this the “Batphone Mentality” where the vendor software support superhero is always a support phone call away.  This is  especially true for closed source COTS apps whose heavy handed approach actually creates a self-reinforcing cycle of enterprise retirement home “Batphone” mentalities. The problem is that Batman worked for free for Gotham, while enterprises have to pay salaries for multiple external vendor Batmans.

“closed source COTS apps … heavy handed approach actually creates a self-reinforcing cycle of enterprise retirement home “Batphone” mentalities.” 

The Emerging Open-Source Enterprise 

When you decide that you want to control your destiny, get things fixed faster, or add features and functionality to applications and tools you need to run your business, you may be ready to make the open-source switch.  If paying loads of money for support contracts is leaving you beholden to a vendor that is not transparent and open about product bug fixes or feature roadmaps, you are effectively trapped.  Major commercial software vendors with closed source products maintain a kind of hegemony over enterprise customers in the following ways: 

  • Forcing the customer to follow often stringent tech-stack requirements to install and use the product.
  • Forcing the customer to forgo support officially if they try to deviate any part of the tech-stack.
  • Making money off parts of this “required” tech-stack with onerous licensing.
  • Releasing new versions that require forklift upgrades that are uber costly and are often not backward compatible.
  • Purposely not being open with support documentation and requiring pay/account gateways to get it.
  • Forcing a god forsaken license key or licensing process that makes installs painful and limited.
  • Dropping a product completely leaving the customer with no option. 

“Major commercial software vendors with closed source products maintain a kind of hegemony over enterprise customers.”

For years enterprises just accepted this in large part due to the following reasons:

  • They used closed source OS platforms (Windows/AIX/Mainframes) since viable open-source alternatives were not available.
  • They bought software that was closed source because developing the software was hard and costly.
  • Tools needed to write software for these closed-source platforms were not easily accessible or free ($1000+ for Visual Studio is pathetic).
  • Technology perhaps was not a key part their business (non-tech companies).
  • Staff that had requisite skills to overcome (1-4) were not widely available. 

Enter the modern tech era Circa 2007 – 2014 where open-source software is almost an American birthright and as normal and accepted as apple pie.  Open source operating systems, applications, and software development tools are widely available that can solve many (if not most) of the things an enterprise might need (Linux, OpenStack, Nagios, Hadoop, Tomcat, Apache, Rails, NodeJS, and on and on).  Public code sharing repositories like Github, Gitlab, Gitorious, Cloudforge, Sourceforge, and GoogleCode have been a revolution in open-source workflow and project accessibility.  Github in particular has been a big part of this inflection point in social coding.  There has likely been more open-source momentum in the past 7 years than in that past 30.  It is truly an amazing time. Decades from now, folks may look back and ask all us old-timers stary-eyed questions about “what was it like to live during the second open-source revolution?” like we we lived through the equivalent of the roaring twenties or some other historically significant time.

“There has likely been more open-source momentum in the past 7 years than in that past 30.” 

While the open-source movement may have not started precisely in the San Francisco bay area, the area has been home to many leaders in the movement.  Partly perhaps due to the roots (even today) of silicon valley as a haven for free thinkers and futurists.  A place also were it was ok to be distrustful of governments and “the man.”  The bay area hippie culture of the 60’s may have also contributed in part to a tradition of folks wanting things to be free or more accessible that continues.  Today, there is still a vein of this benevolent revolutionary mindset behind every open-source project.  The interesting thing that has happened is that the hippies have been replaced by benevolent libertarian capitalists.  This is very cool.  Many startups in the valley today often release their product day one as an open-source project.  They simultaneously offer paid support for customers.  In many cases, their products are based on open-source products but sold also as SaaS or cloud offerings.  This is an amazing amount of new freedom for the enterprise.

“…the hippies have been replaced by benevolent libertarian capitalists.” 

Be your own Batphone 

If you want to control your destiny, get things fixed faster, or add features and functionality you need on your own schedule, you need to be using open-source software and hiring staff that are not drones or retirees.  Here is proposal for how the modern enterprise can call their own Batphone.

1) Use open-source software
     – get access to the source code now (there is a boat load out there
     – check lists like this (
2) Hire good developers for Apps and good developers for Infra
     – Pay them well
     – Empower them
3) Be very strategic about what commercial, closed source software you buy and use
     – It needs to provide a better ROI for functionality, speed, reliability than an OSS alternative.
     – Try to pick closed source software that at least is largely standards based (it can ease a transition off the product later if needed).
     – Put pressure on the vendor to allow flexibility of the tech stack to use open-source projects (tomcat, node, etc).
4) Train your staff and engineers well
5) Accept the fact that you can now truly control your destiny.

On this last point, it is a function of having skilled developers and engineers that use open source projects and have a specific mentality.  The mentality needed is really an “Anti-Batphone” mentality.  Developers and Engineers have to own up to the fact that they are the last person standing.  They are the last solider in the bunker and have to be McGiver or Will Wheaton to get it working.  They have to believe there is no fucking Batphone.  They are the Batphone. 

A Sample Enterprise anti-Batphone Support Call Flow

1) Some important app is broke
2) Open-source monitoring software alerts the app owner/developer team 
     – Operations sees that same alert and opens a ticket
3) Internal support engineer starts working on it
     – logs, processes, system level debugging
4) Something is broken with the code (a bug)
     – Look at the damn source code, you have it!
5) Commit code changes with the fix to your CI/CD pipeline.
     – Push to prod for the fix
     – close the internal support case.

Paying for Support with an Open Source Vendor 

Paying for support with a vendor that has an open-source version of it’s product is truly a win-win.
You get the luxury of official support.
The open-source vendor gets to make money and you still get the source code.
What if you cannot fix the bug with your internal staff?  What if you need to pay for support?
No problem, do the following: 

1) Locate a vendor that provides the product platform you need as open-source + support
2) Pay these good people for support.
3) Use their product and do the following for support issues
          – Open issues on Github for problems and bugs
          – Fix the bugs yourself and submit pull requests directly to the vendor github
4) Influence product roadmaps by submitting pull requests and collaborating with the vendor:
          – this can be from your own internal staff if they are skilled enough.
          – this can be through skilled contractors that you pay to develop modifications to submit as pull requests to the vendor.
          – this can be through outright paying the vendor for adding the features directly.
5) If for whatever reason the open-source vendor decides to stop working on the project, closes shop, or will not accept your pull requests for product modifications, you still own the source code.
          – you can take over the project
          – you can fork the project and do your own thing

This is an awesome new world with loads of new freedom.  Enterprises need to embrace this and control their destiny by using open-source software and using open-source vendors.  You should have the power to be your own bat phone.  When you call Batman, you are in essence calling yourself. 
















Being an Apprentice is better than a Scrum Master (Tools do not make thou work with Agility)

Buzzwords. They seem to make their way into the vernacular via the process of regurgitation from folks farther and farther removed from the source. They start out as simple ways for tech folks to quickly communicate a concept to other tech folk and eventually become shorthand for explaining something to the business. The process of using analogy to describe something is not a bad thing, but when a shorthand term itself creates a cottage industry, one might want to dig deeper. In particular, let’s talk about the now Hackneyed term “Agile.” With the growing drumbeat of organizations trying to squeeze more efficiency out of teams, they have turned to the “magical pixie dust” delivered through the now buzzword of Agile. Heck, even people that helped create the Agile Manifesto are now having to respond to it (see Dave Thomas post here). All these certified scrum-master courses that are popping up like weeds in a bed of mulch after a rain are perhaps the tipping point for the overuse of a shorthand term.
Look, the guys that put together the Agile manifesto were just trying to cut through all the bullshit that happens in traditional corporate spaces to just ship some decent software fast. Taking simple concepts and creating a certification (Scrum Master) for it seem counterintuitive to me since, if it were actually simple, then why would there be a need for a certification rigor?  It makes more sense to me to have a mindset of being an apprentice where you are constantly reminding yourself how to work efficiently by….

  • Valuing Individuals and interactions over processes and tools
  • Preferring Working software over comprehensive documentation
  • Valuing Customer collaboration over contract negotiation
  • Responding to change over following a plan

The reason the guys behind the Agile Manifesto came up with these simple terms, was precisely because all the damn project managers at large organizations were so locked into the doing massive “waterfall” style project plans and Gantt charts and dedicated people to do technical writing in bloated 6MB Word documents that software developers just couldn’t work quickly and without friction. Dave Thomas made the point that people should stop using the term Agile as perhaps a noun, and go back to using it as a action statement more akin to an adjective.
Dave says the following brilliant statement: “It’s easy to tack the word “agile” onto just about anything. Agility is harder to misappropriate.” So for him you aren’t and agile programmer – you are a programmer that programs with AGILITY. You don’t work on an agile team, you work on a team that exhibits AGILITY. You don’t use “agile tools”, you use tools that enhance your AGILITY.

Since I myself have been been trying to make people on my team act and work with more “agility”, I try to keep myself tuned into ideas and people that “get it.”  So when someone tries to help convert traditional IT project managers and technical managers into working with more “agility”, it is worth taking note. Kamal Manglani’s book is trying to do just that. He is trying to get people to work with more agility. Having worked with Kamal in the past, I can tell you that he does get it. Kamal starts shaking his head when standups are taking too long and people turn them into bitchfests or psuedo spikes. He is known for being a bulldog that will search people out and sit with them at their desk in person to solve a problem and won’t leave them alone until the problem gets attention or resolution.  He’s the guy that frown’s on large bloated Word Docs in favor of a wiki or a readme file in Git. His book at least doesn’t misuse the term agile, it’s called ‘The Apprentice and the Project Manager‘.  So, while the devs hopefully get it, at least someone is trying to get those pesky project managers thinking and acting with more AGILTY, not being “agile”.  If this helps teams ship things faster and gets teams more productive by reducing the number of folks in their path, I’m cool with it.  Since we have to keep reminding ourselves how to do work with agility, perhaps being permanent apprentices is better than a scrum “master” of anything.  Is anyone ever a master?

Thoughts on having an opinion on tech

regurgitate != knowledge 
explore+test = have_an_opinion 
requires_work = 1 
sed -i 's/no_time/make_time/g' /my/schedule