Feb 9
Encrypting data with AWS KMS
icon1 Darrell Mozingo | icon2 Uncategorized | icon4 February 9th, 2018| icon3No Comments »

We recently implemented a new personally identifiable information (PII) storage capability in our app at work. Here’s how we handled encrypting the data at rest.

Key Management

Strong encryption algorithms are well known and widely available in most languages & frameworks, so the larger complexity with encryption these days comes down to managing the master keys used in the algorithms. How do you store the keys securely, restrict access to only authorised applications and use cases, deal with the fallout of accidental leaks, rotate them on a regular cadence to add an extra hurdle for attackers, and a long list of similar challenges?

Historically one of the more secure, though expensive, options for key security was a hardware security module (HSM – you may have seen one make a recent appearance on Mr. Robot!). In the cloud world we have CloudHSM and AWS Key Management Service (KMS). KMS is essentially CloudHSM with some extra resiliency handled by AWS, and offered for a cheaper price via shared HSM tenanting.

The main idea behind KMS (and its underlying HSM functionality) is that the master key never leaves its system boundary. You send the raw data that you want encrypted to KMS, along with an identifier of which key you want to use, then it does its maths magic and returns some encrypted gibberish. This architecture greatly simplifies what you need to think about with regards to potential attack vectors and securing the master keys. KMS is fully PCI DSS compliant and they have a detailed whitepaper describing their various algorithms and internal audit controls to safeguard master keys, if you want to geek out over it (smile) AWS also offer a best practices guide which we’ve followed quite closely through all this work.

Envelope Encryption

The challenge with KMS is it limits you to encrypting only 4kB of data for performance reasons (you don’t want to send it a 2MB file to encrypt!). That’s fine if you’re encrypting smaller strings like passwords, but for larger amounts of data you have to use a pattern known as envelope encryption. Here’s the encryption flow:

  1. Ask KMS for a new data-key, specifying which master key you want to use
  2. Get back both the clear-text data key, and that same key in an encrypted form using your specified master key
  3. Send the clear-text data key and whatever you want to encrypt to your magical encryption algorithm
  4. Get back your fully encrypted data from the maths magic
  5. Bin the clear-text data key, you don’t want it hanging around in-memory or elsewhere!
  6. Save both the encrypted data-key from KMS and your encrypted data to some data store

Notice that we ask KMS for a new data-key for each encryption operation, so every user gets their own personalised key! That’s a really nice perk we get for free with this setup. So even if we had a data leak, each user is encrypted with a separate key that’d need cracked. The only thing that gives access to all the data is the master key, which is sitting comfortably inside KMS and never leaves. For performance and cost reasons AWS recommend caching those data-keys for a small number of reuses (ie 5 minutes or 10 users), which we’ll look into as we grow the user base and hit scaling issues.

Decryption is basically the reverse of the above:

  1. Retrieve the encrypted data-key and your encrypted data from your data store
  2. Ask KMS to decrypt the small encrypted data-key using the specified master key
  3. Get back the clear-text data key
  4. Use the clear-text data key to decrypt your encrypted data using the same algorithm you used before
  5. Get back your decrypted data
  6. Send the decrypted data to your user

This whole process is actually exactly what credstash does under the bonnet!

Master key rotation inside KMS is handled automatically by AWS. The ARN identifier of the master key we use is internally wired to an actual key id that we don’t see. On a yearly basis, AWS can create a new master key and automatically associate it with the public ARN. New encryption calls get the new key, and old users will still get their original master key as KMS keeps all the old ones around indefinitely (all tied to the ARN you use when talking to KMS). We have the option of manually rotating if we want to manage this process ourselves for more frequent rotations or in case of a security breach. This process relies on an alias to point to a new generated key, a full re-encryption of all our data, and removing the old key.

The issue of our control over the master key has also been discussed. If KMS completely broke, AWS went bust, or we, more likely, decided to move encryption providers, we don’t have access to our master keys. We can’t export them, an intentional design decision of KMS. We can generate our own set of keys and use it alongside KMS (or import them into KMS), but this raises many of the same issues KMS is designed to address – how do we secure these keys, guard access to them, etc? For the time being we have decided to accept the risks and rely on KMS completely, knowing that a change would require a full re-encryption of all user’s data.

Encryption Context

Authenticated Encryption with Associated Data (AEAD) is a class of encryption algorithms that help solve the problem of “even though I know my encrypted data is read proof, how do I know it hasn’t been moved about or tampered with”? For example, we can encrypt a user’s profile, but if someone has access to the data store itself and copies the whole encrypted blob from some random record to their record own, then log in with their normal credentials, they’ll see the other user’s data! Most popular encryption algorithms and tools use AEAD to mitigate this, including TLS, IPSec, and SSH (using hostnames). Encryption Context is KMS’ form of associated data that, in part, helps solve this issue. Think of it like a signature for your encrypted blob, helping verify it hasn’t been messed with.

Encryption Context is a plaintext key/value pair that is stored, in the clear, and cryptographically associated with your encrypted blob. These key/value pairs shouldn’t be secret, and for auditability are logged along side all encrypt/decrypt related operations via CloudWatch. For the value you’ll typically use something related to the context that can’t change, such as a record’s primary key or a message’s destination address. We use the user id. This way if some nefarious internal admin moves that encrypted blob to their record, the id’s won’t match and it’s still worthless to them.

Resiliency

We have some super secure encryption with all the above bits, but it does us no good if KMS itself is down, unreachable, or unacceptably slow in a given region. Users wouldn’t be able to create, update, or read any of their personal information! Remember also that the master key, which ultimately encrypts all of our data-keys, can not leave a given KMS’ regional boundary, so two different regions can never have the same master key for us to rely on. How can we support failover without duplicating the encrypted text in each region, and therefore increasing our storage costs? It’s envelope encryption back to save the day!

  1. Ask our local KMS for a data-key, just like we did for encyrption in the previous diagrams
  2. Send that clear-text data-key to other remote regions KMS regions, asking each of them to encrypt only the small data-key with their own master key
  3. Encrypt the data with the clear-text data-key as usual
  4. Store the encrypted data-key for every region, along with our encrypted data itself, in the data store

Since we’re only storing that extra 4kB worth of encrypted data (the data-key) per region, the overhead of extra regions is minimal. This allows us to try our local KMS region to decrypt, and if it fails for whatever reason, try the next in the list using the encrypted data-key from its region. No matter which KMS region we use, we get back the same clear-text data-key, which we use to decrypt our encrypted data. Nice!

We use a great encryption SDK provided by AWS to do most of the heavy lifting listed in this article. It doesn’t support multi-region encryption just yet though, so we do that ourselves. We also added a simple minimum region setting so we’re not going all over the world encrypting, just a couple of extra regions in the geographic area. Doing this in parallel and other enhancement are possible, but unneeded so far.

Wrap-up

We have learned a lot about KMS while working on our new identity system, including usage patterns, failure scenarios, key permission management, and a host of other topics around encryption. Hopefully it’s never put to the test from a data leak, but we’re confident it’ll protect our user’s personal data if needed!

Dec 26
Starting a new job
icon1 Darrell Mozingo | icon2 Uncategorized | icon4 December 26th, 2016| icon3No Comments »

I started a new position with Skyscanner during the summer. It made me realise that over the years of starting new jobs, and of course being on teams when others joined, I’ve made and seen plenty of annoying mistakes that hurt relationships, trust, and respect with colleague  before they even had a change to form. I think this is especially true for senior engineers, where you come in with more experience and opinions, and perhaps more of a desire to “prove” your new role/salary. Here’s some tips that might help both yourself and your new team with on-boarding to the new job:

  1. Humility – You were hired in part because you’re good at your job, having the right set of experience and technical skills that the organisation needs. Don’t let it get to your head though. Remember that you’re going into this organisation in part to learn from some other smart people, and there’s a lot you’re not going to know. Perhaps certain ways of doing things from your last role are done differently & better here. This is particularly important if you were coming from a “big fish small pond” situation as I have a few times. That skill reset towards the bottom is the best way to learn, and I feel staying humble keeps you the most open to it.
  2. Don’t try changing everything day 1 – Sure, fix problems you see and make things better – it’s part of the reason you were hired. Just take a breather and don’t try to fix all the things on day 1! It’s part taking time to understand why things are in their current state or are done a certain way, and part building relationships/trust/respect with your new colleagues before going forward. Coming in on day 1 and trying to change a good chunk of things will put your colleagues on the defensive, making the possibility of introducing change that much harder.
  3. “Lightly” question everything – Similar to above, do question things. Don’t just assume things are already done the best way possible, but ask why they are that way. If you think it can be improved, don’t push back too much right away but take note and come back to it after a bit, when you’re more familiar with the team and system. Don’t push back on the decisions or your opinion too much right off the bat.
  4. Prime Directive – Taking a nod from the retrospective prime directive, I’ve found it the most conductive way to approach new systems/code/processes. It’s best to assume things were done for a valid reason in their given context, not maliciously. Our profession, the new business you’re at, personal skill levels, and most importantly technology itself, are constantly changing. What’s “correct” today in any of those areas are bound to be “incorrect” next week. Even the decisions you’ve made in the past for these areas probably seems like crap to you now. ORMs, WebForms, heavy handed message broker SOA… these were all valid & common decisions at different points in time. Maybe the system you’re coming onto didn’t implement these patterns perfectly, but did we always get it right ourselves? Rather than slag off the code or the anonymous author, try to realise that they were like you right now, trying to make the best decision in their context. Perhaps try to understand what those contexts were, then take it for what it is, and move on with improving things.
Jul 14
Retrospective tips
icon1 Darrell Mozingo | icon2 Uncategorized | icon4 July 14th, 2015| icon32 Comments »

My friend Jeremy wrote an excellent post about spicing up retrospectives. I started writing this up as a comment to post there but it got a little long, so thought I’d break it out as a blog post.

Jeremy’s experiences mirror mine exactly from running and participating in many retros over the years. Actively making sure they’re not getting routine and becoming an after thought is an absolute must. Here’s a few additional tips we use to run, spice up, and management retros:

  • Retro bag: We keep a small bag in the office filled with post-its, sharpies, markers, bluetack, etc, to make retro facilitator’s lives easier – they can just grab and go. We also have a print copy of Jeremy’s linked retr-o-mat in it.
  • Facilitator picker: A small internal app which lets team enter their retro info and randomly select someone to facilitate. It favours those who haven’t done one recently and are available for the needed time span. Sure saves on walking around and asking for a facilitator!
  • Cross-company retros: We’ve gotten great value out of doing larger cross-company retros after big projects. These are larger (upwards of 20 people) representing as many teams involved as possible (developers, systems, product owners, management, sales, client ops, etc). We used the mail box technique Jeremy mentioned and had attendees generate ideas beforehand to get everything in, limiting the retro to 1.5 hours. Making sure everyone knew the prime directive was also a must, as many hadn’t been involved in retro’s before. Actions that came out ended up being for future similar projects, and were assigned to a team to champion. Sure enough they came in very handy a few months later as we embarked on a similarly large project.
  • Retro ideas: (don’t remember were I got these, but they’re not original!)
    1. Only listing 3 of the good things that happened in a given period. At first I didn’t think focusing purely on the good would result in any actionable outcomes, but the perspective brought about some interesting ideas
    2. Making a “treasure map” of the retro time period, with some members adding a “mountain of tech debt”, “bog of infrastructure”, and “sunny beach of automation”. Fun take on the situation to get at new insights
    3. Amazon reviews of the period with a star rating and “customer feedback”
    4. I’m excited to try out story cubes at the next retro I run – sounds good!