Sunday, 4 November 2012

Interesting SSL challenge between Dev/QA and production

So as we have another small issue with SSL in of TeamMentor's live server (see Problem loading content via SSL in docs.teammentor.net) I'm trying to figure out a way to make SSL deployment more robust.

At the moment there is already a setting in TM to force SSL redirects for all requests. But one issue that exists, is that normally during Dev and QA we don't test on SSL sites, which means that this setting tends to be disabled. Yes this something that we should be changing, but I really like the current light 'dev environment setup' that TM currently has, and adding SSL to it will increase its complexity.

This is actually one of those cases where the 'individual' solution(s) are easy and understood. The problem is in getting them to work in a streamlined and 'check-list-driven' mode. And that is one of the big challenges of Security. Every change as side-effects and ensuring that all moving parts happen at the same time and in the right order is a big challenge.

So when we (the security-side) say 'you should run all your site in SSL', we should understand that there is a lot more to it than just flip a switch in a webserver. See Etsy's Scaling User Security post for some of the challenges that they had when moving to SSL (and I also agree with their decision that the control of SSL should rely on the application and not on the infrastructure).

Another big problem that exists is getting accurate feedback on what is going on. Our UnitTest coverage in TM is not as good as it should, and we still don't have a live website monitoring with 'is it still ok' service (which is something we need to look into next)

I'm thinking about ways to address this in code:

  • Detect if the current server supports SSL (and don't allow SSL redirects if that is not working)
  • Allow the enable and disable of SSL via a special 'admin only' REST command (which will be a challenge to do if the whole site is currently redirecting to SSL)
  • Create a special 'Current Stats' REST command that provides information about the current server
  • Find if there is a way to access IIS logs from TM so that we can use TM to detect that something is going wrong with IIS (for example the 404 errors in the current problem are IIS one, and they never reach the .NET Http pipeline
  • Create a set of SAST rules specific for each release so that these 'check-list' items can be checked in a programatically fashion (and part of the deployment workflow)