The Wrath of Krang: When a deployment doesn't work, and logs don't show anything: Restart it! (AKA It was working on the other QA box!)

Thursday, 6 August 2015

When a deployment doesn't work, and logs don't show anything: Restart it! (AKA It was working on the other QA box!)

PROBLEM:

We have a number of QA (Quality Assurance, ie, testing) boxes. Release candidates (RC) go on a specific QA box, one that is more prod-like. The problem was an application that was working on the other QA boxes but not on the RC QA box.

We tailed the log files while testing the functionality - no errors.

We had a look at the libraries that were deployed. They were as expected.

We manually made calls to the service endpoints using curl - we were getting a 404!

Okay, now what's going on?

Was it a mistake in the jar we released - in the past we've had some code not getting merged into master during release. Opened the jar - yup, contents were there.

Was it a mistake in the URL? Nope. It works on the other QA box.

Now doing a "ps" would usually show the libraries being loaded on the classpath, but this app was different and didn't actually do that.

Could it be that the *old* version was still running? We had no quick way of knowing this. But the fact that the webapp didn't know this new endpoint even though we had the new jar file in the filesystem was pointing to this.

So we restarted the process. That fixed it! Functionality all working either through direct calls via curl or the webapp.

LESSON:

When investigating something that should already be working, the steps should be:

1) Check the logs.

2) Check that it's actually deployed

3) If 1 or 2 don't show anything wrong, restart the application.

Saves so much time.

The culprit was a Jenkins job that deployed the package but didn't successfully kill the old process.

How do we check that the process is running the new code? Our plan is to have an endpoint that can be queried to show release information - so we know the libraries in use by the application currently running. Our Jenkins job should then check this URL after it's deployed the package, and send out an alert if it's not getting the expected results.

4) Have an endpoint showing the state of the running process, library versions and configuration

5) Get Jenkins to check the release info endpoint after it deploys a package

Thursday, 6 August 2015

When a deployment doesn't work, and logs don't show anything: Restart it! (AKA It was working on the other QA box!)

No comments: