Change of tack and new reading materials

My mission has hit a bump in the road.  I have clearly made a pigs ear of my Riemman and Graphite exercises – I have been totally spammed by emails from one of my EC2 instances.  Thousands of emails, so many that I think Google has disabled my alerting email account.  That hasn’t helped as I now get spammed by the bounce backs.

This has lead me to the conclusion that I have gone too deep into the weeds.  A little too much too soon.
As a result I have made a decision to tear down my six instances (three for Riemann, three for Graphite – this will make more sense to those who have gone through The Art of Monitoring) and instead restart The Docker Book.  It is motivating me as I will not be starting from scratch.

As per my previous blogs, I intend to get to the point where I have Docker images for Riemann and Graphite, test them, then roll out across multiple instances.

It’s not been a lot of fun copying and pasting config from one iTerm window to another.  Of course, there have been many lessons learnt from building these by hand.

Adding to a feeling of being burnt out, I have decided to change approach with regards to my reading material.  I had got into the habit of reading mission-related materials on every commute apart from Monday mornings.  I commute Monday to Thursday, that’s seven session a week.

So this week I gave up and read for fun.  Next week I am going to start The Goal and read it alongside David Bowie, A Life and try and set a more sustainable pace.


Two new Art of Monitoring Riemann issues

I have been adding Riemann alerts for high CPU/disk usage/memory and encountered two issues that need fixing.  That’s the downside.  The upside is my Clojure is improving.

Anyway, error 1:

ERROR [2017-09-16 17:09:59,617] main - riemann.bin - Couldn't start
clojure.lang.Compiler$CompilerException: java.lang.RuntimeException: Invalid token: /percent_bytes-used, compiling:(/etc/riemann/riemann.config:58:52)

No idea what is going on there.  Can be hard to google this stuff.  Might be a typo somewhere.

And error 2:

WARN [2017-09-16 17:56:17,775] defaultEventExecutorGroup-2-1 - riemann.config -$mailer$make_stream__9273$stream__9275@552dcd79 threw
com.sun.mail.util.MailConnectException: Couldn't connect to host, port:, 25; timeout -1

Wonder what is going on there?  Same stuff works on other two instances.   Another argument for Dockerising the whole thing.  Did wonder whether this due to one of my nodes spamming gmail.  Tomorrow’s problem.  Made some progress but now time for a beer.

Book Blog 1 “Release It!”

I have decided to blog about some of the books I have been reading.   Reading is a vital part of my mission.

I am not going to “review” books, I am going to comment on them and how they have related to my education.

I am about to finish Release It! – author’s exclamation mark, not mine.  When the same book is referenced in a few other books you have been reading, it is probably worth a look.  I think that maybe I have tackled this one a bit too early.  It definitely taught me a few things, but it went into some technical depths that I am equipped to deal with yet.  It is also one of the few tech books that has made me laugh.  More than once.  It has helped me make the case for failing fast, timeouts, bulkheads and circuit breakers on a current assignment too.

Next up, I may continue with Site Reliability Engineering.  Other options are re-reading The DevOps Handbook.  I finished it in April but given it was such an easy read, I am going to try it again to see if it offers any insights I missed last time around.  I am also considering re-reading The Docker Book for similar reasons.

Continuing Riemann logging madness and Scaffolding

My full disks and Riemann logging issues continued over the past few days but appear to have been calmed.  Sadly I am not sure why.  I have a couple of theories though.

Firstly, after running through section 6.2 of The Art of Monitoring (checking processes are running), I pasted in new riemann.config files.  Not 100% sure but wonder if that has corrected a previous error – all the more reason for automation/Puppet/Docker eh?

Which brings me onto theory two.  I wonder if I stopped midway between sections and needed do further work to stop this happening.  This has happened to me before with The Docker Book when I exposed the Docker API publicly.   That’s a subject for another blog.

Of course, I may not have fixed this issue at all.  If I have, I would like to know what the fix was.  Chapters coming up will graph disk usage I believe.

On another note, it has occurred to me that there is probably huge value in revisiting the books I have gone through recently now that I know more.  That kind of makes my heart sink given my mission’s target date.  I have a continuing sense that I am learning different things to what the books intend.  Still learning in this space is surely going to be useful.

Finally, this pulled up outside my house this week.  Someone is trying to tell me something.  Insert your own Unikernel joke here.

The Medusa Touch

Over the past two weeks I have become a bit like Richard Burton in  The Medusa Touch.  In this film he plays a character who has visions of disasters before they happen.

It seems that I only have to read about how unwise it is to share a database between customer and reporting traffic in  Sam Newman’s Microservices  before a slow running reporting query creates issues for customers.

On another occasion, I read  about circuit breakers and fail-fast timeouts in Release It! and almost immediately afterwards hit an issue that would have been avoidable if circuit breakers and fail-fast timeouts were in place.

And then, shortly after listening to three principles of CI on this podcast (whilst dog-walking, naturally), I run into issues with devs checking in code whilst the build pipeline is down.

For the time being, my team have asked me to stop reading about things that can go wrong or at least warn them in advance.

riemannmc still in a bit of a state

Realising that I am learning a lot about Unix and infra, but could learn this without ever learning anything about DevOps.  I can see where DevOps could make my life easier but the books I am following remove automation, duplication in order to explain config and the like.

Anyway my Riemann Mission Control is still in a state even when I free up disk space.  In today’s lesson I was setting up the collectd write-riemann plugin and had errors as follows on my problem-child host:

/etc/collectd.d$ sudo service collectd start
 * Starting statistics collection and monitoring daemon 
collectd ERROR: lt_dlopen ("/usr/lib/collectd/write_riemann.
so") failed: file not found. The most common cause for this problem is missing dependencies. Use ldd(1) to check the dependencies of the plugin / shared object.

This lead to my first contact with the ldd command which revealed: => not found

Finally with a bit of research the below fixed my issue:

sudo apt-get install protobuf-c-compiler protobuf-compiler libprotobuf-c0 libprotobuf-c0-dev

Having configured this plugin on four ubuntu hosts, I wonder why this dependency was missing on only one.  Two theories.  One – whatever is busting my disk space may be related to the absence of a working dependency.  Two – I may have missed a step on one of the four hosts.

Either way, if had configuration management tools or containerised, automated builds I suspect this error may not have occurred at all or at the very least have been fixable before building four hosts using the same image/container/scripts.

One thought I had was that it would be great to rebuild riemannmc.  Of course this would take a while to retrace my steps unless I had a Docker image to help me out (for example).

The other issue I had today is that collectd on my Red Hat hosts is not logging.  One for tomorrow.

Dog-walking-learning , Riemann mystery continues and the return of LCD Soundsystem

Dog-walk learning

Quiet-ish couple of days on the mission front.  Rugby and work filling a lot of time.  However, in recent months I have learnt to regain time lost on dog-walking by listening to the following wonderful podcasts:


All brilliant in their own ways.  My introduction to James Turnbull and his books came from the DevOps cafe.  The Kelsey Hightower episode was a classic too.  The reminders I add to my phone to look at post-walk grows every time,

If anyone can recommend any more podcasts, please let me know.

Riemann mystery continues

Not had time to look at this in any detail, but issue remains.  I clear out the huge files and get an email the following day when the next log files grows to consume all of my space.

It appears the issue occurs, in Art Of Monitoring speak, on my Riemann Mission Control host.  So maybe other Riemann hosts are spamming it.  We will see.   The other two Riemann hosts and all Carbon hosts are performing well.

LCD Soundsystem are back

Finally, it is always a good week when this lot put out new music.  Not related to my mission but I am sure it will soundtrack much of my studying in the coming weeks.