This presentation will be about unicorns!
Actually, no. It will be about statistics
But, man, I wish it was about unicorns!
CDS in statistics:
How do we know what our users do?
It's simple, everything is in the logs, right?
Ok, but at least we know about errors and exceptions, right?
But we can use command line to find what we need there, right?
- Hey, can you tell me how many people from US visited this record on CDS in February ?
- Sure, let me quickly search for it.
pv -Webrapt -l apache.log | \
grep 'record/123456' |\
sed -r 's/^(([0-9]+\.){3}[0-9]+) .*$/\1/' |\
xargs -n 1 geoiplookup |\
grep 'US' | wc -l
And that's just for access logs and errors.
To see how users are using our system we had to add 2 more systems on top of that.
Piwik - to see the statistics of our visitors
And Invenio webstat module - for custom statistics (like the number of loans in the library)
Oh, come on! There has to be a better way!
We decided to switch to one system for all our logs.
Since the Elasticsearch was getting more and more popular, we decided to use it.
Elasticsearch is part of the ELK stack:
Write how many machines we have used, how much data we store, what is the load and for long will it last (before we have to scale up)?
Write about custom improvements:
Describe Lumberjack - custom plugin that allows us to send data to Elasticsearch from any place in Invenio
Kibana works very nice for administrators, but we might need something more in the future (probably a module integrated directly in Invenio, with some predefined parameters and Role Based Access Control).
We are still in the phase of transition between the old way (MySQL and Piwik) and the new one (Elasticsearch), as we are still supporting both versions of CDS, but we can clearly see the benefits of this change.
There are no bad parts, but be aware to not treat Elasticsearch as an error-proof black box, then you install once and it works no matter what. There are some resources that will be depleted quicker than other (memory, for example), so spend some time configuring it properly, to make the most of your Elasticsearch installation and avoid troubles in the future.
You were an awesome audience!
I wish I could be there, but you know, I'm taking the last change to see Black Sabbath live, so I think you will understand.
I just hope Esteban won't screw up this awesome presentation.