Monthly Archives: September 2014

5 Top tips for Excellent Graphs

Following on from my discussions of the design ideas of Edward Tufte and before my discussion on mathematics, maths and graphs I give you my 5 top tips for excellent graphs!

1. Decide your Statement: Absolutely essential! What is it you want to say about the data set you are discussing? Although graphs can show a huge amount of detail and provide fantastic insights into the behaviours being investigated, it is essential when presenting a graph to have one idea of the statement you would like your audience to take away from it. You might find they take away more, but they have to take away at least that argument. Apply this statement to your graph and make sure that this is what it says, remove anything in the graph that distracts from it.

2. Could a table represent the data better?: Why are you including this graph? Just because you have a nice data set and think a bar chart would break things up? Because you want to show off the fact that you have worked out how to do histograms?! NO! STOP! Your job as a data scientist is not just to produce indefinite numbers of graphs so your inner quota is fulfilled. Your job is to analyse the data on hand and allow your audience to fully understand your work. A table may well be better for this than a graph.

3. Remove chart junk: Remove all those grid-lines! why do you have all those colours? Remove background colours, data point colour variations, data point shape variations. Remove data labels! Simplify your legend!

4. 3D Visualisations: 3D visualisations should only be used to represent things that are actually 3D! If you’re showing an MRI of a patient or temperature fluctuations throughout your manifold go ahead. But 3D visualisations should not be used to make things look cool, keep your audience awake or fill more space on your page.

5.Colours: The most effective way to draw the eye to the story graphs tell is to use colour distinctions. Use of strong bold colours against a background of muted colours makes data stand out. Too many colours will distract the reader and the graph will become unreadable. For data sets of trials where one variable is changed, a change in hue of one main colour can aid in describing the relationship between the variable and the output.

Scottish Independance: Analytics Review

Last Thursday, 85% of the population of Scotland voted on whether Scotland should become and independent country. Today the results are revealed and represented online through various media outlets. Below is my review of how each site has done in terms of data representation.

scotland donut
http://www.theguardian.com/politics/ng-interactive/2014/sep/18/-sp-scottish-independence-referendum-results-in-full

Pros: Clear, concise, informative. Cons: Slightly boring, could show more information on population. The yes/no nature of this vote makes the data quite easy to represent and allows for use of the sister to the pie-chart: the doughnut!

scotland councilsNice easy to understand and interactive visualisation. Unfortunately it makes the vote look far more “NO” than “YES”, but does allow for clear comparison of councils.

scotland bar fthttp://www.ft.com/ig/sites/2014/scottish-results/

Weighted bar charts of councils from the financial times: Pretty poor show from ft, their visualisations are usually much more interesting than this. However, a good use of populations of different councils. Difficult to quite compare how this effects the whole vote.

scotland interactive poll ft

http://www.ft.com/cms/s/2/2a5bdce0-c4a4-11e3-b2fb-00144feabdc0.html#axzz3DltADIgK

Interactive graph showing poll results of yes/no/undecided as the campaign has proceeded. This is more like it FT! good use of colour and lots of poll results neatly displayed. Allows for comparisons between different polls taken on the same date, overall comparison of how the campaign has changed the vote. Very interesting that they included the “undecided” vote in the mix as this has been one of the pivotal points of the campaign strategy on both sides.

scottish obrian map

http://vis.oobrien.com/indyref/

A nice interactive map from a researcher at the UCL department of geography. Good use of colours that show the yes/no vote in a better way than the Guardian visualisation. Excellent use of metrics (turnout % etc) in the bottom corner. Text is maybe a bit too large, but I’m picking holes really. The really fantastic thing is the display of the vote, the time at which the vote was collected and the previous SNP vote! A very interesting take on how the campaign has potentially effected the SNP vote.

scotland interest worldwide

http://marcellison.com/bbc/tweetmap/index.html

Another map and another take on the data. This time, showing how interested the rest of the world was in the debate. Nice use of size and colour for markers, with muted world map so as not to deter the reader from the point in hand.

Malcom Gladwell and the Art of Graphs

Detective of fads and emerging subcultures, chronicler of jobs-you-never-knew-existed, Malcolm Gladwell’s work is toppling the popular understanding of bias, crime, food, marketing, race, consumers and intelligence

If you’ve not read Malcom Gladwell’s books, or his articles in the New Yorker he’s worth a look. He is very good at combining data and social critique in a way that is immensely readable. He has a habit of complimenting the reader and taking them on a journey to the other side of his original argument which is very entertaining. However, what I’m here to talk about today is his work “David and Goliath” in which, he effortlessly presents some nice little snippets of data. If Edward Tufte read his work would be quite pleased (see my previous My Data Heroes Post – also more on old Tufte later in this series).

Gladwell takes tufts ideas of concise data visualisations and uses them really well. At one point he is explaining that not all relationships are non-linear (easy to understand if you’ve done a little more than secondary school stats). To do this, he shows a linear graph which is just a simple line with two axis and the upper and lower limits. No grid lines, no “vibrations”, no¬†excessive labelling or useless legend. Then he does the same but, instead he shows a parabola curve. Easy.

He has encaptured the idea behind Tuft’s work. Decide what you want to tell your audience, then tell them that and don’t complicate it with anything else. Then, they will understand what you are trying to say. This, although immensely simple is incredibly difficult for the detail orientated data scientist. But, if they can capture this idea and work it into presentations and reports, they will reap rewards.

Multi Tasking with Running VBA

A large amount of work I do is in excel, be it simple tables, pivot reports or VBA Macros. One hurdle I’ve been facing recently is how to overcome the problem created when excel is running a macro and you have other work to be done in excel.

Say your macro is used to run through a big loop and whilst you’ve improved it for speed and simplicity (see my Top Tips for VBA Run Speed), it is doing a big job and big jobs take time! This is often a brilliant excuse to go make yourself a cuppa and check up on the water cooler gossip. Regular breaks make you work better and stop you turning into a Square Eyed Zombie. However, I have cracked a way to work with excel whilst it’s working on your number crunching.

Super simple:

1. BEFORE you run your macro, open up a new window of Excel. This is not a new document, but a whole new window. Do this by: Right clicking on the excel icon, (which I’m sure you have pinned to your taskbar if you’re a geek like me) and click the icon marked “Microsoft Excel”. Check this has happened properly by trying to view two different documents in the two windows (you should have two unique sets of menus)

2. Run your macro in the first window

3. Work as normal in the second window!

These two windows aren’t linked so when your macro is finished you’ll have some limits when working with both windows (e.g. coping and pasting will only paste values and not formulas).