5 Top tips for Excellent Graphs

Following on from my discussions of the design ideas of Edward Tufte and before my discussion on mathematics, maths and graphs I give you my 5 top tips for excellent graphs!

1. Decide your Statement: Absolutely essential! What is it you want to say about the data set you are discussing? Although graphs can show a huge amount of detail and provide fantastic insights into the behaviours being investigated, it is essential when presenting a graph to have one idea of the statement you would like your audience to take away from it. You might find they take away more, but they have to take away at least that argument. Apply this statement to your graph and make sure that this is what it says, remove anything in the graph that distracts from it.

2. Could a table represent the data better?: Why are you including this graph? Just because you have a nice data set and think a bar chart would break things up? Because you want to show off the fact that you have worked out how to do histograms?! NO! STOP! Your job as a data scientist is not just to produce indefinite numbers of graphs so your inner quota is fulfilled. Your job is to analyse the data on hand and allow your audience to fully understand your work. A table may well be better for this than a graph.

3. Remove chart junk: Remove all those grid-lines! why do you have all those colours? Remove background colours, data point colour variations, data point shape variations. Remove data labels! Simplify your legend!

4. 3D Visualisations: 3D visualisations should only be used to represent things that are actually 3D! If you’re showing an MRI of a patient or temperature fluctuations throughout your manifold go ahead. But 3D visualisations should not be used to make things look cool, keep your audience awake or fill more space on your page.

5.Colours: The most effective way to draw the eye to the story graphs tell is to use colour distinctions. Use of strong bold colours against a background of muted colours makes data stand out. Too many colours will distract the reader and the graph will become unreadable. For data sets of trials where one variable is changed, a change in hue of one main colour can aid in describing the relationship between the variable and the output.

Advertisements

Scottish Independance: Analytics Review

Last Thursday, 85% of the population of Scotland voted on whether Scotland should become and independent country. Today the results are revealed and represented online through various media outlets. Below is my review of how each site has done in terms of data representation.

scotland donut
http://www.theguardian.com/politics/ng-interactive/2014/sep/18/-sp-scottish-independence-referendum-results-in-full

Pros: Clear, concise, informative. Cons: Slightly boring, could show more information on population. The yes/no nature of this vote makes the data quite easy to represent and allows for use of the sister to the pie-chart: the doughnut!

scotland councilsNice easy to understand and interactive visualisation. Unfortunately it makes the vote look far more “NO” than “YES”, but does allow for clear comparison of councils.

scotland bar fthttp://www.ft.com/ig/sites/2014/scottish-results/

Weighted bar charts of councils from the financial times: Pretty poor show from ft, their visualisations are usually much more interesting than this. However, a good use of populations of different councils. Difficult to quite compare how this effects the whole vote.

scotland interactive poll ft

http://www.ft.com/cms/s/2/2a5bdce0-c4a4-11e3-b2fb-00144feabdc0.html#axzz3DltADIgK

Interactive graph showing poll results of yes/no/undecided as the campaign has proceeded. This is more like it FT! good use of colour and lots of poll results neatly displayed. Allows for comparisons between different polls taken on the same date, overall comparison of how the campaign has changed the vote. Very interesting that they included the “undecided” vote in the mix as this has been one of the pivotal points of the campaign strategy on both sides.

scottish obrian map

http://vis.oobrien.com/indyref/

A nice interactive map from a researcher at the UCL department of geography. Good use of colours that show the yes/no vote in a better way than the Guardian visualisation. Excellent use of metrics (turnout % etc) in the bottom corner. Text is maybe a bit too large, but I’m picking holes really. The really fantastic thing is the display of the vote, the time at which the vote was collected and the previous SNP vote! A very interesting take on how the campaign has potentially effected the SNP vote.

scotland interest worldwide

http://marcellison.com/bbc/tweetmap/index.html

Another map and another take on the data. This time, showing how interested the rest of the world was in the debate. Nice use of size and colour for markers, with muted world map so as not to deter the reader from the point in hand.

Malcom Gladwell and the Art of Graphs

Detective of fads and emerging subcultures, chronicler of jobs-you-never-knew-existed, Malcolm Gladwell’s work is toppling the popular understanding of bias, crime, food, marketing, race, consumers and intelligence

If you’ve not read Malcom Gladwell’s books, or his articles in the New Yorker he’s worth a look. He is very good at combining data and social critique in a way that is immensely readable. He has a habit of complimenting the reader and taking them on a journey to the other side of his original argument which is very entertaining. However, what I’m here to talk about today is his work “David and Goliath” in which, he effortlessly presents some nice little snippets of data. If Edward Tufte read his work would be quite pleased (see my previous My Data Heroes Post – also more on old Tufte later in this series).

Gladwell takes tufts ideas of concise data visualisations and uses them really well. At one point he is explaining that not all relationships are non-linear (easy to understand if you’ve done a little more than secondary school stats). To do this, he shows a linear graph which is just a simple line with two axis and the upper and lower limits. No grid lines, no “vibrations”, no excessive labelling or useless legend. Then he does the same but, instead he shows a parabola curve. Easy.

He has encaptured the idea behind Tuft’s work. Decide what you want to tell your audience, then tell them that and don’t complicate it with anything else. Then, they will understand what you are trying to say. This, although immensely simple is incredibly difficult for the detail orientated data scientist. But, if they can capture this idea and work it into presentations and reports, they will reap rewards.

Multi Tasking with Running VBA

A large amount of work I do is in excel, be it simple tables, pivot reports or VBA Macros. One hurdle I’ve been facing recently is how to overcome the problem created when excel is running a macro and you have other work to be done in excel.

Say your macro is used to run through a big loop and whilst you’ve improved it for speed and simplicity (see my Top Tips for VBA Run Speed), it is doing a big job and big jobs take time! This is often a brilliant excuse to go make yourself a cuppa and check up on the water cooler gossip. Regular breaks make you work better and stop you turning into a Square Eyed Zombie. However, I have cracked a way to work with excel whilst it’s working on your number crunching.

Super simple:

1. BEFORE you run your macro, open up a new window of Excel. This is not a new document, but a whole new window. Do this by: Right clicking on the excel icon, (which I’m sure you have pinned to your taskbar if you’re a geek like me) and click the icon marked “Microsoft Excel”. Check this has happened properly by trying to view two different documents in the two windows (you should have two unique sets of menus)

2. Run your macro in the first window

3. Work as normal in the second window!

These two windows aren’t linked so when your macro is finished you’ll have some limits when working with both windows (e.g. coping and pasting will only paste values and not formulas).

Top Speed Tips for VBA

Five top tips for improving your VBA speed

1. Write good code: Your script is only going to be as fast as it’s weekest link, create good habits by keeping your code neat, consice and your macro will run fast

2.Screen Updating: Turn off screen updating whilst you work – Added benifit that you wont have a flashing screen as documents are open and closed

3.Automatic Calculations: Turn off these and it’ll stop your formulas recalculating every single time you add to your workbook

4.Enable Events: This is a bit obscure, but it has saved more than one macro from the recycling bin. This is a more complex version of automatic calculations, which might cause you problems, but really improves speed

5. Active Workbooks: Stop activating workbooks, it’s not necessary and will slow your code down. Instead define your workbooks or ranges or what have you and reference the locations.

My Data Heroes

 

Two_women_operating_ENIAC[1]Ada Lovelace and Betty Holberton: The first women in computer programming started as “computers”, they were seen as glorified secretary’s who would switch out leads into sockets and leave the “thinking work” to the men. However, many of these women, including Ada Lovelace and Betty Holberton were pivotal programmers, even in the face of reduced opportunity and sexism in the workplace (and probably in their home environment,) that would make us scream blue murder today.

On her first day of lectures at the University of Pennsylvania, Holberton’s maths professor asked her if she wouldn’t be better off at home raising children [1]

Betty Holberton worked on BINAC, FORTRAN and COBOL (along with another amazing programmer Grace Hopper). She wrote the first statistical package and sorted out the keyboard, which we still use today.

Ada Lovelace is seen as the first female programmer – so much so that she is viewed to have written the first ever computer program. Astounding!

Chandoo

Purna Duggirala: Runs chandoo.org a fantastic site all about excel tips and tricks. His website is fantastically comprehensive, easy to read and inspiring. I am always discovering some new trick with excel from his site and his challenges are both fun and interesting. The real gem of his site is that it is so focused. He doesn’t over complicate and insist you use VBA (although you should sometimes). He shows you the power of excel in its raw form.

1105.yaffa_article[1]

Edward Tufte: You cannot mention data science visualisation without taking about Eddie. His work is part statistics, part design and part angry rant. He is the yardstick by which many data scientists measure their work. Championing concise data rich visualisations that speak to an intelligent audience in a clear manner, with an aim to debunk the phrase “lies, damn lies and statistics”. Many websites have been spawned off his popular phrases: junk chart is just one example.

 Colleagues and Friends: you are my true heroes. Pushing me to learn as fast as you are, to work in sectors as cool as yours, to impress you with my speed and skill and know how. It’s you girls who inspire me. The fact I can talk to you about work and the meaning of life only impresses me further. Some notable examples of the amazing people I know will be featured in future blog posts. Watch this space.

Who are your data heroes? Who inspires you?

[1]http://en.wikipedia.org/wiki/Betty_Holberton

Starting the Journey

Hello data friends,

This blog is a bid to consolidate and profile my learning experiences and interests in the field of data visualisation, programming and excel. My plan is to help others out by detailing the problems I’ve been having in the hope that other people can learn from my mistakes!