Author: Oliver O'Brien

Visualisation of Census Data

Post author By Oliver O'Brien
Post date 8 October 2009
No Comments on Visualisation of Census Data

This is the seventh in a series detailing the projects I have worked on at UCL in the last academic year.

My final mashup over the last year is not a “fully working” website, rather an interactive mockup of the sort of web application that I could potentially be building over the next year. It takes some of the data tables from the 2001 UK census, create choropleths of them and shows them on an OpenLayers-powered map. There is the ability to jump to an area by searching on a postcode, and to show, as vector points with name popups, schools. If the site looks similar to some of my other work, that’s because it is – it was cobbled together from existing code, as a “quick” demonstration, rather than being a polished product.

censusprofiler

Data Graphics

Data Graphics and Beer Don’t Mix

Post author By Oliver O'Brien
Post date 7 October 2009
No Comments on Data Graphics and Beer Don’t Mix

Here’s an example of an outstandingly misleading data-graphic, appearing in this week’s LondonStudent freepaper.

beergraphic

It attempts to show the disparity of bar staff pay across London universities, but:

The “empty” pint glass does not correspond to £0.00. To the casual observer it makes it look like CSSD work for free, until the figures are noticed. In fact, the text of the article mentions that, at two other London university bars, the staff do in fact work for free (or for beer – ouch) – but these are not shown on the graphic.
The graphic is a 2D (i.e. print) representation of a 3D object (a pint glass, tilted slightly towards the viewer) but the scale appears to vary in 1D – the values form a straight line across the “glass”. Hence the graphic has a large “Lie Factor”, the concept discussed in detail in E. Tufte’s totemic book The Visual Display of Quantitative Information (p57 for those making notes!)
LSE’s amount bizarrely isn’t represented at all in the graphic, but appears in a text box above it.
The numbers are on their side, even though there is plenty of room to show them horizontally – making the real values harder to read, so the reader concentrates on the misleading graphic representation instead.
The actual levels don’t bear any resemblance to the values – the ordering is correct but the relative value differences don’t correspond to the “beer levels”. For example, the drop between £5.90 and £5.95 is larger than the £5.95 to £6.25 drop.
Why’s beer being used to represent pay anyway?

Data Graphics Mashups

HEFCE Funding Map

This is the sixth in a series detailing the projects I have worked on at UCL in the last academic year.

hefce

This was a quick mashup to show on a map the latest HEFCE funding round – HEFCE is the government body that decides and awards research funding to the universities around the UK.

This is a vector-based mashup, once again using OpenLayers. For each point, representing a university’s “main” campus, I request a pie-chart from the Google Charts API, and use the resulting image as the marker for the point. There’s no simplification or other generalisation, so, for example, you’ll need to zoom in quite far if you want to make out the different universities in London.

It was cobbled together in about a day, the Thematic Mapping blog was particularly useful for getting the images working as markers.

You can see the mashup here.

Leisure

Finsbury Parkrun

[Updated] North London finally has its own parkrun – a free 5K timed race, starting at 9am every Saturday in Finsbury Park. The inaugural is on 31 October.

And unlike Bushy Park, this one has hills…

Update: It looks like a Victoria parkrun might be on its way too, permissions pending… and also a Greenwich parkrun, which could be just as hilly as Finsbury.

Mashups OpenLayers

Manchester Map

This is the fifth in a series detailing the projects I have worked on at UCL in the last academic year.

manchester

This was born out of Alex acquiring a old (1094) map of Manchester showing the different housing types that existed back then. I put together some image tiles he created of the map, in OpenLayers, and combined it with some existing modern OAC demographic map tiles that I had created for a separate project, and Google mapping and aerial imagery. Vectors are not used on the map, but you can switch between, combine and reorder the raster layers, for quick visual comparison between the maps.

Alex blogs the creation process here, and you can see the map itself here.

Orienteering

The Amazing Maze

I ran in a rather unusual orienteering event yesterday – a maze cut in a field of maize. It was the Maize Maze-O Challenge, taking place at the National Forest Maize Maze in the Midlands and organised by Stodgetta of WCH. I’d never heard of maize mazes before, but there are several throughout England. They generally have a new design each year, the crop is planted, the pathways have their crop removed, the field grows and the maize appears, before being cut down for animal feed at the beginning of October.

The format of the race was two qualifying runs, both less than a kilometre straight-line but well over 2km by the shortest path. These were in the afternoon, there was then a pause while the sun went down (and the competitors took advantage of the local catering – burgers from cows fed on last year’s maize) and then the finals – in the dark! A first at night-orienteering for me, but I just about made it around, along with around 100 other connoisseurs of unusual orienteering races!

The map and pic (of Mike G from SLOW entering the maize maze for the night final) is on the WCH website. The GPS trace is of my night final route.

Data Graphics Mashups OpenLayers

HE Profiler

This is the fourth in a series detailing the projects I have worked on at UCL in the last academic year.

The HE Profiler is the last of the three “core” school-profiling map mashups that I have developed over the last year – this has been developed over the last few weeks and indeed was finished only today, my final project of the year.

It is designed to be used by university widening-participation administrators, as a graphical tool to discover and evaluate the schools to target for campaigns to encourage university application. To do this, it makes use of two metrics – the OAC demographics of pupils attending each school, and the POLAR score of their postcode – in simple terms a National Statistics demographic describing the likelihood that people from this postcode go to university.

Again it is powered by OpenLayers, displaying point-based vector information on top of Google Maps image tiles, using NPE data for geocoding postcodes. The most interesting thing about this application is I’ve started to explore the very powerful rule/attribute based symbolisation for points available in OpenLayers. This sort of symbolisation will be, I expect, very useful in my next year’s project. I am very impressed with what can be done – some quite GIS-like properties present in a popular and freely available web application.

heprofiler

The graphic above shows target schools for a central-London university, based on the proportion of POLAR1/2 pupils (least likely to go to universities) compared with the rest. Schools with a majority of pupils in this category are coloured red. The area of each circle represents the number of such pupils present. The poor representation at university of the Thames Gateway region can be clearly seen. As an aside, the OAC demographic, not shown here, does not work well for London due to its size – the OAC is calibrated across the whole UK, and it is likely a more specific demographic analysis for London (e.g. LOAC) for schools there, would be more useful.

Mashups

School Catchments

This is the third in a series detailing the projects I have worked on at UCL in the last academic year.

School Catchments was developed to complement the Education Profiler. Like the Education Profiler, It is powered by OpenLayers, but uses Google Maps tiles as its background rather than custom OpenStreetMap-based ones. For each of the schools in the database, it loads in “catchment maps” for the GCSE and A-Level students. OpenLayers’ vector mapping is used to display them on the Google Map. These reveal the “real” catchment areas for a school, which may or may not correlate with the school’s official catchment area, if available (many schools choose not to publish this information.)

Most schools are roughly in the centre of their catchment area, but geographies, spatial distribution of population – and geodemographics – can all act to distort the contour’s shape. In general, the Roman Catholic and other faith schools have noticeably large catchment areas. Of the regular schools, poorly performing schools often have very small catchments. It should be noted that we only had geodemographic data for the state school sector.

The process of developing the catchment maps was interesting – the known postcodes for each pupil are geocoded and the resulting grid of locations is simplified into a single “contour”, generally surrounding at least 60% of the pupils. One requirement was that only a single contour is produced – problematic when one school serves two small and well spaced villages. The contours were created in R, a statistical programming language popular in the academic community – it’s popular in the department, but quite alien to my Java/PHP/python past. I found the learning curve rather steep and am still a “non-proficient” R coder. Most available documentation for R assumes a level of experience I do not have. Searching for R information in Google is a pain too, thanks to its name.

catchmentgen

The generation process is highlighted in the second half of this screen video that my boss produced for a presentation – in the context of the School Profiler that also included the catchments.

The website is also the first eCommerce site I’ve developed (my boss is planning on the catchment vectors being available only through a subscription.) It was interesting getting to grips with the eCommerce paradigm. After struggling for a while to try and make sense of Google Checkout (having assumed, incorrectly in this case, that Google=easy) I switched to using NoChex as the payment processor. It is impressively simple to use, if not well documented, and I got the end-to-end flow set up in only a couple of days.

The site is still in “test” mode and needs some polishing (and the underlying data updated) before a public release.

Mashups OpenLayers

The Education Atlas

This is the second in a series detailing the projects I have worked on at UCL in the last academic year.

This is a mashup, powered by OpenLayers, and using network data from OpenStreetMap (OSM) to provide a “contextual window” on top of choropleths (colour-region maps) representing various educational attributes. Both the choropleths and the OSM maps were created using Mapnik. Data from the NPEMap project is used to provide geocoding (locating from postcodes). Schools from the ShowUsABetterWay competition are available as a simple point-based vector layer.

This project has been through various iterations before ending up as a (sort-of) finished product. An earlier version was briefly demoed at the GISRUK 2009 conference in April at Durham. This was an “all-singing, all-dancing” mashup, which wowed the judges at the conference (it was entered in, and won, the Mashup Challenge competition) with its many layers and features, but was probably too complicated for the intended end use.

The functionality has been split into three different mashups – the first, the choropleths, form the Education Atlas. The school catchment contours are in a separate mashup, School Catchments, which I’ll talk about in a future post. The detailed metrics about each individual school are in a third application.

The choropleths mainly relate to academic attainment and geodemographic background (for GCSE pupils) and A-Level subject choice. Some interesting patterns emerge, for example French is particularly popular in Kent (funny that…) and Geography is more popular in the rural north of England than in the cities – as shown below. The demographic maps show a characteristic pattern of city poverty/underachievement compared with rural areas.

eduatlas

The resulting slimmed-down application is available at http://atlas.publicprofiler.org/, however it is only soft-launched, as the data is quite old, and there are some noticeable gaps in coverage, particularly in Manchester and Hampshire, where state school pupils generally don’t have any sixth-form provision in their secondary schools.

Noteable features, apart from the bespoke black-and-white “network” layer, are the keys, which change depending on the choropleth selected.

I presented some screenshots of the mashup, and talked about how it was made, at the RGS conference in Manchester, in August.

A screenshot of the mashup forms the banner of this blog.

Data Graphics Technical

Spatial Interaction Modelling for Access to Higher Education

Post author By Oliver O'Brien
Post date 29 September 2009
No Comments on Spatial Interaction Modelling for Access to Higher Education

This is the first in a series detailing the projects I have worked on at UCL in the last academic year.

My main project through the last year has been to test a hypothesis, developed by Professor AG Wilson, that the flows of students moving from school to university can be approximately by spatial interaction modelling (SIM). Put simply, SIM is a variant of the 300-odd year old Newton’s Law of Universal Gravitation, i.e. the attraction between two masses is related by each of their masses and the distance between them. Replace the masses by the numbers of final-year pupils a school, and a university’s capacity, and make the distance decay exponential instead of inverse-square, and that’s the basics of the model. A similar theory has been applied to great effect by Joel Dearden of CASA, in his retail SIM, which has shown a “tipping point” explaining how supermarkets and out-of-town retail developments have become attractive to shoppers over the last forty years.

Of course, it’s a little more complicated than that, and even with the more complex model I’ve tested, a large number of simplifying assumptions have to be made.

The two main extra parameters that are added to the model are (1) that universities have an “attractiveness factor” above and beyond their size. I have used one of the common university league tables to provide values for this factor. And (2) the distance-decay is not uniform across all types of school students, but varies by their background. By splitting up the final-year school students by demographic, the variation in the distance-decay can be seen, and this is used to calibrate the model.

simdecay2b

The seven OAC demographic supergroups are shown here – the horizontal scale is distance and is the same in each graph. (Only English-based school students going to English universities are considered in the study.) The vertical scale is the proportion of students, of that OAC supergroup, in each distance bucket. The actual number of students in each supergroup varies dramatically and this is not shown in the graphs.

The graphs show there is indeed considerable variation between supergroups in the “beta value” of the drop-off if approximated as exponential, and also in the “R-squared” fit to true exponential decay.

Blue collar.
City living – this group strongly favours London, Birmingham and Manchester, i.e. the same or other “big cities” in England, hence characteristic peaks appear at these distances – accentuated by the relatively small school-age population in this group.
Countryside – this group rises before falling, as there is a minimum distance they need to travel to get to even their nearest university.
Prospering suburbs – the lowest beta-value, in other words this group attaches the least importance to school-university distance.
Constained by circumstance – similar to the first group.
Typical traits – the “average” group which encouragingly also has an average looking graph.
Multi-cultural – more distance-sensitive than the others – hence the very steep drop-off. This shows that people living in areas classified as multi-cultural will more strongly desire going to a university that is very local to their home.

Prof Wilson’s theory also factors in the subject that the student is studying (not all universities offer all subjects, and some are most are strong in certain subjects and weak in others), and their attainment at school (i.e. they might really want to study Maths at Oxford, and be at a school very near by, but if they get a D in Maths at A-Level, they aren’t going to be able to do that.)
Universities also come in two types – “recruiting”, where there are more places than students genuinely intending studying there, and “selective”, where there are more prospective students than places. One interesting effect of the recent economic downturn is the massive increase in people applying for university in 2009-10 – UCL saw a 12% increase for undergraduate courses, for example. This has had the effect of making more universities selective.

In order to consider two types in the same model, it was necessary to develop what is known as a “partially constrained” SIM. The details are for a future article, but, put simply, an iterative approach, assigning students to a university and then reassigning the weakest for over-capacity universities, is taken.

I built a GUI in Java – it’s the language I’m most comfortable with for “proper” programming – to quickly visualise the results and compare them with real-life flows. Here’s a bit of it:

simpredicted

This shows the perhaps not very surprising prediction that BIRM7s (multi-cultural school students living in Birmingham) are pretty likely to also go to university in Birmingham (AST = Aston, BCU = Birmingham City University, BIR = University of Birmingham), rather than elsewhere in the country.

When compared with the actual flows:
simactual
…the model under-predicts the flow to Birmingham City University, possibly because BCU’s desirability amongst this demographic group is mis-calibrated. Further-education students are also not present in the predicted model, but are included in the actual flows, so the two are not, as presented, normalised.

The model needs to be developed further before it can be presented formally. In particular, attainment is almost certainly a necessary component.