Adventures in the Dark Web of Government Data

thanks everyone for coming I'm certainly excited to have the opportunity to share with you all some of my my adventures and unreal fascination and passion for public data so yes I'm mark from New York City originally sometimes I go around the city with my laptop and a large antenna and sort of tune into some of the the fun things that can be overheard on the the radio spectrum around the city as mentioned I do a lot of work with kind of public and government data and the company enigma that I started we have a big sort of open source search engine called enigma public of all of this stuff that we we aggregated bring together but I think probably to kick things off it would be helpful to sort of get some clarity on on terms and you know what you know what exactly is government data and does it really have a dark web so it's interesting I think you know one of the easiest ways to think about like this more expanded idea of government data is that it's sort of the thing that's produced every time you come up against or hit regulation in some ways you know the we of course we have these you know sprawling bureaucracies at you know federal state and local levels and every time you you touch them they have a way of kind of kicking off some data exhaust and from sort of reconnaissance and open-source intelligence perspective this can be really good one of the really kind of interesting Maps I think at least at the u.s. federal level to what's going on from a data collection perspective all came out of this thing from 1980 called the paperwork reduction Act and basically what happened in the 70s is there was just a massive proliferation of forms and and sort of you know government information collection instruments of all these things and it rose to the level where the Congress passed the law and with that law said was basically every time the federal government wants to make a new form they have to themselves fill out a form and register it with the Office of Management and Budget which is part of the executive branch and just kind of show you here you know this is a kind of an ordinary tax return 1040 and it has this OMB Control Number in it and this is great anytime you have a federal fat US federal government form it will definitely have an OMB number on it somewhere and you know when a government agency wants to make a new form they've got to apply to the OMB and one of the things they have to do is justify why they need the form and also estimate how you know what are called sort of the burden hours of of the forms so right now there's you know maybe about 10,000 different unique forms that are registered with the federal government and what I find extraordinarily remarkable is that according to the government's own estimates they require eleven point three billion hours each year of people's time to fill out so we can certainly extrapolate that there's a lot of information being produced here just to kind of flag this if there's something that's kind of interesting to you guys and you want to explore further this it's kind of hard to google for but it's called the current inventory report I made just a little bit liya that'll drop you right into the sort of proper government site and it is kind of fun because there are there is like an XML file that has all of this stuff structured in it and and you can go and and play with it and so you know just to kind of flush out like you know what is this real spectrum of information the government's producing you know if anyone's come into the country into the u.s. from abroad you've probably seen this form it's one of the most filled out with over 300 million of them a year things like w-2 so the sort of you know tax form for your if you're on payroll somewhere so the quarter billion of those produced a year I was sort of surprised to see that these fiction Ridge cards or friction Ridge cards actually they're about 90 million of them filled out every year and I suppose it's not all strictly for people being arrested this one that I just found on Google Images is for someone applying to be a pyrotechnic operator so I suppose these things are produced in lots of ways and so those are you know some of the more common forms that are produced but there also is a really long tail here so everything from the 20 or 30 companies that actually fish off the Alaska coast near Russia and have a specific form they need to fill out to the importation of shelled peas from Kenya and things like the petroleum supply reporting system which I'm not sure exactly what it is but does sound like it could be interesting and juicy in some ways and so once you start to know that oh there is a form out there there of course not all public but there's a really interesting tranche of them that are you can start to go out in and collect information so just as an example this is you know what a Federal Election Commission form looks like and I don't know if you can see it but this is like a line item by line item sort of disbursement schedule for all of the things that the Trump administer campaign spent money on so we have a hundred and forty dollar uber credit there I'll talk a little bit more about this data set later but the FCC licenses you know all commercial radios in some way in the country and you can use that data set to actually find every McDonald's drive-thru in the country and also what frequencies it's linked to the restaurant with certainly I'm she has been in elevators you'll see these little inspection placards let's run at the state level but that sort of data that you can get in and learn all about what's going on inside of a building you know certainly aircraft registrations are really interesting and have tail numbers and all sorts of interesting joins you can do with radios this is an example is just the deed for the hotel that we're in and when you take it a step further it's kind of cool because you whenever their building permit applications filed for changes use in the space for renovation there's often sort of architectural drawings and things like that so this is also from the hotel the Department of Labor collects a lot of information on things like this is I think for the OSHA so the Occupational Safety when hazard something a little bit of a sad case of someone who fell down an elevator shaft here but there's a lot of information produced you know h-1b visas I couldn't really show it very clearly here but these are the 14 or 15 or whatever it is h-1b visas that Caesar is applied for 18 and you can see they're mostly tech you know sort of tech programming looking people this I just kind of found and thought was kind of funny it's the 401k plan that DEF CON Communications has for the four people that are enrolled in it and this is actually one of my all-time favorite pieces so this is a customs declaration from the 1960s that the Apollo 11 mission had filed upon coming back with with moon rocks and so you know it just is kind of a lovely artifact of bureaucracy I think and does give us some sort of you know sense of the kinds of things that do appear hidden away in the state end you know I think the takeaway you know that I want to leave you guys with just from kind of having blown through all of that stuff is you know government bureaucracy can really be your friend I think that there's you know certainly a key set of probably sources of government data that are in our toolkits be they you know real estate records or corporate registrations or whatever but this is a really deep and n stand sort of massive well of resources and by kind of thinking about like what are the processes and how does that potentially reflect in data you can start to develop all sorts of new avenues for research and exploration so you know I have a personal interest sort of in software-defined radios and the sort of AM spectrum and I was really curious to sort of see how the public data that's available around usage of the electromagnetic spectrum could be used to serve ask different questions over the world I'm sure this won't be really a surprise to anybody in this room but of course you know radio waves are all around us you know they're sort of in that really cool sort of spectrum that takes us from all the visible light that we see around us to you know the FM stations in our car and the Wi-Fi and all of these things are just waves of different lengths you know of course you know Marconi is often credited as being one of the sort of inventors of radio and it's kind of amazing in its early days it was you know it's not surprisingly like a terribly unregulated and quite of chaotic technology that was you know people were just broadcasting and creating all sorts of interference and actually a lot of the regulatory regimes now that we have in the u.s. are said to sort of come as a result of the sinking of the Titanic in part because the Titanic being a new ship you know did have a radio operator on it and was sending out SOS messages but the kind of thought was that there was so much interference on the sort of land-based stations that a lot of those messages weren't received and so that led eventually in 1912 to the Congress passing what was called the radio act which became sort of the precursor to setting up the sort of FCC regulatory regime that we have today and so what you know so we of course now live in a world where there is a lot more sort of a tension regulation around the radio spectrum and that that's actually really cool and exciting when it comes to trying to understand you know how this spectrum is being used so I'm just curious show of hands has anyone seen this map before so it looks like about maybe 20% of people I'll keep coming back to this sort of throughout the remainder of the talk because it's I think a really good sort of touchstone to understand how you know how a lot of these things are existing next to each other so if you see this it's I know a little difficult to get with much detail on the screen but it it good is basically from maybe three kilohertz all the way at the top to like 300 gigahertz all the way at the bottom and each one of those little blocks is a basically you know a sort of reserved set of uses that that bit of the spectrum can be used for so you can see here you know the FM radio band of course like 88 megahertz to 108 megahertz roughly and that sort of blocked off there but what's interesting is you can start to see that like these things exist you know next to you and alongside of course other uses of the spectrum so you know further down and like the SiC 150s 160 mega hurt range is where this thing called a is which is like a merit like like ship positioning data is transmitted and then you know further down in the sort of next block at you know ten hundred and ninety megahertz or just about a gig that's where all of the sort of aircraft are broadcasting their ship positioning their vessel positioning data and so I just call that out to show how these different you know protocols and uses of the spectrum do you have a kind of continuity to them and of course you know there's a ton of politics and money at stake here and you know as we know recently you know as sort of analog television has has all been shut down that spectrum is getting sold off and you know just last year you have twenty billion dollars being spent you know mostly by the big telco companies to get access to some of that stuff that was freed up so you know needless to say this is like a very kind of high stake if somewhat obscure and it invisible is replaced that data is produced so in the u.s. you know of course the FCC is the main regulatory body here and they basically like collect a ton of different information and release it in two different ways the first one which has the most data is this thing called the universal licensing system and there's maybe fifteen or sixteen different kinds of licenses that we end up giving the windup get Gowda and each one has a lot of sort of detailed information associated with it as part of like an open data initiative the FCC has done some work to unify all of that into this database called the license view database so I think it's maybe like a hundred columns that sort of are harmonized across all of these things what's nice is it collects in one place all of these different licenses and it basically pops out in one CSV file this is a bitly link to a github repo I made which basically makes it relatively easy to if you have a post degree server running you can basically run the script and it'll you know download the most recent version of this database geocoded geo index it and make it searchable for you and the cool thing is once you do that you can actually start to use this data to ask really targeted and specific questions about your local environment in a way so this is I just did a sort of search of a kilometer radius around the Caesars hotel here and said basically like for all the licenses that have been given out within a kilometer of here who kind of has the most of them and what are the kind of rank ordered counts of like how the spectrum is being used so you know probably not super surprisingly the top three are all like next house this is your sort of cell phone stuff but you know then kind of digging in it was I sort of interesting for me to start to learn like where where am I on what's going on around here so Perini building company is a legitimate construction firm that has no ties with the Mafia but they have done a lot of the casino construction in Las Vegas and you know certainly one of the biggest holders around here and then sort of drilling down we of course see a bunch of the casinos themselves are really big recipients of licenses I was kind of surprised to see because they'll come up later in the talk but this firm Recon robotics which by their own tagline is the world leader in tactical micro robot and personal sensor systems has a good 32 licenses right in this part of Las Vegas and that in fact puts them on par with DEFCON who I was very impressed to see is quite fastidious about making sure that the official FCC licenses are all sort of filled out and one other thing that I sort of call out here that's I think really important when to keep in mind when you're working with these sort of government data sets is that there can be often a lot of confusion and difficulty when it comes to you know doing like entity recognition and resolution and stuff and so towards the bottom here we have pH wlv LLC which I saw is that what is that and in fact it's the parent company that it's a Planet Hollywood Holdings that this casino and many others so then you know now that you can kind of start to identify what's going on around around you geographically how can you start to use and apply that you know of course it's been quite amazing to see in the last several years how cheap software-defined radios have gotten how much that's really opened up so for those of you who don't know you know for like literally 20 bucks you can get you know a little USB dongle that will let you tune into pretty broad spectrum I think these will go from like maybe 50 or 60 megahertz to just over a gigahertz um thing like this and they're really cool and you know very easy to just sort of get started with this is a program called GQ rx which is just a really simple sort of tuner so if you plug in one of these USB is and you know put in a frequency you can listen to whatever might be coming coming across it and so what's kind of interesting is we can start to you know not only just look at like what is the sort of the clustering of radio licenses around us but actually dig into them a bit more specifically and what's really nice about these is you do get some very high resolution information about how organizations kind of operate in function so this one is for the Caesars hotel it's you know one of many that they have but it's sort of interesting is you know the person who actually filled out this license his name is Eric Dominguez who is the VP of sort of facilities and engineering here and what's also included is his phone number an email address and it is his direct line I I called him so I doing it about to be true and so these things you know kind of become interesting when you're trying to think about what are other ways of you know understanding a target or a place of interest and finding things that let you have a lot of sort of base knowledge about what's going on if anyone's interested these are sort of a big tranche of the radio frequencies that the Caesars Palace itself has licenses for there are other ones under other entities that come up in my sort of first search but they can be ferreted out and just to kind of remind us to keep all of this in context you know we can see sort of these Caesar Palace radios are in the the 450 Meg zone but then just a little bit down the spectrum we've got the radio frequencies being used for sort of the control infrastructure around the the water system in Las Vegas and so it's a very rich and crowded sort of space you know but of course this isn't only limited to these sorts of things so there's a you know know a 19 is a weather satellite that's that's flying around above head it all operates in sort of the 137 megahertz range and a friend of mine actually in New York built an antenna and a G Cal reminder so that whenever this weather satellite is actually over the eastern seaboard he can bring this thing outside and actually download the images because of course you know satellites these are kind of coming down unencrypted and are there for gathering and that's the URL for it if anyone's interested but I was also kind of very curious to see in what ways different kinds of public data could start to get joined with what we know on is available on the radio spectrum in order to do things like maybe look inside of a cargo ship so of course today ships are that you know really diverse radio stations in and of themselves you can see here you know you of course have GPS antennas and maybe satellite TV you know Pam radio antennas but importantly up here on the the top left is an AIS antenna and AIS stands for automated identification system and it's basically a radio protocol that is used for navigation and safety and whenever a ship is under way it broadcasts some information included on this channel and it all basically lives around I guess 160 162 megahertz there's two different channels that it goes on and what interesting is if you are you know they have a line of sight or have a decent antenna you can actually using one of these $20 dongles as an example receive those AIS messages that the ship is sending off and so here you can kind of see in this like text box or whatever those are what sort of the raw demodulated packet sort of look like and what you can basically do it's because there's a you know people there's a great Python library called Lib AIS and there's many other ones where we've all sort of taken the spec and made all the decoding but basically what data you're getting when you're listening to these ships basically breaks down to you know what you're seeing here and this tells you things like you know the position and the heading and rate of turn and things like that but importantly it also has this thing called an MMS I and the MMS I is a sense for mobile maritime subscriber identifier it's basically like the cell phone number of the ship and you can use that to then join with a second order piece of government data here I wrote an API that was all linked in that repo that I showed earlier but to connect to the International Telecommunication Union to take that MMS I identifier of the ship and turn it basically into the vessel name and some other information about the ship itself and once you have this few pieces you can then get to the place where you can actually look inside of a ship and the way that you do that the sort of conceit here is by taking bills of lading data that often get filed before ship hits the port that explained basically for the purposes of customs taxation everything that's inside of the ship that data is kind of made available in a very crazy way so it's the only way that anyone can get access to it is by going to the Customs and Border Protection office in Washington DC giving them $100 certified check and getting a CD in return but through a nigga map uh blick we actually gather all of that it's free with an API on it and so is able to sort of stock all of these things together I'm just grounding time okay so it's sort of you know one example another one I'll just quickly talk about is using a dsb sort of data which is very similar to a is but it's for aircraft and there's a really interesting piece of work that was done by BuzzFeed specifically around looking for the extent to which governments were using stingray devices which you know often are put in aircraft and flown in circles you know when they're going after a target and stingrays of course are ways to you know track and intercept Zuma it's very specific cellphones and so basically what they did that was really smart is we're able to take all of the sort of like flick a DSP flight data and there's companies like FlightAware and others that aggregate it for the whole US and they applied some you know basic kind of analytics to it to look for all of the flight patterns over cities where planes were just kind of flying in circles a lot and based on that they were able to identify all of these you know both airplanes that were like very clearly registered to Homeland Security or to a police department but also in addition all of these new companies that were shell companies being used by the government but that they were able to kind of back into you once they knew that those companies were potentially of interest because of these unusual flight patterns you know there is you know I think when we think about all of the different radio devices that surround us all the time there are a lot of different opportunities and examples of taking this sort of contextual public data and applying them to to those devices and just kind of enclosing since we're coming up on time I want to tell you about sort of another investigation that I did here around trying to understand the surveillance infrastructure along the us-mexico border so what you're looking at here is just kind of a slightly interpolated map of all the radio licenses that are within 10 kilometers of the us-mexico border and when I was looking at them you know did you sort of see these normal dispersion patterns around cities of course like the radio towers and uses are all over the place but what I was kind of very interested in is sort of seeing out in some of these more remote sort of desert frontier places these very regularly spaced towers that were being put up along the border and this one in particular is was put up by a company called MSR and so I started looking sort of what is M SAR do well they make you know the kind of radar packages that the ground radar packages that go on predator drones and other things like that so I thought this could be interesting to try to dig in and get a sense of who who and what else is sort of happening along the border so this is just kind of like a account of like who are all of these kind of entities that are showing up doing experimental work specifically along the border I just called out that company Recon robotics which was the one I had mentioned earlier is also doing a lot of work around this hotel but then I sort of one Piron actually wanted to look at all these companies and basically you know found that it's not so surprising but that in fact the vast majority of them are defense contractors of different stripes and so sort of starting to go through and looking at like you know who are these companies and what are they doing sort of you know stumbled upon all of this really kind of fascinating technology I suppose anyway so T comm makes these aerostatic blimps that introduces surveillance platforms leonardo DRS is a italian defense contractor but their purport to have the most widely used at ground surveillance radar and you sort of see a lot of these interesting packages LTA is an israeli defense company that does a lot of border security work that's also sort of working there as is elbit systems and so you know what's really interesting is you know you can again pivot from these very specific licenses or these sort of aggregates of licenses to then go and look at like where are the sites and where are these sorts of things happening so you know kind of incredible for me to just then actually be able to go go over to Google Maps punch these things up and start to see all of the sort of sites where these bits of exploration and and prototyping this like virtual offense are starting to happen it just as a like last piece of context there there was a bunch of these were part of an older program that Boeing had was sort of wound up being a massive disaster they were supposed to be able to cover the entire border for 7 billion dollars but wound up spending a billion dollars to only do 50 miles and the thing didn't even work but you know the thing that I'll sort of leave you with and hopefully kind of came across in the talk and sort of through these examples in context of like what's possible with data more generally is to really think about you know not only where these deeper perhaps unseen bits of data are but really thinking about how they can be put together to tell us sort of brought our stories so anyway thank you very much [Applause]

The Open Government Data Revolution

so I see what I want to say dovetails very well into into the previous talk many of the examples that were being given to you about effective modern governance at the city level are going to end up drawing on this foundation this foundation of of open data that we've been engaged in for a while now around the world and the UK was one of the one of the initial leaders in this work and still is trying to push the envelope as I'll try and describe I'm from a university background in that I head up a group a Southampton on web and Internet science but I'm also an open-air data adviser to the government and actually helped set up the original dot uk' portal with Tim berners-lee back in 2009 I'll talk about that a little bit just a few things to say in fact again the the keynote this morning talked about data is the new oil and people think there is a super abundance of data and indeed there is but the extraordinary thing about the super abundance of data is that in itself it's extraordinarily powerful people think it's an unalloyed problem but if you get the right data organized at scale it becomes a remarkable properties one of my favorite examples is this one from from Google Google's research where they in fact took a search log of took the log queries from a very large number of American users and were looking to predict from the search terms being looked at the outbreak of seasonal flu an epidemic of flu essentially United States it takes about two weeks using traditional methods to get physicians data back to the Center for Disease Control's to actually plot this actual data trend you can see here that CDC data this orange plot here they were able to build a model essentially a a knowledge based model of what terms were being used to precisely match that outbreak and of course they were doing it in the end at real-time okay they could show real-time tracking of the flu outbreak and of course it's because people collectively are going to be searching at times of flu outbreaks as they're breaking out in the community for a particular sets of key terms and objects of interest to them and such like and that's this it seems to me a extremely powerful indicator of how something as fundamental as public health policy or well-being in a community can be driven by this data of course the realistic question is whose data is this and just how easy is it for anybody else other than a very large search engine company to do this and what would the terms and conditions be under which that data could be released back I loved these examples and my other favorite example this is you can just about make out what appears to be a light pollution map of the of the UK Europe actually each of those luminosities is a is a geo code from a Flickr upload a picture so each of those points of brightness is a Flickr upload and in fact if we look at that at higher resolution what do you see there it's a map of London there are the major brought bridges crossing River Thames you can see the major thoroughfares you can see these densities here every one of those points is a Flickr geocode photograph and of course obligingly people who take those photographs have been busy tagging those photographs as well so when I get this freely opened a available data from Flickr which I can do and download it this is one of John Kline birth students did it comes already marked up with the top most frequently photographed and labeled tourist destinations in the city now that level of immediate intelligence rendered off a very low level data is a world that I think has huge possibilities and opportunities for us and I haven't touched government data per se interesting to think though of the range of datasets that can become available for us to use and exploit my other example I often use is this well-known example of a many people have still not still still it's news to them this is a map an open-source product called OpenStreetMaps map of the port-au-prince the Haitian capital before the earthquake there was no map of that capital city bad news when your capital city has been destroyed and you've got to work out where to put relief they actually crowd-sourced a construction of an incredible high-resolution map for this here it is in 12 days 12 days because people were on the ground with GPS receivers and laptops uploading those data coordinates to an open source platform with open data formats open licenses and when you see that happen you realize that we can truly crowdsource in which the way the same way we're hearing earlier remarkable intelligence around city cities and environments we live in so the power of open I believe is very profound indeed and the exciting thing is that we're applying that now to government data itself this is the state of affairs in about November 2009 when Tim berners-lee and I were asked by the prime minister to start opening up government data in the UK we produce what we called the postcode paper here's a postcode we published at the Guardian newspaper headquarters we took all sorts of local data public data nationally government and local government generated data and made it into a newspaper with respect to that postcode the problem was that 80% of the content of that newspaper was illegally reproduced illegally reproduced even the post codes we weren't allowed to use in that form thus we'd have had to pay or Ordnance Survey for the privilege of recording and using that piece of information so there was a lot to shift but the dial has been turned and in fact in three months we had our first portal data gov dot UK up and running it was you can open source software it does something rather heretical in terms of government IT it was a beta site in constant development and in just 24 months we were actually we have this site here David of the UK where you put in your postcode chillie access the data sets that are available for that particular region postcodes about 812 residential addresses you can now find out data about the crimes occurring in that area the educational attainment Raceway the bus stop saw a bunch of stuff okay and that has been happening because we've had a real sea change in the whole approach to data release and publication we had some friendly competition along the way the u.s. in particular began this work back in 2000 and the Obama administration's released in 2009 first executive order just about on openness said update gov we followed suit a little later in 2009 we now have were over 8,000 data sets 8,000 data sets available on daily gov of course the granularity with the egg sets an object a much friendly competition we count entire maps of the UK as one data set if you parcel them up we can get a very good score on the mounted a key making available much to say about that the interesting thing about our support for open data in government is that it's been led from the top from middle out civil servants who are engaged in this and from activists the top level political support we've had has been really important nearly kroy's here the vice president of the European Commission don't get the hang of this guy she was actually extolling the virtues of a European open data just a few months ago I'd be very interested to see how much we actually materially get released because despite all of this goodness there are challenges around open government data that I want to come on and address the reasons for doing this or it's a powerful idea whether it's mapping a capital city that has no detailed maps or finding out what the state of public health is more looking for snow falls and working out what the fixing streetlights there are many examples this is a photograph of cholera bacteria so famously when a particular surgeon in the 19th century mapped death rates on a map of London they discovered that people died from cholera we're all clustering around particular water well you know they didn't know that cholera was a waterborne disease at that point it changed the whole perception of public health similarly this is a picture of mrs a this is the hospital acquired infection that does for a good number of people up and down the country certainly used to do for a lot more and then we started to publish infection rates and death rates in hospitals as a leak table and of course that data led to a rather dramatic change in behavior at those hospitals and it was one of the major instruments that led to a sharp decline in hospital acquired infections that and deep cleaning and other other actual policy actions the deep cleaning of course were that people were seeing very clearly the effect an impact of this sort of information and actually we talk about transparency and accountability improving public service delivery improved efficiency these are all reasons why you would want to release open government data there are these and we again heard them in in previous talk around engagement citizen engagement but also we get data improvement governments data is no better than many corporate datasets when we published bus stop data where the bus stops in the UK were finally got the UK to publish those 360,000 bus top positions 17,000 of them work where the government thought they were you know which is a tedious if you're trying to build an app or turn up for a bus it very soon after that was published a crowdsource site was developed where people could enter the actual positions so now a challenge for government is how it does open government to point naught how do you write back data in a way that it becomes in a sense official data data that has a provenance that is both backed by the crowd and by government but also we're seeing it in terms of of economic value and societal value and I'll come on to that in a moment so open the data and people's experience is that the applications do flow whether they're flowing fast enough or whether they're making the difference is of course a question we're now asking ourselves two years into the experiment so all these good sustainable citizen engagement tools are these tools to help us manage understanding of how a city is function or a nation state or a region how do we drive both demand for the data utilization of the data build the ecosystem around open data and maybe we're going to discover that actually the data releases that data like everything else in this new economy has a long tail and that some datasets are highly reused by very very large numbers of apps and people and some data has a bit of interest to a very small constituency but remember the lesson of the long tail is that an awful lot of utility and use lives under the bottom of the tail distribution okay so just seeing that your data set is the most used does not mean that substantial amounts of data don't have utility and in fact the assumption we have in doing this work is presume to publish make publishing the default and then unanticipated reuse makes much of the rest of the magic that we observe on the web a fact for open data so we get data at all scales it's not just City as its regions and it's not just yet nation states its regions and cities that are releasing and here we've got examples of Redbridge a Regional Council in London London's data store itself all good stuff and increasing numbers of countries from Singapore I just returned from Singapore this this last weekend looking at their data Kenya Chile the english-speaking democracies a whole range of open data efforts now growing up and we achieved a lot we can say I think that we have seen significant data sets released that the licenses that are essential to this certainly one of the lessons from the UK data release is you've got to allow your licenses to be unrestrictive not surrounded by minor terms and conditions I go and see lots of data sites that claim to be open and somewhere in the background of a particular chunk of data there's a little restrictive covenant you shouldn't use it to do this or you Shawn use it to do that so we can use it but you can't use it in a commercial reuse context open is open look we've seen developer communities grow up and we've seen a degree of international collaboration start to emerge all good things and there's something particularly compelling about the city or than or the or the urban conurbation as a data user a lot of the cities get open data and have been some of the earliest advocates and exponents many of your best apps are urban rather irritatingly the apps are good but then they kind of run out when you pass the city limit you we've got some great examples in transportation in the UK which work great in London because the mayor and TfL had the mandate and authority to get the data out there get across the city line and your immediate boss find a best route boss finder app Forster pieces okay so urban conurbations have a kind of a coherence though such that if you're there it's still good news for you because they have authority over their data and there's a network effect all data sets have a network effect but as we saw again in the previous talk around transportation utilities education public service provision data sets tend to supplement and support one another if you're trying to work out where you want to live buy a house you'd like to know about the crime rates at how effective transportation is where the actual schools are how what are they doing people can make decisions both at the governance level and in terms of an individual citizens choice because of the interconnected nature of much city data and it always comes down to location location location so it turns out that geographic geospatial open geospatial data is a lynchpin and whenever we think we've got enough data openly available there's some other data set that people want and the only ongoing ruckus at the moment in the UK is for a comprehensive address file that will give you the actual register that not the people who live at the addresses but the addresses of all the businesses and all the people who would be be visited or submit a census form for example a there has never been a comprehensive list and be currently the proposals are that you can charge for this will charge for it but the amount of location-based specific services that will be empowered by a release of comprehensive addressing data I believe would be would be very large indeed so although we've achieved a lot in the UK there's always more to go for in my opinion and these are some of the products that the open that the Ordnance Survey now support for mapping and good they are I mean we're in it we're in a very much better place than we were just a couple of years ago and these get routinely used in a range of open data applications so looking at London again here we have a rather good illustration using the open OS open data mapping product and what we can see here is essentially thicker lines are more journeys by by hired bicycles and the red blotches you can see and you can see this inspect this on the high-resolution our pollution levels measured by LED emissions okay now this has been put together by a team at a spatial analytics Research Unit at UCL they're doing this on a weekly basis people looking at new kinds of information mashup that will have a direct interest to you the rider of a bicycle in London or you the public health consultant this is a similar example and both of these very recently made available just in January this year this is a map of what is called multiple indexes of deprivation basically an indication of how wealthy or affluent or not so affluent a region or an area is now Charles booth in the 19th century actually built a wonderful map of urban deprivation in London literally visiting every household doing a survey much of the same insights can be derived from data that is now held and now is openly openly published these become important policy tools important planning tools important tools to mobilize a community and this is this is data on London's daytime population the remarkable thing about this just taken from the London data stored undertaken by a researcher in Sheffield the daytime density of the City of London is 350,000 people per square kilometre okay extraordinary about 11,000 people live in that area or registered as living there but you start to see these peaks and flows these ebbs and flows this kind of sense of what you can learn from the statistics made available and in a way the one that is perhaps most about most most impressive because this holds politicians feet to the fire we've had spending data published in the UK at a very low level of detail 500 pounds every month in excess of published by every regional authority 360 of them gives you an exquisite picture of what's being paid for by local authorities what's being spent whether one authorities paying more for its fleet higher than another for example but this is giving you by crime types by street level every month reported crimes in the UK for England and Wales for England and Wales those are the Constabulary –zz that have signed up to this you type in a Scottish postcode you get nothing back ok which is an interesting issue I think if you're a Scottish citizen because what does this tell you well it tells you this is actually an application my group built in Southampton this is a heat map essentially I've taken the excel this is the H EE 16 one a a which is the postcode folk for the Excel Centre and I'm visualizing here lohi it's a heat map reported antisocial behavior okay this is a months worth of data this was the first set of data published in December 2010 and you have a bit of a filmic experience and I'm just going to scroll through that's December January February March April can you see a certain constancy in the location of anti-social behavior and where it's occurring and what that might do to your sense of what you police or pay attention to if you're a resident who you complain to or who you try and get a sense of what's happening here why antisocial behavior is this if you actually knew exactly when that was reported you would find because we know this exists exquisite temporal periodicity Friday night's particular time you'll see a bunch of so antisocial behavior in a bunch of particular places that are associated with of course checkout times at the local pubs or entry times at local nightclubs some of this stuff isn't so surprising but as a tool and I could have visualized burglaries or vehicle crime shoplifting this is a tool for empowerment but it's also a tool suppose you're an insurance company what are you going to make of this data what we would start to think about of course if you're then a campaigning group for for the digitally disenfranchised you start to ask yourself but you know if we start to have postcode insurance premiums how do those who don't have a voice get their voice heard but the data tells you the story very clearly and other countries are doing this of the states Singapore as I say don't have everything solved you'll be glad to know in the in the world's intelligence city an awful lot of their data is not available on the unrestricted licenses and a lot of it is simply national statistics and if I look at their crime data all I can find are sets of numbers by month across a huge swathe of the cities so I'm no real insight as to what's happening there and he did there's great stuff in the US but there's no federal equivalent there's no coverage across across the country so complex this is it's good we've been trying to work in the UK to improve the quality of data not just in terms of what's there but how it's linked together data can be linked and place and space are good places to do it so our geography allows us to if you can represent the data in the latest open formats used on the web for linking data we can begin to link other datasets together much more easily this is the ambition of so called linked data approaches on the way talk more about that perhaps in the session but it does allow us to produce now visualizations this is a particular post code in Southampton and we're just looking here at the post code this is the immediately surrounding post goes to the north and south and east and west and then the sets around those concentric post codes we can look at crimes crime types we can look at transportation access points we can look at educational attainment absenteeism from schools we can begin to do a range of information integration that was only ever available if it was asked for by policy makers within our statistics officers now we can argue about whether we're making the right interpretations around this we need new cadre of data literacy but it allows us to have the discussion and it allows us to powerfully think about how we might exploit it so this does amount to a gray revolution and in the UK the process is continuing we have significant datasets in transport and weather believe it or not in the UK you didn't have open access to weather data the predictions four days five days out every three hours the Met Office publishers for 5000 points in the UK three early predictive weather now you can look it up on it on their website as a picture but you couldn't get the raw data if I get the raw data I can build services around secondary insurance for rain insurance or events planning as a million things I could do that don't require me now just to go through one the third point of access the Met Office we're going to see rather dramatically releases of health data everything from what GPS are prescribing every month that drugs they're prescribing through to the outcomes they're detecting how does that vary by postcode how does that vary by maternal but by multiple deprivation indices this is powerful stuff and most recently particular for Tim berners-lee myself in our role in this work we've had announced an open data Institute to be funded based in Shoreditch not just very far from here at all to look at the commercial potential and exploitation of open data so how can we take those micro businesses those startups and and help build businesses based around these kinds of data releases and drive more data out of government how can we use the experience we have to get public services in the public sector to deliver its data more effectively for for reuse and how can we educate and help developers in corporations large and small to live in this open data environment that's so fast evolving and again speaking back to the original keynote this morning transparency and data open data to possible important components for capitalism to point naught and I think it really is an interesting challenge to ourselves is that the case thank you very much you