The Open Government Data Revolution

so I see what I want to say dovetails very well into into the previous talk many of the examples that were being given to you about effective modern governance at the city level are going to end up drawing on this foundation this foundation of of open data that we've been engaged in for a while now around the world and the UK was one of the one of the initial leaders in this work and still is trying to push the envelope as I'll try and describe I'm from a university background in that I head up a group a Southampton on web and Internet science but I'm also an open-air data adviser to the government and actually helped set up the original dot uk' portal with Tim berners-lee back in 2009 I'll talk about that a little bit just a few things to say in fact again the the keynote this morning talked about data is the new oil and people think there is a super abundance of data and indeed there is but the extraordinary thing about the super abundance of data is that in itself it's extraordinarily powerful people think it's an unalloyed problem but if you get the right data organized at scale it becomes a remarkable properties one of my favorite examples is this one from from Google Google's research where they in fact took a search log of took the log queries from a very large number of American users and were looking to predict from the search terms being looked at the outbreak of seasonal flu an epidemic of flu essentially United States it takes about two weeks using traditional methods to get physicians data back to the Center for Disease Control's to actually plot this actual data trend you can see here that CDC data this orange plot here they were able to build a model essentially a a knowledge based model of what terms were being used to precisely match that outbreak and of course they were doing it in the end at real-time okay they could show real-time tracking of the flu outbreak and of course it's because people collectively are going to be searching at times of flu outbreaks as they're breaking out in the community for a particular sets of key terms and objects of interest to them and such like and that's this it seems to me a extremely powerful indicator of how something as fundamental as public health policy or well-being in a community can be driven by this data of course the realistic question is whose data is this and just how easy is it for anybody else other than a very large search engine company to do this and what would the terms and conditions be under which that data could be released back I loved these examples and my other favorite example this is you can just about make out what appears to be a light pollution map of the of the UK Europe actually each of those luminosities is a is a geo code from a Flickr upload a picture so each of those points of brightness is a Flickr upload and in fact if we look at that at higher resolution what do you see there it's a map of London there are the major brought bridges crossing River Thames you can see the major thoroughfares you can see these densities here every one of those points is a Flickr geocode photograph and of course obligingly people who take those photographs have been busy tagging those photographs as well so when I get this freely opened a available data from Flickr which I can do and download it this is one of John Kline birth students did it comes already marked up with the top most frequently photographed and labeled tourist destinations in the city now that level of immediate intelligence rendered off a very low level data is a world that I think has huge possibilities and opportunities for us and I haven't touched government data per se interesting to think though of the range of datasets that can become available for us to use and exploit my other example I often use is this well-known example of a many people have still not still still it's news to them this is a map an open-source product called OpenStreetMaps map of the port-au-prince the Haitian capital before the earthquake there was no map of that capital city bad news when your capital city has been destroyed and you've got to work out where to put relief they actually crowd-sourced a construction of an incredible high-resolution map for this here it is in 12 days 12 days because people were on the ground with GPS receivers and laptops uploading those data coordinates to an open source platform with open data formats open licenses and when you see that happen you realize that we can truly crowdsource in which the way the same way we're hearing earlier remarkable intelligence around city cities and environments we live in so the power of open I believe is very profound indeed and the exciting thing is that we're applying that now to government data itself this is the state of affairs in about November 2009 when Tim berners-lee and I were asked by the prime minister to start opening up government data in the UK we produce what we called the postcode paper here's a postcode we published at the Guardian newspaper headquarters we took all sorts of local data public data nationally government and local government generated data and made it into a newspaper with respect to that postcode the problem was that 80% of the content of that newspaper was illegally reproduced illegally reproduced even the post codes we weren't allowed to use in that form thus we'd have had to pay or Ordnance Survey for the privilege of recording and using that piece of information so there was a lot to shift but the dial has been turned and in fact in three months we had our first portal data gov dot UK up and running it was you can open source software it does something rather heretical in terms of government IT it was a beta site in constant development and in just 24 months we were actually we have this site here David of the UK where you put in your postcode chillie access the data sets that are available for that particular region postcodes about 812 residential addresses you can now find out data about the crimes occurring in that area the educational attainment Raceway the bus stop saw a bunch of stuff okay and that has been happening because we've had a real sea change in the whole approach to data release and publication we had some friendly competition along the way the u.s. in particular began this work back in 2000 and the Obama administration's released in 2009 first executive order just about on openness said update gov we followed suit a little later in 2009 we now have were over 8,000 data sets 8,000 data sets available on daily gov of course the granularity with the egg sets an object a much friendly competition we count entire maps of the UK as one data set if you parcel them up we can get a very good score on the mounted a key making available much to say about that the interesting thing about our support for open data in government is that it's been led from the top from middle out civil servants who are engaged in this and from activists the top level political support we've had has been really important nearly kroy's here the vice president of the European Commission don't get the hang of this guy she was actually extolling the virtues of a European open data just a few months ago I'd be very interested to see how much we actually materially get released because despite all of this goodness there are challenges around open government data that I want to come on and address the reasons for doing this or it's a powerful idea whether it's mapping a capital city that has no detailed maps or finding out what the state of public health is more looking for snow falls and working out what the fixing streetlights there are many examples this is a photograph of cholera bacteria so famously when a particular surgeon in the 19th century mapped death rates on a map of London they discovered that people died from cholera we're all clustering around particular water well you know they didn't know that cholera was a waterborne disease at that point it changed the whole perception of public health similarly this is a picture of mrs a this is the hospital acquired infection that does for a good number of people up and down the country certainly used to do for a lot more and then we started to publish infection rates and death rates in hospitals as a leak table and of course that data led to a rather dramatic change in behavior at those hospitals and it was one of the major instruments that led to a sharp decline in hospital acquired infections that and deep cleaning and other other actual policy actions the deep cleaning of course were that people were seeing very clearly the effect an impact of this sort of information and actually we talk about transparency and accountability improving public service delivery improved efficiency these are all reasons why you would want to release open government data there are these and we again heard them in in previous talk around engagement citizen engagement but also we get data improvement governments data is no better than many corporate datasets when we published bus stop data where the bus stops in the UK were finally got the UK to publish those 360,000 bus top positions 17,000 of them work where the government thought they were you know which is a tedious if you're trying to build an app or turn up for a bus it very soon after that was published a crowdsource site was developed where people could enter the actual positions so now a challenge for government is how it does open government to point naught how do you write back data in a way that it becomes in a sense official data data that has a provenance that is both backed by the crowd and by government but also we're seeing it in terms of of economic value and societal value and I'll come on to that in a moment so open the data and people's experience is that the applications do flow whether they're flowing fast enough or whether they're making the difference is of course a question we're now asking ourselves two years into the experiment so all these good sustainable citizen engagement tools are these tools to help us manage understanding of how a city is function or a nation state or a region how do we drive both demand for the data utilization of the data build the ecosystem around open data and maybe we're going to discover that actually the data releases that data like everything else in this new economy has a long tail and that some datasets are highly reused by very very large numbers of apps and people and some data has a bit of interest to a very small constituency but remember the lesson of the long tail is that an awful lot of utility and use lives under the bottom of the tail distribution okay so just seeing that your data set is the most used does not mean that substantial amounts of data don't have utility and in fact the assumption we have in doing this work is presume to publish make publishing the default and then unanticipated reuse makes much of the rest of the magic that we observe on the web a fact for open data so we get data at all scales it's not just City as its regions and it's not just yet nation states its regions and cities that are releasing and here we've got examples of Redbridge a Regional Council in London London's data store itself all good stuff and increasing numbers of countries from Singapore I just returned from Singapore this this last weekend looking at their data Kenya Chile the english-speaking democracies a whole range of open data efforts now growing up and we achieved a lot we can say I think that we have seen significant data sets released that the licenses that are essential to this certainly one of the lessons from the UK data release is you've got to allow your licenses to be unrestrictive not surrounded by minor terms and conditions I go and see lots of data sites that claim to be open and somewhere in the background of a particular chunk of data there's a little restrictive covenant you shouldn't use it to do this or you Shawn use it to do that so we can use it but you can't use it in a commercial reuse context open is open look we've seen developer communities grow up and we've seen a degree of international collaboration start to emerge all good things and there's something particularly compelling about the city or than or the or the urban conurbation as a data user a lot of the cities get open data and have been some of the earliest advocates and exponents many of your best apps are urban rather irritatingly the apps are good but then they kind of run out when you pass the city limit you we've got some great examples in transportation in the UK which work great in London because the mayor and TfL had the mandate and authority to get the data out there get across the city line and your immediate boss find a best route boss finder app Forster pieces okay so urban conurbations have a kind of a coherence though such that if you're there it's still good news for you because they have authority over their data and there's a network effect all data sets have a network effect but as we saw again in the previous talk around transportation utilities education public service provision data sets tend to supplement and support one another if you're trying to work out where you want to live buy a house you'd like to know about the crime rates at how effective transportation is where the actual schools are how what are they doing people can make decisions both at the governance level and in terms of an individual citizens choice because of the interconnected nature of much city data and it always comes down to location location location so it turns out that geographic geospatial open geospatial data is a lynchpin and whenever we think we've got enough data openly available there's some other data set that people want and the only ongoing ruckus at the moment in the UK is for a comprehensive address file that will give you the actual register that not the people who live at the addresses but the addresses of all the businesses and all the people who would be be visited or submit a census form for example a there has never been a comprehensive list and be currently the proposals are that you can charge for this will charge for it but the amount of location-based specific services that will be empowered by a release of comprehensive addressing data I believe would be would be very large indeed so although we've achieved a lot in the UK there's always more to go for in my opinion and these are some of the products that the open that the Ordnance Survey now support for mapping and good they are I mean we're in it we're in a very much better place than we were just a couple of years ago and these get routinely used in a range of open data applications so looking at London again here we have a rather good illustration using the open OS open data mapping product and what we can see here is essentially thicker lines are more journeys by by hired bicycles and the red blotches you can see and you can see this inspect this on the high-resolution our pollution levels measured by LED emissions okay now this has been put together by a team at a spatial analytics Research Unit at UCL they're doing this on a weekly basis people looking at new kinds of information mashup that will have a direct interest to you the rider of a bicycle in London or you the public health consultant this is a similar example and both of these very recently made available just in January this year this is a map of what is called multiple indexes of deprivation basically an indication of how wealthy or affluent or not so affluent a region or an area is now Charles booth in the 19th century actually built a wonderful map of urban deprivation in London literally visiting every household doing a survey much of the same insights can be derived from data that is now held and now is openly openly published these become important policy tools important planning tools important tools to mobilize a community and this is this is data on London's daytime population the remarkable thing about this just taken from the London data stored undertaken by a researcher in Sheffield the daytime density of the City of London is 350,000 people per square kilometre okay extraordinary about 11,000 people live in that area or registered as living there but you start to see these peaks and flows these ebbs and flows this kind of sense of what you can learn from the statistics made available and in a way the one that is perhaps most about most most impressive because this holds politicians feet to the fire we've had spending data published in the UK at a very low level of detail 500 pounds every month in excess of published by every regional authority 360 of them gives you an exquisite picture of what's being paid for by local authorities what's being spent whether one authorities paying more for its fleet higher than another for example but this is giving you by crime types by street level every month reported crimes in the UK for England and Wales for England and Wales those are the Constabulary –zz that have signed up to this you type in a Scottish postcode you get nothing back ok which is an interesting issue I think if you're a Scottish citizen because what does this tell you well it tells you this is actually an application my group built in Southampton this is a heat map essentially I've taken the excel this is the H EE 16 one a a which is the postcode folk for the Excel Centre and I'm visualizing here lohi it's a heat map reported antisocial behavior okay this is a months worth of data this was the first set of data published in December 2010 and you have a bit of a filmic experience and I'm just going to scroll through that's December January February March April can you see a certain constancy in the location of anti-social behavior and where it's occurring and what that might do to your sense of what you police or pay attention to if you're a resident who you complain to or who you try and get a sense of what's happening here why antisocial behavior is this if you actually knew exactly when that was reported you would find because we know this exists exquisite temporal periodicity Friday night's particular time you'll see a bunch of so antisocial behavior in a bunch of particular places that are associated with of course checkout times at the local pubs or entry times at local nightclubs some of this stuff isn't so surprising but as a tool and I could have visualized burglaries or vehicle crime shoplifting this is a tool for empowerment but it's also a tool suppose you're an insurance company what are you going to make of this data what we would start to think about of course if you're then a campaigning group for for the digitally disenfranchised you start to ask yourself but you know if we start to have postcode insurance premiums how do those who don't have a voice get their voice heard but the data tells you the story very clearly and other countries are doing this of the states Singapore as I say don't have everything solved you'll be glad to know in the in the world's intelligence city an awful lot of their data is not available on the unrestricted licenses and a lot of it is simply national statistics and if I look at their crime data all I can find are sets of numbers by month across a huge swathe of the cities so I'm no real insight as to what's happening there and he did there's great stuff in the US but there's no federal equivalent there's no coverage across across the country so complex this is it's good we've been trying to work in the UK to improve the quality of data not just in terms of what's there but how it's linked together data can be linked and place and space are good places to do it so our geography allows us to if you can represent the data in the latest open formats used on the web for linking data we can begin to link other datasets together much more easily this is the ambition of so called linked data approaches on the way talk more about that perhaps in the session but it does allow us to produce now visualizations this is a particular post code in Southampton and we're just looking here at the post code this is the immediately surrounding post goes to the north and south and east and west and then the sets around those concentric post codes we can look at crimes crime types we can look at transportation access points we can look at educational attainment absenteeism from schools we can begin to do a range of information integration that was only ever available if it was asked for by policy makers within our statistics officers now we can argue about whether we're making the right interpretations around this we need new cadre of data literacy but it allows us to have the discussion and it allows us to powerfully think about how we might exploit it so this does amount to a gray revolution and in the UK the process is continuing we have significant datasets in transport and weather believe it or not in the UK you didn't have open access to weather data the predictions four days five days out every three hours the Met Office publishers for 5000 points in the UK three early predictive weather now you can look it up on it on their website as a picture but you couldn't get the raw data if I get the raw data I can build services around secondary insurance for rain insurance or events planning as a million things I could do that don't require me now just to go through one the third point of access the Met Office we're going to see rather dramatically releases of health data everything from what GPS are prescribing every month that drugs they're prescribing through to the outcomes they're detecting how does that vary by postcode how does that vary by maternal but by multiple deprivation indices this is powerful stuff and most recently particular for Tim berners-lee myself in our role in this work we've had announced an open data Institute to be funded based in Shoreditch not just very far from here at all to look at the commercial potential and exploitation of open data so how can we take those micro businesses those startups and and help build businesses based around these kinds of data releases and drive more data out of government how can we use the experience we have to get public services in the public sector to deliver its data more effectively for for reuse and how can we educate and help developers in corporations large and small to live in this open data environment that's so fast evolving and again speaking back to the original keynote this morning transparency and data open data to possible important components for capitalism to point naught and I think it really is an interesting challenge to ourselves is that the case thank you very much you

Precision Public Health Summit: Leaders Voice Hope for Change

thank you all for joining us at the precision public health summit here at UCSF we hope that these experiences will inspire you to the possibilities of how we can partner to ensure that all children no matter what their circumstance have the best opportunity to survive and thrive the precision medicine initiative is one of the benchmark efforts of this administration to unlock the power of data to create new scientific discoveries being able to actually focus that on population health and prevention is one of our key goals and it makes sense to start with the first three years of life that's the most important and vulnerable black babies die as more than twice the rate of the general population in the first year of life they die because they are born too soon and too small a bloom what we're aiming to achieve is essentially designing the future of prenatal care with technology to improve the health of moms and babies and what we do is we combine wearable devices with data analytics to both reassure moms and provide doctors with better information to improve birth outcomes I think we're at this very unique intersection of data technology and naturally the question is how are we using that to think about our own individual Lots we should fundamentally believe as a nation that a technology is neither radical nor revolutionary unless it benefits every single American we're very good at building really unique creative technologies we have to make sure it benefits everybody simultaneously to really provide the value proposition that we have to have going forward into the next great generation we actually believe the same types of innovations can really make a difference in public health in population health and that's what we wanted to do with the summit is bring together thought leaders from various different sectors to really explore how these innovative ideas can really be applied to address public health challenges what we do know is that that we can be exposed to environmental agents like toxic chemicals and that they can have profound and important influences on our help my family was suffering from different health issues we were losing her hair we had these rashes I my one son has a compromised immune system he wasn't gaining weight and we weren't putting it all together right at first until our water started coming through our tab Brown you have to fight these agencies you're paying to protect us and you have reprisals it adversely affects your career but it is either that or sit by and let bad public health bad science and engineering be used to poison little kids what more data have been helpful to you yes and to realize that there was a problem there instead of trying to hide it my hope is that this summit creates new ambassadors and leaders throughout the country and the world who can carry that message not just a precision medicine which is the ambition to bring all this great technology to improving health but much more importantly a newer more profound I think equity message that can directly impact human health and well-being that includes everyone no matter what their zip code no matter what their geography that in fact precision public health and improve health for all you

Making government better, through data and design | Cat Drew | TEDxWhitehall

who here can remember being a teenager oh all of you brilliant in 2010 there was a London borough who was thinking quite a lot about teenagers the teenagers in this London borough were no longer coming to their community centres they use the data of declining numbers to make a case to invest in new equipment in computer games in football goals in table tennis tables and yeah still people weren't coming teenagers weren't coming a puzzle a few decades earlier BT was also having a puzzle they had created this amazing new customer service the first automated telephone directory service you could have your number speedily differently and yet again no one was using it odd these things fascinate me because you've got data creating insight or speeding things up and yet something's missing now I'll come back to these at the end and maybe you can think about what the answers are as I go through my talk as a civil servant and the designer I've always been nerdily interested in both analytical stuff and also much more creative stuff when I was really little I used to go to my friends houses and with my imaginary friend Jack we used to go round but not to play but to tidy people's rooms and now on my shelves at home all my books are very neatly ordered but not through alphabet but through colour I can see them yellow orange red purple green and blue ordered but beautiful and at school I won the statistics prize for that crucial scientific discovery that blue Smarties for though a normal statistical distribution that's great but also that's you can eat 20 packs of Smarties in your lunch hour and now I've been a civil service for 10 years we've been working the big departments of state like the home office cabinet office number and number 10 in very traditional very important cabinet policy-making roles but all at that time I've kind of as now had this niggle this hankering to do something a bit more creative so at school I rebelled if you can call it that I did my art GCSE in my spare time after school than Wednesday's and at work I after take 8 years thought I've had enough I'm gonna pack my bags and go off to Berlin and become an artist artist in the day 7 cocktails at the night but after two years of poor artists life I thought I had to come back I kind of missed the really amazing uses that government could put words and numbers to things that can make society better so I came back but I didn't want to miss that creativity and I didn't want to keep flip-flopping between analytical and creative stuff all the time I wanted to combine both so I was a policy maker and I also studied graphic design and then it became apparent to me that you can combine both two women in particular really inspired me Florence Nightingale she presented data on diseases in the Crimean War and for the first time revealed that most of the deaths were actually preventable and that changed the course of Nursing Phyllis Percel she walked 23,000 streets in London here and she created what we now know and love as the eh-2-zed so both of these women were designers that use data for social good and now I am so lucky to work in policy lab where I get to do this stuff every single day policy lab was set up to support departments to use digital design and data techniques to make policies better we combine data science which uses really powerful computer techniques and applies them to huge amounts of data really complex stuff and we combine that with ethnography which takes human experiences and behaviors and emotions and really tries to understand why people do what they do and we bring these things together we combine them and we share them with a diverse range of people to come up with amazing new ideas to make government better our first project was on policing in the 21st century supporting victims of crime and there was one woman let's call her Jane Jane was a victim of anti-social behavior and she was told by the police to keep diary and so she kept her diary she showed us where she kept it her bedside table and when she filled it out the last thing at night before she went to sleep can you imagine how much that must be for Jane and how much better of beef Jane if she could have something online that she could share this information with the police as soon as it happens so the police can start solving her problems now there were many many more rich observations like this and we shared those with a group of diverse people from chief constables police officers neighborhood watch members and they use their human creativity to come up with a whole range of other ideas they came up with ideas for young people to report client crime using minecraft or for older people to be able to sit on their sofas in their living rooms and give it evidence at court now all of these things are a bit out there but they gave us the creative spark so we could create online crime recording but to take that from a small pilot in Surrey and Sussex and to scale it across England and Wales which is what's happening now we need a data we needed data analyst to help us make the case that this would save 3.7 million pounds per year and 180,000 officer hours our second project was around health and work so in the UK you've got 2.5 million people on health related benefits and that costs us 15 billion pounds per year but we know that the right work can be really good for people in this project we combine data science and ethnography throughout the data science showed us that people are more likely to go on health benefits if they've been in their job a really short amount of time and the ethnography displayed that it was the relationship between the lie manager and their employer that was actually critical whether someone stays or goes the data science showed us that women with depression a much more likely than men with depression to stay in work and this played out in the ethnography another woman let's call her Vanessa Vanessa had been battling with depression for a long long time she had been too scared to go to her boss to do anything about it she didn't feel she had anything to show him and then she got breast cancer and can you imagine what she said to us that she was relieved that she had breast cancer because then she could go with her boss and show something physical and she got time off work and she was able to deal with both illnesses successfully so throughout all that project we combines the data science and ethnography they were always talking to each other sharing their hypotheses and confirming them and we built up this really rich picture of exactly what was going on and again we used that we shared that with people and we came up with lots of ideas for how to support people to manage their health conditions and work which we're now testing across England and Wales data and design require therefore a new type of policymaker when I first started a civil service I didn't know what a policy was and I certainly didn't know how to make one and for those of you in the room who do not know what policy is it can range it's a government position on something and it can range from anything very specific for the amount of benefit that is paid to a 70 year old lady who's also a carer all the way through to whether or not we go to war or not now at the time when I started we were called generalists and for me I thought we had to be masters of everything now I soon realize that that is not possible at all and I remembered someone saying to me a good policymaker doesn't have all the information but they do know where to go and get it great I thought I can get all of this information and I can come up with all the ideas in a world of data and design that's not true data and design can provide the information but it also can come up with the ideas so a better definition is that a policymaker doesn't have all the information nor the skills nor the techniques nor the ideas but it does know how to bring people with them together they need to be able to work with data analysts to spot patterns and data at the same time is working with ethnographers to really get underneath that data and explain why things are happening they need to be able to work with data scientists to automate really clunky bureaucratic processes but also to be able to design them so they actually fit in with people's real lives and they need to work with graphic designers so they can visualize and make accessible very complex data that the civil service loves and share it back out with the public so we can all generate ideas together now not all of us are policy makers or designers or data scientists but we can all use a data in a design approach in our lives you might be someone who loves Sudoku but can't draw a stickman or you might read someone who sends out lives in art galleries but can't add up to save their life we're all using all the time are creative in our logical selves take renting or buying a house you have to make a cost-benefit analysis make sure you can afford it but also you need huge amounts of creativity to turn your house into your home this is important because data is our future right now we are generating 2.5 quintillion bytes of data every single day that's 25 with 17 zeros after it every single time you go online and search you use your store card you tap in with your oyster card you are creating data and experts think that in 20 years from now we're going to be creating a hundred times as much citymapper is an app which uses data government data to tell you how to get from A to B great there's lots of other apps that do that well what's brilliant about it is it uses human stories and human needs to present that data in the way that we find useful so rain safe is a service which not only tells you how to get from A to B but tells you the driest way to do so so if now we're going to have apps that will help us get from A to B in the driest possible way in the future we're going to have autonomous vehicles who can drive us there for us if now we can use our fitbit's on our smart phones to tell us how many steps we're taking every single day in the future we're going to have smart fridges that will monitor our health and order in healthy food for us and if now we're just about starting to get elderly people to remotely share their blood pressure with their GPS from their homes in the future they'll have remote robots companions to help them do that so data it's going to completely transform our lives in ways that we can't even imagine but we have to make sure it is well designed data after all is human we all generate it we're the ones who give the data mostly and we're the ones who do something as a result of what the data tells us so let me take you back to those first first two stories we have community centers and telephones a London borough is having all of its trouble with the teenagers not going to the community centers the data showing numbers are declining but no one could understand why so they've got some researchers to go out and actually spend time with these teenagers finds out what they do like doing what they don't like doing and what did they find well not surprisingly girls and the most part don't like computer games football girls and table tennis and the boys the boys actually prefer hanging out with the girls so very very simple story that boys mostly prefer hanging out with girls explains the data the community center was able to invest in equipment for girls and numbers went up and BT who had this amazing new speedy automated service for the public no one was using it but they didn't trust it they did not trust that a computer could look up a number so quickly so someone had to have that aha moment of going we need to build in trust we need to record the sound of someone flipping through a ginormous phone book and we'll play that to them while they wait to their small amount of time people believed it people started using it so let me now leave you with one final thought it's data the new oil if it is we have to treat it with so much care we have to make sure that we're using it in a way that humans would want and design can help us do that like a hybrid car we need data and design together in combination and we need hybrid policymakers to help us do that thank you

Find, Use, & Govern Data withIBM InfoSphere Information Governance Catalog

today we're gonna look at how IBM information governance catalog makes it simpler to find use and govern data our marketing team needs help identifying smartphone buying trends for this we'll need to locate data sources for customer purchases and then compare them against other data from the supply chain and product launches since we don't have the specific table names we can use the catalog to find assets by searching on a relevant business term in this case we need customer data the results from our search can include related terms tables and reports since we're looking for customer data this customer sales table looks promising we can hover over the asset and get a quick overview of it if we select the asset we can see more details like the business definition and structure to get a better understanding of how it's being used in the organization this information helps build trust that this data is what we need if we have more questions about it we can ask the data steward for additional context we can also explore the lineage of the asset to see where it's coming from how it's been used in other places and other processes or applications that have used it here the data lineage shows that the customer sales table was derived after various transformations and filtering now that we've identified the data and are confident it's valid we can add it to a collection we can repeat this process using other business terms like sales and discount to search and collect as many data assets as we need as we use these assets or add new business terms the governance catalog updates its records this is just one way organizations can gain value from the information governance catalog visit the link below and download your free trial today

AI, Big Data, and Data Governance // Stan Christiaens, Collibra (FirstMark's Data Driven)

as Matt briefly introduces for a data governance software company and we have this niche audience in a way of chief data officers and data stewards and data Czarina's and and the like so we have the sort of adapt our message a little bit to the variety of audience which is on the one hand technical as well as business if I understood correctly right so I'll try to give you as as good as a story as I can or a multitude of stories and if there's questions at the end I believe you have like five minutes of questions so first let me talk to you about the frustration that I've seen with you know companies and when it comes to getting a value of data so if you you know if just like this guy here I forget your name I'm sorry but you know you're trying to find machine learning experts sorry and you know data scientists and all that stuff and you find that there's not a lot of good ones out there but then you let's say you find one or you find a team then you're gonna actually hit their frustrations which are many but there are two very important ones one of their first frustrations is I can't find the data right that's like the biggest problem where's the data give me the data I'll put the models on it then one they have the data then the next problem appears and then it's all you know they make all sorts of classifiers beautiful visualizations training data sample data what have you and then they produce beautiful output whatever they produce models classifiers but then the organization sort of does nothing with it right they don't make a product out of it they don't make a service out of it they don't change the business process so these talented people that you then hire are becoming demotivated and will actually go somewhere else just because their work is not actually adding any value to the business so I'll try to talk about some of these topics and I'm going to put that in the context of these seven predictions we did with calibra about a nine months ago I would say and I'll see if I can remember them and I'll tell you which ones were actually complete wrong so the first one is about the rise and fall of the CBO and chief data officer and I'm going to again play on you know the email story so you know we think from our viewpoint as a data governance software vendor that's achieved at officers on our own the rise which they are but the question is how temporary are they actually going to be and then I learned from one of our customers who did a start-up twenty thirty years ago when email wasn't around that back then they actually had messaging systems that they bought and sold and they had a chief email officer back then nobody has a chief email officer right now right so the CEO how long will it last it will still grow but how long will it last and second data will require a system of records right just like a chief financial officer as a CRM ERP or PP of sales as a CRM system data will have the same kind of meet three data education will explode and it has right I think there's another Belgian startup called data camp who has a million students opening data science so there's a lot of data learning out there ideally this translates all the way through not into the technicalities of how to make a Python script but actually into how does data get into an MBA program for example right for and the predictions was or was it again the data data citizens who are other people who use data to do their work in our company product managers or data citizens for example sales ops and data citizens they will rise up against the data dictator that's a chief data officer who tries too much to control the data and doesn't democratize enough for people to actually get value out of it so that will also happen what are we now five the Internet of Things will disrupt business models of course that's already happened we were too late with that one data protection will overcome data privacy especially with the European GDP our protection rule and then the last one I think we had was all about that the blockchain will emerge into seventy so the last one at least from our vantage point we got completely wrong because when we were saying the whole blockchain thing to our at our user event a couple of months back half of the audience you're not doing this right now right that half of the audience was actually googling what blockchain actually meant so for us we got that one completely wrong and what I wanted to I use this story as a context for what we actually missed because the one that we missed as I understand it is a very popular topic in this audience and it's very simple right the one we missed I would say because of my silly reasons it's artificial and that intelligence and machine learning so I'll tell you why we didn't put that on a map because I'm a little bit of a skeptic and you know we had the AI winter in the 70s and the 80s when the government funding dried up and then the first commercial applications failed and then again back another story from Belgium my home country I've been living in New York now for three years we had that whole natural language processing event that happened where he had flange language valley and that boomed and busted all right so there's a lot of cycles that already went through AI and from that few points I didn't believe that it would hype as much as it puts this year so that one we got wrong now why do I believe that it did hide this year or is exploding this year multiple reasons one the processor power if you if you've seen it for example Nvidia has increased its stock price by four times over the last 12 months because everything is GPU driven right now matrix operations to more data everybody knows that right there's more data out there to actually apply your algorithms on and three and this is a belief of me that could be wrong is that actually the big tech firms they are Amazon Facebook Google Microsoft and all the others they're using AI and machine learning as their next feature war like who will have the best platform to build on so that's why we believe we should have put this on because we're a little bit of a skeptic we would also like to you know give you three pitfalls to watch out for in their session when it comes to AI and the first one is Harry Potter here it's not a magic wand maybe this is the most well picture we could have taken right because Harry Potter's not AI arrow but this is about how we as engineers tend to look at technology as the thing that will solve all the world's problems and I got another story from that when we started 10 years ago our professor said you know back in the day there was a spin-off at the University and they were all gung-ho about object-oriented programming so that was going to be their company object-oriented programming that was a differentiator so it's an engineer sort of computer scientist looking at technology as a differentiated but how does it actually add value to the business the funniest thing about this story is the name of that company you know what they call it soft core come on right so with respect to the magic one aspect of AI at the moment I would say don't you know don't expect that Johnny Depp is going to show up and turn into some super intelligence controlling nanobots all over the place you're going to find most business applications of AI currently in very specialized applications but for a very specialized business problem even south-south driving cars are very specialized right you can't have that same algorithm or machinery that you produced drive a bicycle right it's going to have to learn all over again the way you recognize faces is different from the way you recognize all these things for example so a I will have business value first in specialized applications just so look for it in your business in that very clear acquirement how does it add value how does it reduce cost how does it reduce or mitigate risk that's where you have to look and if you don't believe me just think about the cost of doing AI but even I believe Google when came out with auto ml I think it's called they did this experiment where they had one neural network learned what the feature or the configuration should be for a child or a slave neural network and that's one experiment took 800 GPUs about several weeks of calculation time just to run one experiment so you don't want to invest that cost if you don't know what the value is going to be your electricity bill is going to go through the roof right so that's one thing to watch out for the second thing to watch out for with AI is the salesman's pitch so it's it's all about doing doing your due diligence and I'm going to use two examples here that have had a lot of attention in the media and I don't wanted this IBM to the company right but I am going to use them as an example you know how they did this big thing with Watson winning Jeopardy and so on and so forth so around the same time they also did this big announcement that they were going to solve cancer together with the MD Anderson Cancer Center now several years later and 60 million dollars down the drain that initiative failed it actually failed I stopped doing it and they went back to market why did it fail because they were having challenges in connecting Watson to the electronic health record system but he fundamental right if you want to get some data going in there and second they had too little good data turns out that all these papers that are out there in the field about oncology and whatnot that there's actually just a very small subset that actually has very curated and controlled clinical trials that have the right amount or the right data that actually feeds into the algorithms and then there's another story that was a sort of a failure of machine learning that maybe you all know is the Google Trends the flu prediction if you noticed or a few years old they predicted based on search keywords in Google that the flu was going to do an outbreak and they said they will do this better than the CDC or faster instantaneous so they did that and then it worked until it didn't so in 2013 they had a mismatch of 140 percent prediction versus the actual situation in the world and again why was that because of all sorts of basic checks right they had their model being over fit right they had they didn't take into account that the data actually changed they changed google search suggestions in the mean time so that changed the data that was produced that it wasn't consumed by the algorithm so again basic things so don't fall into the snake oil salesman strap and please do your due diligence on the technology and then the last one is the algorithms so our belief is that AI and machine learning you will not win this war by having the better algorithm the differentiator the value proposition is not in the algorithm it's actually in the data the algorithms will be open source I don't know if you've seen data scientists or machine learning people in action but that's typically sitting in Jupiter or Zeppelin typing in Python commands in these notebooks and I'm immediately seeing a classifier or a visualization so it's pretty cool right but these algorithms themselves the neural networks there's their open source google actually acquired skagle right which is all about open sourcing the models on certain data science problems so you're not going to differentiate yourself with the model so you're going to differentiate yourself with the proprietary training data that you actually feed into the models why do you think Google has been buying data acquisition companies for years for a lot of money what do you think Google makes all these weird devices like this backpack that scans the street or this car that scans the street they're just or gnashed right the 1984 spy cam that you put in your in your house that sort of stuff they do this so they can get proprietary data that is making all the difference so our view is on AI is that data will differentiate how you will succeed with AI and machine learn and then you come full circle to the beginning of my story which was about that frustration how can that data scientist or business analyst or whatever you call that person how can they actually get the data so their their questions typically go as follows give me the data I by the way that question just pisses me off when a data scientist comes to me and says I don't have data then I tell them go find it right or make it it's not an excuse it's not an excuse just go get it it's part of your job right then if you need to write a Python script or hack into the database of your own internal company just do it get the data there's no excuse anyway so they can't find the data when I then can find it they cannot understand the data they don't know the business context they don't know how to interpret it which could be pretty basic right because bias is lying is like a snake in the data graphs if you will if they then can understand it they don't know where it's coming from the linnaeus people often say if they understand where it's coming from they don't know what's wrong with it and if they don't know what's wrong with it or they do know that everything is ok they don't know who to actually call and ask about their data they don't know the data owner they don't load the data curator it's all these are problems about finding understanding and trusting data that are so commonplace in our view in AI projects and not just any AI projects because the way we see it is that this is all a people problem right this is all being done in multiple places multiple data projects all over the map and it's all people doing ad hoc what we call the W the digital equivalent of wd-40 and duct tape which is Excel spreadsheets meetings etc one minute all right and so they people do this in AI they do this MBI in analytics all the same things right all the same steps they do it in Big Data internet-of-things projects data quality project GDP our regulatory compliance projects and so many more so it's like this firefighting around data controls all over the place all the time it's just disorganized it has all the symptoms of a broken down business process and data today if you treat as a strategic asset should have its own business process so that's my last slide I put some links up there for you to read all about this and we have our University at a bottom if you want to do some free learning just as well that make the time yes very nicely done thank you tell us actually as a person like it can you bring back the last slide yes okay people can take pictures tell us a bit more about Calibra the company so you started alluding to what you guys do at the end but tell us about data governance data cataloging whatever needs and what does the product do well you know it thanks Matt that would be happy to do that although I don't know if the audience will be because typically if I tell with clear up does and I say that we're a data governance software company then everybody's eyes just sort of glaze over our governance that's not interesting with our job as the company math is actually to try and make governance sexy or at least sexy as you can make it by focusing on both parts of governance so in our view governance is about the control and enablement of any and all data management activities right so enablement as much as control and in that context the finding understanding trusting the whole collaboration of data just knowing who to call about what data domain all of these are aspects of governance and that's what we've been doing since I was talking to Bob earlier or sorry to mantis or Shawn sorry and since 2008 it's almost a decade I've been doing nothing but data governance and cataloging since 2008 there were a difference between governance and data catalog me the catalog and you don't explain maybe put in our view yes so what we've done fortunately been quite fortunate in the sense that we've been able to shape the data governance category which is still in flux and in our view the data governance category does include cataloging and my view on this mat is very simple if you have a data catalog which is really like a listing of all the data sets and attributes dictionaries that are out there without governance around it it's just like a phone book it doesn't do anything it's not controlled it doesn't work it's going to stop working reason why I say that is because I've talked to all the high tech companies like Twitter Linkedin etc and they all have their catalogue projects many of them FL actually open sourced catalogue initiatives but then you find that nobody's actually using them right there's no enforcement to use them there's not enough enablement and the first time you hit the catalog you get all these questions that I add that I mentioned right ok but then who's the owner could I call what if I have a problem with the data how do I get access to the data who oppose my requests you get into governance questions right away let's question from you and open it up to people from a governance standpoint is the right approach of data late where you centralize everything and then you put governance on top or is it a more distributed approach where you have not believe the data in viola original repositories but then I guess you need more agile governing software I would say that the that the whether you put all your data in a lake or you put in a warehouse or you keep it separate in their applications or whatever you store it it's more a function of the requirements and the data engineering that follows from it for example if you need a lot of scale and limitedly then maybe a lake is the best solution if you don't maybe something else is the best solution so in that sense that we are less worried about the data architecture we're more worried about how you're going to coordinate all these people that make or put in place the data architecture that satisfy your needs what we do see a lot right now is indeed the centralized data Lake approaches but again I'm being skeptical I think that's just just it's the new thing to try right so many companies are sinking so much money in one of the big three distributions and then you know there's two years down the line and people are actually asking you know what are we doing so again just me we have two questions and one equipment coming your way in just about ten seconds thanks enter I'm curious about what your thoughts are on sort of the ownership of data with concerns with that sort of individuals right to data privacy and I think this is a bigger issue in the Europe that is here where they have much more stringent requirements on the ability for companies to use like individual hopeful data right so the question was about data ownership if I understood correctly and my views on it yeah so I'll try to break data ownership down and two views one is that I'm hearing what I'm hearing you say where ownership is really I can Europe with the individual right so it's it's my data those are my emails Google and if I want to move them to hotmail you gotta allow me to do that which is a very European view which Europe is actually through regulation trying to impose on any business doing business in Europe so it also applies to tech companies but then you know I was in Silicon Valley talking to a Stanford guy number of years ago and putting forth that European stance no you can't how the data is mine right that's it's my data and he was saying a Stan you know just forget about it you've already lost the weather you think's or no the data is already in the hands of these technology companies so that's definitely going to continue and I'm hoping that the regulations will impose enough sanctions so that is big you know Internet giant are actually allowing more control of an individual's data because that's currently the complete walkin and that's going to at risk of monopolizing markets anyway if you talk about ownership in the company's view like I'm a company and we have all these databases and who is the owner of that that's a very interesting topic right so nobody wants to take responsibility no no I don't want to be the data so there what you have to do is you have to sort of sneak on a ship under the door if the sort of say to the business executive of the process or application owner yeah it's just your face next to this data domain you know it's don't worry about it until you know you're like a year down the line and they actually start adopting it so I'd be happy to do to go deeper but those are like two angles on ownership thank you and we're running a little bit low you're going to be around after the talks yeah I drink okay so people can ask you questions directly thank you so much this was very big [Applause]

Linking and protecting government data for social research © ESRC

the UK it's a busy complicated sort of place and running the UK is a complicated business government departments and agencies couldn't keep the UK ticking over without collecting different sorts of information information about people getting sick and being made better information about apprentices getting skilled training then entering the workforce information about businesses making profits and paying their taxes the result is a great pile of data about all of us now civil servants are prevented from putting all this information into a massive database the public certainly aren't allowed to look at it either but there is one group of people who can use this information to help us understand the UK better and all they need is access to it by comparing different parts of information the government collects academic researchers can spot trends which simply wouldn't be obvious otherwise linking data can help researchers find out which policies and ways of doing things work well and which have failed to help that's where the ESR sees administrative data research network service comes in it exists to let these experts gain carefully supervised Dax s to the relevant bits of this information without infringing on anyone's privacy most important job is making sure the research idea being put forward is a good one is it ethical is it feasible ultimately is this proposal going to help us understand the UK better the bar is high because getting the data ready for researchers to look at and then link it involves a lot of work that's why for administrative data research centres one each for England Scotland Wales and Northern Ireland work hard to make sure the information these researchers are seeing can't be linked back to an individual person they let the research have a look helping them work out what it all means and all the while the information direct identifying you me and your neighbor things like our names and addresses is completely removed researchers don't mind because they're interested in the big picture not the specific individuals involved it's a lot of work but it's worth it because the end results can often be a bright idea the kind of idea that can help improve the way those apprentices are helped to get a job the kind of idea that cuts through our busy complicated lives provides an insight into society's inner workings without intruding on anyone's privacy and ultimately makes our busy complicated place just a little better

The Importance of Data Governance for Organizations

de demo binnentrokken bij governance in flight koffers om van een typisch jou en in fine and tab to companies and cd koffer ons voor wat je doet try to explain this is wel van klei fimo perspectief zo ingeven crime scene crimes committed den de l'eau de lo and forth mijn energie kosten en seal 17 en andrew evident is basically sealt is perfect het en daar ook eind of google spoor serieus politici processes to collect is en to share with and then they go to get steden de crime scene investigation steenkool natuur de les de grotere researchers and day as for consent voor is lse step dat 63 een stoet heet en die although this to moment catch de veel int catch de kroeg en dan your hands want ik heb team two convict in and this is all to protect polder civilians that they feel treffers het analytics trust het in size met onze redder field frosted er teveel systeem moet zo if you look let's say to government and why is koffer en sony moordend voor organisaties ervoor jorine swish en haast hebben focus on the governance in synology high key users' wat libraries for box en wie uw verdeler bitches worden help you organize qd de boerinnen make a simple een extra symbool in die bar junior innovation u kunt of deden is een certificaat stad in de trident ow ja en dat storge wordt in de deden grand strategy in een klein staat er een crash uw aandelen governance per focus 2 pokémon het she sings yder deden governance voor insights oordelen governance for campagnes in shirley riviere's hebben een big impact de global markets naar justin europe in dutch drive in de beek focus op compliance met zien de markies anti-tank saffraan insights zelfservice extra's te delen trekken baden in your own ja die uurwerk zesde alle delen die u niet most people don't keer boekdelen governance waarmee kinderen vld en houders open source in this approach peter de cd the acceptance were built in a day the covered it strategy oh een open source of grote patch alice the cosby can take advantage of wanneer heb een in de kimi en bodybuilden een mededeling dat beest open standards charlie en organization quinoa twitter customers water suppliers met de partners dit is hoe de shopping day the governance and the winners never ben de successen bol [Muziek]