Filipa Castro Data Science for Social Good

Automatic Summary

Data Science for Social Good: A Perspective from Portugual

Good day! I'm Phillipa Castro from Portugal. In this blog post, I'd like to share how data science can be utilized for the benefit of society, focusing on my experiences with Water Act and Data Science for Social Goods (DSSG) Portugal.

My Journey with Data Science

I've always been interested in how data science can contribute to societal good, dating back to when I joined Water Act around ten years ago. This worldwide organization, consisting of young people united in creating positive changes in the world, was where I developed a keen interest in data and the potential insights to be gleaned from it.

My passion for data propelled me to pursue a master's in biomedical engineering in Porto, Portugal. After my studies, I embarked on a data-centric job, quickly learning that professional skills could be utilized not only in the corporate sphere but also in impacting societal change.

Establishing Data Science for Social Goods in Portugal

In 2019, DSSG Portugal was launched, and despite not being one of the founding members, I was invited to participate by dear friends. The initiative's goal was to collaborate with other associations and leverage data to drive impact in their social projects. Our initial objective led to a pilot project development that I will delve into shortly.

Journey of a Data Science for Good Project

The first stage of a data for good project, from our experience, involves directly engaging with the beneficiary organization. This open dialogue ensures an understanding of their existing challenges and available data. It also fosters mutual commitment and enthusiasm for the project.

Once the beneficiary is on board, project scoping begins. We focus on a specific problem, given the short duration of our projects, typically between three and six months. The scope is documented, outlining the problem, proposed solution, and main goals, which is then sent to the beneficiary for review. We iterate until both parties are satisfied with the project's scope.

Project Case Study: Fundraising Against Cancer

One of our first projects was with Rotaract, focusing on their fundraising initiative against cancer. This vast campaign involved over a hundred volunteers, but initially lacked insights on optimal fundraising locations or best practices. We decided to analyze their campaign data, resulting in valuable insights and recommendations for subsequent campaigns. Once the project's scope was conclusively defined, we recruited volunteers.

The Volunteer Recruitment Process

Though the SSC is a lead team committed to organizing the logistics of the project, it's the volunteer community that actualizes the social projects. We open call to volunteers, conducting interviews with all applicants. The project team can range from one to five individuals, always featuring a project manager and technical lead.

Subsequently, we run the social project. The primary role of the SSG entails overseeing the relationship between the volunteers and the beneficiary, harnessing constructive input to assist with data queries or scoping issues. The SSG ensures that project deadlines and milestones are achieved and delivers the final results to the beneficiary for feedback, assessing the project's societal impact.

Outcomes and Impacts of the Rohract Project

Our collaboration with Rohract resulted in an interactive report that offered crucial insights to optimize their fundraising efforts. These insights were also used as a compelling story to validate the importance of older volunteers to the fundraising campaign, showing how they played a vital role in gathering significant funds.

Ongoing Data Science for Good Projects

Our current areas of focus include optimizing the appointment schedule for a public pet hospital, better characterizing the impact of a hurricane in Portugal, visualizing the tangible impact of an organization working with homeless people, and addressing COVID-19 data extraction challenges in Portugal.

Addressing COVID-19 Challenges

When the Portuguese government released daily COVID-19 data via PDF files, hindering data analysis, we decided to create a public data repository. With the help of volunteers, we automated the data extraction process and regularly updated the information. This initiative opened up various contributions from the community, yielding numerous dashboards and analyses.

Get in Touch

If you're interested in learning more about our work or establishing a DSSG chapter in your country, please reach out to me on LinkedIn or get in touch with us at the DSSG website. Together, let's harness the power of data for the benefit of society!


In the age of the digital era, data is an invaluable asset that can drive social impact when appropriately harnessed. With organizations like the DSSG, data science can transcend the confines of corporate gain to foster societal good.

Video Transcription

Hello, everyone. Good morning. It's still 9 a.m. in Portugal. Um So my name is Phillipa Castro. Uh I'm from Portugal and today I'd like to bring my perspective on how we can use uh data science uh for social good.So first things first, uh I will just explain how I got interested in uh data science for social good purposes. So, around 10 years ago, uh I joined Water Act. This is a global organization of young people uh that get together to create positive change in the world. Uh There are a lot of clubs everywhere in the world. So if you are interested, uh you can look for it and we'll certainly find a ro rock club in your town or at least in your country. I'm sure later I started my integrated masters in biomedical engineering in Porto in Portugal. And this is also relevant, of course, in the journey because this is where I got interested into data and learning from this data. So I started to follow this passion. And once I finished my, my uh degree, I started a data related job and this is important because it's uh basically a daily opportunity uh for us to learn more about the field. And also because I think it's when you start to actually work that you understand uh that your professional skills can also help you outside of work to, to do other things. And to, for instance, uh achieve social impact.

In 2019, data science for social goods, Portugal uh was founded, there were already other chapters uh in other places of the world. Uh And even though I was not one of the founders, uh some of my friends were. And so we decided to meet because I belong to rot. And the goal of DS SG is to meet with other uh associations and to help them uh achieve um more in their projects in their social projects through data. So we decided to meet and start a pilot project that I will give more details after this. And in 2020 I also ended up joining uh the SSC and that's mainly to talk about our work that I'm here today. Uh So now I would like to go more in depth about the journey of a data for good project, how we really do this all the steps. So the first step is to meet the beneficiary and the beneficiary here is an association. It might be an NGO for instance. And here it's really important that you meet with them, you understand the problems they might have, you understand the data, they might have available or not sometimes and more important than anything, I would say, you ensure both parts are equally committed and motivated to do this project.

So in this case, it was a meeting between the SSC and Hot Act and then we start what we call the projects scopic. So this beneficiary might have uh a lot of different problems to solve, but our projects are short. So normally we have a project of between three and six months. So we decide to focus on a target problem at first and we start to write in a document, what we call a scoping, document the problems, we want to address the solution we are proposing. And the main goals of this, we send these documents to the beneficiary and we iterate with them. So until both parts are satisfied with the with the project, basically, in the case of uh Roto Act, uh we decided to focus on one of our projects which is uh fundraising against cancer. So basically, this is a big campaign which involves more than 100 volunteers, but we didn't have any kind of insights on which were the best locations to ask for money, the best shifts or the best practices. And we decided that this would be a great project uh as a pilot. So explore this data that we had about a campaign, look for new insights and for new recommendations for future campaigns. And once we have this uh scoping, well defined, we start to gather volunteers.

So the SSC is a lead team, which I'm part of. But the lead team is just responsible for organizing these logistics of meeting the beneficiary finding the volunteers uh communicates our projects. But we, we are not the ones who actually do the, the social projects. So when we have a new project, we open a call to volunteers, we share uh this call in our newsletter in our social networks and then we run interviews with all the candidates. So normally one project is 1 to 5 people. Uh We always have a project manager, a technical lead and the other ones will be data scientists, but they can also be like data journalists. It will really depend on the scope of the project we are talking about. And then the SSC has this responsibility of engaging this community of volunteers. So we try to organize some learning events, networking events uh to, to motivate them. In the case of the pilot project with Broad Act, we have three volunteers which were Daniel Miguel and Thiago.

And once everything is ready, we actually run the social project. So the main goal of uh and the main role uh of the SSC here is to manage this relationship between the volunteers and the beneficiary. So every time the volunteers have some questions regarding the data, every time they have some questions about the scoping or if we need to adapt the scoping. Uh the SSG will be there. We also need to ensure that deadlines and milestones are met along the period of the project. And once this is finished, we are the ones who deliver the results to the community or to the organization. And we collect feedback from the beneficiary to evaluate if this really has an impact on uh society or their organization. At least in the case of our pilot project. Uh the problem, the, the results are the output of the project was uh interactive report that I will show uh next. So just to give you a bit of uh overview and more context about this fundraising campaign, we are talking about uh in my little town, we have like 20 different locations which you can call parishes.

Um And we have more than 100 volunteers that during two days of the weekend, Saturday um Sunday, they are in different places like supermarkets, churches, et cetera, uh asking for money. And for these two days, we normally raise around €20,000 just in this small town against cancer.

Uh And so now I would like to show you in more details, some of the results that this uh uh social uh data for good project add. So try to show this. OK. OK. I think you can see my screen. Yes. OK. So the first one is uh an interactive map. Uh So the fact that it's interactive is really nice because when you go to presentations. Sometimes we have meetings with the presidents of the parishes to organize this fundraising and to present them with the results of their parish.

Uh And in a glance, with this interactive report, you can uh understand uh which were the most successful parishes. So in other colors, you have the parishes that fundraises more money per uh and inter interesting to see as you expected that this town which is the central one would raise more money. And it's interesting to evaluate uh how this evolves for the suburbs. So instead of showing a table with numbers uh in a presentation, uh this is much more interactive and impactful. Uh Another one that I have is the distribution of the amount collected by each volunteer according to their age. So maybe for someone which is not into this project, it's just ok, maybe older people with more than 30 years collect more money. Uh And it's true. Uh This is normal and it can be explained because other people normally are more popular in town. They know more people or they, they are more uh they have more experience in this campaign. Uh But for us, this visualization is really nice because we know that in our club, we are a group of young people, but we also have older volunteers. But every year, it's more difficult to convince these older people to help us. Uh because it's like they have to say, stand up for four hours in a row, asking for money.

So when you have like 50 or 60 years, this is not the most healthy way to, to spend your time. So it's more difficult to convince them. But we use this kind of uh data, really objective uh metrics to show them. OK? You are really important. Uh So we try to remind them that they are important uh for uh for this campaign and to help us reach um higher value of money. And here I have another one which is basically a bar plot with the the amount fundraisers, maybe I can zoom in for it to better with the total amount of money that each parish collected. So here you can see that like the gray one, the gray bar is always the one with more money. And this is not surprising because this is it, this corresponds to the center of the city. So this is not very impactful. So what did the SSC did was to show the same results but uh normalized by the and the things become interesting because you see that for instance, here you have a very small town, which is this one in pink that actually can collect more money than the, the the the main city.

So in terms of population, they are better. And when you are in a meeting with the politicians and the presidents of his parish, uh they really become competitive about this, which is nice because they will try that in next year, they will fundraise more and they start to be competitive with these small um cities. OK.

So going back to the presentation, I already talked about a bit about each of these visualizations. And now I would like to talk about other projects that are ongoing projects at the moment. So one of them is with uh this public pet hospital where we are trying to optimize um the the waiting times for appointments or for treatments. So basically, we'll try to understand which kind of patients or group patients um uh in terms of and understand the correlation with the waiting times so that we can optimize the schedule. Uh And that we can uh uh uh open it to more clients each day. Then vol is also a volunteer organization. And basically, they have a lot of data from the government and a lot of data from Civil Protection. Uh Here, we decided to focus on one project which is uh the characterization of uh the impact of a hurricane that happened in Portugal. So basically, we think, and they think that their data is not totally accurate about the consequences of this hurricane and impact. So we want to cross this data they have with some other sources like news that were at the time of uh of the hurricane to better characterize with um a more complete characterization of this impact.

And this will hopefully help to prepare for better for other disasters, for future disasters and to better prepare for them. If we can understand what was the real impact and the consequences. For instance, if there were some buildings that were not preparing, we can understand why and you can prevent this for the future. And also guys, so guys is association that works mainly with homeless people. So they have this magazine that they produce every month, then they give this magazine to these people and they can sell it. Uh So they are actually work. Uh And then they can keep this income from the magazine and use it for their food or for the rent and what we want to do because this magazine already has 25 years is to get the, all the data. They are about these people and about the selling um and try to use storytelling with data to tell the story and the impact of cash uh to then engage the community and raise more donations. Um and uh increase the impact of this association. And last but not least uh just to give you some context about what we did uh regarding COVID-19. So basically in Portugal, uh the government uh shares every day, the data about uh new cases or uh test cases or deaths in a PDF file.

And this is not very nice for data scientists and for that exploration because then you cannot extract the numbers, you need to do this manually. So we decided to create the data repository online public for everyone to use. And we started to manually extract this data every day and put it in a CS V file so that scientists could use it uh for their analysis. And then we started to uh to gather some volunteers to help us. And now we already have a automatic data distraction pipeline that collects this data from the PDF every day. So we just need to click a button now. And we also have some automatic test that we use to make sure that the data makes sense because sometimes there are some uh mistakes. Uh So when they are, we uh warn the community about this since then, like three months, we are receiving a lot of uh contributions from the community. A lot of dashboards, a lot of analysis. Uh This has been great. We also wrote an open letter to the government basically with our experience uh from this project, we did a list of the gaps. We think there are in the information they are sharing.

And also we offer our help for free our skills uh in case uh they want some help in improving uh these, these methods. So maybe I have some time to show you this, this page I'm talking about. So in case you want to visit, uh I think, yeah. So this is the page, the repository I'm talking about. Uh And actually um what we did in uh the last weeks was as we had some ideas on how to improve this repository. Uh We created some mini projects which you can check here. So this is for instance, a project for sentiment analysis during the period of the pandemic so that we can correlate the sentiments of people with the restrictive measurements from the government. This for instance is very simple. It just every time we update the data, we want someone to make an automatic script so that we publish this information on our Twitter. Um And we have been receiving a lot of help from uh volunteers. So please check it if you are interested. Uh If you have any questions, please ask in the chat, I will try to answer that. Uh You can also reach me with linkedin here. You have the context for, for me and also for the SSG or even Roto Act. So please get in touch. If you have, if you are motivated to create your chapter of the SSG in your country, please get in contact with us because we can help. And yeah, and I think my time is almost over. Uh OK, I have one question I can try to answer.

So I feel it uh for boost how to extract the data from the news. So here we can can also try to share, for instance, for COVID, what we did in the repository that we have is that here we have some data about OK. N nicus, which is news in Portuguese. Um So it's possible to go for instance, for a web to a website of a journal or a magazine and uh to write the script which you call a scrapper that basically collects in a CS V file. Uh All the news, for instance, if you choose news from the 10th of June, it will collect them and then it's a bit of data cleaning and data exploration to, to, to retrieve some value from this data. But it's basically that then I can give you some more details uh later. Yeah, I think the time is gone. Thank you everyone. Uh OK. I'm seeing here people that create the dashboards. Yes. If you are interested into it, you can also see here in our repo, it's mainly about Portugal, of course, but here we have some applications uh that people did. So if you open uh for instance, this one you will see the dashboards. So every time someone does a dashboard, we share it with the community, for instance, this one. Mhm OK. I will share the links. Um OK.

So this one for the story and in case you want to know more about the association, it's Pshe dot pt uh and the website is in English. So I think everyone can understand it in terms of contacts. If you want to reach I will with this year, this one. OK. So you also have my contacts here and thank you very much for, for your presence here. I will now be on other sessions. Thank you and please reach me in uh linkedin and I will still check some questions here and answer if I can. Ok. Thank you.