Sharing gives data wings and posture

28.3.2024
Giulia Masoero vuoren huipulla.

Giulia Masoero studies how climate change affects birds. She wants to be as open and transparent as possible by sharing her data and code, and also encourages other researchers to practise Open Science.

Giulia Masoero currently works as a post-doctoral researcher at the Swiss Ornithological Institute, studying the effects of climate change on alpine swifts. She conducted her PhD in Finland, which also involved climate change and birds, but at that time birds of prey, the pygmy owls. Her topic, climate change, has been one reason for her interest in open science.

As open as possible

“I think I started to know OS just when I started my PhD, at the very beginning”, Masoero tells. She was invited to do her first peer-review and was looking for guidelines for assistance. When searching for those, she also found a guide on how to make your code reproducible and got interested in that. “I started to understand that it was important to make your code in a way that anyone can understand and run it”, she says.

During her PhD, Masoero published a paper in a journal, in which it was mandated to publish also the data. That made her think why to publish or why not to publish data, and she decided that she likes the idea of sharing data.

“Especially when working with issues like climate change… There are so many people that are climate change deniers. They will never believe you anyway but if you have the data, code, and everything you have made publicly available, they have at least fewer arguments to attack you.“, Masoero says. She thinks that if science was just papers, anyone could create something with AI and with random data. “I want to be as open and as transparent as possible to deal with these kinds of things.”

During her postdoc period Masoero has decided that she wants to publish and make FAIR all the data and code she has used to make the analyses.

Masoero’s current funding, Marie Skłodowska-Curie Global Fellowship, comes from the European Union’s Horizon 2020 programme, which mandates her to publish OA and also to make a data management plan at the beginning of the work. Masoero sees it as a good thing that such a big funding agency promotes OS. Making the DMP was especially useful for those data that she started to collect herself, because she had to think for example how to best preserve the data in a way that also other people have access to it. “It’s quite demanding, since they are asking few things, but since I was planning to do it anyway, it’s not a big job.”

Giulia Masoero alppikiitäjä sylissään.

Giulia Masoero with her current study species, the alpine swift. Photo: Héloïse Moullec.

Challenges with long-term data

One issue she has had with data sharing involves the type of data she is working with. Studying climate change requires long-term data collection, so the data Masoero uses for her analyses has been collected since 1999 by lots of people. “It’s not just my data that has to be stored, so it’s not always just my decision to share it.” The dataset as whole is huge and involves a lot of different kinds of data, all of which is not used for the current analyses.

Long-term datasets are almost impossible to share in full. “Like the one I’m using now, it’s 25 years of data and for me it means that future post docs, PhDs and master students are going to be working on this data. And if it’s all public, anyone can just use it also for commercial purposes. So sharing data is great for science overall, but in these cases not so great for the individual researchers. It means future jobs for people.”

Masoero has solved the issue by creating small subsets of the data for use and for sharing. “And if someone is interested in the topic, I’m always happy to share the data upon request and to collaborate.”

Large datasets collected by multiple people could also create issues about data ownership. Masoero does not have personal experiences on that but has heard about situations where a researcher has left the study group and the ownership of the data has been discussed. Masoero thinks that it is hard to say to whom the data in general belongs to, but to her data the answer is clear. “Since I got the funding from the European Commission and they ask for the data to be open, then it’s simply open and that’s a good solution.”

Working with long-term data is nothing new to her, because her PhD work in Finland involved also long-term data on pygmy owls. Another issue with those data were sensitivity issues. As birds of prey pygmy owls are charismatic species and people are interested to see them. It is also listed as a vulnerable species in Finland and thus their nesting sites have to be protected from disturbances. Masoero shared the data but hid the exact locations of the nests from the dataset, and they are only available upon request.

Giulia Masoero kiipeämässä puuhun, jossa on varpuspöllön pönttö.

Photo: Masoero conducted her PhD work in Finland by studying pygmy owls. Photo: Jorma Nurmi. 

Need for recognition

In ecology, as in other fields too, journals are increasingly starting to ask the authors to also share the data when submitting a manuscript for evaluation. Masoero sees the development as a good thing, also for the researchers themselves. “It forces you to organise your data in a way that is understandable to other people and also for yourself later on.”

However, in her experience there is very little quality control in some journals if the data are actually what they should be. Sharing data is useful only if the authors are actually committing their data. Evaluating also the data and code and not just the manuscript means of course more work for the journal but it is important work. Who should do it is a completely different question: Masoero would not want to burden reviewers with that as they are already doing voluntary work with barely any recognition. Also, many editors work without any payment.

The word “recognition” comes up in the discussion also when talking about sharing for sharing data or code. Masoero thinks that not getting recognition and the fear of misuse of the data are the most common reasons for people not sharing data. “At the moment I have the feeling that I’m only getting recognition from the peers who are doing the same thing”, Masoero sums. She says that when you publish the data, sometimes people using it might not even cite it, so you are getting no recognition of any sort. Masoero knows an example, where someone’s data had been used in a meta-analysis without contacting the original data providers. The users had misinterpreted the data, which led to wrong conclusions in the paper.

 “It would be very important to contact the people who collected the data, if possible, or at least have very good metadata and descriptions”, Masoero says. Ecological data can be complicated in that sense: even with good metadata you might have issues in interpreting the data if you do not know the study species or the ecosystem, and can jump into wrong conclusions. Masoero sees the involvement of original authors as a huge benefit. “I think it would be nice if the shared datasets would be more of a spark for collaboration”.

Importance of communities

Masoero is actively involved in the SORTEE (The Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology) community. “What first interested me, was that it was a very nice community of researchers who are very interested in OS.” SORTEE has a code of conduct and is taking inclusivity and accessibility into account as much as possible. For example, their annual unconferences have program running for 24 hours, so that people from all the time zones can participate. “I like that science is going not just toward open but also towards kind science”, Masoero says.

Having a community interested in the same things and dealing with similar kinds of issues with OS is important, not only for getting recognition but also for practical help, especially because official instructions are sometimes too vague or general. For example, the instructions from journals about sharing data can be confusing. “It’s often left a lot to the authors, like how much they want to comply with the guidelines”, Masoero says. She feels that journals should have better policies. “It’s pushing the change forward.”

In general, she thinks that too much of the responsibility in promoting OS has been left on the researchers. “We as researchers, and also the journals, can push the change. But what actually can make the change are the funding agencies and institutions. “Like the Horizon2020 money comes from all over Europe, so it’s nice that we don’t just use the money but give back to the community, in terms of data and results.”

Every step counts

For those researchers, who do not yet practise OS and might not even know where to start but are willing to learn, Masoero has a simple advice: “It seems like there are a lot of things to do and of course if you do it all, you will work for weeks. But if you’re just starting, every small step towards open science is great.”

When you start to do more OS-related things, you slowly start to have a lot of new skills. For the next paper or dataset, it will take much less time. “What took you like three months for your first take, will take just a moment later on.” She also wants to encourage people to be part of communities, to take a look at what they are doing, listen and learn.

And if you are not convinced otherwise about the benefits of OS, you can do it for your own sake. “I have found it useful that the data and code I published for a manuscript will always be there. I’m always going to be able to find it because it’s published. So even from a selfish point of view, it is great. I must organise everything for other people and then it makes my life much easier.”

 

Interview and text: Elina Koivisto

Photos: Giulia Masoero, Héloïse Moullec, Jorma Nurmi (Rights remain with photographers) 

You might also be interested in