Travel blogs from International Data Week 2023 organized in Salzburg 23-26 October.
CSC - IT Center for Science granted participation/travel support via EOSC Finnish Forum for four experts to the International Data Week 2023, combining RDA 21st Plenary and SciDataCon 2023 and organized in Salzburg 23-26 October. The experts tell about their experiences at IDW23 in the blogs below.
Julia Niskanen, Tampere University
Juuso Marttila, University of Jyväskylä
Päivi Kanerva, Tampere University
Pak Lun Fung, University of Helsinki
Picture: Awakening Salzburg, Päivi Kanerva
Julia Niskanen, Tampere University
PhD, data manager & researcher
I participated in the Salzburg International Data Week 2023, a conference combining RDA’s (Research Data Alliance) 21st plenary and CODATA’s (Committee on Data for Science and Technology) SciDataCon 2023, as an online registree with support from EOSC Finnish Forum. With a recent pivot in my research career towards open science and data management, this was my first experience with any of the aforementioned events, and many of the concepts and practices were somewhat novel to me. In other words, IDW23 had a lot to offer, and, to my delight, it delivered - very successfully.
The overarching themes of the conference revolved around FAIR practices, global inclusivity, and machine learning & AI. For an individual researcher immersed in the day-to-day scientific work, it is not self-evident how the structures supporting and shaping our work culture have arisen and evolved; it was therefore very eye-opening to get a glimpse of the people, working groups and organizations committed to improving the foundation of open science. The rich variety of topics presented, both discipline-focused and interdisciplinary, were notably interconnected, demonstrating the wide applicability of the underlying principles. Some presentations specifically illustrated the benefits of crossing the boundaries of fields of science, such as the opening plenary, titled “Spatial Data Science: Geographic Context Matters”. The main takeaway of the session was that spatial data is present perhaps more often that one would think, and that using it can confer sometimes surprising advantages to quality control and analytics as well as add an extra layer of meaning to interpretations and conclusions.
Another crucial theme discussed in various talks were the elements that are considered top-tier research outputs by the scientific community at large. The current, publication-centered funding culture fails to take into account important groundwork that many fields are based on: the collection, curation, management, analysis and preservation of data. With the accelerating rise of machine learning and AI, data is held in increasingly high regard, yet there are few to no established academic career paths where the value of practical data work is recognized. To combat this, the insightful presentation of Zefan Zheng, PhD student at the Max Planck Institute, outlined a different and non-traditional approach to implementing FAIR principles in practice, building a system of transparent and creditable outputs in all stages of a study. This culture change would be especially critical for early career stages, where it is necessary to navigate not only learning how to do science, but also surviving the often fragmented funding landscape and pressure for novel discoveries. Crediting outputs other than published articles might lessen this pressure, level the playing field for young investigators and produce more transparent and replicable science.
The final crucial theme was the role of machine learning and AI and the utilization of interoperable resources. While harnessing the power of machine-operated processes during the entire lifecycle of a study is a particularly attractive prospect, executed non-optimally it might inadvertently increase the human workload required for curation of information. Therefore, adapting data and metadata to commonly accepted, machine-operable formats and interlinking them by using persistent identifiers is of key importance. User interfaces are also not to be forgotten, as various presentations demonstrated. After all, if a piece of software is not understandable and practical to users, it will not gain usage. Ultimately, the most useful solutions for the future of research might lie at the interface of what is humane and meaningful to researchers and other stakeholders, while also feasible for machines.
In conclusion, IDW23, RDA & SciDataCon were a diverse experience on the different facets of data and how we, as a society, do and would like to interact with it. I warmly recommend these events for anyone involved in or adjacent to research with an interest in data
Juuso Marttila, University of Jyväskylä
PhD, Informatician
I got a chance to spend a whole week in Salzburg International Data Week. To call the event fully packed would be a serious understatement. Four days of presentations from early morning till late afternoon or early evening and added networking events. It kind of leaves one with a “thousand-yard stare”, but also luckily with a plethora of notes to draw upon. So here are homecoming gifts from surprisingly summerly Salzburg.
General Impressions
Modern conference tools really make everything easier and smoother by a factor of ten. Flipping through the agenda and making your own one are one thing, another is the ease of networking. When you see something interesting, just search the presenter from the conference app and send a message, arrange a meet-up etc. Perfect solution for Finnish people who shy from approaching people directly without any preliminary warning! System was so efficient that there is no excuse running a conference without this kind of app!
If I was really impressed about the practicalities of the conference, I am not sure if I agree with the concept of the Data Week. Cramming together so many SciDataCon and Research Data Alliance’s sessions meant that RDA Workgroups in many cases had to step aside so that most of the RDA sessions were Birds of Feather and Interest Groups. And while BoFs and IGs are an important part of RDA, general consensus among the people I met was that the real lifting is done in WGs that now were absent. It meant that true gems of the conferences were found from SciDataCon presentations and RDA stuff often remained in a more abstract level of intentions and considerations for new initiatives. It is quite a lot like Finnish Open Science Coordination: if we have an agenda that only presents some recent developments and wonders what we should do next, it is more for policy makers than experts. RDM experts can be found in those open science coordination workshops where magic really happens.
Three content related points carried home
All examples and working workflows, tools etc. are usually highly specific and function in the quite limited surroundings of a specific institution, research case or discipline. Even when FAIR principles and other presented points apply in general level to all science, it seems quite hard as a generalist support person to take lessons learnt from these cases and to apply them in an institutional setting where we are trying to cater to a wide range of researchers. I hope we manage to find something general enough that still brings added value to our jobs while not being discouraged by comparisons to the level of ambition attainable in more specific projects.
Why aren’t we talking about collections as data? BoF session posed a good question: why are we not collaborating and coordinating more with GLAM actors (galleries, libraries, archives and museums) to get them and their contents involved with the world of open research data. In Finland libraries are quite well-presented, but archives and museums are almost totally absent. Still they handle big data collections already used in science and also act as repositories for many data collections produced by researchers (e.g. ethnology). Still we have no visibility to that data in our national services!
There are more and more separate instances of institutions trying to make Machine-actionable Data Management Plans and make DMPs itself more FAIR. Still, most are concentrating on gathering information and giving some intrinsic value to DMP itself instead of using the information contained in DMP and integrating it into concrete workflows. Diversity among different takes on DMPs is so immense, that perhaps we should sit down and think about what DMP is for, why we need it and what information it should contain. Is this all about one plan and its development? For what we need that plan? Or should it be more about new, more detailed kinds of research information and how we should react to this change?
Päivi Kanerva, Tampere University
Information Specialist
This year's International Data Weeks (IDW) combining conference, RDA plenary and SciDataCon, was organised in Salzburg, Austria on October 23-26. The city of Salzburg, the Salzach River (almost as gorgeous as the river Aura in Turku), and the nearby snow-capped mountains created a wonderful setting for the conference, which brought together data scientists, researchers, policymakers and data stewards from various disciplines across the globe.
Before the conference, on the recommendation of my colleagues, I studied the program in advance, downloaded the Whova app recommended by the organisers and created my own preliminary agenda for the conference days with it. Looking back to it now, that advance work paid off: Faced with IDW's wide offering (the agenda included multiple RDA breakout sessions, RDA plenary sessions, SciDataCon parallel sessions simultaneously), it was necessary to practice careful planning and selection and Whova helped to stay on schedule. With Whova, you could also network easily: with the help of the community feature, you could call people together to discuss and share information and tips (either face-to-face or online) around your interests, work role, citizenship, etc. Whova also had the option of private messages between participants. This feature was particularly appealing to me and brought a nice, slightly "more limited form of sociality" to my conference participation as a person who does not quite feel like starting a conversation with complete strangers. Four days of shuttling between a huge program offering and hundreds of participants can be quite an overwhelming experience even for a more experienced conference visitor, let alone a conference novice like me.
Prior to the conference, I decided that I would mainly focus on topics related to research data trainings. One of my personal goals was also to find the person who coordinates the Data Stewardship training course at the University of Vienna. It is our intention to develop a similar course in Finland, and I wanted to create a foundation for our possible cooperation well in advance. However, I was not able to quite stick to my original plan because I couldn't help but notice how important themes such as research privacy, transparency, accountability, responsibility and ethics were at this year's conference. In practice, how to make the practice of science more equitable and more transparent to ensure that everyone enjoys the benefits?
A session that was really an eye-opening for me from the point of view of data ethics and equal participation was the plenary session “Inclusivity in Open Science while advancing research assessment and career pathway impact”. For the first time in my professional career, I encountered the Traditional knowledge labels, which help to identify and clarify community-specific rules and responsibilities regarding access and future use of traditional knowledge. Interest in indigenous people has grown and the scientific community must make every effort to ensure that the voice of indigenous people can be heard even after the data, which involves their information, have been made open to everyone.
Sessions, which were built around the themes of data stewards, competence centers and research data management training, were everything I had hoped for. I got plenty of information about University of Vienna’s Data Steward Certificate Course and their efforts in evaluating and re-working their RDM training portfolio and services. I also heard about research data management trainings at the National Research Data Infrastructure in Germany, Competency Framework for Research Data Support Services at Norwegian Research institutions, how to engage and involve stakeholders to the FAIR implementation processes at CABI and experiences in building a community of data stewards in the Knowledge Hub, a RDM support community of the Flemish Research Data Network.
The most interesting discussions we had were about the various skills and competencies, and especially the so-called soft skills that data stewards should have. A data steward should be independent, patient, organized, collaborative and has the ability to listen and learn. They should be curious, detail-oriented network builders. And most importantly, willing to take on more tasks all the time because otherwise they would have run away ages ago 😊.
It was also interesting to hear about how to encourage the training audience (researchers and data stewards) into more active participation. After Covid19, getting people on-site is even more challenging. People value their own time highly, and consequently they will only come to the site if they are promised in advance that they will really gain more understanding and learn something valuable by being present. However, after the meetings, no one had really been interested in whether they learned anything or not. They were just very delighted to be able to interact face-to-face and being together. Many of the organisers of the trainings had also introduced the hard stakes: chocolate! With the help of chocolate and small snacks it was easier to attract the participants to come to the venue. So, there is no need for any strange circus tricks: in the end, the most effective concept is that people just come to the organized meeting, get to talk with their peers, and their sugar and fat balance is maintained.
Naturally, the same applies to the conference participants: The most valuable thing was to meet and talk with other data librarians around the world who were present, to occasionally get a little supplement to the tea and cake quota (the conference food was not terribly commendable), and for me it was also important to get a face to the person with whom I hope I can do international cooperation in the future in order to train more curious and patient data stewards (who are in such a bad shape that they cannot run away).
Pak Lun Fung, University of Helsinki
Postdoctoral researcher
Thanks to CSC’s sponsorship, I had the privilege to attend the IDW workshop for the first time. As a remote delegate, I familiarized myself with the digital platform Whova. In addition, I studied the workshop program in advance and bookmarked some interesting sessions that also fit well in my working schedule. All set, ready to go!
I joined the opening plenary on ‘Spatial Data Science: Geographic Context Matters’. The key take-home message was that the geospatial aspect of data has the potential to link other, more disparate data types. Multiple disciplines could be connected by geospatial analysis. It was inspiring how this concept sounds very simple but is often overlooked. This indeed is a great tool to tackle systemic problems in the age of complexity. The analysis outcomes, for example spatial maps, are easy to understand for a wider audience group for a potentially stronger social impact.
I also got inspired by some other breakout sessions during the workshop. In addition to FAIR principles when handling data, I learned a new set of principles called CARE. The CARE Principles were drafted at IDW in 2018. The initial focus of the Principles were on Indigenous Data Governance, reflecting the significant role of data in advancing Indigenous innovation and self-determination. This complements the existing FAIR principles, which encourage open and other data movements to consider both people and purpose in their advocacy and pursuits.
Another topic that impressed me was about digitalization of historic publications and written records. The speaker emphasized that the digitalization process is not only for the publication, but also the knowledge from it. Instead of a separate publication, authors are encouraged to update their existing knowledge graph. Normally, when we want to see what the paper refers to, we check the citation one by one and follow the whole knowledge building process. However, this knowledge graph gives a visualized alternative to see the connections between historic publication and current papers. It would be much easier to catch the insights and relationship among the different studies around the same field or multiple disciplines.
These concepts could really be incorporated into my current work in such a way that I will take the geospatial and ethnical aspects into consideration when dealing with data and storing data in the future. That would contribute to the community if we have something valuable. Instead of buying data from private companies, we aim to prioritise the use of open access data for our analysis. Everyone other than us can re-use them and replicate the study and move towards a better future.
Lastly, I would like to say something on how I feel as a remote delegate. I did not feel alone or zoom fatigue. I chose the sessions I was interested in. I interacted with them, although virtually on an online platform. The platform is easy to use and very informative. There is also a forum for job posting related to data stewardship and data management from all around the globe, which could be also very useful to some people.
I recommend anyone who is interested in/curious of/already working with data to join next year!