Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed sections/0_causality/figures/social_diagram.png
Binary file not shown.
4 changes: 1 addition & 3 deletions sections/0_causality/social_causality.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,12 @@ As this deliberately reductive example illustrates and as the scoping review car

This approach is dominant in the scholarship on OS and is therefore not surprising that PathOS scoping review [@cole_societal_2024] found that the largest majority (83%) of the literature published on the social impacts of OS focuses on citizen science. While citizen science is unquestionably a very important part of OS, its prominence in the literature is also due to the fact that its effects on the education and awareness (itself the most studied type of impact – 57%) of its participants represent a precisely defined segment of the chain of causality.

This approach is also illustrated by the French case study of the PathOS project. In this case study, we examined the very first step of the causality chain – the consultation of the OS resources – by collaborating with the three main OS platforms the make them available in France: HAL (the national OS repository for academic publications), OpenEdition (a large public publishing services specialized in open science), and [Recherche Data Gouv](https://recherche.data.gouv.fr/) (the open dataset portal created by the French government). Over the past decade in France, the three OS platforms have received significant public funding, and researchers have been increasingly incentivized to publish their work through them. For instance, the total amount of deposits in HAL has gone from 500k in 2010 to 3.5M in 2024. So far, however, little is known about the use of these scientific resources. Who are the main private and public organizations using these platforms? When are these platforms used the most and why? Who are the websites referring to these platforms, and how are they connected? Understanding this first step in the lifecycle and causality circle of OS is crucial as it is the comparison with the non-open resource contained in HAL (where researchers are encouraged and sometimes required to deposit both OS resources as well as references to their non-OS publications).
This approach is also illustrated by the French case study of the PathOS project. In this case study, we examined the very first step of the causality chain – the consultation of the OS resources – by collaborating with the three main OS platforms the make them available in France: HAL (the national OS repository for academic publications), OpenEdition (a large public publishing services specialized in open science). Over the past decade in France, the two OS platforms have received significant public funding, and researchers have been increasingly incentivized to publish their work through them. For instance, the total amount of deposits in HAL has gone from 500k in 2010 to 3.5M in 2024. So far, however, little is known about the use of these scientific resources. Who are the main private and public organizations using these platforms? When are these platforms used the most and why? Who are the websites referring to these platforms, and how are they connected? Understanding this first step in the lifecycle and causality circle of OS is crucial as it is the comparison with the non-open resource contained in HAL (where researchers are encouraged and sometimes required to deposit both OS resources as well as references to their non-OS publications).

To investigate it, we analysed the data contained in the connection logs of the three portals. We focussed, in particular, on the IP address from which the different resources were accessed. We enriched this log information with DOI-based data extracted from *OpenAlex* API – to qualify the resources (type, field, OS status, number of citations, ...) – and with IP-address-based data from [IPinfo.io](https://ipinfo.io/) and a hand-curated database – to qualify the users (geolocation, organisational affiliation, ...). This approach allows us to investigate which resources are accessed the most, when and by users affiliated (through the IP addresses) to which types of organization.

As shown in the diagram below, the idea is to check the access status of the items published in the database – not only if they are open or not, but also the OS 'colour' (diamond, gold green) – as well as of the journals/conferences/books in which they appeared, and to check for interesting associations with the consultations from the IPs of academic or industrial organisations. Of course, at this extremely aggregate level, nothing very interesting can be observed, but our protocol continued to develop an interactive visualisation platform allowing to cross-filter our datasets, by meta-data associated with the items (disciplines, topics, year of publication, language, type of publication, etc.), as well by specific types of organization (e.g., financial or tech-related).

![Diagram of Open Access](figures/social_diagram.png)

This protocol, of course, is not free of biases and limits. The platforms included in our research do not cover all publications by French scholars (though they do cover a vast majority of them). Access events recorded in the logs are an imperfect proxy of reading and use, because users can click on a page by mistake and close it right away – though care was invested in detecting and excluding bots which represents a large part of internet traffic. Finally, only large public and private organizations have stable and known IP ranges, which means that just about 20% of the IP tracked in the logs can be qualified. Still, the approach helped us and, more crucially, our partners in the three portals obtain new insights about the use of their websites and not only in an aggregated way but also drilling down to the individual publications that may have a remarkable visibility, for example being highly consulted by industrial organisations. Even more importantly maybe, our work has allowed creating an open-source middleware compatible with hundreds of OS platforms and now available to major French and foreign OS actors.

This case study, we believe, illustrates that the difficulty in defining a clear and straightforwardly quantifiable causality path does not prevent from studying the societal effects of OS with a more exploratory and case-study-based approach. Yes, such an approach will not deliver yes-or-no answers, but this is not only and arguably not the most important contribution that social research can offer. Instead, another approach to causality research consists in collaborating with public institutions to develop practical and conceptual tools that they can use to explore the always uncertain yet not unintelligible ways in which social phenomena unfold. In the case of the societal impacts of OS, this means for example, joining forces with the organizations that promote this type of science and helping them to make sense of the information that they collect in order to acquire a better understanding of the impacts generated by their work.