(2021-08-23) Chan Sustainable Authorship Models For A Discourse-based Scholarly Communication Infrastructure

Joel Chan: Sustainable Authorship Models for a Discourse-Based Scholarly Communication Infrastructure · Series 1.2: Business of Knowing. Today's scholarly communication infrastructure is not designed to support scholarly synthesis. When gathering sources for a literature review, researchers need to answer questions about theories, lines of evidence, and claims, and how they inform, support, or oppose each other. This information cannot be found simply in the titles of research papers, in groupings of papers by area, or even in citation or authorship networks.

II. Discourse graphs: the promise and the authorship bottleneck

For decades, researchers across a range of disciplines have been developing a vision of an alternative infrastructure centered on a more appropriate core information model: knowledge claims, linked to supporting evidence and their context through a network or graph model. For conciseness here, I call this model a "discourse graph", since the graph encodes discourse relations between statements, rather than ontological relationships between entities.

Much crucial conceptual and technical progress has been made

However, adoption, particularly in terms of authorship, remains a hard open problem

As one data point, contributions to servers for the nanopublications standard for discourse graphs are almost all within bioinformatics and contributed by tens of authors.

First, contributing to shared discourse graphs is currently disconnected from the intrinsic practices of scholarship

This disconnect creates significant opportunity costs for authorship

Consider that by some estimates, full-time faculty self-report reading about 200 articles per year; there were an estimated 700k full-time faculty in 2018. So we can estimate time spent reading ~100M articles per year as a lower bound on untapped resources

Further, the intended audience/beneficiaries of this authoring work are most often some unknown others.

III. Sustainable authorship of discourse graphs by integrating into scholarly practices

Based on this analysis, I believe a promising but underexplored solution path for this authorship bottleneck is to build tools that integrate authoring contributions to discourse graphs into the intrinsic tasks of effective scholarship practices.

Here I describe one example point of integration: reading and sensemaking for literature reviews.

A user story

Consider Curie, a researcher, who is studying the role of analogies in cross-boundary innovation

a mix informal and formal observations and structure, including general notes about related ideas, key details about methods, and the core results of the paper.

But there is one crucial difference: while writing notes for a paper, Curie has marked out a key piece of evidence (EVD) from the paper that might inform her synthesis for her focal question about how domain distance modulates the effects of analogies on creative output. (EVD - Far analogies were rare and never used in psychology lab group meetings for reasoning (vs mere mentions); far analogies were rare in colloquia as well, but were frequently used for reasoning)

This marking creates a new document (or page) in the software with that evidence note as a title, and allows Curie to reference that specific piece of evidence elsewhere in her notebook

Let's take a closer look at an outline Curie is drafting for her literature review.

green CLM and pink EVD notes (CLM - Analogical distance of inspirations for an idea are positively related to the idea's creativity)

Curie can reference specific results (evidence notes) while making sense of the case for and against a focal claim.

This enables her to access contextual details for comparing/contrasting claims and evidence a hover or click away without breaking the flow of writing

Finally, consider what happens when Lovelace, a new student, joins the project. To onboard her, Curie runs a graph query to collect claim and evidence notes that inform the focal question, and exports and emails them to Lovelace.

The graph query works because the notebook Curie is using has an underlying extension that recognizes the argument structure that she is using in the outline, through a mixture of indentation patterns and keywords. Here, for instance, Curie can query for opposing evidence for a claim because the system has formalized an "Opposed By" relation between the CLM and the EVD by recognizing a pattern of writing in her outline.

this reminds me of old web discussion forum software that had response-types which generated associated icons in tree-summary... was that HyperNews?

Over the next few weeks, Lovelace spends her time modifying, elaborating, and integrating these notes into her own notebook

writes up some notes on new evidence from recently published work that Curie hasn't yet read.

resulting updates to the synthesis outline sparks a novel hypothesis that the project team decides to test for their next set of experiments. I'd like more details on this leap - was it a mental leap assisted by having so much structured-argument accessible?

Some observations

it also demonstrates the technical feasibility of this vision! These screenshots are not mockups: they are snapshots of my own notes, which I have written for my own work (for a literature review), and actually shared with students and collaborators. (in Roam)

This notebook is also but one of a Cambrian explosion of similarly extensible hypertext-enabled digital notebooks that can technically accomplish this same basic shape of a workflow. These tools are quickly growing in their userbase, significantly extending beyond older more niche/homegrown tools that have similar basic capacities, and also spawning new sets of technical and cultural practices for easily structuring and sharing notes.

IV. Conclusion

I want to broaden the lens of scholars to include nonprofit research institutions compiling nonpartisan literature reviews to inform policymaking, and highly motivated communities of patients and their families who are seeking to understand and contribute to research on diseases that personally affect them.

Can this bottom-up, decentralized, peer-to-peer infrastructure help advance original visions around a single universal shared discourse graph? I believe the answer is not directly, but this may actually be a feature rather than a bug. Distributed knowledge graphs are notoriously hard to achieve consensus on.

local contextualization, ambiguity and contestation may be crucial for scholarly progress.

Therefore, I am excited about institutional structures that can steward local federations of discourse graphs (e.g., at the level of labs, centers, or institutions), enabled by technical mechanisms for dynamic interoperability, such as Project Cambria. If institutions and local collaborations institute methods of consensus, error-checking, and editing for integrating (as an analog to, say, pull requests to open-source projects), there could also be a natural check and balance that is appropriately scaled for bad actors peddling misinformation. (cf FedWiki)

...we can direct existing technical and institutional structures — repositories, collections, and search databases — or emergent distributed infrastructures —such as distributed knowledge graphs — to curate and index subsets of them for sharing beyond lab groups.

Edited: 2022-05-21 20:21:17.520159 | Tweet this! | Search Twitter for discussion

Bill Seitz