Friday, September 29, 2023


Biotechnology News Magazine

The Promise of a Scientific Informatics Platform: By Michael Swartz, SVP Enterprise Product Strategy at Dotmatics

Discovering novel therapeutics requires extensive research and development (R&D). Scientific software is a key enabler. Science is collaborative, and the term “platform” is frequently used in discussions of scientific software on an enterprise scale. The promise of a platform is to enable global collaboration, workflow support, consistent regulatory compliance, and optimal decision making. There is an underlying assumption that “one platform” will enable these things. But how, and what is it precisely?

The term is common, but if one investigates, there is little consensus. Almost no two people name the same list of components, and even the same names are defined differently. “What’s an ELN?” “What’s a LIMS?” “If an ELN, is it a ChemELN, or a BioEL,” and so on. These are recurring themes, and they have important consequences for scientists and science. Some even question whether the concept of “platform” is feasible in itself.

This article explores what scientists mean when they say “platform,” and whether the hopes bundled therein are realistic. First, let’s discuss three key design principles and then describe core platform componentry.

Science Informatics_page-0001

Figure 1: Platform Scope – the required scope of IT platforms has increased – has IT kept up?

Concept A: Encapsulation

The diagram above shows the potential scope of a platform, but what doesn’t jump out is the intricacy of science itself, which is increasing by leaps and bounds. Black rectangles indicate scientific activities, and many of these require software with deep functionality – functionality so deep, so nuanced and evolving so fast that it is unfeasible to think of “one platform” supporting everything. These products require years to build, and in many cases require significant R&D before developing the software even begins. The spontaneous nature of innovation itself precludes the likelihood that the creativity required to create these products could stem from one company, and thus far open-source initiatives haven’t mustered the organizational coherence to marry the innovative power of the community with the organization strength of commercial organizations necessary to create platforms.

As such, a platform design principle is required that allows commercial organizations with platform building capabilities to harness the innovative power of scientific entrepreneurs. I call this design principle: “encapsulation.” It means that specialized scientific products can be plugged into platform workflows such that data flows seamlessly between the core platform components and the specialized products, but that these products are fundamentally separate. In this context, blue software package symbols represent major platform products. Arrows are data flows between both the specialized scientific software products and the major platform products. And purple software package symbols represent typically “in-house” developed components, but likely should be core platform components going forward.

Concept B: Workflow Composability

The second key design principle is workflow composability. Going back to the implied promise when scientists say “platform,” it is the promise of the right information synchronized and always available for the next step of a workflow or a decision.

To date too much effort is required to make this a reality: potentially years of effort and in many cases tens of millions of dollars of expenditure and planning are required to customize workflows and data flows between components. Unless this promise can be delivered much faster and according to quality and user experience expectations, it might never come true.

Unfortunately, too frequently this is the story of enterprise science informatics. Scientists work in fluid environments where the only constant is change. They work through trial and error. They need to try things they can touch and feel. The solution is truly “composable” end-to-end scientific workflows and data flows where composable means something down-to-earth: (1) easily defined business objects to support different business scenarios; (2) “drag-and-drop” tools for basic tasks; and (3) easy to understand visual interfaces for defining data flows between components.

With the right focus on core product capabilities, the technology exists today for a scientific informatics platform delivered globally as a service to organizations small and large with extremely high quality and performance. Think: social meets science.

Concept C: Data Composability

The drivers behind the principles of encapsulation and workflow composability must be applied to the data layer in enterprise scientific informatics as well. In other words, the final step in the configuration of an end-to-end workflow needs to be the mapping of the output of a workflow to the data layer. Otherwise – where will the data go?

The data layer must be adaptable not just to data created from within the platform but also from a myriad of sources outside the platform. If the data layer is not in a state to ingest data from a workflow or an outside source, then it must be agile enough so that within hours it can be adapted to do so. This is what data composability means. Science can’t wait. Even if data can’t be ingested, decisions still need to be made, and scientists will revert to spreadsheet mayhem. If the data layer is not this flexible, the platform will fail. This goal is only achievable if the same determined effort applied to workflow composability is applied to data composability at all levels (I.e. storage, reporting, visualization and analysis).

The Core Products and Implications of these Principles

Figure 1 suggests four key platform components: Registration, ELN-LES, LIMS and Instrument Integration (the fifth, labeled “Data,” has typically been built in-house but should be a platform component). These design principles have implications for how this schematic should be interpreted. Many functions are common across these different components. Instead of physical boundaries, it is better to think of these components as “composability zones” – more logically than physically distinct. Each zone includes a set of drag and drop tools that allow users and administrators to create web pages and templates for personal and group use configured for particular end-to-end scientific workflows.

An overall orchestration framework is a platform requirement, whereas the collection of tools for a given “composability zone” works in concert inside a given zone and also works across zones with consistent graphical and programmatic interfaces. A given client may use one or all composability zones based on their preference. If they wish to use another product for one zone, that’s fine. System integration and all its costs and drawbacks are up to them. The fundamental composability zones correspond to the illustration above.

To define each zone in detail, it is also important to introduce a temporal dimension implied in the red layer in the diagram, the Innovation Cycle. Materials are created and tested; then information is brought together, and decisions are made about what to do next. Even though it is a cycle, there is a temporal progression and the function of each zone shifts.

During initial phases, the focus is about what is created, but in later phases, as the focus changes from R to D, the focus shifts to how things are created. For each zone according to the phase of R&D, the platform must support composable tooling such that users can create templates with excellent graphical interfaces to support end-to-end workflows pertinent to each zone. This tooling needs to encapsulate the specialized scientific products required to support these workflows, and the underlying data layer needs to be agile enough so that it can ingest the end products of these workflows as well as any data required from outside sources.

Platform Component
Registration Role: Classification: R&D Phase: R Innovation Phase: Make
Assignment of unique identifier to scientific materials based on knowledge of the scientific composition of the material.


ELN Role: Narration & IP R&D Phase: R Innovation Phase: Make
Electronic laboratory notebooks are an important locus of workflow activities in research. Scientists narrate in free-form style their laboratory practices. Some scientists, particularly those like chemists with more repeatable work practices, have adopted more templated work practices and have gotten more value from ELNs. Other scientists, particularly those in the Test and Decide phases of the Innovation Cycle, have suggested ELNs should be automated “behind-the-scenes” systems for IP-compliance only.


LES Role: GXP R&D Phase: D Innovation Phase: Make/Test
Lab execution systems are an important locus of workflow activities in development where lab activities become more regimented, and enforcement of specific operations is more important. Scientists develop methods to create materials and verify these materials are within certain specifications. LES systems provide capabilities for scientists to develop these methods and then transfer them for repeatable execution.


LIMS Role: Logistics, GxP R&D Phase: Both R&D Primary Innovation Phase: Test
LIMS is an important word to define in the context of “scientific platform,” but the meaning is dependent on the phase of R&D. LIMS allows for tracking of materials for the purposes of testing, routing of materials for the purposes of testing and the tracking of scientific test results. In the Research phase, the emphasis is exclusively on laboratory logistics. In Development, beyond logistics it is important that all the right tests get performed, that all the results are within a certain range and that this be provable to regulatory authorities should they ask.


Instrument Integration Role: Data Transfer & Transformation R&D Phase: Both R&D Primary Innovation Phase: Test
Advances in instrumentation drive advances in science. There is the promise of automated transfer of data from instruments to all platform components, which have piled onto existing regulatory obligations to tie instrument output to experimental results in Development organizations. However, there seems to be a tendency to conflate instrument integration with data composability. These are very different things and solve different problems.


Data Role: Decision Support R&D Phase: Both R&D Primary Innovation Phase: Decide
R&D inevitably occurs across boundaries. This is the biggest impediment to the aim of “one platform.” Every organization is going to have at least one workflow system (and probably several). These systems are augmented by specialized scientific products, which can produce huge data volumes. Much research is produced through external collaborations. For these reasons, most organizations built their own data hubs at great cost, but to-date these have not been agile enough. As such, most researchers must resort to spreadsheets and data wrangling.  Platform providers must step up and bring t the principle of data composability to reality. Encapsulation and workflow composability by themselves are insufficient. Otherwise the factory to make the data will be there, but the data will end up on the floor. A revolution in Big Data technologies, like those that have enabled global-scale platforms like Facebook, and the implementation of more sophisticated information designs, have made data composability just as feasible as in other layers of the platform. CIOs and scientific business leaders are encouraged to work with their platform partners to extend the capabilities of their scientific platforms in this direction. In so doing, they will magnify both the powers of their platform and their investments, and bring the Lab of the Future much closer to the present.

Concluding Thoughts

An interesting tension has developed between the progression of science and the progression of technology. In many respects, the expectations of scientists regarding scientific informatics platforms have remained constant. The promise of the right information at one’s disposal to make the best decision has stayed constant. The challenge is that, while the rate of both scientific and core information technology advance has been astonishing, the assimilation of new information technology into scientific platforms has not kept pace. This needs to change.

Are we able to deliver platforms that allow scientists to continue to use the best of breed specialized products they love and need to use? Can they execute end-to-end workflows across key domains? Are they able to easily configure data management solutions to answer the questions they generate during this work? These are the promises that drive the implementation of these systems. Those people who build them, like this author, hope we can make them come true for the scientists who are counting on them.

Editor’s Note: About the Author: Michael Swartz is the SVP Enterprise Product Strategy at Dotmatics, the leader in R&D scientific software connecting science, data, and decision-making.