A Skeptic Converts: Tom Plasterer’s move from Big Pharma to AI Startup
August 15, 2024
Allison Proffitt
For more than a dozen years, Tom Plasterer has held roles at AstraZeneca both in research IT and in more science-facing positions including translational medicine. Along the way, he was beating the drum for FAIR data—data that are Findable, Accessible, Interoperable and Reusable— and knowledge graphs, playing an active role in both the FAIR community and the Pistoia Alliance.
He had heard the buzz about generative AI and the role it could have in the life sciences, but he was skeptical. He has been around the life sciences long enough to see the spate of new technology “hammers” in search of nails.
“That was the big data playbook. It was all of the big vendors saying, ‘Hey! What’s your big data strategy?’ I think that was at the source of my skepticism. I needed to see colleagues who were deep in their understanding of life sciences start to use it and start to show some value.”
There were life sciences conversations about the synergies between generative AI, large language models, and knowledge graphs. The mix of AI and knowledge graphs is not too surprising, Plasterer told Bio-IT World. “There’s a lot of overlap between the artificial intelligence community and the neural net community, which eventually grew into the LLM community. A lot of those guys were old-school knowledge graph/semantics people,” he said. Plasterer was still skeptical.
At the 2023 cross-industry Knowledge Graph Conference, speakers including Jans Aasman (Franz), Deborah McGuinness (Rensselaer Polytechnic Institute), and Helena Deus (Bristol Myers Squibb) shared their personal, scientific use cases. It was a turning point for Plasterer. “These are my peers, these are my colleagues, these are my mentors, and they’re starting to see real value in this,” he said.
“Of course, generative AI has plenty of challenges,”, Plasterer said. “You don’t have to play around with it for very long before you do run straight into the hallucination problems; so it’s hard to do really clever things like: ‘Build me an ontology with a vocabulary behind it that covers the space of how you name clinical trials phases.’ It’ll make it up,” he said. “It’s not really intended for a mapping tool.”
Plasterer realized that coupled with a knowledge graph, gen AI can tackle new challenges. A knowledge graph can correct hallucinations in the language model and track data provenance, he said. For its part, gen AI can help build knowledge graphs or first-pass ontologies.
After the conference, Plasterer went back to AstraZeneca brimming with ideas but soon ran into organizational roadblocks. “There’s a ton of excitement in pharma—and really all big companies—about the potential for gen AI and LLMs and how they can accelerate things,” Plasterer said. But pharma has many stakeholders. “They want to put some structure around it, and that makes it hard to just do things quickly, to just try things out, and that was part of what I wanted to do.”
In the meantime, John Apathy, a keynote speaker at the Knowledge Graph Conference reached out. At the event, all he shared was that he had left Bristol-Myers Squibb and was at a stealth startup. About six months later, he offered Plasterer a role at XponentL Data, Inc.
AI Start Up
XponentL Data (pronounced “Exponential Data”) was started a little over a year ago by three founders from Accenture and Knowledgent: Tom Johnstone, Matt Arellano, and Frank Rotonta, who knew the data marketplace. They hired a team of AI data engineers in Kosovo and brought on John Apathy to build the life sciences part of the business. In its earliest days, the company focused on strategy and traditional data engineering, but now the focus is data-products, semantics, knowledge graphs and Gen AI, Plasterer says. In the past 13 months, the company has grown to 105 team members and serves several industries including life sciences, healthcare, energy, retail, and others.
The life sciences data products that Plasterer envisions from XponentL Data are ones that “democratize the data out of the head of research scientists” and empower the whole scientific community to make better decisions. As an example, he suggests batch genealogy in pharmaceutical manufacturing to track finished pharma products from raw materials through final product.
He envisions not large language models, but “medium-sized language models that are much more focused and trained on the data at hand.”
With those medium-sized, focused language models and an ontology built to guide the exploration of the model, results should be “sort of pre-filtered, pre-cleaned up,” Plasterer said. Tracking data provenance will help identify and fix errors when they occur. “The knowledge graph becomes your source of truth, and it gets better and better over time as it gets filled up with this data,” he said.
He acknowledged the debate between starting with the language model versus starting with the knowledge graph. “It really doesn’t matter which place you start as long as you have that nice, virtuous feed-forward cycle so that each one gets better,” he said.
But his preferences are clear: “My take on this is to keep the language models as small as possible to solve the question at hand and then just keep building and leveraging the knowledge graph.”
After more than a decade in Big Pharma, Plasterer finds the consulting environment of XponentL Data to be invigorating. “I like having my hands in a little bit of everything, so that’s been really fun.”