

Harnessing AI in Commodity Markets: Insights from Argus Media and Zema Global
- 15 May 2025
In this podcast, experts from Argus Media and Zema Global explore the importance of data curation, the advantages and disadvantages of vertical and horizontal AI systems, and the future implications of agentic AI. They also discuss the critical importance to businesses like Argus Media and Zema Global of maintaining trust in information in a world of rapidly exploding data volumes and variability of data quality: both numeric and textual.
Listen now
Key topics covered in the podcast:
- Why quality, curated data is critical for using AI effectively
- How vertical vs. horizontal AI systems compare in real-world use
- The potential future implications of “agentic AI”
- The role of data and information providers regarding AI
Related links
Explore Argus consulting solutions
Transcript
Neil: Hello, and welcome to this podcast on the subject of AI in the energy data space. I'm Neil Fleming, and I'm the global head of editorial at Argus Media, a specialist news and price information business for the energy and commodities markets. I'm here in conversation with Stewart Wallace, director of data design and governance at Zema Global, a specialist data management and analytics business for the energy industry. And opposite Stuart is Vlasios Voudouris, head of data science at my own company, Argus Media.
Vlasios: Hi, Neil. Thank you for having me.
Stewart: Hi, Neil, nice to see both of you.
Neil: We're here to talk about AI, the influence that it has had on extracting meaning from information in the energy space and specifically in the energy trading space, and the influence of AI on the use of information of all kinds, written news and analysis, price data, large fundamentals datasets. So before we get stuck in, it would be helpful, I think, if we could define what we're talking about. The world has become so enthused by AI, by ChatGPT, vision systems, medical tools, image creation, world beating game players, that it's easy to forget that these are not all one thing. So, Stewart, perhaps you could start us off by explaining what we're dealing with out there, what is the difference, for example, between machine learning, what we might call traditional AI, and the algorithms which run so-called large language models, ChatGPT, Claude, Gemini, and similar tools?
Stewart: Absolutely. I think it's worthwhile remembering that both of these are, you know, targeted. They're based and rooted in statistical probabilities and prediction. And neither will give you an absolute answer. I think it's important to remember that when you think about the use cases for using either of these tools and, you know, traditional AI machine learning has been around for many, many years, the LLMs really kind of rocketed into the public consciousness probably three, four years ago when CchatGPT started to come prevalent, I certainly started to notice that across the business world and across my clients where there was suddenly an explosion of interest from executives who could finally get a grasp on what this might mean for their organizations.
But fundamentally, when I've been going and implementing both of these, I often refer back to the more traditional AI and machine learning models, primarily because they are narrower, they're solving specific tasks, specific problems, they can be curated and managed in slightly more easy ways. The auditability and risk management of those models could be managed slightly better because they are naturally smaller, they're task-specific. They are having the involvement of the engineering teams that are defining the parameters to include, they are defining the models that are being used. You can tune the outputs a little bit more clearly.
But that does mean, of course, that you need to do a lot more curation and you need to manage the data and the inputs much more closely, much more manually. If you compare that to the LLMs, you know, kind of general purpose, flexible LLMs in particular, you know, ChatGPT I mentioned, you know, Claude, Gemini, Lama, etc., these are wide open models. They're taking all this vast unstructured data. It's often trained on data from the internet or from a curated, you know, repository of articles, information, and they're learning the kind of representations and patterns. It's a little bit more like a black box as to what is going on within the model itself because of the huge scale. It's effectively like, you know, a traditional AI model of a neural network where you're looking at patterns of behavior and you're looking at the influence of different inputs. If you put that on steroids, effectively, the transformer architectures that are in those LLMs is how they operate. It's that kind of deep learning model and enhancement of what was those neural networks.
So when you then start to look at how those LLMs can use, do you have those generic approaches or do you even take that to the next level and create things like your own knowledge graphs? So behind an LLM is the triptych is, you know, the relationship between three different points of information is how it works those out. You can create your own knowledge graphs and your own ways of using those LLMs. And that's where I've certainly started to find a lot more benefit when I apply them into specific tasks.
Neil: Thanks, very clear. Maybe, Vlasios, you could take this story a little further. We have come a long way, as Stewart says, even though the LLM has secretly probably been around for many, many years, more than anybody really noticed. It was the basis of all the translation software that sprung up 20 years ago, really. But today in 2025, there's a lot of talk about things like agentic AI, about horizontal systems versus vertical ones. What are these all about? Where are we going with AI in general?
Vlasios: I think it might be best to start with the first part, which is agentic AI, which I believe that is where the world is currently moving in. An effectively agentic gen AI system is an autonomous, goal-directed AI agent capable effectively of planning, decision-making, and orchestration of other agents and tools. And these tools, what they really try to do is they have a high-level goal. They break that high-level goal into subtasks, and then they coordinate other tools and agents to basically deliver an outcome to this subtask. An example of that is like a trading assistant that can autonomously identify arbitrarous opportunities by gathering and synthesizing both numerical and text data, assessing market conditions, and ranking potential trading style strategies. It effectively can do all that with a minimum human intervention, and of course, it will be able to adapt based on new information.
So I think that idea of the agentic is what get us excited since the beginning of 2025. But then we hear a lot about the vertical gen AI agents, which are effectively AI systems optimized for a specific industry and a specific workflow. And that actually fits to what Stewart has said. And effectively the way we do that is we incorporate domain-specific data, we also incorporate domain-specific knowledge and processes so that we can deliver highly relevant information to the users within that vertical. Think about it as a specialized surgical tool.
An example is crude oil trading arbitrage agent. It's a vertical really gen AI agent. It specializes in the energy trading domain using specific data, crude prices, refined product prices, freight to identify trade arbitrous opportunities and recommend trading strategies. More precisely what I just described, is a vertical agent with agentic capabilities since it combines domain expertise with autonomous planning and decision-making. Think about it as a surgical tool. On the other hand, the horizontal agents there are actually general purpose AI systems that has a broad knowledge across multiple domains and supports a wide range of capabilities such as writing emails, coding, and summarizing documents. These agents are not specialized for a specific task or industry, but are designed to be versatile and applicable across many cases.
And just to finish off, I think that the reason why we see these changes is because the horizontal models have become quite capable both in terms of planning and reasoning, and there is also an increased demand from businesses for practical, real-world solutions. So it appears to me that we are on an exponential growth curve, which has been accelerated by the search in investment in the generative AI space.
Neil: Thank you. Thank you both. So let's dig a little bit deeper now. Both our businesses, Argus Media and Zema Global, share a common goal. We provide information to specialist customers, and we organize it in order to help people in energy and commodities markets to take decisions. That's basically how we earn our money. So we build tools, we build databases and forward curves, and forecasting models, and we collect information and organize it with these tools, and we select what to present, we transmit it as data or using the written word as pieces of analysis or as news stories. And we've always assumed that we knew what to present. That's the value proposition of what we do and why we employ smart people to think about these things.
But with the advent of AI, the possibilities change because AI systems, seems to me, make it possible to select from a vast range of information, and not only that, to combine and synthesize it and transform it and extract new patterns or so we think. But are there hidden problems? And that's really what I'm interested in exploring. Are there challenges to the assumption that AI can only make things better? We see a lot of customers right now, for example, who are interested in pooling data from multiple sources and somehow expecting magically that pooling the data will result in a better answer than addressing information from a single vertical source. How valid is that assumption, do you think, at this point? And is it going to change as time goes by? Let's start maybe with you again, Stewart, on this issue.
Stewart: I think it's a really big challenge actually for the industry, you know, particularly with the explosion of data that is out there. There's always a temptation that more data means better answers. Certainly in my experience, I've found that often not to be the case. I often, when I speak to my teams, I liken it to mixing of paint. You need to know what it is that you're wanting to achieve. If you mix all your colors together, you inevitably get brown. And no matter what proportion of colors that you mix together, you will always end up with brown if you mix too many of them together. If you want to make green though, you need to mix blue and yellow. So you need to know what your end in mind is, what is your outcome that you're trying to achieve?
And so for that, selecting the right data, curating where you get that data from, the quality of data is incredibly important, and then selecting the data points that are relevant. And that's where I tend to find that a lot of the horizontal LLMs, that Vlasios was explaining earlier, sometimes fall down. They are great for getting an understanding on a more general basis. When you start to apply them in more specific cases and where you want repeatable, standardized, predictable outcomes, it starts to become a little bit more difficult. You run into problems like hallucination, you run into problems like the understandability of those systems. And so when it comes to selecting the data, you first have to understand, what is the problem you're trying to solve, what is the answer you're trying to gain? And making sure that you then select from the data or direct your models to be selecting from the data that you can trust that is reliable, that is curated, that you know is complete, integral, and accurate. The more that you can improve that quality of the underlying data that feeds into the models, ultimately the better the answer you will get.
And if I think back to an example with some of my clients, when I was in my consulting guys several years ago, we were building some risk modeling systems. And the temptation was to throw in a vast, vast array of data points. We had something like 150 different data points we're feeding into our risk model. When we actually started to break it down is that we actually curated this down to about 17 different key indicators that would actually give us a meaningful measure of risk within the system. And the rest were just creating noise. You were actually averaging out exceptions and you were averaging out the opportunity for us to make interventions because those risk signals were just being, you know, average, they were becoming brown, and therefore, you didn't distinguish between the different colors. Does that reflect your experience, Vlasios?
Vlasios: Actually, yes. I think we have the same experience here at Argus. I think that information policy is something that we also see a lot from our clients and our perspective is basically that if someone is interested to get a response from these models that seek fine details, then effectively sending to these models more relevant, smaller chunks of data is likely to give you better results. And it's also very important, not just the data to be in smaller chunks, but it's also important to have coherence, right? So if you're pulling together data, so let's make sure that the data describe an event in a similar type of a manner, otherwise, it's going to be the average between two islands and effectively you're going to be in the sea.
I mean, the only thing I will say, pulling may add some value is when there is gaps. Let's say that you don't get all the data from one provider, you might want to fill the gaps from another provider. So I can see that there is some value. It has to be done with care. So it can make sure everybody measures the same thing using the same definitions. But if you do that properly and there is some sort of a standardization, then I can see the value in that scenario.
The second area is something that I'm excited about is when there is an uncertainty about the things we try to quantify, it's going to be some sort of a value if you are pulling two different datasets together, and the outcome is not expected to be a specific number, but might expected to be arranged. So I can see some very specific scenarios where pooling could be useful, but I feel that most of the value will come by effectively analyzing coherent sets of information one at a time. And that's is this world of vertical gen AI agents that I talked about earlier.
Neil: Certainly, I think we've kind of explored this area of data curation and stressed its importance. A question on my mind, to which I genuinely don't know the answer, is there a world we could envisage in which AI would do its own data curation, in other words that we leave an agentic AI to supervise itself and ensure that the data that is in the model is consistent in some way, or is this still a task for human minds at this point?
Stewart: I think in my view, there are still in many cases value of the human in the loop. But if you look at some of the methodology behind data quality and data quality management and looking at data lineage, that's where I'm actually seeing the ability to create networks, so flows of information and being able to create the connections That's actually how LLMs operate, so it's very useful from a data quality perspective, and particularly if you have a complex chain of systems, it's very good of being able to inquire across that data chain. I've also found, you know, with agentic AI, and maybe I'm a little bit more skeptical than you are, Vlasios, but I see this as effectively just traditional RPA, you know, robotic process automation. And all you're doing as part of those decision points in the RPA process flow is that you're creating an ML or LLM-type calculation model at a point instead.
So when you're doing your traditional monitoring or you're looking for data quality, layering up these different systems will actually allow you to curate the data in a meaningful way. So building in trust into systems, if you have one system, then if that fails, you have no way of necessarily detecting it. But if you have several layers that are all watching what has happened on the previous layer, or is tackling the problem in smaller bite-sized chunks, you can actually start to monitor what is happening through the journey and the life cycle of that data. I've done that in the past where actually using AI to look at data, particularly when you've got unstructured data, text data, and you're looking for things like duplicates, actually using some of these models is a very good way to get a confidence threshold at where you have duplicates and you can start to join multiple layers of data together.
But it doesn't necessarily mean that you want to take an action. You define thresholds, confidence, intervals at which you will take an automated action, one where you'll take no action, and one that is in that kind of uncertain bit in the middle. And that's where I often find the kind of human judgment or further information is required. And so building that into financial systems, into our financial decision-making, any of our modeling of, you know, how our clients at Zema Global and at Argus are using our data, they are going to be the ones who are taking those decisions on those uncertain areas. And they can, again, leave the models to do the stuff that is obvious. It's a yes or no answer on either side. In the middle that's where that human intelligence still has a huge amount of value, I find, and the interpretation of how you use that information is still key.
Neil: Vlasios, would you get on a plane flown by an AI?
Vlasios: Absolutely not at this point in time, but I might. Let me say why I say that, right? So I think that AI can enhance a lot of these processes. Clearly, I think a lot of the manual things we do right now on data curation can be actually replaced by AI-enabled process. But however, you do still need human oversight, especially when the context is important and when complex decisions needs to be made. But one of the things that, in my view, will become super important in this new world is metadata. I believe that metadata is going to become the new currency. And in the past, we used to define this metadata for human consumption. I think the time has arrived now the metadata has to be written for consumption by other agents.
And I think that's what Stewart was talking about, is that you can have a process that requires many agents and another agent can actually monitor the outcome of the previous agent. So metadata is going to become critical here. And my personal opinion is that this metadata needs to be written by an AI tool. It's an AI tool writing the metadata for other AI tools, things like chunking of data will become very important. It's the exact opposite what we discuss about pulling of data. Like, if the specific problem you try to solve requires fine details, then you need to find automated way as part of your data curation to chunk the data in a way that will maximize the likelihood of providing that precise response. So I would like to think that yes, I can trust AI 100%, but I don't think we are there yet.
Neil: Fair enough. I mean, you and I have discovered in the course of our AI journey that LLMs, in particular, have very poor understanding of the concept of time and what understanding there is is possibly, kind of, falsified understanding. Do you think we're moving to a world where those kind of holes in the capabilities of the AI are going to get filled? Will we ever get to a point where an AI can decide that something sounds more true than something else?
Vlasios: I think that the foundational models have become so powerful that right now concepts like time, we can incorporate that. But effectively what will require there is not to send to the system the actual question the user is asking, but actually process that question using an LLM model, try to understand what actually the user means, including conscious about time, about geography, about the commodity sector, and then sent to the LLM effectively, a question that has been refined. And that's some of the work that we are doing. So there is a huge improvement in these models every six months or so. So I think that some of the current problems will actually be resolved. Stewart, I don't know if you have a different opinion about that.
Stewart: I mean, I would generally agree with that. I mean, I think that some of the challenges with LLMs and, you know, you're kind of going down this route of talking about the LLMs and you think about the likes of the ChatGPTs, the Claudes, the Geminis of this world, which are those horizontal LLMs. And when you have the slightly more curated knowledge graphs, you have your own curated data and you have it applied in a more specific use case, I actually find that the holes disappear a lot more because, to Vlasios' point, you can curate the questions, the inputs, the ways in which the questions are made to the models in a slightly better way, and therefore, the holes are not quite so apparent. And if you have those secondary systems that are there to build trust and curate the information and to look for outlier responses or look for flags that can be then referred to human operators or can have other interventions made, then you can actually start to build a lot of trust into those automated systems.
So I think the question of the holes is less so, but yes, I will also agree that an LLM model, a system doesn't have the same concept of time as maybe a human does, you know? On this earth, we are very aware of our own mortality, so there's always this idea of time and there's never enough of it. But we also are aware of the passing of time in a slightly different way as to what computers and machines are. To them it is just another data point. To us, it's something slightly more real concept. So immediacy can sometimes be more important to us.
Neil: Just to wrap things up here, because I'm conscious of time, and we could probably go on for several hours exploring this world, these models are very, very dependent on being fed constantly with new data. And I'd just like, you know, in conclusion, to explore what we think happens when the volume of data being fed to either a machine learning system or an LLM changes. And we see this happen in energy markets all the time, where there will suddenly be a collapse in the number of data points, or worse, the number of data points are being recycled, which is also a threat in the world of AI that you wind up with nothing new and it's just feeding on itself and making more and more brown, to use Stewart's metaphor. So how does it work and what do we do about sudden changes in volume of data, either over time, speaking of time, or from one market to another? How do we address that?
Vlasios: I think one of the things we need to do better is to adapt more of the model risk management practice. It's something we have been doing on the machine learning framework, but I think that the gen AI framework, there is more work to be done there. We need to be very careful, Neil, of what you mentioned, like concept drifts, model drift problems. We know what this means. We know how to handle them. We just need to adapt our understanding of what this means in the gen AI space. There are some new concepts like an LLM judge.
So depending on the vertical that you're trying to adjust, make sure you create some custom criteria to be able to address the problems in the change of the volume of the data, or as you move from one market out from another, from one vertical to another. And one of the things that I have seen in practice, and I don't know whether Stewart agrees with that, is create a really good gallery of examples. So if you have a good gallery of exams during the periods where you actually have good volumes of data, when the quality of your data kind of falls, these exams can actually help, for a short period of time at least, to maintain a level of quality. So that's the two, if you like, basic building blocks that comes to my mind.
Stewart: And I largely agree with that, Vlasios. And, you know, thinking back to your original question, you know, the scenario where you get a collapse in the volume of data very much depends on what is the nature of the model you're trying to do. If it is trying to assess whether or not a data point falls within a set of pre-calculated parameters, relatively easy to do that with small amounts of data. If you're starting to do things like regression analysis and starting to look at patterns of behaviors, that's where you'll get these, kind of, very lumpy outcomes and you will actually start to find that your models will maybe not achieve the same things that you want to.
LLMs are probably more resilient in that sense because it will just continue to feed what is the expected response based on the past behavior and that extensive library of knowledge, learned responses effectively, a bit like, you know, how you've taught a child. Keep putting it in those scenarios, it will keep doing that until you teach it something else. But I do find, back to that point of the judges, the watches, if you want to build trusted AI, then you need to have systems in place to curate, to monitor the data that feeds into them, and then also how they continue to operate and continue to have controls and checks in place to say, "Where are we getting that model drift, where has our population of data shifted so that we now need to adjust our model?"
Where has the volume of data may be fallen below a parameter where we say, "Actually we need to reevaluate how this model operates?" Where are the outcomes becoming increasingly variable so that we maybe need to reevaluate the amount of data or the nature of the data that we've put in or interpolation that we need to put in place? There are lots of these, what I would call, what could go wrongs within these systems. And part of our role as data professionals is to understand what they are and to put in place the mechanisms to curate those systems, to develop quality, to develop trust, and to allow our end users, our consumers to use these systems, to use that data with trust to make better decisions.
Neil: And trust is really what it's all about. Gentlemen, thank you both very much for this introductory chat on the infinitely exciting world of AI in the space of commodities and energy markets. For more information on this and subjects relating to data management and data curation and the delivery of data, head on over to our websites, either to Zema Global's website or to the Argus Media website. We will continue to explore this and we will explore further on a longer webinar later in the course of this summer. But for now, Vlasios Voudouris and Stewart Wallace, thank you both very much.
Stewart: Thank you very much, Neil. Thank you very much, Vlasios.
Vlasios: Thank you very much, Neil and Stewart, that was really interesting.
Related Products
Spotlight content
