A Case Study in “Agentic” AI - Automating Sector Research For Governments and Consultancies.

The co-founders of glass.ai have been deploying Artificial Intelligence solutions into challenging enterprise environments for over a quarter of a century. A key learning over all this time is that what matters are the results; consistent, robust, reliable results. The AI might get you into the conversation but — as you would expect — to be successful where it really matters it’s down to the results. This has been decisive in the relative lack of traction with generative AI solutions in enterprise settings despite the interest that it has generated.

When we first turned our hand to web-scale machine reading/language understanding it was in the early days of breakthroughs in statistical/machine learning approaches having some real success in the field. However, what we felt at the time was, that while these approaches were improving all the time, they continued to make unacceptable errors — a situation which continues with hallucinations in generative AI — and were not consistent across the diverse and messy data found online. In this regard, it turned out that Sector and Company Research was the perfect testing ground for our methods.

Company databases and AI

Whether understanding the economic impact of emerging sectors of the economy, evaluating the success of policy changes targeted at particular sectors, or trying to learn and apply best practices from one sector to another, Governments need to have a granular understanding of the various sectors of the economy. As we’ve previously suggested, for most purposes standard industrial classifications are not sufficient. Beyond the public sector, sales and marketing teams within corporate entities also need to understand the markets that they operate in and the sectors that they serve. Although, unlike the company and market sector analysis that can be used to drive sales and marketing teams, when building sector research datasets to be used in government studies that may drive or inform policy changes or future investment cases, the accuracy of the results is vital. Normal levels of precision that you might expect from an AI solution do not apply. And, not only precision but results that can be supported by provenance.

In general, this has only previously been possible with human-curated datasets. This might be through web research or more likely in combination with one or more high-level company databases. This introduces an immediate problem, as any static database is quickly out of date, especially in the emerging sectors which are often the target of these sector studies. The following diagram shows a quick random snapshot of the results from a keyword search against a popular company database, but it follows the same pattern as other company databases we have tested.

You can see that only 20% of the returned results meet the criteria of being cybersecurity businesses, a number fail because of out-of-date data, others may depend on exactly how you define the cybersecurity sector, maybe you are interested in general IT consultants or trainers that do a little bit of cybersecurity, which in itself raises another question when utilising company databases.

We’ve noticed across a broad range of sector research exercises, that there is no standard set of requirements for a sector. Even the same sector is likely to be viewed through a slightly different lens by different organisations. Often different departments in government are interested in particular nuances of a sector or for corporate engagements the business takes its own capabilities into consideration to identify the specific set of businesses they want to engage with. Any sort of fixed classification over a company database limits the accuracy of the results that can be obtained.

If you have a candidate set of organisations for a sector within a company database then one popular AI tool for finding results similar to existing ones through a query is to use semantic embeddings. The following charts show a set of potential cybersecurity companies presented in a 2-dimensional embedding space and labelled according to each cluster.

On the whole, the clusters seem to make sense so it appears like a good method for defining the sector. However, looking across a set ranked by similarity across the data shows that good records are spread over a wide range of results, further at the top of the rank there is quite a mix of good and bad results. In particular, if you look at the following results that are all in the top end and “near” each other, then it explains the mixed results that you can receive. This is not a sufficient solution for company and sector research.

More recently with the advent of generative AI, we are starting to see LLMs used for company classification. Using RAG (Retrieval-Augmented Generation) style prompts, systems ask the LLM which sector a business is in based on its description or whether it is in a specific sector or not.

The first thing to notice on the left-hand chart is that across 5 different runs with the same prompts and data — other than relatively low performance — is that there is a variation across the results. This is to be expected as LLMs are stochastic so can give different results across multiple runs even with the same inputs. But, as shown on the right-hand chart, we can use this variation to boost the performance by over 20% by only taking those results in agreement across the 5 runs. Maybe the others are the tricky ‘edge’ cases. The downside of this approach — in addition to the extra compute costs — is that although the majority of the good results are captured, we are dropping around 25% of the good results so we are introducing a coverage problem. These are a few examples of some that would be lost:

Automating sector research from web sources

The alternative to using company databases for building sector datasets is manual research through the web and other sources. This is a very time-consuming task, even when you are just trying to augment or check the results gathered from existing databases. Moreover, the results will remain ‘high-level’ as it’s very difficult to capture deeper information about the activities and products of companies with a manual approach. As mentioned at the start of this article, our initial approaches were driven by having scalable and very accurate solutions to reading the web. Scalable from the perspectives of being able to run efficiently on fewer resources and in regards to being robust across solution spaces. This has proven decisive when running a wide variety of sector research studies.

The importance of accuracy cannot be understated. When you are dealing with data at a web-scale, the impact of errors soon adds up as there is a cumulative effect on the target accuracy. As demonstrated by the following chart, typical good accuracy for AI solutions might hover between 70–80% on a given problem, but when combined across just 10 data points the overall chance of an accurate result drops to under 20%. Even at our own original baseline target accuracy of above 95%, the chance of an accurate result drops to 60%. How do you go about solving this problem?

The first step to being able to automate the web sector research process is to be able to very efficiently crawl and accurately extract potentially relevant information for the research exercise. Ideally, the machinery is browsing the internet as a set of researchers would be if they were carrying out the task manually — not just crawling and reading everything like a general search engine crawler would do. Using focused crawling techniques, only following links that are likely to lead to results, and identifying and retaining information that is likely to contribute to the research, not only are we able to make sure all the information used for the research is going to be up to date, but it can be captured with relatively modest resources.

The second phase of the web research process is to build and apply models that can be used to accurately identify the right results that form the complete picture for the target sector of the research. What we have demonstrated with the previous examples of using semantic embeddings or RAG-based classification procedures is that it is unlikely that a single approach can provide an adequate solution on its own — one that is both accurate and comprehensive. We apply a wide range of techniques using an ensemble of models to give state-of-the-art results.

These include supervised and unsupervised models, including models that have specifically targeted the sector research domain, ensuring that all models can verify results against evidence online. This includes where generative AI approaches are applied. There is no hallucination. Furthermore, not only are we applying a wide range of techniques, but we are also looking at a broad set of input data about each company — beyond a single description — to make each decision. Cross validating across models and sources ensures that not only is the accuracy high but that by looking at things from different angles the coverage is high as well.

The final phase of the automation of sector research from web sources involves enriching the results to collate the final set of fields and indicators needed for this particular project. Some of these may be straightforward attributes, such as headquarters location or a link to their official government record, others may require further analysis themselves, such as, does the company appear to be growing or which countries is it trading with. Regardless, each attribute needs to be backed with strong provenance together with collating the provenance and evidence associated with earlier decisions. This can lead to final checks and balances to verify the dataset to be delivered.

While following a similar flow, different projects can introduce subtle differences in the types of data to be collected and the sorts of models to apply as a best-fit solution. With this in mind, a flexible framework is needed to build robust pipelines of diverse agents that can be orchestrated on a project-by-project basis.

Certain paths through these pipelines can be productised for classes of problems, but our experience across the variety of requirements we’ve tackled for sector research is that even small differences can lead to the need to apply different combinations of approaches to get a high-quality set of results. One area that is key to this, which we haven’t covered yet, is getting human insights and inputs during the process to ensure that the solution performs, and is performing, at the expected levels. This is highlighted in the above schematic with the dashed highlights on getting inputs during the configuration of the initial search parameterisation of the exercise and as the modelling proceeds to get external validation of the results that are being generated. With human input, this framework can lead to very high levels of accuracy suitable for the demands of sector research.

Using this approach, with the large-scale live capture of data, applied across a variety of different modelling methods, we can build pipelines that can create very high-quality results at a web scale. What the above chart illustrates is 2 pipelines of ensembles where majority voting across models with an average accuracy of 90–95% quality lead to ensemble models in the 99%+ range, then when tested with our earlier scenario looking at the cumulative effect of errors — at the top end we are still able to maintain accuracy at almost 99%.

What does this tell us about agentic AI solutions?

This article has outlined an approach to delivering sector research by orchestrating a broad set of AI agents that provide individual capabilities that can be utilised during the research process. Each of these capabilities isn’t tied to a specific methodology, so the deployed pipeline benefits from a variety of different techniques to help drive up the accuracy and coverage of the final result. The weakness of one model can be offset by the strength of another, and vice versa. The current increase in interest in AI agents — particularly in the enterprise and governments — has been driven by the capabilities of generative AI. However, despite the noise, most AI solutions that are successfully deployed today are not generative AI ones. While a useful tool, it is just one of many capabilities that can be deployed to build compound AI solutions. Each plays to their strengths, capturing current data, cross-checking results and verifying outputs against evidence to provide the provenance for each decision. As we have seen with the complex problem of building sector research from web sources, taking a broad inclusive approach can lead to exceptional results.

Previous
Previous

The Service Economy: Understanding Trade Using AI and Web Data.

Next
Next

Open Data Distillation. Building a Deeper View of Impact for Policymakers and Evaluators.