How our AI research capability works

Frontiers of AI

One of the frontiers in AI is machine language understanding, which aims to give machines the power to comprehend and interpret human language. Unlike basic text processing, which is limited to recognising keywords, machine language understanding aims to grasp the meaning behind sentences, paragraphs, and larger bodies of text, taking into account context, grammar, and semantics. In contrast to “generative” Large Language Models (LLMs), which are trained to predict the next word in a sentence and produce fluent, creative responses that may not be grounded in real-world facts, machine language understanding focuses on accurate interpretation, leading to fact-based, verifiable insights. Following several years of R&D at Glass.AI, we have invented AI technology that can read and interpret text at scale. We decided to apply this innovation to reading the web, the biggest research resource that has ever existed.

 

Multiple data sources
90% of the world's data is unstructured and can be found on various platforms. Our AI makes sense of vast quantities of written language, that is textual data – whether from company websites, news, social media, industry reports, government or other sources. Web data is unstructured, fast-moving and hard to query at scale. Our intelligent crawler tracks hundreds of thousands of topics, signals and other indicators of interest across billions of web pages, watching over more than 40 million organisations with active websites globally, in different countries and languages. We have built a new research capability that reads the web at scale and can understand and monitor the activities of millions of companies.

Below is the typical process we follow to build bespoke data on companies, sectors and markets for our clients:

 
 
 

Step 1:
Data Exploration

  • Define search requirements

  • AI reads business websites, social

  • Discovery of other sources

  • Initial sample for client review

Step 2:
Intelligent Crawling

  • Refine search based on feedback

  • Deeper read of websites, social, news

  • Other sources to augment results

  • Match to official sources, registers

Step 3:
Analysis & QA

  • Automatic QA process

  • Detect and remove anomalies

Step 4:
Data Delivery

  • Format delivery method agreed

  • Packaging of data

  • Single or regular delivery of results

  • Summary stats, visualisations

 

Making sense of web content at scale
Glass.AI has an ongoing discovery process that reads millions of websites globally and classifies a site as a company website if it detects certain criteria around content (e.g. active, in English or other languages) and if possible, will predict the sector and geography of the business if enough content is available on the web. Currently, our system reads 40 million websites of businesses globally - including companies, partnerships, sole traders, academic institutions, government and non-profits. We estimate this is 80% of all businesses globally that have useful web content. The crawler also reads sources like news, social media, and academic and sector-specific websites. Where possible we match the web data about businesses with data from official business registers.

“Our proprietary AI technology uses various approaches in NLP, machine learning and computational linguistics. It combines language understanding through semantic analysis with resource crawling at scale and the maintenance of a deep topic ontology”.

 
 
 

Semantic analysis
Glass.AI detects entities and classifies content from text (e.g. companies, people, products, news) with state-of-the-art precision. So when our AI goes to a website or reads a web source, it makes its own decisions on what is being talked about.

 

Resource crawling 
Glass.AI is an intelligent crawler, with smart filtering and crawling that follows links that are likely to discover the data that is most relevant to the results, simulating how a human would efficiently scan a website. We extract large-scale datasets efficiently.

 

Topics ontology
Glass.AI builds language models and large topic maps to help understand the web content and open the data for further investigation. Our ontology adapts to emerging themes and trends to ensure comprehensive coverage of the constantly evolving digital landscape.

 

Entity onboarding
Everything in Glass.AI is fully automated. For example, when detecting businesses, Glass.AI automatically recognises the type of site, the name of the business, its sector, and then it deeps reads the site to understand the activities of the firm.

 

You can find out more about how Glass.AI differs from other approaches to understanding language in this blog post. You can also learn more about our approach to automating web research - and sector research in particular, compared with some other methods (e.g. company databases, manual research). Our AI capability is transforming company and sector research for Governments, Consultancies and B2B Corporates.