GraphRAG Costs Explained: What You Need to Know
GraphRAG represents an innovative approach to powering Retrieval-Augmented Generation (RAG) applications, enabling organizations to extract unprecedented value from their complex datasets. However, unlike the relatively straightforward process of embedding and vectorizing data that most organizations use, constructing a graph requires additional effort and cost. This article provides a practical example to help you estimate the costs associated with building your own graph.
For those seeking a quick overview, here is an example to help build a sense of the cost per document analyzed. You can find an approximate calculation method for estimating the costs of your own datasets later in this article.
It is worth noting that LLM costs continue to be significantly optimized, and fine-tuned models specialized for building graphs are already being developed. When assessing an emerging technology, it is important to consider what business value these new capabilities unlock, and not allow the cost at a point in time prevent you making the most of these opportunities.
Why Use GraphRAG?
Typical RAG systems excel at retrieving specific pieces of information, such as:
What wattage does this product require to operate?
What was the revenue for FY24?
How can I sign up for this service?
However, they struggle when it comes to answering questions that require a comprehensive understanding of an entire document or a set of documents:
What are the key themes of this report?
What are all the products currently being supported by my API gateway?
List all the contractors that have completed work at this manufacturing site and tell me what level of certification they have.
This limitation arises because traditional retrieval systems return only isolated chunks of information. In contrast, a knowledge graph captures the relationships between various entities, objects, and systems, thereby providing a holistic understanding of the documents in a memory-efficient manner.
In this article, we use the novel “The Wizard of Oz” as our reference text to explore the costs associated with GraphRAG. You can see the graph representation of the novel visualized below.
What are the major costs of GraphRAG?
1. Building the Graph
If you have data that contains significant metadata, it can be straightforward to build a graph- for example, social media websites are able to create graph networks by identifying the relationships between people based on who they interact with. Developing a graph for unstructured data such as a novel or other text documents is much more difficult.
There are two key elements that make up a graph:
Nodes: Key entities in the documents, such as characters, objects, and places (e.g. Dorothy, the main character, or Toto, her dog).
Edges: The relationships between these entities, such as the connection between Toto and Dorothy (friends, pet). The LLM also estimates the strength or relevance of this relationship as a numerical figure.
For both the nodes and edges, the LLM adds a summary and additional information, including references to the original text.
Building this list of nodes and edges is time-consuming, particularly for unstructured data where text must be analyzed to find these relationships. This is where the bulk of GraphRAG’s costs originate. Documents are analyzed in chunks, and nodes and edges are created through the clever use of Large Language Models (LLMs). The following prompt illustrates how few-shot prompting is used to progressively build a graph:
-Goal-
Given a text document that is potentially relevant to this activity, first identify all entities needed from the text in order to capture the information and ideas in the text.
Next, report all relationships among the identified entities.
-Steps-
1. Identify all entities. For each identified entity, extract the following information:
– entity_name: Name of the entity, capitalized
– entity_type: Suggest several labels or categories for the entity. The categories should not be specific, but should be as general as possible.
– entity_description: Comprehensive description of the entity’s attributes and activities
Format each entity as (“entity”{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>
2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
– source_entity: name of the source entity, as identified in step 1
– target_entity: name of the target entity, as identified in step 1
– relationship_description: explanation as to why you think the source entity and the target entity are related to each other
– relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
Format each relationship as (“relationship”{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)
3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.
4. When finished, output {completion_delimiter}
-Examples-
######################
Example 1:
text:
It was very dark, and the wind howled horribly around her, but Dorothy
found she was riding quite easily. After the first few whirls around,
and one other time when the house tipped badly, she felt as if she were
being rocked gently, like a baby in a cradle.
Toto did not like it. He ran about the room, now here, now there,
barking loudly; but Dorothy sat quite still on the floor and waited to
see what would happen.
Once Toto got too near the open trap door, and fell in; and at first
the little girl thought she had lost him. But soon she saw one of his
ears sticking up through the hole, for the strong pressure of the air
was keeping him up so that he could not fall. She crept to the hole,
caught Toto by the ear, and dragged him into the room again, afterward
closing
————————
output:
(“entity”{tuple_delimiter}DOROTHY{tuple_delimiter}CHARACTER, PERSON{tuple_delimiter}Dorothy is a character who experiences a dark and windy environment, feels as if being rocked gently, and actively participates in rescuing Toto)
{record_delimiter}
(“entity”{tuple_delimiter}TOTO{tuple_delimiter}CHARACTER, ANIMAL{tuple_delimiter}Toto is Dorothy’s dog who dislikes the situation, runs around barking, and accidentally falls into a trap door but is saved by Dorothy)
{record_delimiter}
(“entity”{tuple_delimiter}TRAP DOOR{tuple_delimiter}OBJECT{tuple_delimiter}The trap door is an opening through which Toto falls, but the air pressure prevents him from falling completely)
{record_delimiter}
(“relationship”{tuple_delimiter}DOROTHY{tuple_delimiter}TOTO{tuple_delimiter}Dorothy rescues Toto from the trap door, showing a caring relationship{tuple_delimiter}9)
{record_delimiter}
(“relationship”{tuple_delimiter}TOTO{tuple_delimiter}TRAP DOOR{tuple_delimiter}Toto falls into the trap door, which is a pivotal moment for his character in this scene{tuple_delimiter}7)
{record_delimiter}
(“relationship”{tuple_delimiter}DOROTHY{tuple_delimiter}TRAP DOOR{tuple_delimiter}Dorothy interacts with the trap door to rescue Toto, showing her proactive nature{tuple_delimiter}8)
{completion_delimiter}
#############################
-Real Data-
######################
text: {input_text}
######################
output:
This process is repeated for the entire body of documents, and these LLM calls are what drives the majority of the cost.
For comparison, embedding documents for use with typical RAG patterns is relatively inexpensive. Vectorizing the entire text of the Wizard of Oz cost only $0.0056 USD, which is negligible in comparison with the costs of building the graph.
2. Hosting & Inference Costs
Inference costs are not a major focus of this article, as the difference in prompt and inference tokens between GraphRAG and typical RAG applications is highly dependent on the scenario. Either one may be more expensive than the other, however the profile of token usage is relatively similar.
Hosting costs depend on the technology used. For typical RAG applications, this is the vector database. For GraphRAG, the hosting methods are still evolving, so these costs have not been considered.
Cost breakdown
Step 1: Token consumption for building the graph
In this step, the document was processed using the GraphRAG solution accelerator. The total number of tokens consumed was then measured once the process completed.
Step 2: Cost calculation for building the graph
Next, the cost of these prompt and completion tokens was calculated, for three different models. The GraphRAG paper used GPT-4-Turbo, however new, lower cost models have since been released (GPT-4o and GPT-4o mini).
Step 3: Token consumption for querying the graph
There are two ways of querying GraphRAG- a local search, which is a narrower search best suited to targeted queries, and global searches, which search across the entire graph. For clarity, these costs include the cost of the search and the cost of the LLM to actually answer the user’s question.
Both queries took around 20-24 seconds. There is a method for streaming the results, although the initial retrieval step takes ~10-15 seconds, which must be completed before streaming is possible.
Step 4: Cost calculation for querying the graph
The cost for the two types of queries can then be calculated:
Step 5: Estimating the cost of other sets of documents
A range of quick references are included, to help build an intuition of the general cost of building a graph. Use care when leveraging these insights- this is intended as a reference only and is no substitute for benchmarking the costs on a small sample of your own documents.
It is important to note:
Proper benchmarking was not conducted in this work, to compare the performance of the different models.
Costs change frequently, and these costs only represent the cost at time of writing.
Dedicated models are anticipated to further reduce costs.
Parameters and window settings can significantly affect costs (by many multiples). These results were averaged across two experiments, but a proper analysis across a variety of use cases and parameter settings would be required to form a true calculation methodology. These results are only included as a very rough starting point, to help build an intuition of GraphRAG costs, and should not be used to size business cases.
The word count is convenient for quick estimates, but for proper sizing analyses, you should convert your word counts to tokens using online tools or code.
The per word or per token cost is extremely useful, as you can use it as a rough guide when estimating the cost of your dataset.
For example:
Word count for your set of documents: e.g. 30,000 words
Model chosen: GPT-4o-mini
Cost per word: $0.0000113 (from table above)
Cost: $0.0000113 * 30,000 = $0.34 USD to process your documents.
This could similarly be estimated using the token count of the document: 38,371 tokens * 0.0000088 = $0.34USD.
Value is What Matters
While cost is a critical factor, it is essential to evaluate it against the value that the system delivers. Consider the time it takes for a person to read through thousands of pages of enterprise documents and become intimately familiar with your organization’s processes and procedures. This level of deep knowledge can now be made available to your entire organization, in seconds. Traditionally, building a graph has been an expensive endeavor, often rendering it unfeasible for many organizations. GraphRAG offers a significant cost reduction compared to manually designing and building a graph, allowing organizations to realize value from their data that was previously locked away.
Future Trajectory of GraphRAG Costs
Research is ongoing to identify ways of further reducing the cost of implementing RAG over graph data structures. One key area of research combines traditional NLP techniques with new techniques to develop graphs at a significantly reduced cost, rather than relying solely on LLMs to generate every entity and relationship. A second focus area is training more specialized, smaller language models that are fine-tuned for graph generation which will further reduce costs and improve performance. In this article, GPT-4-Turbo was used with the chunking size set to the default of 1,200. In other work, it has been noted that by reducing the chunking size to 600, GPT-4o-mini is able to achieve similar performance to GPT-4-Turbo, for a fraction of the cost.
Summary
GraphRAG offers a transformative approach to powering RAG applications, enabling organizations to unlock new value from their data. While the costs associated with building a graph are higher than traditional embedding methods, the value delivered can far outweigh these expenses. As technology advances and new models emerge, the costs of GraphRAG are expected to decrease, making it an even more viable solution for organizations looking to get the most out of their data. The GraphRAG solution accelerator makes it easy to get started, benchmark the costs of your own dataset, and start implementing this emerging technology in your own applications!
Thank you to Jonathan Larson, Tim Meyers, and Josh Bradley for their invaluable feedback and review of this article.
Microsoft Tech Community – Latest Blogs –Read More