GPTCache: Supercharge Generative AI

Are you waiting for the future to happen or ready to make it happen?
Don’t miss your chance to join Gen AI Launch Pad 2025 and shape what’s next.
Introduction
With the increasing use of Generative AI models like GPT-4, developers and businesses face challenges related to latency, cost, and efficiency. GPTCache is a powerful caching library designed to optimize the performance of Large Language Model (LLM) applications by storing and reusing previous responses. This not only reduces redundant API calls but also enhances user experience with faster response times.
In this blog, we’ll explore the capabilities of GPTCache, break down the code required to integrate it into AI applications, and discuss best practices for maximizing efficiency. Whether you're working on chatbots, Retrieval-Augmented Generation (RAG) systems, or other AI-driven applications, this guide will help you unlock the full potential of GPTCache.
Setting Up GPTCache
Before integrating GPTCache into your AI workflow, you need to install the required dependencies. The following command installs GPTCache along with other necessary packages:
pip install gptcache onnxruntime openai==0.28 tiktoken
To use the OpenAI API, you need to set up an API key in your environment:
import os from google.colab import userdata OPENAI_API_KEY = userdata.get('OPENAI_API_KEY') os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
Explanation
gptcache
is the main library used for caching AI responses.onnxruntime
enables fast execution of machine learning models.openai
is the official library to interact with OpenAI’s API.- The OpenAI API key is retrieved from Google Colab’s
userdata
module and set as an environment variable.
Real-World Use Case: If your application repeatedly receives the same or similar queries, caching responses prevents unnecessary API calls, reducing costs and improving user experience.
OpenAI API Without GPTCache
Let’s first observe the standard OpenAI API call without caching:
import os import time import openai question = "what’s chatgpt" openai.api_key = OPENAI_API_KEY start_time = time.time() response = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": question}], ) def response_text(openai_resp): return openai_resp['choices'][0]['message']['content'] print(f'Question: {question}') print("Time consuming: {:.2f}s".format(time.time() - start_time)) print(f'Answer: {response_text(response)}\n')
Expected Output
Question: what’s chatgpt Time consuming: 0.87s Answer: ChatGPT is a chatbot developed by OpenAI...
Analysis: Every time the same question is asked, an API call is made, leading to additional cost and increased latency.
Implementing GPTCache
To speed up responses, let’s initialize GPTCache:
from gptcache import cache from gptcache.adapter import openai cache.init() cache.set_openai_key() print("Cache loading...")
Explanation
cache.init()
initializes the caching system.cache.set_openai_key()
sets up the OpenAI API key for GPTCache.
Benefit: Once caching is enabled, repeated queries will return instantly without making API requests.
Query Timing with GPTCache
question = "what's github" for _ in range(2): start_time = time.time() response = openai.ChatCompletion.create( model='gpt-4o', messages=[{"role": "user", "content": question}], ) print(f'Question: {question}') print("Time consuming: {:.2f}s".format(time.time() - start_time)) print(f'Answer: {response_text(response)}\n')
Expected Output
Question: what's github Time consuming: 0.84s Answer: GitHub is a web-based platform... Question: what's github Time consuming: 0.76s Answer: GitHub is a web-based platform...
Observation: The second call is significantly faster because GPTCache retrieves the answer without querying the API.
Implementing Semantic Search in GPTCache
To enhance caching capabilities, we use similarity-based search with ONNX and FAISS:
from gptcache.embedding import Onnx from gptcache.manager import CacheBase, VectorBase, get_data_manager from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation onnx = Onnx() data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension)) cache.init( embedding_func=onnx.to_embeddings, data_manager=data_manager, similarity_evaluation=SearchDistanceEvaluation(), ) cache.set_openai_key()
Explanation
- ONNX optimizes embedding computation.
- FAISS accelerates vector search, making similarity-based caching highly efficient.
get_data_manager
integrates a database (sqlite
) and a vector search engine (faiss
).
Use Case: If users ask slightly different variations of the same question (e.g., "What is GitHub?", "Tell me about GitHub"), GPTCache retrieves a previously stored response instead of generating a new one.
Exact Match Caching
For applications that require strict matching, GPTCache supports exact match evaluation:
from gptcache.similarity_evaluation.exact_match import ExactMatchEvaluation cache.init(similarity_evaluation=ExactMatchEvaluation()) cache.set_openai_key() response = openai.ChatCompletion.create( model='gpt-3.5-turbo', messages=[{'role': 'user', 'content': 'what is chatgpt'}] ) print(response)
Benefit
- Ensures that responses are only retrieved from cache if the query exactly matches a previous query.
Conclusion
GPTCache is a game-changer for optimizing LLM applications, offering significant reductions in API costs and response times. By leveraging exact match caching, semantic search with ONNX and FAISS, and adaptive caching policies, developers can enhance the efficiency of AI applications in production.
Next Steps
- Experiment with different caching strategies based on your use case.
- Integrate GPTCache into chatbot applications for improved performance.
- Explore hybrid caching techniques combining exact match and similarity search.
Resources
---------------------------
Stay Updated:- Follow Build Fast with AI pages for all the latest AI updates and resources.
Experts predict 2025 will be the defining year for Gen AI Implementation. Want to be ahead of the curve?
Join Build Fast with AI’s Gen AI Launch Pad 2025 - your accelerated path to mastering AI tools and building revolutionary applications.
---------------------------
Resources and Community
Join our community of 12,000+ AI enthusiasts and learn to build powerful AI applications! Whether you're a beginner or an experienced developer, this tutorial will help you understand and implement AI agents in your projects.
- Website: www.buildfastwithai.com
- LinkedIn: linkedin.com/company/build-fast-with-ai/
- Instagram: instagram.com/buildfastwithai/
- Twitter: x.com/satvikps
- Telegram: t.me/BuildFastWithAI