buildfastwithaibuildfastwithai
AI WorkshopsAll blogsAgentic AI Launchpad
Agentic AI Launchpad
Download Unrot App
Free AI Workshop
Mentorship

Agentic AI Launchpad

Go from user to builder in 6 weeks.

Explore Program
Share
Back to blogs
Tools
Open Source

FlagEmbedding: Enhance AI Retrieval with Advanced Embeddings

February 20, 2025
4 min read
Share:
FlagEmbedding: Enhance AI Retrieval with Advanced Embeddings
Share:

Are you content watching others shape the future, or will you take charge?

Be part of Gen AI Launch Pad 2025 and make your mark today.

Introduction

In the era of AI-driven search and retrieval, FlagEmbedding emerges as a powerful open-source project aimed at improving information retrieval and large language model (LLM) augmentation through advanced embeddings. This blog post will guide you through the features, implementation, and practical applications of FlagEmbedding, providing a deep dive into its components and functionalities. By the end of this article, you'll gain a comprehensive understanding of how FlagEmbedding enhances retrieval accuracy, improves ranking, and optimizes language model adaptability.

Key Features of FlagEmbedding

FlagEmbedding offers a suite of robust features tailored for diverse retrieval needs:

  • BGE M3-Embedding 🌍: Supports multi-lingual, multi-granular embeddings and enables both dense and sparse retrieval.
  • Visualized-BGE 🖼️: Fuses text and image embeddings for hybrid retrieval tasks.
  • LM-Cocktail 🍹: Blends fine-tuned and base models to improve adaptability in retrieval scenarios.
  • LLM Embedder 🤖: Optimized for knowledge retrieval, memory augmentation, and tool retrieval.
  • BGE Reranker 🔄: Re-ranks top-k results for enhanced accuracy.

Installation

Before diving into implementation, install FlagEmbedding via pip:

pip install -U FlagEmbedding

FlagEmbedding Model Initialization

To begin using FlagEmbedding, initialize the model as follows:

from FlagEmbedding import FlagAutoModel

model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5',
                                      query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
                                      use_fp16=True)

Explanation

  • FlagAutoModel.from_finetuned loads a pre-trained BGE model optimized for retrieval tasks.
  • query_instruction_for_retrieval provides context for how the sentence should be represented for search.
  • use_fp16=True enables mixed-precision floating point for performance optimization.

Use Case

This is ideal for document retrieval systems, search engines, and LLM augmentation, where users need to match queries with relevant passages efficiently.

Encoding Sentences with FlagEmbedding

Now, let's encode some sentences and generate their embeddings:

sentences_1 = ["I love NLP", "I love machine learning"]
sentences_2 = ["I love BGE", "I love text retrieval"]
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)

Explanation

  • Sentence embeddings are numerical representations that capture semantic meaning.
  • model.encode(sentences) converts textual sentences into high-dimensional vector embeddings.
🚀 Cohort Waitlist Open
Go From AI User to AI Builder

Don't just use ChatGPT. Learn to build custom LLM agents, RAG pipelines, and full-stack Agentic AI apps in our intensive 6-week program.

6 Weeks Live Mentorship
Deploy 5+ Real-world Apps
Weekly App Templates & Code
No Coding Experience Required
Explore Program
Join 1,000+ graduates•Free Registration

Computing Sentence Similarity

Once embeddings are generated, compute cosine similarity between sentences:

similarity = embeddings_1 @ embeddings_2.T
print(similarity)

Expected Output

[[0.6538745  0.7568528 ]
 [0.6559792  0.72265273]]

Explanation

  • The dot product (@) computes similarity scores between embeddings.
  • Higher values indicate greater similarity between sentences.

Use Case

This technique is beneficial in recommendation systems, duplicate content detection, and contextual search engines.

AutoReranker: Enhancing Ranking Accuracy

FlagEmbedding provides an AutoReranker for improving search result ranking.

from FlagEmbedding import FlagAutoReranker

reranker = FlagAutoReranker.from_finetuned('BAAI/bge-reranker-large',
                                           query_max_length=256,
                                           passage_max_length=512,
                                           use_fp16=True,
                                           devices=['cuda:0'])

score = reranker.compute_score(['query', 'passage'])
print(score)

Explanation

  • FlagAutoReranker.from_finetuned loads a large reranker model.
  • query_max_length & passage_max_length control the input sizes.
  • FP16 & CUDA accelerate performance.

Expected Output

[-1.513671875]

This value represents the relevance of the passage to the query.

Use Case

This is useful for search engines, chatbots, and knowledge bases, where ranking precision is crucial.

Normal Reranker: Standard Ranking Mechanism

For simpler ranking, a standard FlagReranker is available:

from FlagEmbedding import FlagReranker

reranker = FlagReranker('BAAI/bge-reranker-v2-m3',
                         query_max_length=256,
                         passage_max_length=512,
                         use_fp16=True,
                         devices=['cuda:0'])

score = reranker.compute_score(['query', 'passage'])
print(score)

Explanation

  • Similar to AutoReranker but tailored for standard ranking tasks.

Expected Output

[-5.66015625]

Use Case

Suitable for e-commerce searches, FAQ retrieval, and support chatbots.

LLM Reranker: Layer-wise Re-ranking

For advanced layer-wise ranking, use the LLM Reranker:

from FlagEmbedding import LayerWiseFlagLLMReranker

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise',
                                     query_max_length=256,
                                     passage_max_length=512,
                                     use_fp16=True,
                                     devices=['cuda:0'])

score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)

Explanation

  • cutoff_layers allows tuning of ranking layers for customization.

Expected Output

[-1.375]

Use Case

This is ideal for academic search engines, medical literature retrieval, and legal document ranking.

Conclusion

FlagEmbedding is a game-changer for AI-powered retrieval, offering flexible and powerful tools for embedding generation, reranking, and hybrid search. Key takeaways:

  • BGE embeddings power multi-lingual, dense, and sparse retrieval.
  • AutoReranker & Normal Reranker boost ranking accuracy.
  • Layer-wise reranking fine-tunes results for advanced use cases.

Whether you’re building a search engine, AI chatbot, or recommendation system, FlagEmbedding is a must-have tool.

Resources

  • FlagEmbedding GitHub
  • BAAI Models on Hugging Face
  • BERT for Text Retrieval
  • FlagEmbedding Experiment Notebook

---------------------------

Stay Updated:- Follow Build Fast with AI pages for all the latest AI updates and resources.

Experts predict 2025 will be the defining year for Gen AI Implementation. Want to be ahead of the curve?

Join Build Fast with AI’s Gen AI Launch Pad 2025 - your accelerated path to mastering AI tools and building revolutionary applications.

---------------------------

Resources and Community

Join our community of 12,000+ AI enthusiasts and learn to build powerful AI applications! Whether you're a beginner or an experienced developer, this tutorial will help you understand and implement AI agents in your projects.

  • Website: www.buildfastwithai.com
  • LinkedIn: linkedin.com/company/build-fast-with-ai/
  • Instagram: instagram.com/buildfastwithai/
  • Twitter: x.com/satvikps
  • Telegram: t.me/BuildFastWithAI
Enjoyed this article? Share it →
Share:
    You Might Also Like
    Tiktoken: High-Performance Tokenizer for OpenAI Models
    Tools
    Tiktoken: High-Performance Tokenizer for OpenAI Models

    Unlock the power of tokenization with Tiktoken! Learn how this high-performance library helps you efficiently tokenize text for OpenAI models like GPT. From setup to encoding, decoding, and token management, discover how Tiktoken can optimize your AI projects.

    How FAISS is Revolutionizing Vector Search: Everything You Need to Know
    Tools
    How FAISS is Revolutionizing Vector Search: Everything You Need to Know

    Discover FAISS, the ultimate library for fast similarity search and clustering of dense vectors! This in-depth guide covers setup, vector stores, document management, similarity search, and real-world applications. Master FAISS to build scalable, AI-powered search systems efficiently! 🚀