RAGTools

This is a meta-package exporting PromptingTools RAGTools sub-module and it's key dependencies to simplify the workflow. A simple quality of life improvement.

For details on how to use RAGTools see the manual.

RAGTools.AbstractCandidateChunks
RAGTools.AbstractChunkIndex
RAGTools.AbstractDocumentTermMatrix
RAGTools.AbstractGenerator
RAGTools.AbstractIndexBuilder
RAGTools.AbstractMultiIndex
RAGTools.AbstractRetriever
RAGTools.AdvancedGenerator
RAGTools.AdvancedRetriever
RAGTools.AllTagFilter
RAGTools.AnnotatedNode
RAGTools.AnyTagFilter
RAGTools.BM25Similarity
RAGTools.BatchEmbedder
RAGTools.BinaryBatchEmbedder
RAGTools.BinaryCosineSimilarity
RAGTools.BitPackedBatchEmbedder
RAGTools.BitPackedCosineSimilarity
RAGTools.CandidateChunks
RAGTools.ChunkEmbeddingsIndex
RAGTools.ChunkKeywordsIndex
RAGTools.ChunkKeywordsIndex
RAGTools.CohereReranker
RAGTools.ContextEnumerator
RAGTools.CosineSimilarity
RAGTools.DocumentTermMatrix
RAGTools.FileChunker
RAGTools.FlashRanker
RAGTools.HTMLStyler
RAGTools.HyDERephraser
RAGTools.JudgeAllScores
RAGTools.JudgeRating
RAGTools.KeywordsIndexer
RAGTools.KeywordsProcessor
RAGTools.MultiCandidateChunks
RAGTools.MultiFinder
RAGTools.MultiIndex
RAGTools.NoEmbedder
RAGTools.NoPostprocessor
RAGTools.NoProcessor
RAGTools.NoRefiner
RAGTools.NoRephraser
RAGTools.NoReranker
RAGTools.NoTagFilter
RAGTools.NoTagger
RAGTools.OpenTagger
RAGTools.PassthroughTagger
RAGTools.RAGConfig
RAGTools.RAGResult
RAGTools.RankGPTReranker
RAGTools.RankGPTResult
RAGTools.ReciprocalRankFusionReranker
RAGTools.SimpleAnswerer
RAGTools.SimpleBM25Retriever
RAGTools.SimpleGenerator
RAGTools.SimpleIndexer
RAGTools.SimpleRefiner
RAGTools.SimpleRephraser
RAGTools.SimpleRetriever
RAGTools.Styler
RAGTools.SubChunkIndex
RAGTools.SubDocumentTermMatrix
RAGTools.TavilySearchRefiner
RAGTools.TextChunker
RAGTools.TrigramAnnotater
StructTypes.StructType
Base.:==
Base.:==
Base.:==
Base.copy
Base.getindex
Base.hcat
Base.parent
Base.show
Base.view
Base.view
JSON3.read
PromptingTools.last_message
PromptingTools.last_output
PromptingTools.pprint
PromptingTools.pprint
RAGTools.add_node_metadata!
RAGTools.airag
RAGTools.align_node_styles!
RAGTools.annotate_support
RAGTools.annotate_support
RAGTools.answer!
RAGTools.bm25
RAGTools.build_context
RAGTools.build_index
RAGTools.build_index
RAGTools.build_qa_evals
RAGTools.build_tags
RAGTools.build_tags
RAGTools.build_tags
RAGTools.chunkdata
RAGTools.chunkdata
RAGTools.cohere_api
RAGTools.create_permutation_instruction
RAGTools.create_websearch
RAGTools.doc_rel_length
RAGTools.document_term_matrix
RAGTools.extract_ranking
RAGTools.find_closest
RAGTools.find_closest
RAGTools.find_closest
RAGTools.find_closest
RAGTools.find_closest
RAGTools.find_tags
RAGTools.find_tags
RAGTools.find_tags
RAGTools.generate!
RAGTools.get_chunks
RAGTools.get_embeddings
RAGTools.get_embeddings
RAGTools.get_embeddings
RAGTools.get_keywords
RAGTools.get_tags
RAGTools.get_tags
RAGTools.get_tags
RAGTools.getpropertynested
RAGTools.hamming_distance
RAGTools.hcat_truncate
RAGTools.idf
RAGTools.load_text
RAGTools.max_bm25_score
RAGTools.merge_kwargs_nested
RAGTools.pack_bits
RAGTools.permutation_step!
RAGTools.preprocess_tokens
RAGTools.print_html
RAGTools.rank_gpt
RAGTools.rank_sliding_window!
RAGTools.receive_permutation!
RAGTools.reciprocal_rank_fusion
RAGTools.reciprocal_rank_fusion
RAGTools.reciprocal_rank_fusion
RAGTools.refine!
RAGTools.refine!
RAGTools.refine!
RAGTools.rephrase
RAGTools.rephrase
RAGTools.rephrase
RAGTools.rerank
RAGTools.rerank
RAGTools.retrieve
RAGTools.run_qa_evals
RAGTools.run_qa_evals
RAGTools.score_retrieval_hit
RAGTools.score_retrieval_rank
RAGTools.score_to_unit_scale
RAGTools.set_node_style!
RAGTools.setpropertynested
RAGTools.split_into_code_and_sentences
RAGTools.tags_extract
RAGTools.tavily_api
RAGTools.tf
RAGTools.token_with_boundaries
RAGTools.tokenize
RAGTools.translate_positions_to_parent
RAGTools.translate_positions_to_parent
RAGTools.trigram_support!
RAGTools.trigrams
RAGTools.trigrams_hashed
RAGTools.vocab
RAGTools.vocab_lookup
StructTypes.constructfrom
StructTypes.constructfrom

RAGTools.AbstractCandidateChunks — Type

AbstractCandidateChunks

Abstract type for storing candidate chunks, ie, references to items in a AbstractChunkIndex.

Return type from find_closest and find_tags functions.

Required Fields

index_id::Symbol: the id of the index from which the candidates are drawn
positions::Vector{Int}: the positions of the candidates in the index
scores::Vector{Float32}: the similarity scores of the candidates from the query (higher is better)

source

RAGTools.AbstractChunkIndex — Type

AbstractChunkIndex <: AbstractDocumentIndex

Main abstract type for storing document chunks and their embeddings. It also stores tags and sources for each chunk.

Required Fields

id::Symbol: unique identifier of each index (to ensure we're using the right index with CandidateChunks)
chunks::Vector{<:AbstractString}: underlying document chunks / snippets
embeddings::Union{Nothing, Matrix{<:Real}}: for semantic search
tags::Union{Nothing, AbstractMatrix{<:Bool}}: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the given tag (see tag_vocab for the position lookup)
tags_vocab::Union{Nothing, Vector{<:AbstractString}}: vocabulary for the tags matrix (each column in tags is one item in tags_vocab and rows are the chunks)
sources::Vector{<:AbstractString}: sources of the chunks
extras::Union{Nothing, AbstractVector}: additional data, eg, metadata, source code, etc.

source

RAGTools.AbstractDocumentTermMatrix — Type

AbstractDocumentTermMatrix

Abstract type for a document term matrix.

source

RAGTools.AbstractGenerator — Type

AbstractGenerator <: AbstractGenerationMethod

Abstract type for generating an answer with generate! (use to change the process / return type of generate).

Required Fields

contexter::AbstractContextBuilder: the context building method, dispatching `build_context!
answerer::AbstractAnswerer: the answer generation method, dispatching answer!
refiner::AbstractRefiner: the answer refining method, dispatching refine!
postprocessor::AbstractPostprocessor: the postprocessing method, dispatching postprocess!

source

RAGTools.AbstractIndexBuilder — Type

AbstractIndexBuilder

Abstract type for building an index with build_index (use to change the process / return type of build_index).

Required Fields

chunker::AbstractChunker: the chunking method, dispatching get_chunks
embedder::AbstractEmbedder: the embedding method, dispatching get_embeddings
tagger::AbstractTagger: the tagging method, dispatching get_tags

source

RAGTools.AbstractMultiIndex — Type

AbstractMultiIndex <: AbstractDocumentIndex

Experimental abstract type for storing multiple document indexes. Not yet implemented.

source

RAGTools.AbstractRetriever — Type

AbstractRetriever <: AbstractRetrievalMethod

Abstract type for retrieving chunks from an index with retrieve (use to change the process / return type of retrieve).

Required Fields

rephraser::AbstractRephraser: the rephrasing method, dispatching rephrase
finder::AbstractSimilarityFinder: the similarity search method, dispatching find_closest
filter::AbstractTagFilter: the tag matching method, dispatching find_tags
reranker::AbstractReranker: the reranking method, dispatching rerank

source

RAGTools.AdvancedGenerator — Type

AdvancedGenerator <: AbstractGenerator

Default implementation for generate!. It simply enumerates context snippets and runs aigenerate (no refinement).

It uses ContextEnumerator, SimpleAnswerer, SimpleRefiner, and NoPostprocessor as default contexter, answerer, refiner, and postprocessor.

source

RAGTools.AdvancedRetriever — Type

AdvancedRetriever <: AbstractRetriever

Dispatch for retrieve with advanced retrieval methods to improve result quality. Compared to SimpleRetriever, it adds rephrasing the query and reranking the results.

Fields

rephraser::AbstractRephraser: the rephrasing method, dispatching rephrase - uses HyDERephraser
embedder::AbstractEmbedder: the embedding method, dispatching get_embeddings (see Preparation Stage for more details) - uses BatchEmbedder
processor::AbstractProcessor: the processor method, dispatching get_keywords (see Preparation Stage for more details) - uses NoProcessor
finder::AbstractSimilarityFinder: the similarity search method, dispatching find_closest - uses CosineSimilarity
tagger::AbstractTagger: the tag generating method, dispatching get_tags (see Preparation Stage for more details) - uses NoTagger
filter::AbstractTagFilter: the tag matching method, dispatching find_tags - uses NoTagFilter
reranker::AbstractReranker: the reranking method, dispatching rerank - uses CohereReranker

source

RAGTools.AllTagFilter — Type

AllTagFilter <: AbstractTagFilter

Finds the chunks that have ALL OF the specified tag(s). A method for find_tags.

source

RAGTools.AnnotatedNode — Type

AnnotatedNode{T}  <: AbstractAnnotatedNode

A node to add annotations to the generated answer in airag

Annotations can be: sources, scores, whether its supported or not by the context, etc.

Fields

group_id::Int: Unique identifier for the same group of nodes (eg, different lines of the same code block)
parent::Union{AnnotatedNode, Nothing}: Parent node that current node was built on
children::Vector{AnnotatedNode}: Children nodes
`score::

source

RAGTools.AnyTagFilter — Type

AnyTagFilter <: AbstractTagFilter

Finds the chunks that have ANY OF the specified tag(s). A method for find_tags.

source

RAGTools.BM25Similarity — Type

BM25Similarity <: AbstractSimilarityFinder

Finds the closest chunks to a query embedding by measuring the BM25 similarity between the query and the chunks' embeddings in binary form. A method for find_closest.

Reference: Wikipedia: BM25. Implementation follows: The Next Generation of Lucene Relevance.

Fields mimic the arguments of bm25.

Fields

k1: The k1 parameter for BM25. Default is 1.2.
b: The b parameter for BM25. Default is 0.75.
normalize: Whether to normalize the scores. Default is false.
normalize_max_tf: The maximum term frequency to normalize to. Default is 3.
normalize_min_doc_rel_length: The minimum document relative length to normalize to. Default is 1.0.

source

RAGTools.BatchEmbedder — Type

BatchEmbedder <: AbstractEmbedder

Default embedder for get_embeddings functions. It passes individual documents to be embedded in chunks to aiembed.

source

RAGTools.BinaryBatchEmbedder — Type

BinaryBatchEmbedder <: AbstractEmbedder

Same as BatchEmbedder but reduces the embeddings matrix to a binary form (eg, BitMatrix). Defines a method for get_embeddings.

Reference: HuggingFace: Embedding Quantization.

source

RAGTools.BinaryCosineSimilarity — Type

BinaryCosineSimilarity <: AbstractSimilarityFinder

Finds the closest chunks to a query embedding by measuring the Hamming distance AND cosine similarity between the query and the chunks' embeddings in binary form. A method for find_closest.

It follows the two-pass approach:

First pass: Hamming distance in binary form to get the top_k * rescore_multiplier (ie, more than top_k) candidates.
Second pass: Rescore the candidates with float embeddings and return the top_k.

Reference: HuggingFace: Embedding Quantization.

source

RAGTools.BitPackedBatchEmbedder — Type

BitPackedBatchEmbedder <: AbstractEmbedder

Same as BatchEmbedder but reduces the embeddings matrix to a binary form packed in UInt64 (eg, BitMatrix.chunks). Defines a method for get_embeddings.

See also utilities pack_bits and unpack_bits to move between packed/non-packed binary forms.

Reference: HuggingFace: Embedding Quantization.

source

RAGTools.BitPackedCosineSimilarity — Type

BitPackedCosineSimilarity <: AbstractSimilarityFinder

Finds the closest chunks to a query embedding by measuring the Hamming distance AND cosine similarity between the query and the chunks' embeddings in binary form. A method for find_closest.

The difference to BinaryCosineSimilarity is that the binary values are packed into UInt64, which is more efficient.

Reference: HuggingFace: Embedding Quantization. Implementation of hamming_distance is based on TinyRAG.

source

RAGTools.CandidateChunks — Type

CandidateChunks

A struct for storing references to chunks in the given index (identified by index_id) called positions and scores holding the strength of similarity (=1 is the highest, most similar). It's the result of the retrieval stage of RAG.

Fields

index_id::Symbol: the id of the index from which the candidates are drawn
positions::Vector{Int}: the positions of the candidates in the index (ie, 5

refers to the 5th chunk in the index - `chunks(index)[5]`)

scores::Vector{Float32}: the similarity scores of the candidates from the query (higher is better)

source

RAGTools.ChunkEmbeddingsIndex — Type

ChunkEmbeddingsIndex

Main struct for storing document chunks and their embeddings. It also stores tags and sources for each chunk.

Previously, this struct was called ChunkIndex.

Fields

id::Symbol: unique identifier of each index (to ensure we're using the right index with CandidateChunks)
chunks::Vector{<:AbstractString}: underlying document chunks / snippets
embeddings::Union{Nothing, Matrix{<:Real}}: for semantic search
tags::Union{Nothing, AbstractMatrix{<:Bool}}: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the given tag (see tag_vocab for the position lookup)
tags_vocab::Union{Nothing, Vector{<:AbstractString}}: vocabulary for the tags matrix (each column in tags is one item in tags_vocab and rows are the chunks)
sources::Vector{<:AbstractString}: sources of the chunks
extras::Union{Nothing, AbstractVector}: additional data, eg, metadata, source code, etc.

source

RAGTools.ChunkKeywordsIndex — Type

ChunkKeywordsIndex

Struct for storing chunks of text and associated keywords for BM25 similarity search.

Fields

id::Symbol: unique identifier of each index (to ensure we're using the right index with CandidateChunks)
chunks::Vector{<:AbstractString}: underlying document chunks / snippets
chunkdata::Union{Nothing, AbstractMatrix{<:Real}}: for similarity search, assumed to be DocumentTermMatrix
tags::Union{Nothing, AbstractMatrix{<:Bool}}: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the given tag (see tag_vocab for the position lookup)
tags_vocab::Union{Nothing, Vector{<:AbstractString}}: vocabulary for the tags matrix (each column in tags is one item in tags_vocab and rows are the chunks)
sources::Vector{<:AbstractString}: sources of the chunks
extras::Union{Nothing, AbstractVector}: additional data, eg, metadata, source code, etc.

Example

We can easily create a keywords-based index from a standard embeddings-based index.


# Let's assume we have a standard embeddings-based index
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))

# Creating an additional index for keyword-based search (BM25), is as simple as
index_keywords = ChunkKeywordsIndex(index)

# We can immediately create a MultiIndex (a hybrid index holding both indices)
multi_index = MultiIndex([index, index_keywords])

You can also build the index via build_index

# given some sentences and sources
index_keywords = build_index(KeywordsIndexer(), sentences; chunker_kwargs=(; sources))

# Retrive closest chunks with
retriever = SimpleBM25Retriever()
result = retrieve(retriever, index_keywords, "What are the best practices for parallel computing in Julia?")
result.context

If you want to use airag, don't forget to specify the config to make sure keywords are processed (ie, tokenized) and that BM25 is used for searching candidates

cfg = RAGConfig(; retriever = SimpleBM25Retriever());
airag(cfg, index_keywords;
	question = "What are the best practices for parallel computing in Julia?")

source

RAGTools.ChunkKeywordsIndex — Method

ChunkKeywordsIndex(
	[processor::AbstractProcessor=KeywordsProcessor(),] index::ChunkEmbeddingsIndex; verbose::Int = 1,
	index_id = gensym("ChunkKeywordsIndex"), processor_kwargs...)

Convenience method to quickly create a ChunkKeywordsIndex from an existing ChunkEmbeddingsIndex.

Example


# Let's assume we have a standard embeddings-based index
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))

# Creating an additional index for keyword-based search (BM25), is as simple as
index_keywords = ChunkKeywordsIndex(index)

# We can immediately create a MultiIndex (a hybrid index holding both indices)
multi_index = MultiIndex([index, index_keywords])

source

RAGTools.CohereReranker — Type

CohereReranker <: AbstractReranker

Rerank strategy using the Cohere Rerank API. Requires an API key. A method for rerank.

source

RAGTools.ContextEnumerator — Type

ContextEnumerator <: AbstractContextBuilder

Default method for build_context! method. It simply enumerates the context snippets around each position in candidates. When possibly, it will add surrounding chunks (from the same source).

source

RAGTools.CosineSimilarity — Type

CosineSimilarity <: AbstractSimilarityFinder

Finds the closest chunks to a query embedding by measuring the cosine similarity between the query and the chunks' embeddings. A method for find_closest (see the docstring for more details and usage example).

source

RAGTools.DocumentTermMatrix — Type

DocumentTermMatrix{T<:AbstractString}

A sparse matrix of term frequencies and document lengths to allow calculation of BM25 similarity scores.

source

RAGTools.FileChunker — Type

FileChunker <: AbstractChunker

Chunker when you provide file paths to get_chunks functions.

Ie, the inputs will be validated first (eg, file exists, etc) and then read into memory.

Set as default chunker in get_chunks functions.

source

RAGTools.FlashRanker — Type

FlashRanker <: AbstractReranker

Rerank strategy using the package FlashRank.jl and local models. A method for rerank.

You must first import the FlashRank.jl package. To automatically download any required models, set your ENV["DATADEPS_ALWAYS_ACCEPT"] = true (see DataDeps for more details).

Example

using FlashRank

# Wrap the model to be a valid Ranker recognized by RAGTools
# It will be provided to the airag/rerank function to avoid instantiating it on every call
reranker = FlashRank.RankerModel(:mini) |> FlashRanker
# You can choose :tiny or :mini

## Apply to the pipeline configuration, eg, 
cfg = RAGConfig(; retriever = AdvancedRetriever(; reranker))

# Ask a question (assumes you have some `index`)
question = "What are the best practices for parallel computing in Julia?"
result = airag(cfg, index; question, return_all = true)

source

RAGTools.HTMLStyler — Type

HTMLStyler

Defines styling via classes (attribute class) and styles (attribute style) for HTML formatting of AbstractAnnotatedNode

source

RAGTools.HyDERephraser — Type

HyDERephraser <: AbstractRephraser

Rephraser implemented using the provided AI Template (eg, ...) and standard chat model. A method for rephrase.

It uses a prompt-based rephrasing method called HyDE (Hypothetical Document Embedding), where instead of looking for an embedding of the question, we look for the documents most similar to a synthetic passage that would be a good answer to our question.

Reference: Arxiv paper.

source

RAGTools.JudgeAllScores — Type

final_rating is the average of all scoring criteria. Explain the final_rating in rationale

source

RAGTools.JudgeRating — Type

Provide the final_rating between 1-5. Provide the rationale for it.

source

RAGTools.KeywordsIndexer — Type

KeywordsIndexer <: AbstractIndexBuilder

Keyword-based index (BM25) to be returned by build_index.

It uses TextChunker, KeywordsProcessor, and NoTagger as default chunker, processor, and tagger.

source

RAGTools.KeywordsProcessor — Type

KeywordsProcessor <: AbstractProcessor

Default keywords processor for get_keywords functions. It normalizes the documents, tokenizes them and builds a DocumentTermMatrix.

source

RAGTools.MultiCandidateChunks — Type

MultiCandidateChunks

A struct for storing references to multiple sets of chunks across different indices. Each set of chunks is identified by an index_id in index_ids, with corresponding positions in the index and scores indicating the strength of similarity.

This struct is useful for scenarios where candidates are drawn from multiple indices, and there is a need to keep track of which candidates came from which index.

Fields

index_ids::Vector{Symbol}: the ids of the indices from which the candidates are drawn
positions::Vector{TP}: the positions of the candidates in their respective indices
scores::Vector{TD}: the similarity scores of the candidates from the query

source

RAGTools.MultiFinder — Type

MultiFinder <: AbstractSimilarityFinder

Composite finder for MultiIndex where we want to set multiple finders for each index. A method for find_closest. Positions correspond to indexes(::MultiIndex).

source

RAGTools.MultiIndex — Type

MultiIndex

Composite index that stores multiple ChunkIndex objects and their embeddings.

Fields

id::Symbol: unique identifier of each index (to ensure we're using the right index with CandidateChunks)
indexes::Vector{<:AbstractChunkIndex}: the indexes to be combined

Use accesor indexes to access the individual indexes.

Examples

We can create a MultiIndex from a vector of AbstractChunkIndex objects.

index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; sources))
index_keywords = ChunkKeywordsIndex(index) # same chunks as above but adds BM25 instead of embeddings

multi_index = MultiIndex([index, index_keywords])

To use airag with different types of indices, we need to specify how to find the closest items for each index

# Cosine similarity for embeddings and BM25 for keywords, same order as indexes in MultiIndex
finder = RT.MultiFinder([RT.CosineSimilarity(), RT.BM25Similarity()])

# Notice that we add `processor` to make sure keywords are processed (ie, tokenized) as well
cfg = RAGConfig(; retriever = SimpleRetriever(; processor = RT.KeywordsProcessor(), finder))

# Ask questions
msg = airag(cfg, multi_index; question = "What are the best practices for parallel computing in Julia?")
pprint(msg) # prettify the answer

source

RAGTools.NoEmbedder — Type

NoEmbedder <: AbstractEmbedder

No-op embedder for get_embeddings functions. It returns nothing.

source

RAGTools.NoPostprocessor — Type

NoPostprocessor <: AbstractPostprocessor

Default method for postprocess! method. A passthrough option that returns the result without any changes.

Overload this method to add custom postprocessing steps, eg, logging, saving conversations to disk, etc.

source

RAGTools.NoProcessor — Type

NoProcessor <: AbstractProcessor

No-op processor for get_keywords functions. It returns the inputs as is.

source

RAGTools.NoRefiner — Type

NoRefiner <: AbstractRefiner

Default method for refine! method. A passthrough option that returns the result.answer without any changes.

source

RAGTools.NoRephraser — Type

NoRephraser <: AbstractRephraser

No-op implementation for rephrase, which simply passes the question through.

source

RAGTools.NoReranker — Type

NoReranker <: AbstractReranker

No-op implementation for rerank, which simply passes the candidate chunks through.

source

RAGTools.NoTagFilter — Type

NoTagFilter <: AbstractTagFilter

No-op implementation for find_tags, which simply returns all chunks.

source

RAGTools.NoTagger — Type

NoTagger <: AbstractTagger

No-op tagger for get_tags functions. It returns (nothing, nothing).

source

RAGTools.OpenTagger — Type

OpenTagger <: AbstractTagger

Tagger for get_tags functions, which generates possible tags for each chunk via aiextract. You can customize it via prompt template (default: :RAGExtractMetadataShort), but it's quite open-ended (ie, AI decides the possible tags).

source

RAGTools.PassthroughTagger — Type

PassthroughTagger <: AbstractTagger

Tagger for get_tags functions, which passes tags directly as Vector of Vectors of strings (ie, tags[i] is the tags for docs[i]).

source

RAGTools.RAGConfig — Type

RAGConfig <: AbstractRAGConfig

Default configuration for RAG. It uses SimpleIndexer, SimpleRetriever, and SimpleGenerator as default components. Provided as the first argument in airag.

To customize the components, replace corresponding fields for each step of the RAG pipeline (eg, use subtypes(AbstractIndexBuilder) to find the available options).

source

RAGTools.RAGResult — Type

RAGResult

A struct for debugging RAG answers. It contains the question, answer, context, and the candidate chunks at each step of the RAG pipeline.

Think of the flow as question -> rephrased_questions -> answer -> final_answer with the context and candidate chunks helping along the way.

Fields

question::AbstractString: the original question
rephrased_questions::Vector{<:AbstractString}: a vector of rephrased questions (eg, HyDe, Multihop, etc.)
answer::AbstractString: the generated answer
final_answer::AbstractString: the refined final answer (eg, after CorrectiveRAG),

also considered the FINAL answer (it must be always available)

context::Vector{<:AbstractString}: the context used for retrieval (ie, the vector

of chunks and their surrounding window if applicable)

sources::Vector{<:AbstractString}: the sources of the context (for the original matched chunks)
emb_candidates::CandidateChunks: the candidate chunks from the embedding index (from find_closest)
tag_candidates::Union{Nothing, CandidateChunks}: the candidate chunks from the tag index (from find_tags)
filtered_candidates::CandidateChunks: the filtered candidate chunks (intersection

of `emb_candidates` and `tag_candidates`)

reranked_candidates::CandidateChunks: the reranked candidate chunks (from rerank)
conversations::Dict{Symbol,Vector{<:AbstractMessage}}: the conversation history for

AI steps of the RAG pipeline, use keys that correspond to the function names, eg, `:answer` or `:refine`

See also: pprint (pretty printing), annotate_support (for annotating the answer)

source

RAGTools.RankGPTReranker — Type

RankGPTReranker <: AbstractReranker

Rerank strategy using the RankGPT algorithm (calling LLMs). A method for rerank.

Reference

[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github

source

RAGTools.RankGPTResult — Type

RankGPTResult

Results from the RankGPT algorithm.

Fields

question::String: The question that was asked.
chunks::AbstractVector{T}: The chunks that were ranked (=context).
positions::Vector{Int}: The ranking of the chunks (referring to the chunks).
elapsed::Float64: The time it took to rank the chunks.
cost::Float64: The cumulative cost of the ranking.
tokens::Int: The cumulative number of tokens used in the ranking.

source

RAGTools.ReciprocalRankFusionReranker — Type

ReciprocalRankFusionReranker <: AbstractReranker

Rerank strategy using the reciprocal rank fusion algorithm for simple cases with embeddings and keywords indices referring to the same chunks. A dispatch type for rerank.

!! To be used with MultiIndex that contains embeddings and keywords indices referring to the same chunks.

source

RAGTools.SimpleAnswerer — Type

SimpleAnswerer <: AbstractAnswerer

Default method for answer! method. Generates an answer using the aigenerate function with the provided context and question.

source

RAGTools.SimpleBM25Retriever — Type

SimpleBM25Retriever <: AbstractRetriever

Keyword-based implementation for retrieve. It does a simple similarity search via BM25Similarity and returns the results.

Make sure to use consistent processor and tagger with the Preparation Stage (build_index)!

Fields

rephraser::AbstractRephraser: the rephrasing method, dispatching rephrase - uses NoRephraser
embedder::AbstractEmbedder: the embedding method, dispatching get_embeddings (see Preparation Stage for more details) - uses NoEmbedder
processor::AbstractProcessor: the processor method, dispatching get_keywords (see Preparation Stage for more details) - uses KeywordsProcessor
finder::AbstractSimilarityFinder: the similarity search method, dispatching find_closest - uses CosineSimilarity
tagger::AbstractTagger: the tag generating method, dispatching get_tags (see Preparation Stage for more details) - uses NoTagger
filter::AbstractTagFilter: the tag matching method, dispatching find_tags - uses NoTagFilter
reranker::AbstractReranker: the reranking method, dispatching rerank - uses NoReranker

source

RAGTools.SimpleGenerator — Type

SimpleGenerator <: AbstractGenerator

Default implementation for generate. It simply enumerates context snippets and runs aigenerate (no refinement).

It uses ContextEnumerator, SimpleAnswerer, NoRefiner, and NoPostprocessor as default contexter, answerer, refiner, and postprocessor.

source

RAGTools.SimpleIndexer — Type

SimpleIndexer <: AbstractIndexBuilder

Default implementation for build_index.

It uses TextChunker, BatchEmbedder, and NoTagger as default chunker, embedder, and tagger.

source

RAGTools.SimpleRefiner — Type

SimpleRefiner <: AbstractRefiner

Refines the answer using the same context previously provided via the provided prompt template. A method for refine!.

source

RAGTools.SimpleRephraser — Type

SimpleRephraser <: AbstractRephraser

Rephraser implemented using the provided AI Template (eg, ...) and standard chat model. A method for rephrase.

source

RAGTools.SimpleRetriever — Type

SimpleRetriever <: AbstractRetriever

Default implementation for retrieve function. It does a simple similarity search via CosineSimilarity and returns the results.

Make sure to use consistent embedder and tagger with the Preparation Stage (build_index)!

Fields

rephraser::AbstractRephraser: the rephrasing method, dispatching rephrase - uses NoRephraser
embedder::AbstractEmbedder: the embedding method, dispatching get_embeddings (see Preparation Stage for more details) - uses BatchEmbedder
processor::AbstractProcessor: the processor method, dispatching get_keywords (see Preparation Stage for more details) - uses NoProcessor
finder::AbstractSimilarityFinder: the similarity search method, dispatching find_closest - uses CosineSimilarity
tagger::AbstractTagger: the tag generating method, dispatching get_tags (see Preparation Stage for more details) - uses NoTagger
filter::AbstractTagFilter: the tag matching method, dispatching find_tags - uses NoTagFilter
reranker::AbstractReranker: the reranking method, dispatching rerank - uses NoReranker

source

RAGTools.Styler — Type

Styler

Defines styling keywords for printstyled for each AbstractAnnotatedNode

source

RAGTools.SubChunkIndex — Type

SubChunkIndex

A view of the parent index with respect to the chunks (and chunk-aligned fields). All methods and accessors working for AbstractChunkIndex also work for SubChunkIndex. It does not yet work for MultiIndex.

Fields

parent::AbstractChunkIndex: the parent index from which the chunks are drawn (always the original index, never a view)
positions::Vector{Int}: the positions of the chunks in the parent index (always refers to original PARENT index, even if we create a view of the view)

Example

cc = CandidateChunks(index.id, 1:10)
sub_index = @view(index[cc])

You can use SubChunkIndex to access chunks or sources (and other fields) from a parent index, eg,

RT.chunks(sub_index)
RT.sources(sub_index)
RT.chunkdata(sub_index) # slice of embeddings
RT.embeddings(sub_index) # slice of embeddings
RT.tags(sub_index) # slice of tags
RT.tags_vocab(sub_index) # unchanged, identical to parent version
RT.extras(sub_index) # slice of extras

Access the parent index that the positions correspond to

parent(sub_index)
RT.positions(sub_index)

source

RAGTools.SubDocumentTermMatrix — Type

SubDocumentTermMatrix

A partial view of a DocumentTermMatrix, tf is MATERIALIZED for performance and fewer allocations."

source

RAGTools.TavilySearchRefiner — Type

TavilySearchRefiner <: AbstractRefiner

Refines the answer by executing a web search using the Tavily API. This method aims to enhance the answer's accuracy and relevance by incorporating information retrieved from the web. A method for refine!.

source

RAGTools.TextChunker — Type

TextChunker <: AbstractChunker

Chunker when you provide text to get_chunks functions. Inputs are directly chunked

source

RAGTools.TrigramAnnotater — Type

TrigramAnnotater

Annotation method where we score answer versus each context based on word-level trigrams that match.

It's very simple method (and it can loose some semantic meaning in longer sequences like negative), but it works reasonably well for both text and code.

source

StructTypes.StructType — Method

StructTypes.StructType(::Type{RAGResult})

source

Base.:== — Method

Base.var"=="(i1::MultiIndex, i2::MultiIndex)

Check that each index has a counterpart in the other MultiIndex.

source

Base.:== — Method

Base.var"=="(dtm1::AbstractDocumentTermMatrix, dtm2::AbstractDocumentTermMatrix) = false

Check if two AbstractDocumentTermMatrix objects are equal.

source

Base.:== — Method

Base.var"=="(r1::T, r2::T) where {T <: AbstractRAGResult}

Two RAGResult objects are equal if all their fields are equal.

source

Base.copy — Method

Base.copy(r::T) where {T <: AbstractRAGResult}

Copy a RAGResult object by deep copying all its fields.

source

Base.getindex — Method

Base.getindex

Get the field of a candidate chunk from an index.

source

Base.hcat — Method

Base.hcat(d1::AbstractDocumentTermMatrix, d2::AbstractDocumentTermMatrix)

Concatenate two AbstractDocumentTermMatrix objects horizontally.

source

Base.parent — Method

Base.parent(dtm::AbstractDocumentTermMatrix)

The parent of an AbstractDocumentTermMatrix is itself.

source

Base.show — Method

Base.show(io::IO,
	t::Union{AbstractDocumentIndex, AbstractCandidateChunks, AbstractRAGResult})

Structured show method for easier reading (each kwarg on a new line)

source

Base.view — Method

Base.view

source

Base.view — Method

Base.view(dtm::SubDocumentTermMatrix, doc_idx::AbstractVector{<:Integer}, token_idx::Colon)

Create a view of a SubDocumentTermMatrix for a specific document index and all tokens.

source

JSON3.read — Method

JSON3.read(path::AbstractString, ::Type{RAGResult})

Read a RAGResult object from a JSON file.

source

PromptingTools.last_message — Method

PT.last_message(result::RAGResult)

Extract the last message from the RAGResult for consistency with AICall / Message vectors. It looks for final_answer first, then answer fields in the conversations dictionary. Returns nothing if not found.

source

PromptingTools.last_output — Method

PT.last_output(result::RAGResult)

Extracts the last output (generated text answer) from the RAGResult for consistency with AICall / Message vectors.

See also: PT.last_message

source

PromptingTools.pprint — Method

PromptingTools.pprint(
	io::IO, node::AbstractAnnotatedNode;
	text_width::Int = displaysize(io)[2], add_newline::Bool = true)

Pretty print the node to the io stream, including all its children

Supports only node.style::Styler for now.

source

PromptingTools.pprint — Method

PT.pprint(
	io::IO, r::AbstractRAGResult; add_context::Bool = false,
	text_width::Int = displaysize(io)[2], annotater_kwargs...)

Pretty print the RAG result r to the given io stream.

If add_context is true, the context will be printed as well. The text_width parameter can be used to control the width of the output.

You can provide additional keyword arguments to the annotater, eg, add_sources, add_scores, min_score, etc. See annotate_support for more details.

source

RAGTools.add_node_metadata! — Method

add_node_metadata!(annotater::TrigramAnnotater,
	root::AnnotatedNode; add_sources::Bool = true, add_scores::Bool = true,
	sources::Union{Nothing, AbstractVector{<:AbstractString}} = nothing)

Adds metadata to the children of root. Metadata includes sources and scores, if requested.

Optionally, it can add a list of sources at the end of the printed text.

The metadata is added by inserting new nodes in the root children list (with no children of its own to be printed out).

source

RAGTools.airag — Method

airag(cfg::AbstractRAGConfig, index::AbstractDocumentIndex;
	question::AbstractString,
	verbose::Integer = 1, return_all::Bool = false,
	api_kwargs::NamedTuple = NamedTuple(),
	retriever::AbstractRetriever = cfg.retriever,
	retriever_kwargs::NamedTuple = NamedTuple(),
	generator::AbstractGenerator = cfg.generator,
	generator_kwargs::NamedTuple = NamedTuple(),
	cost_tracker = Threads.Atomic{Float64}(0.0))

High-level wrapper for Retrieval-Augmented Generation (RAG), it combines together the retrieve and generate! steps which you can customize if needed.

The simplest version first finds the relevant chunks in index for the question and then sends these chunks to the AI model to help with generating a response to the question.

To customize the components, replace the types (retriever, generator) of the corresponding step of the RAG pipeline - or go into sub-routines within the steps. Eg, use subtypes(AbstractRetriever) to find the available options.

Arguments

cfg::AbstractRAGConfig: The configuration for the RAG pipeline. Defaults to RAGConfig(), where you can swap sub-types to customize the pipeline.
index::AbstractDocumentIndex: The chunk index to search for relevant text.
question::AbstractString: The question to be answered.
return_all::Bool: If true, returns the details used for RAG along with the response.
verbose::Integer: If >0, enables verbose logging. The higher the number, the more nested functions will log.
api_kwargs: API parameters that will be forwarded to ALL of the API calls (aiembed, aigenerate, and aiextract).
retriever::AbstractRetriever: The retriever to use for finding relevant chunks. Defaults to cfg.retriever, eg, SimpleRetriever (with no question rephrasing).
retriever_kwargs::NamedTuple: API parameters that will be forwarded to the retriever call. Examples of important ones:

- `top_k::Int`: Number of top candidates to retrieve based on embedding similarity.
- `top_n::Int`: Number of candidates to return after reranking.
- `tagger::AbstractTagger`: Tagger to use for tagging the chunks. Defaults to `NoTagger()`.
- `tagger_kwargs::NamedTuple`: API parameters that will be forwarded to the `tagger` call. You could provide the explicit tags directly with `PassthroughTagger` and `tagger_kwargs = (; tags = ["tag1", "tag2"])`.

generator::AbstractGenerator: The generator to use for generating the answer. Defaults to cfg.generator, eg, SimpleGenerator.
generator_kwargs::NamedTuple: API parameters that will be forwarded to the generator call. Examples of important ones:

- `answerer_kwargs::NamedTuple`: API parameters that will be forwarded to the `answerer` call. Examples:
	- `model`: The model to use for generating the answer. Defaults to `PT.MODEL_CHAT`.
	- `template`: The template to use for the `aigenerate` function. Defaults to `:RAGAnswerFromContext`.
- `refiner::AbstractRefiner`: The method to use for refining the answer. Defaults to `generator.refiner`, eg, `NoRefiner`.
- `refiner_kwargs::NamedTuple`: API parameters that will be forwarded to the `refiner` call.
	- `model`: The model to use for generating the answer. Defaults to `PT.MODEL_CHAT`.
	- `template`: The template to use for the `aigenerate` function. Defaults to `:RAGAnswerRefiner`.

cost_tracker: An atomic counter to track the total cost of the operations (if you want to track the cost of multiple pipeline runs - it passed around in the pipeline).

Returns

If return_all is false, returns the generated message (msg).
If return_all is true, returns the detail of the full pipeline in RAGResult (see the docs).

See also build_index, retrieve, generate!, RAGResult, getpropertynested, setpropertynested, merge_kwargs_nested, ChunkKeywordsIndex.

Examples

Using airag to get a response for a question:

index = build_index(...)  # create an index
question = "How to make a barplot in Makie.jl?"
msg = airag(index; question)

To understand the details of the RAG process, use return_all=true

msg, details = airag(index; question, return_all = true)
# details is a RAGDetails object with all the internal steps of the `airag` function

You can also pretty-print details to highlight generated text vs text that is supported by context. It also includes annotations of which context was used for each part of the response (where available).

PT.pprint(details)

Example with advanced retrieval (with question rephrasing and reranking (requires COHERE_API_KEY). We will obtain top 100 chunks from embeddings (top_k) and top 5 chunks from reranking (top_n). In addition, it will be done with a "custom" locally-hosted model.

cfg = RAGConfig(; retriever = AdvancedRetriever())

# kwargs will be big and nested, let's prepare them upfront
# we specify "custom" model for each component that calls LLM
kwargs = (
	retriever_kwargs = (;
		top_k = 100,
		top_n = 5,
		rephraser_kwargs = (;
			model = "custom"),
		embedder_kwargs = (;
			model = "custom"),
		tagger_kwargs = (;
			model = "custom")),
	generator_kwargs = (;
		answerer_kwargs = (;
			model = "custom"),
		refiner_kwargs = (;
			model = "custom")),
	api_kwargs = (;
		url = "http://localhost:8080"))

result = airag(cfg, index, question; kwargs...)

If you want to use hybrid retrieval (embeddings + BM25), you can easily create an additional index based on keywords and pass them both into a MultiIndex.

You need to provide an explicit config, so the pipeline knows how to handle each index in the search similarity phase (finder).

index = # your existing index

# create the multi-index with the keywords index
index_keywords = ChunkKeywordsIndex(index)
multi_index = MultiIndex([index, index_keywords])

# define the similarity measures for the indices that you have (same order)
finder = RT.MultiFinder([RT.CosineSimilarity(), RT.BM25Similarity()])
cfg = RAGConfig(; retriever=AdvancedRetriever(; processor=RT.KeywordsProcessor(), finder))

# Run the pipeline with the new hybrid retrieval (return the `RAGResult` to see the details)
result = airag(cfg, multi_index; question, return_all=true)

# Pretty-print the result
PT.pprint(result)

For easier manipulation of nested kwargs, see utilities getpropertynested, setpropertynested, merge_kwargs_nested.

source

RAGTools.align_node_styles! — Method

align_node_styles!(annotater::TrigramAnnotater, nodes::AbstractVector{<:AnnotatedNode}; kwargs...)

Aligns the styles of the nodes based on the surrounding nodes ("fill-in-the-middle").

If the node has no score, but the surrounding nodes have the same style, the node will inherit the style of the surrounding nodes.

source

RAGTools.annotate_support — Method

annotate_support(annotater::TrigramAnnotater, answer::AbstractString,
	context::AbstractVector; min_score::Float64 = 0.5,
	skip_trigrams::Bool = true, hashed::Bool = true,
	sources::Union{Nothing, AbstractVector{<:AbstractString}} = nothing,
	min_source_score::Float64 = 0.25,
	add_sources::Bool = true,
	add_scores::Bool = true, kwargs...)

Annotates the answer with the overlap/what's supported in context and returns the annotated tree of nodes representing the answer

Returns a "root" node with children nodes representing the sentences/code blocks in the answer. Only the "leaf" nodes are to be printed (to avoid duplication), "leaf" nodes are those with NO children.

Default logic:

Split into sentences/code blocks, then into tokens (~words).
Then match each token (~word) exactly.
If no exact match found, count trigram-based match (include the surrounding tokens for better contextual awareness).
If the match is higher than min_score, it's recorded in the score of the node.

Arguments

annotater::TrigramAnnotater: Annotater to use
answer::AbstractString: Text to annotate
context::AbstractVector: Context to annotate against, ie, look for "support" in the texts in context
min_score::Float64: Minimum score to consider a match. Default: 0.5, which means that half of the trigrams of each word should match
skip_trigrams::Bool: Whether to potentially skip trigram matching if exact full match is found. Default: true
hashed::Bool: Whether to use hashed trigrams. It's harder to debug, but it's much faster for larger texts (hashed text are held in a Set to deduplicate). Default: true
sources::Union{Nothing, AbstractVector{<:AbstractString}}: Sources to add at the end of the context. Default: nothing
min_source_score::Float64: Minimum score to consider/to display a source. Default: 0.25, which means that at least a quarter of the trigrams of each word should match to some context. The threshold is lower than min_score, because it's average across ALL words in a block, so it's much harder to match fully with generated text.
add_sources::Bool: Whether to add sources at the end of each code block/sentence. Sources are addded in the square brackets like "[1]". Default: true
add_scores::Bool: Whether to add source-matching scores at the end of each code block/sentence. Scores are added in the square brackets like "[0.75]". Default: true
kwargs: Additional keyword arguments to pass to trigram_support! and set_node_style!. See their documentation for more details (eg, customize the colors of the nodes based on the score)

Example

annotater = TrigramAnnotater()
context = [
	"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test context. Another context sentence."

annotated_root = annotate_support(annotater, answer, context)
pprint(annotated_root) # pretty print the annotated tree

source

RAGTools.annotate_support — Method

annotate_support(
	annotater::TrigramAnnotater, result::AbstractRAGResult; min_score::Float64 = 0.5,
	skip_trigrams::Bool = true, hashed::Bool = true,
	min_source_score::Float64 = 0.25,
	add_sources::Bool = true,
	add_scores::Bool = true, kwargs...)

Dispatch for annotate_support for AbstractRAGResult type. It extracts the final_answer and context from the result and calls annotate_support with them.

See annotate_support for more details.

Example

res = RAGResult(; question = "", final_answer = "This is a test.",
	context = ["Test context.", "Completely different"])
annotated_root = annotate_support(annotater, res)
PT.pprint(annotated_root)

source

RAGTools.answer! — Method

answer!(
	answerer::SimpleAnswerer, index::AbstractDocumentIndex, result::AbstractRAGResult;
	model::AbstractString = PT.MODEL_CHAT, verbose::Bool = true,
	template::Symbol = :RAGAnswerFromContext,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...)

Generates an answer using the aigenerate function with the provided result.context and result.question.

Returns

Mutated result with result.answer and the full conversation saved in result.conversations[:answer]

Arguments

answerer::SimpleAnswerer: The method to use for generating the answer. Uses aigenerate.
index::AbstractDocumentIndex: The index containing chunks and sources.
result::AbstractRAGResult: The result containing the context and question to generate the answer for.
model::AbstractString: The model to use for generating the answer. Defaults to PT.MODEL_CHAT.
verbose::Bool: If true, enables verbose logging.
template::Symbol: The template to use for the aigenerate function. Defaults to :RAGAnswerFromContext.
cost_tracker: An atomic counter to track the cost of the operation.

source

RAGTools.bm25 — Method

bm25( dtm::AbstractDocumentTermMatrix, query::AbstractVector{<:AbstractString}; k1::Float32 = 1.2f0, b::Float32 = 0.75f0, normalize::Bool = false, normalizemaxtf::Real = 3, normalizemindocrellength::Float32 = 1.0f0, kwargs...)

Scores all documents in dtm based on the query.

References: https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

Arguments

dtm: A DocumentTermMatrix object.
query: A vector of query tokens.
k1: The k1 parameter for BM25.
b: The b parameter for BM25.
normalize: Whether to normalize the scores (returns scores between 0 and 1).

Theoretically, if you choose normalize_max_tf and normalize_min_doc_rel_length to be too low, you could get scores greater than 1.

normalize_max_tf: The maximum term frequency to normalize to. 3 is a good default (assumes max 3 hits per document).
normalize_min_doc_rel_length: The minimum document relative length to normalize to. 0.5 is a good default.

Ideally, pick the minimum document relative length of the corpus that is non-zero min_doc_rel_length = minimum(x for x in doc_rel_length(chunkdata(key_index)) if x > 0) |> Float32

Example

documents = [["this", "is", "a", "test"], ["this", "is", "another", "test"], ["foo", "bar", "baz"]]
dtm = document_term_matrix(documents)
query = ["this"]
scores = bm25(dtm, query)
# Returns array with 3 scores (one for each document)

Normalization is done by dividing the score by the maximum possible score (given some assumptions). It's useful to be get results in the same range as cosine similarity scores and when comparing different queries or documents.

documents = [["this", "is", "a", "test"], ["this", "is", "another", "test"], ["foo", "bar", "baz"]]
dtm = document_term_matrix(documents)
query = ["this"]
scores = bm25(dtm, query)
scores_norm = bm25(dtm, query; normalize = true)

## Make it more accurate for your dataset/index
normalize_max_tf = 3 # assume max term frequency is 3 (what is likely for your dataset? depends on chunk size, preprocessing, etc.)
normalize_min_doc_rel_length = minimum([x for x in doc_rel_length(dtm) if x > 0]) |> Float32
scores_norm = bm25(dtm, query; normalize = true, normalize_max_tf, normalize_min_doc_rel_length)

source

RAGTools.build_context — Method

build_context(contexter::ContextEnumerator,
	index::AbstractDocumentIndex, candidates::AbstractCandidateChunks;
	verbose::Bool = true,
	chunks_window_margin::Tuple{Int, Int} = (1, 1), kwargs...)

	build_context!(contexter::ContextEnumerator,
	index::AbstractDocumentIndex, result::AbstractRAGResult; kwargs...)

Build context strings for each position in candidates considering a window margin around each position. If mutating version is used (build_context!), it will use result.reranked_candidates to update the result.context field.

Arguments

contexter::ContextEnumerator: The method to use for building the context. Enumerates the snippets.
index::AbstractDocumentIndex: The index containing chunks and sources.
candidates::AbstractCandidateChunks: Candidate chunks which contain positions to extract context from.
verbose::Bool: If true, enables verbose logging.
chunks_window_margin::Tuple{Int, Int}: A tuple indicating the margin (before, after) around each position to include in the context. Defaults to (1,1), which means 1 preceding and 1 suceeding chunk will be included. With (0,0), only the matching chunks will be included.

Returns

Vector{String}: A vector of context strings, each corresponding to a position in reranked_candidates.

Examples

index = ChunkIndex(...)  # Assuming a proper index is defined
candidates = CandidateChunks(index.id, [2, 4], [0.1, 0.2])
context = build_context(ContextEnumerator(), index, candidates; chunks_window_margin=(0, 1)) # include only one following chunk for each matching chunk

source

RAGTools.build_index — Method

build_index(
	indexer::KeywordsIndexer, files_or_docs::Vector{<:AbstractString};
	verbose::Integer = 1,
	extras::Union{Nothing, AbstractVector} = nothing,
	index_id = gensym("ChunkKeywordsIndex"),
	chunker::AbstractChunker = indexer.chunker,
	chunker_kwargs::NamedTuple = NamedTuple(),
	processor::AbstractProcessor = indexer.processor,
	processor_kwargs::NamedTuple = NamedTuple(),
	tagger::AbstractTagger = indexer.tagger,
	tagger_kwargs::NamedTuple = NamedTuple(),
	api_kwargs::NamedTuple = NamedTuple(),
	cost_tracker = Threads.Atomic{Float64}(0.0))

Builds a ChunkKeywordsIndex from the provided files or documents to support keyword-based search (BM25).

source

RAGTools.build_index — Method

build_index(
	indexer::AbstractIndexBuilder, files_or_docs::Vector{<:AbstractString};
	verbose::Integer = 1,
	extras::Union{Nothing, AbstractVector} = nothing,
	index_id = gensym("ChunkEmbeddingsIndex"),
	chunker::AbstractChunker = indexer.chunker,
	chunker_kwargs::NamedTuple = NamedTuple(),
	embedder::AbstractEmbedder = indexer.embedder,
	embedder_kwargs::NamedTuple = NamedTuple(),
	tagger::AbstractTagger = indexer.tagger,
	tagger_kwargs::NamedTuple = NamedTuple(),
	api_kwargs::NamedTuple = NamedTuple(),
	cost_tracker = Threads.Atomic{Float64}(0.0))

Build an INDEX for RAG (Retriever-Augmented Generation) applications from the provided file paths. INDEX is a object storing the document chunks and their embeddings (and potentially other information).

The function processes each file or document (depending on chunker), splits its content into chunks, embeds these chunks, optionally extracts metadata, and then combines this information into a retrievable index.

Define your own methods via indexer and its subcomponents (chunker, embedder, tagger).

Arguments

indexer::AbstractIndexBuilder: The indexing logic to use. Default is SimpleIndexer().
files_or_docs: A vector of valid file paths OR string documents to be indexed (chunked and embedded). Specify which mode to use via chunker.
verbose: An Integer specifying the verbosity of the logs. Default is 1 (high-level logging). 0 is disabled.
extras: An optional vector of extra information to be stored with each chunk. Default is nothing.
index_id: A unique identifier for the index. Default is a generated symbol.
chunker: The chunker logic to use for splitting the documents. Default is TextChunker().
chunker_kwargs: Parameters to be provided to the get_chunks function. Useful to change the separators or max_length.
- sources: A vector of strings indicating the source of each chunk. Default is equal to files_or_docs.
embedder: The embedder logic to use for embedding the chunks. Default is BatchEmbedder().
embedder_kwargs: Parameters to be provided to the get_embeddings function. Useful to change the target_batch_size_length or reduce asyncmap tasks ntasks.
- model: The model to use for embedding. Default is PT.MODEL_EMBEDDING.
tagger: The tagger logic to use for extracting tags from the chunks. Default is NoTagger(), ie, skip tag extraction. There are also PassthroughTagger and OpenTagger.
tagger_kwargs: Parameters to be provided to the get_tags function.
- model: The model to use for tags extraction. Default is PT.MODEL_CHAT.
- template: A template to be used for tags extraction. Default is :RAGExtractMetadataShort.
- tags: A vector of vectors of strings directly providing the tags for each chunk. Applicable for tagger::PasstroughTagger.
api_kwargs: Parameters to be provided to the API endpoint. Shared across all API calls if provided.
cost_tracker: A Threads.Atomic{Float64} object to track the total cost of the API calls. Useful to pass the total cost to the parent call.

Returns

ChunkEmbeddingsIndex: An object containing the compiled index of chunks, embeddings, tags, vocabulary, and sources.

See also: ChunkEmbeddingsIndex, get_chunks, get_embeddings, get_tags, CandidateChunks, find_closest, find_tags, rerank, retrieve, generate!, airag

Examples

# Default is loading a vector of strings and chunking them (`TextChunker()`)
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))

# Another example with tags extraction, splitting only sentences and verbose output
# Assuming `test_files` is a vector of file paths
indexer = SimpleIndexer(chunker=FileChunker(), tagger=OpenTagger())
index = build_index(indexer, test_files; 
		chunker_kwargs(; separators=[". "]), verbose=true)

Notes

If you get errors about exceeding embedding input sizes, first check the max_length in your chunks. If that does NOT resolve the issue, try changing the embedding_kwargs. In particular, reducing the target_batch_size_length parameter (eg, 10_000) and number of tasks ntasks=1. Some providers cannot handle large batch sizes (eg, Databricks).

source

RAGTools.build_qa_evals — Method

build_qa_evals(doc_chunks::Vector{<:AbstractString}, sources::Vector{<:AbstractString};
			   model=PT.MODEL_CHAT, instructions="None.", qa_template::Symbol=:RAGCreateQAFromContext, 
			   verbose::Bool=true, api_kwargs::NamedTuple = NamedTuple(), kwargs...) -> Vector{QAEvalItem}

Create a collection of question and answer evaluations (QAEvalItem) from document chunks and sources. This function generates Q&A pairs based on the provided document chunks, using a specified AI model and template.

Arguments

doc_chunks::Vector{<:AbstractString}: A vector of document chunks, each representing a segment of text.
sources::Vector{<:AbstractString}: A vector of source identifiers corresponding to each chunk in doc_chunks (eg, filenames or paths).
model: The AI model used for generating Q&A pairs. Default is PT.MODEL_CHAT.
instructions::String: Additional instructions or context to provide to the model generating QA sets. Defaults to "None.".
qa_template::Symbol: A template symbol that dictates the AITemplate that will be used. It must have placeholder context. Default is :CreateQAFromContext.
api_kwargs::NamedTuple: Parameters that will be forwarded to the API endpoint.
verbose::Bool: If true, additional information like costs will be logged. Defaults to true.

Returns

Vector{QAEvalItem}: A vector of QAEvalItem structs, each containing a source, context, question, and answer. Invalid or empty items are filtered out.

Notes

The function internally uses aiextract to generate Q&A pairs based on the provided qa_template. So you can use any kwargs that you want.
Each QAEvalItem includes the context (document chunk), the generated question and answer, and the source.
The function tracks and reports the cost of AI calls if verbose is enabled.
Items where the question, answer, or context is empty are considered invalid and are filtered out.

Examples

Creating Q&A evaluations from a set of document chunks:

doc_chunks = ["Text from document 1", "Text from document 2"]
sources = ["source1", "source2"]
qa_evals = build_qa_evals(doc_chunks, sources)

source

RAGTools.build_tags — Function

Builds a matrix of tags and a vocabulary list. REQUIRES SparseArrays, LinearAlgebra, Unicode packages to be loaded!!

source

RAGTools.build_tags — Method

build_tags(
	tagger::AbstractTagger, chunk_metadata::AbstractVector{
		<:AbstractVector{<:AbstractString},
	})

Builds a sparse matrix of tags and a vocabulary from the given vector of chunk metadata.

source

RAGTools.build_tags — Method

build_tags(tagger::AbstractTagger, chunk_tags::Nothing; kwargs...)

No-op that skips any tag building, returning nothing, nothing

Otherwise, it would build the sparse matrix and the vocabulary (requires SparseArrays and LinearAlgebra packages to be loaded).

source

RAGTools.chunkdata — Method

chunkdata(index::ChunkKeywordsIndex, chunk_idx::AbstractVector{<:Integer})

Access chunkdata for a subset of chunks.

Arguments

index::ChunkKeywordsIndex: the index to access
chunk_idx::AbstractVector{<:Integer}: the indices of the chunks to access

source

RAGTools.chunkdata — Method

Access chunkdata for a subset of chunks, chunk_idx is a vector of chunk indices in the index

source

RAGTools.cohere_api — Method

cohere_api(;
api_key::AbstractString,
endpoint::String,
url::AbstractString="https://api.cohere.ai/v1",
http_kwargs::NamedTuple=NamedTuple(),
kwargs...)

Lightweight wrapper around the Cohere API. See https://cohere.com/docs for more details.

Arguments

api_key: Your Cohere API key. You can get one from https://dashboard.cohere.com/welcome/register (trial access is for free).
endpoint: The Cohere endpoint to call.
url: The base URL for the Cohere API. Default is https://api.cohere.ai/v1.
http_kwargs: Any additional keyword arguments to pass to HTTP.post.
kwargs: Any additional keyword arguments to pass to the Cohere API.

source

RAGTools.create_permutation_instruction — Method

create_permutation_instruction(
	context::AbstractVector{<:AbstractString}; rank_start::Integer = 1,
	rank_end::Integer = 100, max_length::Integer = 512, template::Symbol = :RAGRankGPT)

Creates rendered template with injected context passages.

source

RAGTools.create_websearch — Method

create_websearch(query::AbstractString;
	api_key::AbstractString,
	search_depth::AbstractString = "basic")

Arguments

query::AbstractString: The query to search for.
api_key::AbstractString: The API key to use for the search. Get an API key from Tavily.
search_depth::AbstractString: The depth of the search. Can be either "basic" or "advanced". Default is "basic". Advanced search calls equal to 2 requests.
include_answer::Bool: Whether to include the answer in the search results. Default is false.
include_raw_content::Bool: Whether to include the raw content in the search results. Default is false.
max_results::Integer: The maximum number of results to return. Default is 5.
include_images::Bool: Whether to include images in the search results. Default is false.
include_domains::AbstractVector{<:AbstractString}: A list of domains to include in the search results. Default is an empty list.
exclude_domains::AbstractVector{<:AbstractString}: A list of domains to exclude from the search results. Default is an empty list.

Example

r = create_websearch("Who is King Charles?")

Even better, you can get not just the results but also the answer:

r = create_websearch("Who is King Charles?"; include_answer = true)

See Rest API documentation for more information.

source

RAGTools.doc_rel_length — Method

doc_rel_length(dtm::AbstractDocumentTermMatrix)

Get the document relative length vector of an AbstractDocumentTermMatrix.

source

RAGTools.document_term_matrix — Method

documenttermmatrix( documents::AbstractVector{<:AbstractVector{T}}; mintermfreq::Int = 1, max_terms::Int = typemax(Int)) where {T <: AbstractString}

Builds a sparse matrix of term frequencies and document lengths from the given vector of documents wrapped in type DocumentTermMatrix.

Expects a vector of preprocessed (tokenized) documents, where each document is a vector of strings (clean tokens).

Returns: DocumentTermMatrix

Arguments

documents: A vector of documents, where each document is a vector of terms (clean tokens).
min_term_freq: The minimum frequency a term must have to be included in the vocabulary, eg, min_term_freq = 2 means only terms that appear at least twice will be included.
max_terms: The maximum number of terms to include in the vocabulary, eg, max_terms = 100 means only the 100 most frequent terms will be included.

Example

documents = [["this", "is", "a", "test"], ["this", "is", "another", "test"], ["foo", "bar", "baz"]]
dtm = document_term_matrix(documents)

source

RAGTools.extract_ranking — Method

extract_ranking(str::AbstractString)

Extracts the ranking from the response into a sorted array of integers.

source

RAGTools.find_closest — Function

find_closest(
	finder::AbstractSimilarityFinder, index::AbstractChunkIndex,
	query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
	top_k::Int = 100, kwargs...)

Finds the indices of chunks (represented by embeddings in index) that are closest to query embedding (query_emb).

Returns only top_k closest indices.

source

RAGTools.find_closest — Function

find_closest(
	finder::BinaryCosineSimilarity, emb::AbstractMatrix{<:Bool},
	query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
	top_k::Int = 100, rescore_multiplier::Int = 4, minimum_similarity::AbstractFloat = -1.0, kwargs...)

Finds the indices of chunks (represented by embeddings in emb) that are closest to query embedding (query_emb) using binary embeddings (in the index).

This is a two-pass approach:

First pass: Hamming distance in binary form to get the top_k * rescore_multiplier (ie, more than top_k) candidates.
Second pass: Rescore the candidates with float embeddings and return the top_k.

Returns only top_k closest indices.

Reference: HuggingFace: Embedding Quantization.

Examples

Convert any Float embeddings to binary like this:

binary_emb = map(>(0), emb)

source

RAGTools.find_closest — Function

find_closest(
	finder::BM25Similarity, dtm::AbstractDocumentTermMatrix,
	query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
	top_k::Int = 100, minimum_similarity::AbstractFloat = -1.0, kwargs...)

Finds the indices of chunks (represented by DocumentTermMatrix in dtm) that are closest to query tokens (query_tokens) using BM25.

Reference: Wikipedia: BM25. Implementation follows: The Next Generation of Lucene Relevance.

source

RAGTools.find_closest — Function

find_closest(
	finder::BitPackedCosineSimilarity, emb::AbstractMatrix{<:Bool},
	query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
	top_k::Int = 100, rescore_multiplier::Int = 4, minimum_similarity::AbstractFloat = -1.0, kwargs...)

Finds the indices of chunks (represented by embeddings in emb) that are closest to query embedding (query_emb) using bit-packed binary embeddings (in the index).

This is a two-pass approach:

First pass: Hamming distance in bit-packed binary form to get the top_k * rescore_multiplier (i.e., more than top_k) candidates.
Second pass: Rescore the candidates with float embeddings and return the top_k.

Returns only top_k closest indices.

Reference: HuggingFace: Embedding Quantization.

Examples

Convert any Float embeddings to bit-packed binary like this:

bitpacked_emb = pack_bits(emb.>0)

source

RAGTools.find_closest — Function

find_closest(
	finder::CosineSimilarity, emb::AbstractMatrix{<:Real},
	query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
	top_k::Int = 100, minimum_similarity::AbstractFloat = -1.0, kwargs...)

Finds the indices of chunks (represented by embeddings in emb) that are closest (in cosine similarity for CosineSimilarity()) to query embedding (query_emb).

finder is the logic used for the similarity search. Default is CosineSimilarity.

If minimum_similarity is provided, only indices with similarity greater than or equal to it are returned. Similarity can be between -1 and 1 (-1 = completely opposite, 1 = exactly the same).

Returns only top_k closest indices.

source

RAGTools.find_tags — Method

find_tags(method::AnyTagFilter, index::AbstractChunkIndex,
	tag::Union{AbstractString, Regex}; kwargs...)

find_tags(method::AnyTagFilter, index::AbstractChunkIndex,
	tags::Vector{T}; kwargs...) where {T <: Union{AbstractString, Regex}}

Finds the indices of chunks (represented by tags in index) that have ANY OF the specified tag or tags.

source

RAGTools.find_tags — Method

find_tags(method::AllTagFilter, index::AbstractChunkIndex,
	tag::Union{AbstractString, Regex}; kwargs...)

find_tags(method::AllTagFilter, index::AbstractChunkIndex,
	tags::Vector{T}; kwargs...) where {T <: Union{AbstractString, Regex}}

Finds the indices of chunks (represented by tags in index) that have ALL OF the specified tag or tags.

source

RAGTools.find_tags — Method

find_tags(method::NoTagFilter, index::AbstractChunkIndex,
	tags::Union{T, AbstractVector{<:T}}; kwargs...) where {T <:
														   Union{
	AbstractString, Regex, Nothing}}
	tags; kwargs...)

Returns all chunks in the index, ie, no filtering, so we simply return nothing (easier for dispatch).

source

RAGTools.generate! — Method

generate!(
	generator::AbstractGenerator, index::AbstractDocumentIndex, result::AbstractRAGResult;
	verbose::Integer = 1,
	api_kwargs::NamedTuple = NamedTuple(),
	contexter::AbstractContextBuilder = generator.contexter,
	contexter_kwargs::NamedTuple = NamedTuple(),
	answerer::AbstractAnswerer = generator.answerer,
	answerer_kwargs::NamedTuple = NamedTuple(),
	refiner::AbstractRefiner = generator.refiner,
	refiner_kwargs::NamedTuple = NamedTuple(),
	postprocessor::AbstractPostprocessor = generator.postprocessor,
	postprocessor_kwargs::NamedTuple = NamedTuple(),
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...)

Generate the response using the provided generator and the index and result. It is the second step in the RAG pipeline (after retrieve)

Returns the mutated result with the result.final_answer and the full conversation saved in result.conversations[:final_answer].

Notes

The default flow is build_context! -> answer! -> refine! -> postprocess!.
contexter is the method to use for building the context, eg, simply enumerate the context chunks with ContextEnumerator.
answerer is the standard answer generation step with LLMs.
refiner step allows the LLM to critique itself and refine its own answer.
postprocessor step allows for additional processing of the answer, eg, logging, saving conversations, etc.
All of its sub-routines operate by mutating the result object (and adding their part).
Discover available sub-types for each step with subtypes(AbstractRefiner) and similar for other abstract types.

Arguments

generator::AbstractGenerator: The generator to use for generating the answer. Can be SimpleGenerator or AdvancedGenerator.
index::AbstractDocumentIndex: The index containing chunks and sources.
result::AbstractRAGResult: The result containing the context and question to generate the answer for.
verbose::Integer: If >0, enables verbose logging.
api_kwargs::NamedTuple: API parameters that will be forwarded to ALL of the API calls (aiembed, aigenerate, and aiextract).
contexter::AbstractContextBuilder: The method to use for building the context. Defaults to generator.contexter, eg, ContextEnumerator.
contexter_kwargs::NamedTuple: API parameters that will be forwarded to the contexter call.
answerer::AbstractAnswerer: The method to use for generating the answer. Defaults to generator.answerer, eg, SimpleAnswerer.
answerer_kwargs::NamedTuple: API parameters that will be forwarded to the answerer call. Examples:

- `model`: The model to use for generating the answer. Defaults to `PT.MODEL_CHAT`.
- `template`: The template to use for the `aigenerate` function. Defaults to `:RAGAnswerFromContext`.

refiner::AbstractRefiner: The method to use for refining the answer. Defaults to generator.refiner, eg, NoRefiner.
refiner_kwargs::NamedTuple: API parameters that will be forwarded to the refiner call.

- `model`: The model to use for generating the answer. Defaults to `PT.MODEL_CHAT`.
- `template`: The template to use for the `aigenerate` function. Defaults to `:RAGAnswerRefiner`.

postprocessor::AbstractPostprocessor: The method to use for postprocessing the answer. Defaults to generator.postprocessor, eg, NoPostprocessor.
postprocessor_kwargs::NamedTuple: API parameters that will be forwarded to the postprocessor call.
cost_tracker: An atomic counter to track the total cost of the operations.

See also: retrieve, build_context!, ContextEnumerator, answer!, SimpleAnswerer, refine!, NoRefiner, SimpleRefiner, postprocess!, NoPostprocessor

Examples

Assume we already have `index`

question = "What are the best practices for parallel computing in Julia?"

# Retrieve the relevant chunks - returns RAGResult
result = retrieve(index, question)

# Generate the answer using the default generator, mutates the same result
result = generate!(index, result)

source

RAGTools.get_chunks — Method

get_chunks(chunker::AbstractChunker,
	files_or_docs::Vector{<:AbstractString};
	sources::AbstractVector{<:AbstractString} = files_or_docs,
	verbose::Bool = true,
	separators = ["\n\n", ". ", "\n", " "], max_length::Int = 256)

Chunks the provided files_or_docs into chunks of maximum length max_length (if possible with provided separators).

Supports two modes of operation:

chunker = FileChunker(): The function opens each file in files_or_docs and reads its contents.
chunker = TextChunker(): The function assumes that files_or_docs is a vector of strings to be chunked, you MUST provide corresponding sources.

Arguments

files_or_docs: A vector of valid file paths OR string documents to be chunked.
separators: A list of strings used as separators for splitting the text in each file into chunks. Default is [\n\n", ". ", "\n", " "]. See recursive_splitter for more details.
max_length: The maximum length of each chunk (if possible with provided separators). Default is 256.
sources: A vector of strings indicating the source of each chunk. Default is equal to files_or_docs (for reader=:files)

source

RAGTools.get_embeddings — Method

get_embeddings(embedder::BatchEmbedder, docs::AbstractVector{<:AbstractString};
	verbose::Bool = true,
	model::AbstractString = PT.MODEL_EMBEDDING,
	truncate_dimension::Union{Int, Nothing} = nothing,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	target_batch_size_length::Int = 80_000,
	ntasks::Int = 4 * Threads.nthreads(),
	kwargs...)

Embeds a vector of docs using the provided model (kwarg model) in a batched manner - BatchEmbedder.

BatchEmbedder tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.

Notes

docs are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.
If you get errors about exceeding input sizes, first check the max_length in your chunks. If that does NOT resolve the issue, try reducing the target_batch_size_length parameter (eg, 10_000) and number of tasks ntasks=1. Some providers cannot handle large batch sizes.

Arguments

docs: A vector of strings to be embedded.
verbose: A boolean flag for verbose output. Default is true.
model: The model to use for embedding. Default is PT.MODEL_EMBEDDING.
truncate_dimension: The dimensionality of the embeddings to truncate to. Default is nothing, 0 will also do nothing.
cost_tracker: A Threads.Atomic{Float64} object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
target_batch_size_length: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.
ntasks: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().

source

RAGTools.get_embeddings — Method

get_embeddings(embedder::BinaryBatchEmbedder, docs::AbstractVector{<:AbstractString};
	verbose::Bool = true,
	model::AbstractString = PT.MODEL_EMBEDDING,
	truncate_dimension::Union{Int, Nothing} = nothing,
	return_type::Type = Matrix{Bool},
	cost_tracker = Threads.Atomic{Float64}(0.0),
	target_batch_size_length::Int = 80_000,
	ntasks::Int = 4 * Threads.nthreads(),
	kwargs...)

Embeds a vector of docs using the provided model (kwarg model) in a batched manner and then returns the binary embeddings matrix - BinaryBatchEmbedder.

BinaryBatchEmbedder tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.

Notes

docs are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.
If you get errors about exceeding input sizes, first check the max_length in your chunks. If that does NOT resolve the issue, try reducing the target_batch_size_length parameter (eg, 10_000) and number of tasks ntasks=1. Some providers cannot handle large batch sizes.

Arguments

docs: A vector of strings to be embedded.
verbose: A boolean flag for verbose output. Default is true.
model: The model to use for embedding. Default is PT.MODEL_EMBEDDING.
truncate_dimension: The dimensionality of the embeddings to truncate to. Default is nothing.
return_type: The type of the returned embeddings matrix. Default is Matrix{Bool}. Choose BitMatrix to minimize storage requirements, Matrix{Bool} to maximize performance in elementwise-ops.
cost_tracker: A Threads.Atomic{Float64} object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
target_batch_size_length: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.
ntasks: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().

source

RAGTools.get_embeddings — Method

get_embeddings(embedder::BitPackedBatchEmbedder, docs::AbstractVector{<:AbstractString};
	verbose::Bool = true,
	model::AbstractString = PT.MODEL_EMBEDDING,
	truncate_dimension::Union{Int, Nothing} = nothing,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	target_batch_size_length::Int = 80_000,
	ntasks::Int = 4 * Threads.nthreads(),
	kwargs...)

Embeds a vector of docs using the provided model (kwarg model) in a batched manner and then returns the binary embeddings matrix represented in UInt64 (bit-packed) - BitPackedBatchEmbedder.

BitPackedBatchEmbedder tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.

The best option for FAST and MEMORY-EFFICIENT storage of embeddings, for retrieval use BitPackedCosineSimilarity.

Notes

docs are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.
If you get errors about exceeding input sizes, first check the max_length in your chunks. If that does NOT resolve the issue, try reducing the target_batch_size_length parameter (eg, 10_000) and number of tasks ntasks=1. Some providers cannot handle large batch sizes.

Arguments

docs: A vector of strings to be embedded.
verbose: A boolean flag for verbose output. Default is true.
model: The model to use for embedding. Default is PT.MODEL_EMBEDDING.
truncate_dimension: The dimensionality of the embeddings to truncate to. Default is nothing.
cost_tracker: A Threads.Atomic{Float64} object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
target_batch_size_length: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.
ntasks: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().

See also: unpack_bits, pack_bits, BitPackedCosineSimilarity.

source

RAGTools.get_keywords — Method

get_keywords(
	processor::KeywordsProcessor, docs::AbstractVector{<:AbstractString};
	verbose::Bool = true,
	stemmer = nothing,
	stopwords::Set{String} = Set(STOPWORDS),
	return_keywords::Bool = false,
	min_length::Integer = 3,
	min_term_freq::Int = 1, max_terms::Int = typemax(Int),
	kwargs...)

Generate a DocumentTermMatrix from a vector of docs using the provided stemmer and stopwords.

Arguments

docs: A vector of strings to be embedded.
verbose: A boolean flag for verbose output. Default is true.
stemmer: A stemmer to use for stemming. Default is nothing.
stopwords: A set of stopwords to remove. Default is Set(STOPWORDS).
return_keywords: A boolean flag for returning the keywords. Default is false. Useful for query processing in search time.
min_length: The minimum length of the keywords. Default is 3.
min_term_freq: The minimum frequency a term must have to be included in the vocabulary, eg, min_term_freq = 2 means only terms that appear at least twice will be included.
max_terms: The maximum number of terms to include in the vocabulary, eg, max_terms = 100 means only the 100 most frequent terms will be included.

source

RAGTools.get_tags — Method

get_tags(tagger::NoTagger, docs::AbstractVector{<:AbstractString};
	kwargs...)

Simple no-op that skips any tagging of the documents

source

RAGTools.get_tags — Method

get_tags(tagger::OpenTagger, docs::AbstractVector{<:AbstractString};
	verbose::Bool = true,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...)

Extracts "tags" (metadata/keywords) from a vector of docs using the provided model (kwarg model).

Arguments

docs: A vector of strings to be embedded.
verbose: A boolean flag for verbose output. Default is true.
model: The model to use for tags extraction. Default is PT.MODEL_CHAT.
template: A template to be used for tags extraction. Default is :RAGExtractMetadataShort.
cost_tracker: A Threads.Atomic{Float64} object to track the total cost of the API calls. Useful to pass the total cost to the parent call.

source

RAGTools.get_tags — Method

get_tags(tagger::PassthroughTagger, docs::AbstractVector{<:AbstractString};
	tags::AbstractVector{<:AbstractVector{<:AbstractString}},
	kwargs...)

Pass tags directly as Vector of Vectors of strings (ie, tags[i] is the tags for docs[i]). It then builds the vocabulary from the tags and returns both the tags in matrix form and the vocabulary.

source

RAGTools.getpropertynested — Function

getpropertynested(
	nt::NamedTuple, parent_keys::Vector{Symbol}, key::Symbol, default = nothing)

Get a property key from a nested NamedTuple nt, where the property is nested to a key in parent_keys.

Useful for nested kwargs where we want to get some property in parent_keys subset (eg, model in retriever_kwargs).

Examples

kw = (; abc = (; def = "x"))
getpropertynested(kw, [:abc], :def)
# Output: "x"

source

RAGTools.hamming_distance — Method

hamming_distance(
	mat::AbstractMatrix{T}, query::AbstractVector{T})::Vector{Int} where {T <: Integer}

Calculates the column-wise Hamming distance between a matrix of binary vectors mat and a single binary vector vect.

This is the first-pass ranking for BinaryCosineSimilarity method.

Implementation from domluna's tinyRAG.

source

RAGTools.hcat_truncate — Method

hcat_truncate(
	matrices::AbstractVector{<:AbstractMatrix{T}},
	truncate_dimension::Union{Nothing, Int} = nothing;
	verbose::Bool = false,
) where {T <: Real}

Horizontal concatenation of matrices, with optional truncation of the rows of each matrix to the specified dimension (reducing embedding dimensionality).

More efficient that a simple splatting, as the resulting matrix is pre-allocated in one go.

Returns: a Matrix{Float32}

Arguments

matrices::AbstractVector{<:AbstractMatrix{T}}: Vector of matrices to concatenate
truncate_dimension::Union{Nothing,Int}=nothing: Dimension to truncate to, or nothing or 0 to skip truncation. If truncated, the columns will be normalized.
verbose::Bool=false: Whether to print verbose output.

Examples

a = rand(Float32, 1000, 10)
b = rand(Float32, 1000, 20)

c = hcat_truncate([a, b])
size(c) # (1000, 30)

d = hcat_truncate([a, b], 500)
size(d) # (500, 30)

source

RAGTools.idf — Method

idf(dtm::AbstractDocumentTermMatrix)

Get the inverse document frequency vector of an AbstractDocumentTermMatrix.

source

RAGTools.load_text — Method

load_text(chunker::AbstractChunker, input;
	kwargs...)

Load text from input using the provided chunker. Called by get_chunks.

Available chunkers:

FileChunker: The function opens each file in input and reads its contents.
TextChunker: The function assumes that input is a vector of strings to be chunked, you MUST provide corresponding sources.

source

RAGTools.max_bm25_score — Method

maxbm25score( dtm::AbstractDocumentTermMatrix, querytokens::AbstractVector{<:AbstractString}; k1::Float32 = 1.2f0, b::Float32 = 0.75f0, maxtf::Real = 3, mindocrel_length::Float32 = 0.5f0)

Returns the maximum BM25 score that can be achieved for a given query (assuming the max_tf matches and the min_doc_rel_length being the smallest document relative length). Good for normalizing BM25 scores.

Example

max_score = max_bm25_score(chunkdata(key_index), query_tokens)

source

RAGTools.merge_kwargs_nested — Method

merge_kwargs_nested(nt1::NamedTuple, nt2::NamedTuple)

Merges two nested NamedTuples nt1 and nt2 recursively. The nt2 values will overwrite the nt1 values when overlapping.

Example

kw = (; abc = (; def = "x"))
kw2 = (; abc = (; def = "x", def2 = 2), new = 1)
merge_kwargs_nested(kw, kw2)

source

RAGTools.pack_bits — Method

pack_bits(arr::AbstractMatrix{<:Bool}) -> Matrix{UInt64}
pack_bits(vect::AbstractVector{<:Bool}) -> Vector{UInt64}

Pack a matrix or vector of boolean values into a more compact representation using UInt64.

Arguments (Input)

arr::AbstractMatrix{<:Bool}: A matrix of boolean values where the number of rows must be divisible by 64.

Returns

For arr::AbstractMatrix{<:Bool}: Returns a matrix of UInt64 where each element represents 64 boolean values from the original matrix.

Examples

For vectors:

bin = rand(Bool, 128)
binint = pack_bits(bin)
binx = unpack_bits(binint)
@assert bin == binx

For matrices:

bin = rand(Bool, 128, 10)
binint = pack_bits(bin)
binx = unpack_bits(binint)
@assert bin == binx

source

RAGTools.permutation_step! — Method

permutation_step!(
	result::RankGPTResult; rank_start::Integer = 1, rank_end::Integer = 100, kwargs...)

One sub-step of the RankGPT algorithm permutation ranking within the window of chunks defined by rank_start and rank_end positions.

source

RAGTools.preprocess_tokens — Function

preprocess_tokens(text::AbstractString, stemmer=nothing; stopwords::Union{Nothing,Set{String}}=nothing, min_length::Int=3)

Preprocess provided text by removing numbers, punctuation, and applying stemming for BM25 search index.

Returns a list of preprocessed tokens.

Example

stemmer = Snowball.Stemmer("english")
stopwords = Set(["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "some", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"])
text = "This is a sample paragraph to test the functionality of your text preprocessor. It contains a mix of uppercase and lowercase letters, as well as punctuation marks such as commas, periods, and exclamation points! Let's see how your preprocessor handles quotes, like "this one", and also apostrophes, like in don't. Will it preserve the formatting of this paragraph, including the indentation and line breaks?"
preprocess_tokens(text, stemmer; stopwords)

source

RAGTools.print_html — Method

print_html([io::IO,] parent_node::AbstractAnnotatedNode)

print_html([io::IO,] rag::AbstractRAGResult; add_sources::Bool = false,
	add_scores::Bool = false, default_styler = HTMLStyler(),
	low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
	medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
	high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)

Pretty-prints the annotation parent_node (or RAGResult) to the io stream (or returns the string) in HTML format (assumes node is styled with styler HTMLStyler).

It wraps each "token" into a span with requested styling (HTMLStyler's properties classes and styles). It also replaces new lines with <br> for better HTML formatting.

For any non-HTML styler, it prints the content as plain text.

Returns

nothing if io is provided
or the string with HTML-formatted text (if io is not provided, we print the result out)

See also HTMLStyler, annotate_support, and set_node_style! for how the styling is applied and what the arguments mean.

Examples

Note: RT is an alias for RAGTools

Simple start directly with the RAGResult:

# set up the text/RAGResult
context = [
	"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."
rag = RT.RAGResult(; context, final_answer=answer, question="")

# print the HTML
print_html(rag)

Low-level control by creating our AnnotatedNode:

# prepare your HTML styling
styler_kwargs = (;
	default_styler=RT.HTMLStyler(),
	low_styler=RT.HTMLStyler(styles="color:magenta", classes=""),
	medium_styler=RT.HTMLStyler(styles="color:blue", classes=""),
	high_styler=RT.HTMLStyler(styles="", classes=""))

# annotate the text
context = [
	"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."

parent_node = RT.annotate_support(
	RT.TrigramAnnotater(), answer, context; add_sources=false, add_scores=false, styler_kwargs...)

# print the HTML
print_html(parent_node)

# or to accumulate more nodes
io = IOBuffer()
print_html(io, parent_node)

source

RAGTools.rank_gpt — Method

rank_gpt(chunks::AbstractVector{<:AbstractString}, question::AbstractString;
	verbose::Int = 1, rank_start::Integer = 1, rank_end::Integer = 100,
	window_size::Integer = 20, step::Integer = 10,
	num_rounds::Integer = 1, model::String = "gpt4o", kwargs...)

Ranks the chunks based on their relevance for question. Returns the ranking permutation of the chunks in the order they are most relevant to the question (the first is the most relevant).

Example

result = rank_gpt(chunks, question; rank_start=1, rank_end=25, window_size=8, step=4, num_rounds=3, model="gpt4o")

Reference

[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github

source

RAGTools.rank_sliding_window! — Method

rank_sliding_window!(
	result::RankGPTResult; verbose::Int = 1, rank_start = 1, rank_end = 100,
	window_size = 20, step = 10, model::String = "gpt4o", kwargs...)

One single pass of the RankGPT algorithm permutation ranking across all positions between rank_start and rank_end.

source

RAGTools.receive_permutation! — Method

receive_permutation!(
	curr_rank::AbstractVector{<:Integer}, response::AbstractString;
	rank_start::Integer = 1, rank_end::Integer = 100)

Extracts and heals the permutation to contain all ranking positions.

source

RAGTools.reciprocal_rank_fusion — Method

reciprocal_rank_fusion(args...; k::Int=60)

Merges multiple rankings and calculates the reciprocal rank score for each chunk (discounted by the inverse of the rank).

Example

positions1 = [1, 3, 5, 7, 9]
positions2 = [2, 4, 6, 8, 10]
positions3 = [2, 4, 6, 11, 12]

merged_positions, scores = reciprocal_rank_fusion(positions1, positions2, positions3)

source

RAGTools.reciprocal_rank_fusion — Method

reciprocal_rank_fusion(mcc::MultiCandidateChunks; k::Int = 60)

Calculates joint ranking via the reciprocal rank fusion algorithm. Utility wrapper for hybrid MultiIndex that wraps embeddings and keywords for the SAME CHUNKS!!

!! It only works for two indices with the exact same chunks and chunk positions

Example

# start with document positions and scores from two indices
positions1 = [1, 3, 5, 7, 9]
scores1 = [0.9, 0.8, 0.7, 0.6, 0.5]
positions2 = [2, 4, 6, 8, 10]
scores2 = [0.5, 0.6, 0.7, 0.8, 0.9]

# mimic the MultiCandidateChunks struct as if it came from :index1a and :index1b
mcc = RT.MultiCandidateChunks(
	vcat(fill(:index1a, length(positions1)), fill(:index1b, length(positions2))),
	vcat(positions1, positions2),
	vcat(scores1, scores2))

mcc_merged = reciprocal_rank_fusion(mcc; k = 60)

source

RAGTools.reciprocal_rank_fusion — Method

reciprocal_rank_fusion(
	positions1::AbstractVector{<:Integer}, scores1::AbstractVector{<:T},
	positions2::AbstractVector{<:Integer},
	scores2::AbstractVector{<:T}; k::Int = 60) where {T <: Real}

Merges two sets of rankings and their joint scores. Calculates the reciprocal rank score for each chunk (discounted by the inverse of the rank).

Example

positions1 = [1, 3, 5, 7, 9]
scores1 = [0.9, 0.8, 0.7, 0.6, 0.5]
positions2 = [2, 4, 6, 8, 10]
scores2 = [0.5, 0.6, 0.7, 0.8, 0.9]

merged_pos, scores_dict = reciprocal_rank_fusion(positions1, scores1, positions2, scores2; k = 60)

# Create a CandidateChunks from the merged positions and scores
cc = CandidateChunks(:my_index, merged_pos, [scores_dict[pos] for pos in merged_pos])

source

RAGTools.refine! — Method

refine!(
	refiner::NoRefiner, index::AbstractChunkIndex, result::AbstractRAGResult;
	kwargs...)

Simple no-op function for refine!. It simply copies the result.answer and result.conversations[:answer] without any changes.

source

RAGTools.refine! — Method

refine!(
	refiner::SimpleRefiner, index::AbstractDocumentIndex, result::AbstractRAGResult;
	verbose::Bool = true,
	model::AbstractString = PT.MODEL_CHAT,
	template::Symbol = :RAGAnswerRefiner,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...)

Give model a chance to refine the answer (using the same or different context than previously provided).

This method uses the same context as the original answer, however, it can be modified to do additional retrieval and use a different context.

Returns

Mutated result with result.final_answer and the full conversation saved in result.conversations[:final_answer]

Arguments

refiner::SimpleRefiner: The method to use for refining the answer. Uses aigenerate.
index::AbstractDocumentIndex: The index containing chunks and sources.
result::AbstractRAGResult: The result containing the context and question to generate the answer for.
model::AbstractString: The model to use for generating the answer. Defaults to PT.MODEL_CHAT.
verbose::Bool: If true, enables verbose logging.
template::Symbol: The template to use for the aigenerate function. Defaults to :RAGAnswerRefiner.
cost_tracker: An atomic counter to track the cost of the operation.

source

RAGTools.refine! — Method

refine!(
	refiner::TavilySearchRefiner, index::AbstractDocumentIndex, result::AbstractRAGResult;
	verbose::Bool = true,
	model::AbstractString = PT.MODEL_CHAT,
	include_answer::Bool = true,
	max_results::Integer = 5,
	include_domains::AbstractVector{<:AbstractString} = String[],
	exclude_domains::AbstractVector{<:AbstractString} = String[],
	template::Symbol = :RAGWebSearchRefiner,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...)

Refines the answer by executing a web search using the Tavily API. This method aims to enhance the answer's accuracy and relevance by incorporating information retrieved from the web.

Note: The web results and web answer (if requested) will be added to the context and sources!

Returns

Mutated result with result.final_answer and the full conversation saved in result.conversations[:final_answer].
In addition, the web results and web answer (if requested) are appended to the result.context and result.sources for correct highlighting and verification.

Arguments

refiner::TavilySearchRefiner: The method to use for refining the answer. Uses aigenerate with a web search template.
index::AbstractDocumentIndex: The index containing chunks and sources.
result::AbstractRAGResult: The result containing the context and question to generate the answer for.
model::AbstractString: The model to use for generating the answer. Defaults to PT.MODEL_CHAT.
include_answer::Bool: If true, includes the answer from Tavily in the web search.
max_results::Integer: The maximum number of results to return.
include_domains::AbstractVector{<:AbstractString}: A list of domains to include in the search results. Default is an empty list.
exclude_domains::AbstractVector{<:AbstractString}: A list of domains to exclude from the search results. Default is an empty list.
verbose::Bool: If true, enables verbose logging.
template::Symbol: The template to use for the aigenerate function. Defaults to :RAGWebSearchRefiner.
cost_tracker: An atomic counter to track the cost of the operation.

Example

refiner!(TavilySearchRefiner(), index, result)
# See result.final_answer or pprint(result)

To enable this refiner in a full RAG pipeline, simply swap the component in the config:

cfg = RT.RAGConfig()
cfg.generator.refiner = RT.TavilySearchRefiner()

result = airag(cfg, index; question, return_all = true)
pprint(result)

source

RAGTools.rephrase — Method

rephrase(rephraser::SimpleRephraser, question::AbstractString;
	verbose::Bool = true,
	model::String = PT.MODEL_CHAT, template::Symbol = :RAGQueryHyDE,
	cost_tracker = Threads.Atomic{Float64}(0.0))

Rephrases the question using the provided rephraser template = RAGQueryHyDE.

Special flavor of rephrasing using HyDE (Hypothetical Document Embedding) method, which aims to find the documents most similar to a synthetic passage that would be a good answer to our question.

Returns both the original and the rephrased question.

Arguments

rephraser: Type that dictates the logic of rephrasing step.
question: The question to be rephrased.
model: The model to use for rephrasing. Default is PT.MODEL_CHAT.
template: The rephrasing template to use. Default is :RAGQueryHyDE. Find more with aitemplates("rephrase").
verbose: A boolean flag indicating whether to print verbose logging. Default is true.

source

RAGTools.rephrase — Method

rephrase(rephraser::NoRephraser, question::AbstractString; kwargs...)

No-op, simple passthrough.

source

RAGTools.rephrase — Method

rephrase(rephraser::SimpleRephraser, question::AbstractString;
	verbose::Bool = true,
	model::String = PT.MODEL_CHAT, template::Symbol = :RAGQueryOptimizer,
	cost_tracker = Threads.Atomic{Float64}(0.0), kwargs...)

Rephrases the question using the provided rephraser template.

Returns both the original and the rephrased question.

Arguments

rephraser: Type that dictates the logic of rephrasing step.
question: The question to be rephrased.
model: The model to use for rephrasing. Default is PT.MODEL_CHAT.
template: The rephrasing template to use. Default is :RAGQueryOptimizer. Find more with aitemplates("rephrase").
verbose: A boolean flag indicating whether to print verbose logging. Default is true.

source

RAGTools.rerank — Method

rerank(
	reranker::CohereReranker, 
	index::AbstractDocumentIndex, 
	question::AbstractString,
	candidates::AbstractCandidateChunks;
	verbose::Bool = false,
	api_key::AbstractString = PT.COHERE_API_KEY,
	top_n::Integer = length(candidates.scores),
	model::AbstractString = "rerank-english-v3.0",
	return_documents::Bool = false,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...
)

Re-ranks a list of candidate chunks using the Cohere Rerank API. See https://cohere.com/rerank for more details.

Arguments

reranker: Using Cohere API
index: The index that holds the underlying chunks to be re-ranked.
question: The query to be used for the search.
candidates: The candidate chunks to be re-ranked.
top_n: The number of most relevant documents to return. Default is length(documents).
model: The model to use for reranking. Default is rerank-english-v3.0.
return_documents: A boolean flag indicating whether to return the reranked documents in the response. Default is false.
verbose: A boolean flag indicating whether to print verbose logging. Default is false.
cost_tracker: An atomic counter to track the cost of the retrieval. Not implemented /tracked (cost unclear). Provided for consistency.

source

RAGTools.rerank — Method

rerank(
	reranker::RankGPTReranker, 
	index::AbstractDocumentIndex, 
	question::AbstractString,
	candidates::AbstractCandidateChunks;
	api_key::AbstractString = PT.OPENAI_API_KEY,
	model::AbstractString = PT.MODEL_CHAT,
	verbose::Bool = false,
	top_n::Integer = length(candidates.scores),
	unique_chunks::Bool = true,
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...
)

Re-ranks a list of candidate chunks using the RankGPT algorithm. See https://github.com/sunnweiwei/RankGPT for more details.

It uses LLM calls to rank the candidate chunks.

Arguments

reranker: Using Cohere API
index: The index that holds the underlying chunks to be re-ranked.
question: The query to be used for the search.
candidates: The candidate chunks to be re-ranked.
top_n: The number of most relevant documents to return. Default is length(documents).
model: The model to use for reranking. Default is rerank-english-v3.0.
verbose: A boolean flag indicating whether to print verbose logging. Default is 1.
unique_chunks: A boolean flag indicating whether to remove duplicates from the candidate chunks prior to reranking (saves compute time). Default is true.

Examples

index = <some index>
question = "What are the best practices for parallel computing in Julia?"

cfg = RAGConfig(; retriever = SimpleRetriever(; reranker = RT.RankGPTReranker()))
msg = airag(cfg, index; question, return_all = true)

To get full verbosity of logs, set verbose = 5 (anything higher than 3).

msg = airag(cfg, index; question, return_all = true, verbose = 5)

Reference

[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github

source

RAGTools.retrieve — Method

retrieve(
	retriever::AbstractRetriever,
	index::AbstractChunkIndex,
	question::AbstractString;
	verbose::Integer = 1,
	top_k::Integer = 100,
	top_n::Integer = 5,
	api_kwargs::NamedTuple = NamedTuple(),
	rephraser::AbstractRephraser = retriever.rephraser,
	rephraser_kwargs::NamedTuple = NamedTuple(),
	embedder::AbstractEmbedder = retriever.embedder,
	embedder_kwargs::NamedTuple = NamedTuple(),
	processor::AbstractProcessor = retriever.processor,
	processor_kwargs::NamedTuple = NamedTuple(),
	finder::AbstractSimilarityFinder = retriever.finder,
	finder_kwargs::NamedTuple = NamedTuple(),
	tagger::AbstractTagger = retriever.tagger,
	tagger_kwargs::NamedTuple = NamedTuple(),
	filter::AbstractTagFilter = retriever.filter,
	filter_kwargs::NamedTuple = NamedTuple(),
	reranker::AbstractReranker = retriever.reranker,
	reranker_kwargs::NamedTuple = NamedTuple(),
	cost_tracker = Threads.Atomic{Float64}(0.0),
	kwargs...,
)

Retrieves the most relevant chunks from the index for the given question and returns them in the RAGResult object.

This is the main entry point for the retrieval stage of the RAG pipeline. It is often followed by generate! step.

Notes:

The default flow is build_context! -> answer! -> refine! -> postprocess!.

The arguments correspond to the steps of the retrieval process (rephrasing, embedding, finding similar docs, tagging, filtering by tags, reranking). You can customize each step by providing a new custom type that dispatches the corresponding function, eg, create your own type struct MyReranker<:AbstractReranker end and define the custom method for it rerank(::MyReranker,...) = ....

Note: Discover available retrieval sub-types for each step with subtypes(AbstractRephraser) and similar for other abstract types.

If you're using locally-hosted models, you can pass the api_kwargs with the url field set to the model's URL and make sure to provide corresponding model kwargs to rephraser, embedder, and tagger to use the custom models (they make AI calls).

Arguments

retriever: The retrieval method to use. Default is SimpleRetriever but could be AdvancedRetriever for more advanced retrieval.
index: The index that holds the chunks and sources to be retrieved from.
question: The question to be used for the retrieval.
verbose: If >0, it prints out verbose logging. Default is 1. If you set it to 2, it will print out logs for each sub-function.
top_k: The TOTAL number of closest chunks to return from find_closest. Default is 100. If there are multiple rephrased questions, the number of chunks per each item will be top_k ÷ number_of_rephrased_questions.
top_n: The TOTAL number of most relevant chunks to return for the context (from rerank step). Default is 5.
api_kwargs: Additional keyword arguments to be passed to the API calls (shared by all ai* calls).
rephraser: Transform the question into one or more questions. Default is retriever.rephraser.
rephraser_kwargs: Additional keyword arguments to be passed to the rephraser.

- `model`: The model to use for rephrasing. Default is `PT.MODEL_CHAT`.
- `template`: The rephrasing template to use. Default is `:RAGQueryOptimizer` or `:RAGQueryHyDE` (depending on the `rephraser` selected).

embedder: The embedding method to use. Default is retriever.embedder.
embedder_kwargs: Additional keyword arguments to be passed to the embedder.
processor: The processor method to use when using Keyword-based index. Default is retriever.processor.
processor_kwargs: Additional keyword arguments to be passed to the processor.
finder: The similarity search method to use. Default is retriever.finder, often CosineSimilarity.
finder_kwargs: Additional keyword arguments to be passed to the similarity finder.
tagger: The tag generating method to use. Default is retriever.tagger.
tagger_kwargs: Additional keyword arguments to be passed to the tagger. Noteworthy arguments:

- `tags`: Directly provide the tags to use for filtering (can be String, Regex, or Vector{String}). Useful for `tagger = PassthroughTagger`.

filter: The tag matching method to use. Default is retriever.filter.
filter_kwargs: Additional keyword arguments to be passed to the tag filter.
reranker: The reranking method to use. Default is retriever.reranker.
reranker_kwargs: Additional keyword arguments to be passed to the reranker.

- `model`: The model to use for reranking. Default is `rerank-english-v2.0` if you use `reranker = CohereReranker()`.

cost_tracker: An atomic counter to track the cost of the retrieval. Default is Threads.Atomic{Float64}(0.0).

See also: SimpleRetriever, AdvancedRetriever, build_index, rephrase, get_embeddings, get_keywords, find_closest, get_tags, find_tags, rerank, RAGResult.

Examples

Find the 5 most relevant chunks from the index for the given question.

# assumes you have an existing index `index`
retriever = SimpleRetriever()

result = retrieve(retriever,
	index,
	"What is the capital of France?",
	top_n = 5)

# or use the default retriever (same as above)
result = retrieve(retriever,
	index,
	"What is the capital of France?",
	top_n = 5)

Apply more advanced retrieval with question rephrasing and reranking (requires COHERE_API_KEY). We will obtain top 100 chunks from embeddings (top_k) and top 5 chunks from reranking (top_n).

retriever = AdvancedRetriever()

result = retrieve(retriever, index, question; top_k=100, top_n=5)

You can use the retriever to customize your retrieval strategy or directly change the strategy types in the retrieve kwargs!

Example of using locally-hosted model hosted on localhost:8080:

retriever = SimpleRetriever()
result = retrieve(retriever, index, question;
	rephraser_kwargs = (; model = "custom"),
	embedder_kwargs = (; model = "custom"),
	tagger_kwargs = (; model = "custom"), api_kwargs = (;
		url = "http://localhost:8080"))

source

RAGTools.run_qa_evals — Method

run_qa_evals(index::AbstractChunkIndex, qa_items::AbstractVector{<:QAEvalItem};
	api_kwargs::NamedTuple = NamedTuple(),
	airag_kwargs::NamedTuple = NamedTuple(),
	qa_evals_kwargs::NamedTuple = NamedTuple(),
	verbose::Bool = true, parameters_dict::Dict{Symbol, <:Any} = Dict{Symbol, Any}())

Evaluates a vector of QAEvalItems and returns a vector QAEvalResult. This function assesses the relevance and accuracy of the answers generated in a QA evaluation context.

See ?run_qa_evals for more details.

Arguments

qa_items::AbstractVector{<:QAEvalItem}: The vector of QA evaluation items containing the questions and their answers.
verbose::Bool: If true, enables verbose logging. Defaults to true.
api_kwargs::NamedTuple: Parameters that will be forwarded to the API calls. See ?aiextract for details.
airag_kwargs::NamedTuple: Parameters that will be forwarded to airag calls. See ?airag for details.
qa_evals_kwargs::NamedTuple: Parameters that will be forwarded to run_qa_evals calls. See ?run_qa_evals for details.
parameters_dict::Dict{Symbol, Any}: Track any parameters used for later evaluations. Keys must be Symbols.

Returns

Vector{QAEvalResult}: Vector of evaluation results that includes various scores and metadata related to the QA evaluation.

Example

index = "..." # Assuming a proper index is defined
qa_items = [QAEvalItem(question="What is the capital of France?", answer="Paris", context="France is a country in Europe."),
			QAEvalItem(question="What is the capital of Germany?", answer="Berlin", context="Germany is a country in Europe.")]

# Let's run a test with `top_k=5`
results = run_qa_evals(index, qa_items; airag_kwargs=(;top_k=5), parameters_dict=Dict(:top_k => 5))

# Filter out the "failed" calls
results = filter(x->!isnothing(x.answer_score), results);

# See average judge score
mean(x->x.answer_score, results)

source

RAGTools.run_qa_evals — Method

run_qa_evals(qa_item::QAEvalItem, ctx::RAGResult; verbose::Bool = true,
			 parameters_dict::Dict{Symbol, <:Any}, judge_template::Symbol = :RAGJudgeAnswerFromContext,
			 model_judge::AbstractString, api_kwargs::NamedTuple = NamedTuple()) -> QAEvalResult

Evaluates a single QAEvalItem using RAG details (RAGResult) and returns a QAEvalResult structure. This function assesses the relevance and accuracy of the answers generated in a QA evaluation context.

Arguments

qa_item::QAEvalItem: The QA evaluation item containing the question and its answer.
ctx::RAGResult: The RAG result used for generating the QA pair, including the original context and the answers. Comes from airag(...; return_context=true)
verbose::Bool: If true, enables verbose logging. Defaults to true.
parameters_dict::Dict{Symbol, Any}: Track any parameters used for later evaluations. Keys must be Symbols.
judge_template::Symbol: The template symbol for the AI model used to judge the answer. Defaults to :RAGJudgeAnswerFromContext.
model_judge::AbstractString: The AI model used for judging the answer's quality. Defaults to standard chat model, but it is advisable to use more powerful model GPT-4.
api_kwargs::NamedTuple: Parameters that will be forwarded to the API endpoint.

Returns

QAEvalResult: An evaluation result that includes various scores and metadata related to the QA evaluation.

Notes

The function computes a retrieval score and rank based on how well the context matches the QA context.
It then uses the judge_template and model_judge to score the answer's accuracy and relevance.
In case of errors during evaluation, the function logs a warning (if verbose is true) and the answer_score will be set to nothing.

Examples

Evaluating a QA pair using a specific context and model:

qa_item = QAEvalItem(question="What is the capital of France?", answer="Paris", context="France is a country in Europe.")
ctx = RAGResult(source="Wikipedia", context="France is a country in Europe.", answer="Paris")
parameters_dict = Dict("param1" => "value1", "param2" => "value2")

eval_result = run_qa_evals(qa_item, ctx, parameters_dict=parameters_dict, model_judge="MyAIJudgeModel")

source

RAGTools.score_retrieval_hit — Method

Returns 1.0 if context overlaps or is contained within any of the candidate_context

source

RAGTools.score_retrieval_rank — Method

Returns Integer rank of the position where context overlaps or is contained within a candidate_context

source

RAGTools.score_to_unit_scale — Method

score_to_unit_scale(x::AbstractVector{T}) where T<:Real

Shift and scale a vector of scores to the unit scale [0, 1].

Example

x = [1.0, 2.0, 3.0, 4.0, 5.0]
scaled_x = score_to_unit_scale(x)

source

RAGTools.set_node_style! — Method

set_node_style!(::TrigramAnnotater, node::AnnotatedNode;
	low_threshold::Float64 = 0.0, medium_threshold::Float64 = 0.5, high_threshold::Float64 = 1.0,
	default_styler::AbstractAnnotationStyler = Styler(),
	low_styler::AbstractAnnotationStyler = Styler(color = :magenta, bold = false),
	medium_styler::AbstractAnnotationStyler = Styler(color = :blue, bold = false),
	high_styler::AbstractAnnotationStyler = Styler(color = :nothing, bold = false),
	bold_multihits::Bool = false)

Sets style of node based on the provided rules

source

RAGTools.setpropertynested — Method

setpropertynested(nt::NamedTuple, parent_keys::Vector{Symbol},
	key::Symbol,
	value

)

Setter for a property key in a nested NamedTuple nt, where the property is nested to a key in parent_keys.

Useful for nested kwargs where we want to change some property in parent_keys subset (eg, model in retriever_kwargs).

Examples

kw = (; abc = (; def = "x"))
setpropertynested(kw, [:abc], :def, "y")
# Output: (abc = (def = "y",),)

Practical example of changing all model keys in CHAT-based steps in the pipeline:

# changes :model to "gpt4t" whenever the parent key is in the below list (chat-based steps)
setpropertynested(kwargs,
	[:rephraser_kwargs, :tagger_kwargs, :answerer_kwargs, :refiner_kwargs],
	:model, "gpt4t")

Or changing an embedding model (across both indexer and retriever steps, because it's same step name):

kwargs = setpropertynested(
		kwargs, [:embedder_kwargs],
		:model, "text-embedding-3-large"
	)

source

RAGTools.split_into_code_and_sentences — Method

split_into_code_and_sentences(input::Union{String, SubString{String}})

Splits text block into code or text and sub-splits into units.

If code block, it splits by newline but keep the group_id the same (to have the same source) If text block, splits into sentences, bullets, etc., provides different group_id (to have different source)

source

RAGTools.tags_extract — Method

tags_extract(item::Tag)
tags_extract(tags::Vector{Tag})

Extracts the Tag item into a string of the form category:::value (lowercased and spaces replaced with underscores).

Example

msg = aiextract(:RAGExtractMetadataShort; return_type=MaybeTags, text="I like package DataFrames", instructions="None.")
metadata = tags_extract(msg.content.items)

source

RAGTools.tavily_api — Method

tavily_api(;
	api_key::AbstractString,
	endpoint::String = "search",
	url::AbstractString = "https://api.tavily.com",
	http_kwargs::NamedTuple = NamedTuple(),
	kwargs...)

Sends API requests to Tavily and returns the response.

source

RAGTools.tf — Method

tf(dtm::AbstractDocumentTermMatrix)

Get the term frequency matrix of an AbstractDocumentTermMatrix.

source

RAGTools.token_with_boundaries — Method

token_with_boundaries(
	prev_token::Union{Nothing, AbstractString}, curr_token::AbstractString,
	next_token::Union{Nothing, AbstractString})

Joins the three tokens together. Useful to add boundary tokens (like spaces vs brackets) to the curr_token to improve the matched context (ie, separate partial matches from exact match)

source

RAGTools.tokenize — Method

tokenize(input::Union{String, SubString{String}})

Tokenizes provided input by spaces, special characters or Julia symbols (eg, =>).

Unlike other tokenizers, it aims to lossless - ie, keep both the separated text and the separators.

source

RAGTools.translate_positions_to_parent — Method

translate_positions_to_parent(index::AbstractChunkIndex, positions::AbstractVector{<:Integer})

Translate positions to the parent index. Useful to convert between positions in a view and the original index.

Used whenever a chunkdata() is used to re-align positions in case index is a view.

source

RAGTools.translate_positions_to_parent — Method

translate_positions_to_parent(
	index::SubChunkIndex, pos::AbstractVector{<:Integer})

Translate positions to the parent index. Useful to convert between positions in a view and the original index.

Used whenever a chunkdata() or tags() are used to re-align positions to the "parent" index.

source

RAGTools.trigram_support! — Method

trigram_support!(parent_node::AnnotatedNode,
	context_trigrams::AbstractVector, trigram_func::F1 = trigrams, token_transform::F2 = identity;
	skip_trigrams::Bool = false, min_score::Float64 = 0.5,
	min_source_score::Float64 = 0.25,
	stop_words::AbstractVector{<:String} = STOPWORDS,
	styler_kwargs...) where {F1 <: Function, F2 <: Function}

Find if the parent_node.content is supported by the provided context_trigrams.

Logic:

Split the parent_node.content into tokens
Create an AnnotatedNode for each token
If skip_trigrams is enabled, it looks for an exact match in the context_trigrams
If no exact match found, it counts trigram-based match (include the surrounding tokens for better contextual awareness) as a score
Then it sets the style of the node based on the score
Lastly, it aligns the styles of neighboring nodes with score==nothing (eg, single character tokens)
Then, it rolls up the scores and sources to the parent node

For diagnostics, you can use AbstractTrees.print_tree(parent_node) to see the tree structure of each token and its score.

Example

```julia contexttrigrams = textto_trigrams.(["This IS a test.", "Another test.", "More content here."])

node = AnnotatedNode(content = "xyz") trigramsupport!(node, contexttrigrams) # updates node.children! `

source

RAGTools.trigrams — Method

trigrams(input_string::AbstractString; add_word::AbstractString = "")

Splits provided input_string into a vector of trigrams (combination of three consecutive characters found in the input_string).

If add_word is provided, it is added to the resulting array. Useful to add the full word itself to the resulting array for exact match.

source

RAGTools.trigrams_hashed — Method

trigrams_hashed(input_string::AbstractString; add_word::AbstractString = "")

Splits provided input_string into a Set of hashed trigrams (combination of three consecutive characters found in the input_string).

It is more efficient for lookups in large strings (eg, >100K characters).

If add_word is provided, it is added to the resulting array to hash. Useful to add the full word itself to the resulting array for exact match.

source

RAGTools.vocab — Method

vocab(dtm::AbstractDocumentTermMatrix)

Get the vocabulary vector of an AbstractDocumentTermMatrix, defined in rag_interface.jl.

source

RAGTools.vocab_lookup — Method

vocab_lookup(dtm::AbstractDocumentTermMatrix)

Get the vocabulary lookup dictionary of an AbstractDocumentTermMatrix.

source

StructTypes.constructfrom — Method

StructTypes.constructfrom(RAGResult, JSON3.read(tmp))

Use as: StructTypes.constructfrom(RAGResult, JSON3.read(tmp))

source

StructTypes.constructfrom — Method

StructTypes.constructfrom(
	::Type{T},
	obj::Union{Dict, JSON3.Object}
) where {T <: Union{CandidateChunks, MultiCandidateChunks}}

Constructor for serialization - opinionated for abstract types!

source