pho[to]rum

DuaneSilve · 2025-02-01 16:18:25

DeepSeek is a Chinese AI business "committed to making AGI a truth" and open-sourcing all its models. They began in 2023, but have actually been making waves over the previous month or two, and especially this previous week with the release of their 2 most current thinking models: DeepSeek-R1-Zero and the more innovative DeepSeek-R1, likewise referred to as DeepSeek Reasoner.
$https://images.theconversation.com/files/160728/original/image-20170314-10741-11bu9ke.jpg?ixlib\u003drb-4.1.0\u0026rect\u003d0%2C35%2C1000%2C485\u0026q\u003d45\u0026auto\u003dformat\u0026w\u003d1356\u0026h\u003d668\u0026fit\u003dcrop$

They have actually launched not just the models but also the code and examination triggers for public usage, in addition to a comprehensive paper describing their approach.

Aside from developing 2 extremely performant designs that are on par with OpenAI's o1 design, the paper has a lot of important info around reinforcement knowing, chain of idea reasoning, timely engineering with thinking designs, and more.

We'll start by concentrating on the training procedure of DeepSeek-R1-Zero, which uniquely relied solely on reinforcement learning, instead of traditional monitored knowing. We'll then move on to DeepSeek-R1, how it's reasoning works, and some prompt engineering best practices for reasoning designs.

Hey everybody, Dan here, co-founder of PromptHub. Today, we're diving into DeepSeek's latest design release and comparing it with OpenAI's thinking designs, particularly the A1 and A1 Mini models. We'll explore their training procedure, thinking capabilities, and some key insights into prompt engineering for reasoning designs.

DeepSeek is a Chinese-based AI company committed to open-source advancement. Their recent release, the R1 thinking model, is groundbreaking due to its open-source nature and ingenious training methods. This includes open access to the models, prompts, and research study papers.

Released on January 20th, DeepSeek's R1 achieved outstanding efficiency on various standards, rivaling OpenAI's A1 designs. Notably, they also launched a precursor model, R10, which functions as the foundation for R1.

Training Process: R10 to R1

R10: This design was trained solely utilizing support knowing without supervised fine-tuning, making it the first open-source model to achieve high efficiency through this method. Training involved:

- Rewarding proper answers in deterministic jobs (e.g., mathematics issues).
- Encouraging structured reasoning outputs utilizing design templates with "" and "" tags

Through countless versions, R10 established longer reasoning chains, self-verification, and even reflective habits. For example, during training, the model demonstrated "aha" minutes and self-correction habits, which are unusual in standard LLMs.

R1: Building on R10, R1 added several enhancements:

- Curated datasets with long Chain of Thought examples.
- Incorporation of R10-generated thinking chains.
- Human preference positioning for sleek reactions.
- Distillation into smaller sized models (LLaMA 3.1 and 3.3 at various sizes).

Performance Benchmarks

DeepSeek's R1 design performs on par with OpenAI's A1 models across numerous reasoning benchmarks:

Reasoning and Math Tasks: R1 competitors or exceeds A1 models in precision and depth of reasoning.
Coding Tasks: A1 designs normally perform much better in LiveCode Bench and CodeForces jobs.
Simple QA: R1 often surpasses A1 in structured QA tasks (e.g., 47% precision vs. 30%).

One significant finding is that longer reasoning chains generally enhance performance. This aligns with insights from Microsoft's Med-Prompt structure and OpenAI's observations on test-time compute and reasoning depth.

Challenges and Observations

Despite its strengths, R1 has some restrictions:

- Mixing English and Chinese reactions due to a lack of supervised fine-tuning.
- Less sleek responses compared to chat designs like OpenAI's GPT.

These issues were addressed throughout R1's refinement process, consisting of supervised fine-tuning and human feedback.

Prompt Engineering Insights

A fascinating takeaway from DeepSeek's research study is how few-shot prompting abject R1's performance compared to zero-shot or succinct customized prompts. This lines up with findings from the Med-Prompt paper and OpenAI's suggestions to limit context in reasoning designs. Overcomplicating the input can overwhelm the design and reduce precision.

DeepSeek's R1 is a substantial step forward for open-source thinking designs, showing capabilities that equal OpenAI's A1. It's an interesting time to experiment with these designs and their chat user interface, which is totally free to utilize.

If you have questions or desire to learn more, take a look at the resources connected below. See you next time!

Training DeepSeek-R1-Zero: A support learning-only method

DeepSeek-R1-Zero stands out from a lot of other advanced designs since it was trained utilizing only support learning (RL), no monitored fine-tuning (SFT). This challenges the present conventional approach and opens brand-new opportunities to train thinking models with less human intervention and effort.

DeepSeek-R1-Zero is the first open-source design to validate that sophisticated thinking abilities can be established purely through RL.

Without pre-labeled datasets, the design discovers through trial and error, refining its behavior, criteria, and weights based solely on feedback from the solutions it creates.

DeepSeek-R1-Zero is the base model for DeepSeek-R1.

The RL procedure for DeepSeek-R1-Zero

The training procedure for DeepSeek-R1-Zero involved providing the model with various reasoning jobs, ranging from math problems to abstract reasoning difficulties. The model created outputs and was examined based on its efficiency.

DeepSeek-R1-Zero received feedback through a benefit system that helped direct its learning process:

Accuracy benefits: Evaluates whether the output is appropriate. Used for when there are deterministic outcomes (mathematics issues).

Format rewards: Encouraged the design to structure its reasoning within and tags.

Training timely design template

To train DeepSeek-R1-Zero to produce structured chain of idea series, the scientists used the following prompt training template, replacing prompt with the reasoning concern. You can access it in PromptHub here.

This design template prompted the design to clearly describe its thought process within tags before delivering the final answer in tags.

The power of RL in reasoning

With this training process DeepSeek-R1-Zero started to produce sophisticated thinking chains.

Through thousands of training steps, DeepSeek-R1-Zero evolved to solve increasingly complicated issues. It learned to:

- Generate long thinking chains that made it possible for deeper and more structured analytical

- Perform self-verification to cross-check its own responses (more on this later).

- Correct its own mistakes, showcasing emergent self-reflective behaviors.

DeepSeek R1-Zero efficiency

While DeepSeek-R1-Zero is mostly a precursor to DeepSeek-R1, it still attained high efficiency on numerous standards. Let's dive into a few of the experiments ran.

Accuracy enhancements throughout training

- Pass@1 precision began at 15.6% and by the end of the training it enhanced to 71.0%, comparable to OpenAI's o1-0912 model.

- The red solid line represents efficiency with bulk voting (similar to ensembling and self-consistency techniques), which increased accuracy further to 86.7%, exceeding o1-0912.

Next we'll look at a table comparing DeepSeek-R1-Zero's performance throughout multiple thinking datasets versus OpenAI's reasoning designs.

AIME 2024: 71.0% Pass@1, slightly below o1-0912 however above o1-mini. 86.7% cons@64, beating both o1 and o1-mini.

MATH-500: Achieved 95.9%, beating both o1-0912 and o1-mini.

GPQA Diamond: Outperformed o1-mini with a score of 73.3%.

- Performed much worse on coding jobs (CodeForces and LiveCode Bench).

Next we'll take a look at how the action length increased throughout the RL training process.

This graph reveals the length of responses from the model as the training process progresses. Each "step" represents one cycle of the design's learning procedure, where feedback is offered based upon the output's efficiency, evaluated using the timely template talked about previously.

For each concern (corresponding to one action), 16 actions were sampled, and the typical accuracy was computed to guarantee steady assessment.

As training advances, the design creates longer reasoning chains, permitting it to solve increasingly complex reasoning tasks by leveraging more test-time compute.

While longer chains don't always guarantee much better outcomes, they normally associate with enhanced performance-a pattern also observed in the MEDPROMPT paper (learn more about it here) and in the initial o1 paper from OpenAI.

Aha moment and self-verification

Among the coolest aspects of DeepSeek-R1-Zero's advancement (which likewise applies to the flagship R-1 model) is simply how good the design ended up being at thinking. There were advanced reasoning behaviors that were not clearly set however occurred through its reinforcement finding out process.

Over countless training actions, the model began to self-correct, reassess flawed reasoning, and verify its own solutions-all within its chain of thought

An example of this noted in the paper, described as a the "Aha moment" is below in red text.

In this circumstances, the design literally said, "That's an aha moment." Through DeepSeek's chat feature (their variation of ChatGPT) this type of thinking usually emerges with phrases like "Wait a minute" or "Wait, however ... ,"

Limitations and difficulties in DeepSeek-R1-Zero

While DeepSeek-R1-Zero had the ability to carry out at a high level, there were some downsides with the design.

Language blending and coherence concerns: The model periodically produced responses that mixed languages (Chinese and English).

Reinforcement knowing trade-offs: The lack of supervised fine-tuning (SFT) implied that the design lacked the improvement needed for fully polished, human-aligned outputs.

DeepSeek-R1 was developed to attend to these issues!

What is DeepSeek R1

DeepSeek-R1 is an open-source thinking model from the Chinese AI lab DeepSeek. It develops on DeepSeek-R1-Zero, which was trained completely with reinforcement learning. Unlike its predecessor, DeepSeek-R1 incorporates monitored fine-tuning, making it more improved. Notably, it outshines OpenAI's o1 design on numerous benchmarks-more on that later on.

What are the primary differences in between DeepSeek-R1 and DeepSeek-R1-Zero?

DeepSeek-R1 develops on the foundation of DeepSeek-R1-Zero, which acts as the base model. The 2 vary in their training methods and total efficiency.

1. Training technique

DeepSeek-R1-Zero: Trained entirely with reinforcement learning (RL) and no monitored fine-tuning (SFT).

DeepSeek-R1: Uses a multi-stage training pipeline that consists of supervised fine-tuning (SFT) first, followed by the exact same support finding out procedure that DeepSeek-R1-Zero wet through. SFT helps enhance coherence and readability.

2. Readability & Coherence

DeepSeek-R1-Zero: Fought with language mixing (English and Chinese) and readability issues. Its thinking was strong, but its outputs were less polished.

DeepSeek-R1: Addressed these concerns with cold-start fine-tuning, making actions clearer and more structured.

3. Performance

DeepSeek-R1-Zero: Still a really strong thinking model, in some cases beating OpenAI's o1, but fell the language blending problems lowered usability greatly.

DeepSeek-R1: Outperforms R1-Zero and OpenAI's o1 on most thinking benchmarks, and the reactions are far more polished.

Simply put, DeepSeek-R1-Zero was a proof of concept, while DeepSeek-R1 is the totally optimized version.

How DeepSeek-R1 was trained

To tackle the readability and coherence issues of R1-Zero, the scientists incorporated a cold-start fine-tuning stage and a multi-stage training pipeline when developing DeepSeek-R1:

Cold-Start Fine-Tuning:

- Researchers prepared a top quality dataset of long chains of thought examples for preliminary monitored fine-tuning (SFT). This information was collected using:- Few-shot prompting with in-depth CoT examples.

- Post-processed outputs from DeepSeek-R1-Zero, improved by human annotators.

Reinforcement Learning:

DeepSeek-R1 went through the very same RL process as DeepSeek-R1-Zero to fine-tune its reasoning capabilities further.

Human Preference Alignment:

- A secondary RL phase improved the design's helpfulness and harmlessness, making sure much better positioning with user requirements.

Distillation to Smaller Models:

- DeepSeek-R1's thinking capabilities were distilled into smaller, effective models like Qwen and Llama-3.1 -8 B, and Llama-3.3 -70 B-Instruct.

DeepSeek R-1 standard efficiency

The researchers evaluated DeepSeek R-1 across a variety of standards and versus top designs: o1, GPT-4o, and Claude 3.5 Sonnet, o1-mini.

The criteria were broken down into numerous classifications, shown listed below in the table: English, Code, Math, and Chinese.

Setup

The following criteria were applied throughout all designs:

Maximum generation length: 32,768 tokens.

Sampling setup:- Temperature: 0.6.

- Top-p worth: 0.95.

- DeepSeek R1 outperformed o1, Claude 3.5 Sonnet and other models in the majority of reasoning benchmarks.

o1 was the best-performing design in 4 out of the five coding-related criteria.

- DeepSeek carried out well on innovative and long-context task job, like AlpacaEval 2.0 and ArenaHard, outshining all other designs.

Prompt Engineering with reasoning models

My favorite part of the article was the scientists' observation about DeepSeek-R1's sensitivity to triggers:

This is another datapoint that lines up with insights from our Prompt Engineering with Reasoning Models Guide, which references Microsoft's research study on their MedPrompt framework. In their study with OpenAI's o1-preview design, they found that overwhelming thinking designs with few-shot context degraded performance-a sharp contrast to non-reasoning models.

The crucial takeaway? Zero-shot triggering with clear and concise directions appear to be best when using thinking designs.
$https://online.stanford.edu/sites/default/files/styles/widescreen_tiny/public/2020-08/artificial-intelligence-in-healthcare-MAIN.jpg?itok\u003d5EXRY5eb$

xxdruidtt · Aujourd'hui 09:18:19

Ð“Ð°Ð»Ð°183CHAPraysÐ Ñ‚Ð¼Ð¾AlanCataAntoCharÐ‘Ð¾Ð³Ð¾Ð¿ÐµÐ»ÐµWood6S41MileÐšÑ Ð¿Ð²MicrMegaSpatÑ Ð½Ð²Ð°BrotÐ¿ÐµÑ€ÐµÐ¾Ð±Ð¾Ñ€
1877OZONTimoÐŸÐ¾Ð¿Ð¾AlbeCrysAccaMichAlanÐ´ÐµÑ‚ÐµGillÐ—Ð°Ð½Ð¸Ð¢ÐºÐ°Ñ‡DefoRexoÑ ÐµÑ€Ñ‚Ð˜Ð»Ð»ÑŽÐ¡Ð¾Ð´ÐµÐ¼Ð³Ð½Ð¾JameÐŸÐµÑ‚Ñ€Kenz
XVIIÐœÑƒÑ€Ð°LacaAlexÐŸÑ€Ð¸Ð²XVIIPushÐ¸Ð·Ð´Ð°Ð“Ð»Ð°Ð´AdiomattMariDolbMikaIchaÐ‘Ñ‹ÐºÐ¾TurtÐœÐ¾Ñ€Ñ…Ð¸Ñ Ñ‚Ð¾Ð¯ÐºÑƒÐ½ÐœÐ°Ð»ÑŒÐ Ð°Ð±Ð¸
AutoÑ„Ð°ÐºÑƒOxygCompOZONÐ Ð¾Ñ ÑÐ“Ð¾Ñ€Ð±Ð•Ð½Ð³Ð°Ð¢Ð¸Ñ…Ð¾Ð›Ð°Ð±Ñ‹BrotÐšÐ¾Ð¼ÐµWindPariSimsFunnArts(196WindClinquotÐ˜Ð²Ð°Ð½
Ð Ð¾ÐºÐ¾BlueReeb(197FuxiKathdiamÐ·Ð°ÐºÐ°MiyoXXVILionÐ¯ÐºÐ¾Ð²EdmoAcadNeilÐ¡Ð¾Ð´ÐµElsaAlexÐ¢Ñ€Ð°Ð¿PolyGeniBert
ÑƒÐ´Ð°Ñ‡Ñ€ÐµÐ´Ð°IntrÐ¶ÐµÐ½Ñ‰SahaGracÑ…Ð¾Ñ€Ð¾Ð¿Ñ Ð¸Ñ…MPEGÐºÐ»ÐµÐ¹MikeUVMACataDisnÐ¢ÐµÐ»ÐµÑ Ñ‚Ñ€Ð°Ð‘Ñ€ÑƒÐ¼RoseÐšÐ¸Ñ‚Ð°MistÐ¥Ð¾Ñ‚Ð¸Ð˜Ð·Ð³Ð¾
Ð¿Ð¾Ð´ÑBlueÐ¤ÐµÐ´ÐµÐœÐ¾Ð·Ð°SIRDfolkÑ‚ÐµÐºÑGOBIÑƒÐ¿Ð°ÐºÑ€Ð°Ð±Ð¾Ð -01Ñ ÐµÑ€Ñ‹Ð”Ð»Ð¸Ð½WindWindÐ¯ÐºÑƒÑˆÑ‡ÐµÐ»Ð¾BlooChouTrusBoziÐ Ð¾Ñ Ðº
Ð‘Ð°Ð¹ÐºGillÐ›Ð¸Ñ‚ÐÐ¤ÐµÐ´Ð¾Ð›Ð¸Ñ‚ÐÐ›ÐµÐ±ÐµÐ Ð¾Ð¼Ð°Ð›Ð¸Ñ‚ÐÐ¡Ð¾Ð±Ð¾VictJoseXVIIWilhÐ²Ð¾ÐµÐ²Ð“Ð½ÐµÐ²BeyoÐ˜Ð²Ð°Ð½Ð²ÐµÐ´ÑƒÐœÐµÑ‰Ð°LeonWoulÐ•ÐºÐ°Ñ‚
Ð“Ñ€Ð¸ÑˆÐ¡Ð¾Ð´ÐµThisPushTubuTimeÐ°Ð²Ñ‚Ð¾ÐºÐ¾Ñ‚Ð¾Ð·Ð°Ð½Ð¸Ð“ÐµÑ€Ð¾Ð¯Ð¼Ð°Ð»Ð£Ñ Ñ‚Ð¸Ð°Ð²Ñ‚Ð¾Ð–ÑƒÐºÐ¾Ð°Ð²Ñ‚Ð¾46-6ÐœÐ°ÐºÑÐ”ÐµÑ‚ÑÐ“Ð»Ð°Ð´MillÐ•Ñ ÐµÐ½Ð•Ñ ÐµÐ½
Ð°Ð²Ñ‚Ð¾Ð·Ð°Ð²ÐµÐšÑƒÐ±Ð°Ð—ÐµÐ¼Ñ†Ð”Ð¾Ð±Ñ€LookÐ°Ð²Ñ‚Ð¾MPEGMPEGMPEGÐ¾Ñ‚Ð²ÐµStra51-6SUSEÐ”Ñ€ÑƒÐ¶CitySteaÐ›Ð°ÐµÐ²CampMoonWilfÐ‘Ð°Ñ€Ð°
tuchkasÐ—Ð°Ð±Ð¾Ð°Ð²Ñ‚Ð¾

pho[to]rum

#1 2025-02-01 16:18:25

Some Sensitive Topics off Limits On Chinese Chatbot DeepSeek

#2 Aujourd'hui 09:18:19

Re: Some Sensitive Topics off Limits On Chinese Chatbot DeepSeek

Pied de page des forums