Saturday, April 15, 2023

Testing ability of GPT4 and open-source LLMs to detect C++ bugs

No, they cannot.

LLaMa 65B (4-bit GPTQ) model: 1 false alarms in 15 good examples.  Detects 0 of 13 bugs.
Baize 30B (8-bit) model: 0 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Galpaca 30B (8-bit) model: 0 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Koala 13B (8-bit) model: 0 false alarms in 15 good examples.  Detects 0 of 13 bugs.
Vicuna 13B (8-bit) model: 2 false alarms in 15 good examples.  Detects 1 of 13 bugs.
Vicuna 7B (FP16) model: 1 false alarms in 15 good examples.  Detects 0 of 13 bugs.

GPT 3.5: 0 false alarms in 15 good examples.  Detects 7 of 13 bugs.
GPT 4: 0 false alarms in 15 good examples.  Detects 13 of 13 bugs.

Reproduce my results here: https://github.com/catid/supercharger/tree/main/airate



from Hacker News https://ift.tt/DAKomQO

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.