Spaces:

whyu
/

MM-Vet_Evaluator

Running

whyu commited on Aug 21, 2023

Commit

0b0d73f

•

1 Parent(s): d67424e

Fix sleep bug

Files changed (1) hide show

app.py CHANGED Viewed

@@ -166,6 +166,7 @@ def grade(file_obj, progress=gr.Progress()):
  grade_sample_run_complete = False
  temperature = 0.0
  while not grade_sample_run_complete:
  try:
  response = openai.ChatCompletion.create(
@@ -206,8 +207,15 @@ def grade(file_obj, progress=gr.Progress()):
  grade_sample_run_complete = True
  except:
  # gpt4 may have token rate limit
  print("sleep 30s")
  time.sleep(30)
  if len(sample_grade['model']) >= j + 1:
  sample_grade['model'][j] = response['model']
@@ -298,7 +306,7 @@ markdown = """
 In this demo, we offer MM-Vet LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.
-Plese upload your json file of your model results containing `\{v1_0\: ..., v1_1\: ..., \}`like [this json file](https://raw.githubusercontent.com/yuweihao/MM-Vet/main/results/llava_llama2_13b_chat.json).
 The grading may last 5 minutes. Sine we only support 1 queue, the grading time may be longer when you need to wait for other users' grading to finish.

  grade_sample_run_complete = False
  temperature = 0.0
+ num_sleep = 0
  while not grade_sample_run_complete:
  try:
  response = openai.ChatCompletion.create(
  grade_sample_run_complete = True
  except:
  # gpt4 may have token rate limit
+ num_sleep += 1
+ if num_sleep > 2:
+ score = 0.0
+ grade_sample_run_complete = True
+ num_sleep = 0
+ continue
  print("sleep 30s")
  time.sleep(30)
  if len(sample_grade['model']) >= j + 1:
  sample_grade['model'][j] = response['model']
 In this demo, we offer MM-Vet LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.
+Plese upload your json file of your model results containing `{v1_0: ..., v1_1: ..., }`like [this json file](https://raw.githubusercontent.com/yuweihao/MM-Vet/main/results/llava_llama2_13b_chat.json).
 The grading may last 5 minutes. Sine we only support 1 queue, the grading time may be longer when you need to wait for other users' grading to finish.