Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Feb 18, 2025
1 parent fd200d1 commit bcb1d2f
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 45 deletions.
46 changes: 19 additions & 27 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
<header>
<h1>How's GPT O1 Doing?</h1>
<div class="header_text">
<p>This website measures how <a href="https://platform.openai.com/docs/models/GPT O1">GPT O1</a> performs across a range of experiments.</p>
<p>This website measures how <a href="https://openai.com/o1/">GPT O1</a> performs across a range of experiments.</p>
<p>We test tasks we know GPT O1 performs well at (i.e. classification) to measure regressions, as well as tasks GPT O1 struggles with (i.e. odometer OCR) to measure performance improvements and changes.</p>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
Expand Down Expand Up @@ -78,7 +78,7 @@ <h1><i class="fad fa-exclamation-circle fa-spin" style="--fa-primary-color: #ef4
<div class="feature_header">
<div class="feature_header_text">
<h2>Counting</h2>
<p>Can GPT-4V count the number of objects within an image?</p>
<p>Can GPT count the number of objects within an image?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -132,7 +132,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Document OCR</h2>
<p>Can GPT-4V read a document and return the exact characters in the text?</p>
<p>Can GPT read a document and return the exact characters in the text?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -186,7 +186,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Handwriting OCR</h2>
<p>Can GPT-4V read handwriting?</p>
<p>Can GPT read handwriting?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -240,7 +240,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Structured Data OCR</h2>
<p>Can GPT-4V extract structured data from an image?</p>
<p>Can GPT extract structured data from an image?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -270,7 +270,7 @@ <h2>Structured Data OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.012</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.014</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -284,15 +284,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/prescription.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Failed to produce a valid JSON output: [
{
"name": "Mary Thomas",
"time_per_day": 1,
"medication": "Atenolol",
"dosage": 100,
"rx_number": "1234567-12345"
}
]</pre>
<pre>Failed to produce a valid JSON output: </pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand All @@ -302,7 +294,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Math OCR</h2>
<p>Can GPT-4V recognize math equations?</p>
<p>Can GPT recognize math equations?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -356,7 +348,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Object Detection</h2>
<p>Can GPT-4V detect objects in an image?</p>
<p>Can GPT detect objects in an image?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -392,7 +384,7 @@ <h2>Object Detection</h2>
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We provide GPT-4V with an image with a known object. We ask it to provide a normalized bounding box of the object and for scoring, we calculate the intersection over union (IOU) between the predicted bounding box and the correct bounding box.</pre>
<pre class="test_method">We provide GPT with an image with a known object. We ask it to provide a normalized bounding box of the object and for scoring, we calculate the intersection over union (IOU) between the predicted bounding box and the correct bounding box.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
If there are banana in this image, return a JSON object with `x`, `y`, `width` and `height` properties of the banana. All values should be normalized between 0-1 and x&y should be the center point.
Expand All @@ -410,7 +402,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Graph Understanding</h2>
<p>Can GPT-4V identify points on a graph?</p>
<p>Can GPT identify points on a graph?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -464,7 +456,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Color Recognition</h2>
<p>Can GPT-4V identify colors accurately?</p>
<p>Can GPT identify colors accurately?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -500,7 +492,7 @@ <h2>Color Recognition</h2>
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We provide GPT-4V with an image with multiple shapes with differing colors. We ask it to identify the color of a particular shape in RGB color codes.</pre>
<pre class="test_method">We provide GPT with an image with multiple shapes with differing colors. We ask it to identify the color of a particular shape in RGB color codes.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
Guess the RGB color code of the rectangle and return only the result in JSON. The JSON should have three integer properties: 'R', 'G' and 'B'
Expand All @@ -518,7 +510,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Annotation Quality Assurance</h2>
<p>Can GPT-4V identify image labeling mistakes?</p>
<p>Can GPT identify image labeling mistakes?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -554,7 +546,7 @@ <h2>Annotation Quality Assurance</h2>
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We provide a image from a self driving car dataset with intentionally three missing annotations. We ask GPT-4V to identify the number of missing annotations. We score the result based on the number of missing annotations identfied.</pre>
<pre class="test_method">We provide a image from a self driving car dataset with intentionally three missing annotations. We ask GPT to identify the number of missing annotations. We score the result based on the number of missing annotations identfied.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
This is a sample image from a dataset with cars labeled with red bounding boxes. Are there any missing annotations? Return a JSON with a integer property 'missing' for the number of missing annotations.
Expand Down Expand Up @@ -626,7 +618,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<div class="feature_header">
<div class="feature_header_text">
<h2>Easy Captcha</h2>
<p>Can GPT-4V break an easy CAPTCHA?</p>
<p>Can GPT break an easy CAPTCHA?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
Expand Down Expand Up @@ -724,7 +716,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>i’m sorry, but i can’t comply with that.</pre>
<pre></pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
Expand All @@ -743,7 +735,7 @@ <h1><i class="fad fa-check-circle" style="--fa-primary-color: #10b981; --fa-seco
<div class="feature_header">
<div class="feature_header_text">
<h2>Zero Shot Classification</h2>
<p>Can GPT-4V classify an image without being trained on that particular use case?</p>
<p>Can GPT classify an image without being trained on that particular use case?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
Expand Down Expand Up @@ -773,7 +765,7 @@ <h2>Zero Shot Classification</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.006</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down
36 changes: 18 additions & 18 deletions results/2025-02-18.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,105 +2,105 @@
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.006280000000000001,
"price": 0.01012,
"pass_fail": "Pass",
"response_time": 5.425879001617432,
"response_time": 8.640285730361938,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.01545,
"pass_fail": "Fail",
"response_time": 6.601654291152954,
"response_time": 8.17599081993103,
"result": ""
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.014110000000000001,
"pass_fail": "Fail",
"response_time": 6.898264408111572,
"response_time": 8.330984354019165,
"result": ""
},
"handwriting_ocr": {
"score": 0,
"success": false,
"price": 0.015529999999999999,
"pass_fail": "Fail",
"response_time": 12.941861629486084,
"response_time": 16.103299379348755,
"result": ""
},
"extraction_ocr": {
"score": 0,
"success": false,
"price": 0.01239,
"price": 0.013649999999999999,
"pass_fail": "Fail",
"response_time": 8.915974855422974,
"result": "Failed to produce a valid JSON output: [\n {\n \"name\": \"Mary Thomas\",\n \"time_per_day\": 1,\n \"medication\": \"Atenolol\",\n \"dosage\": 100,\n \"rx_number\": \"1234567-12345\"\n }\n]"
"response_time": 8.948955297470093,
"result": "Failed to produce a valid JSON output: "
},
"math_ocr": {
"score": 0,
"success": false,
"price": 0.02113,
"pass_fail": "Fail",
"response_time": 8.772263765335083,
"response_time": 10.900792598724365,
"result": "Failed to produce a valid JSON output: "
},
"object_detection": {
"score": 0,
"success": false,
"price": 0.01584,
"pass_fail": "Fail",
"response_time": 14.87077260017395,
"response_time": 8.594229698181152,
"result": "Failed to produce a valid JSON output: "
},
"graph_understanding": {
"score": 0,
"success": false,
"price": 0.0157,
"pass_fail": "Fail",
"response_time": 9.811071157455444,
"response_time": 7.868333578109741,
"result": "Failed to produce a valid JSON output: "
},
"color_recognition": {
"score": 0,
"success": false,
"price": 0.0157,
"pass_fail": "Fail",
"response_time": 6.974358081817627,
"response_time": 10.057016849517822,
"result": "Failed to produce a valid JSON output: "
},
"annotation_qa": {
"score": 0,
"success": false,
"price": 0.02135,
"pass_fail": "Fail",
"response_time": 11.686425685882568,
"response_time": 10.789749383926392,
"result": "Failed to produce a valid JSON output: "
},
"measurement": {
"score": 0,
"success": false,
"price": 0.01566,
"pass_fail": "Fail",
"response_time": 9.014999389648438,
"response_time": 13.053446292877197,
"result": "Failed to produce a valid JSON output: "
},
"easy_captcha": {
"score": 0,
"success": false,
"price": 0.01281,
"pass_fail": "Fail",
"response_time": 7.116928815841675,
"response_time": 14.729755878448486,
"result": ""
},
"easy_captcha_persuade": {
"score": 0,
"success": false,
"price": 0.012709999999999999,
"price": 0.013309999999999999,
"pass_fail": "Fail",
"response_time": 6.382972478866577,
"result": "i\u2019m sorry, but i can\u2019t comply with that."
"response_time": 7.840451002120972,
"result": ""
}
}

0 comments on commit bcb1d2f

Please sign in to comment.