Feature/inference api #117

taekang1618 · 2023-07-21T16:08:56Z

From one of our client calls, we found a demand to use model deployment in automatic data pipelines using the studio python api client. In this PR, we built the api client side code for model deployment.

To test, you need to need to pull the branch in this PR from the backend code.

CLAassistant · 2023-07-21T16:09:03Z

All committers have signed the CLA.

ryansingman · 2023-07-21T17:03:22Z

can you fix mypy checks?

ryansingman

this PR is incomplete. users are expected to use the Studio class to interact w/ the API, not the API methods directly. methods need to be added to that class that call the API methods (like all of the other functionalities do)

… of text input files with actual text data columns used for prediction

ryansingman · 2023-07-25T20:27:52Z

cleanlab_studio/internal/api/api.py

+    if modality == "text":
+        header_io = io.StringIO(header)
+        input_batch = io.StringIO("\n".join(chain(header_io, batch)))


see comment in backend

Got it. I'll change the lambda function to take care of this. Didn't want to touch lambda as much as possible since it's not in a state that's easy to fix and deploy right now.

ryansingman · 2023-07-25T20:28:55Z

cleanlab_studio/internal/api/api.py

+def download_prediction_results(result_url: str) -> io.StringIO:
+    """Downloads prediction results from presigned URL."""
+    res = requests.get(result_url)
+    return io.StringIO(res.text)


no need to do this buffering, just pass res.raw

I think we do need this buffering since we want to return a numpy array for the results to the user.

ryansingman · 2023-07-25T20:29:40Z

cleanlab_studio/studio/inference.py

+        # Set timeout to 10 minutes as inference won't take longer than 10 minutes typically and
+        # to prevent users from getting stuck in this loop indefinitely when there is a failure


failures should be reported in the status, correct?

Yeah that would be ideal, but there has been cases where the status updates did not happen properly for lambda failures in the past and queries were stuck in the running status even after they have failed. Just wanted to make sure this case doesn't confuse users.

ryansingman · 2023-07-25T20:30:47Z

cleanlab_studio/studio/inference.py

+            status = resp["status"]
+
+        if status == "error":
+            return resp["error_msg"]


we need to raise an error here. returning an error message is not a sensible API for users

Changed to raising an APIError

ryansingman · 2023-07-25T20:31:10Z

cleanlab_studio/studio/inference.py

+            results_converted: Predictions = pd.read_csv(results).to_numpy()
+            return results_converted
+
+    @functools.singledispatchmethod


should remove (relic from my dev work)

ryansingman · 2023-07-25T20:31:33Z

cleanlab_studio/studio/inference.py

+        status: Optional[str] = resp["status"]
+        # Set timeout to 10 minutes as inference won't take longer than 10 minutes typically and
+        # to prevent users from getting stuck in this loop indefinitely when there is a failure
+        timeout = time.time() + 60 * 10


timeout should be user configurable

I'll add timeout as an optional parameter users can add when calling predict.

ryansingman · 2023-07-27T13:49:43Z

cleanlab_studio/internal/api/api.py

+def download_prediction_results(result_url: str) -> io.StringIO:
+    """Downloads prediction results from presigned URL."""
+    res = requests.get(result_url)
+    return io.StringIO(res.text)


It is unnecessary and wasteful to rebuffer the entire result here. Just use res.raw

Removed this function entirely and used url directly in pandas to read results

ryansingman · 2023-07-27T19:15:58Z

Failing in dev currently:

cleanlab_studio.errors.APIError: Length of values (2) does not match length of index (100)

taekang1618 · 2023-07-27T21:27:34Z

Can you let me know which model and test input you tested with?

taekang1618 · 2023-07-28T18:33:31Z

Looked at the input file here and it seems like there is a formatting error in the header.
,Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone

cleanlab_studio/studio/studio.py

cleanlab_studio/studio/inference.py

ryansingman · 2023-07-31T20:07:31Z

cleanlab_studio/studio/inference.py

+        resp = api.get_prediction_status(self._api_key, query_id)
+        status: Optional[str] = resp["status"]
+        # Set timeout to prevent users from getting stuck indefinitely when there is a failure
+        timeout_limit = time.time() + timeout
+
+        while status == "running" and time.time() < timeout_limit:
+            resp = api.get_prediction_status(self._api_key, query_id)
+            status = resp["status"]
+            # Set time.sleep so that the while loop doesn't flood backend with api calls
+            time.sleep(3)


make timeout configurable and cleanup logic

cleanlab_studio/studio/inference.py

cleanlab_studio/internal/api/api.py

cleanlab_studio/studio/studio.py

axl1313

lgtm! just a couple nits

cleanlab_studio/studio/inference.py

cleanlab_studio/internal/api/api.py

taekang1618 requested a review from ryansingman July 21, 2023 16:09

ryansingman requested changes Jul 21, 2023

View reviewed changes

ryansingman and others added 15 commits July 25, 2023 10:37

adds inference API (v0)

cafec61

modify client to fit backend api endpoints

55f93a2

modify cli to make model prediction work

9257172

black

79b5cde

modify cli after testing api endpoints

92d973c

black

967722b

integrate into Studio class

3d63153

remove base url override

d81b209

fix api endpoint for client to work

865cf03

modify code to support text files without headers

c88f90b

remove test file for local testing

7db17d8

change response for upload api and remove logic for comparing headers…

2d27b3c

… of text input files with actual text data columns used for prediction

modify invoke lambda api to send only query_id as param

25d33ac

remove test file

ba0a5f9

fix mypy errors

9ab9d28

taekang1618 force-pushed the feature/inference-api branch from a469401 to 9ab9d28 Compare July 25, 2023 17:37

taekang1618 added 4 commits July 25, 2023 10:40

remove TypeAlias

7db2438

fix mypy for Batch type

7b1a48a

User Union instead of | for multi generic typing

b0abe97

more typing fixes and timeout in prediction

d5fb4d4

taekang1618 requested a review from ryansingman July 25, 2023 18:07

taekang1618 added 3 commits July 25, 2023 11:23

change timeout to adn

0b2a73c

remove print statement

4846bf3

remove test files again

af6ce97

ryansingman requested changes Jul 25, 2023

View reviewed changes

fix code review comments

a2a466c

taekang1618 requested a review from ryansingman July 26, 2023 21:12

ryansingman requested changes Jul 27, 2023

View reviewed changes

remove download api endpoint and supply url directly to pandas

612a522

taekang1618 requested a review from ryansingman July 27, 2023 19:06

taekang1618 requested a review from axl1313 July 28, 2023 18:53

axl1313 reviewed Jul 28, 2023

View reviewed changes

cleanlab_studio/studio/studio.py Outdated Show resolved Hide resolved

taekang1618 added 2 commits July 28, 2023 12:35

Merge remote-tracking branch 'origin/main' into feature/inference-api

5435cd7

update doctring to match documentation format

40bffbf

axl1313 reviewed Jul 28, 2023

View reviewed changes

cleanlab_studio/studio/inference.py Show resolved Hide resolved

taekang1618 requested a review from axl1313 July 28, 2023 21:43

ryansingman reviewed Jul 31, 2023

View reviewed changes

ryansingman added 6 commits July 31, 2023 14:23

fix predict timeout

d83e5e1

mypy fix

e266518

mypy fix

8679aad

add typing extensions req

c871c8c

fix incorrect return types, return predictions separate from class probs

13b4c8c

mypy fix

8b7b8de

axl1313 reviewed Aug 1, 2023

View reviewed changes

ryansingman added 5 commits August 1, 2023 11:51

clean up polling interface, angelas comments

4261e5a

fix sleep placement in poll loop

24b4733

mypy fix

2603868

mypy fix

f7cdefa

fix results name

f1d5102

axl1313 approved these changes Aug 2, 2023

View reviewed changes

cleanlab_studio/studio/inference.py Outdated Show resolved Hide resolved

cleanlab_studio/studio/inference.py Outdated Show resolved Hide resolved

cleanlab_studio/internal/api/api.py Outdated Show resolved Hide resolved

fix nits

af4d02f

ryansingman approved these changes Aug 2, 2023

View reviewed changes

ryansingman merged commit bb2b1c5 into main Aug 2, 2023
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/inference api #117

Feature/inference api #117

taekang1618 commented Jul 21, 2023 •

edited

Loading

CLAassistant commented Jul 21, 2023 •

edited

Loading

ryansingman commented Jul 21, 2023

ryansingman left a comment

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 25, 2023

taekang1618 Jul 25, 2023

ryansingman Jul 27, 2023

taekang1618 Jul 27, 2023

ryansingman commented Jul 27, 2023

taekang1618 commented Jul 27, 2023

taekang1618 commented Jul 28, 2023 •

edited

Loading

ryansingman Jul 31, 2023

axl1313 left a comment

		# Set timeout to 10 minutes as inference won't take longer than 10 minutes typically and
		# to prevent users from getting stuck in this loop indefinitely when there is a failure

Feature/inference api #117

Feature/inference api #117

Conversation

taekang1618 commented Jul 21, 2023 • edited Loading

CLAassistant commented Jul 21, 2023 • edited Loading

ryansingman commented Jul 21, 2023

ryansingman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryansingman commented Jul 27, 2023

taekang1618 commented Jul 27, 2023

taekang1618 commented Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

axl1313 left a comment

Choose a reason for hiding this comment

taekang1618 commented Jul 21, 2023 •

edited

Loading

CLAassistant commented Jul 21, 2023 •

edited

Loading

taekang1618 commented Jul 28, 2023 •

edited

Loading