GPT-2
Predict the next word(s) from a given sequence of words.
Add this to your python code:
import booste
out_list = booste.gpt2(in_string, length)
Arguments:
Arg | Description | Required | Type | Example |
---|---|---|---|---|
in_string | The given sequence of words | True | string | "I went on a walk and suddenly, I" |
length | The quantity of words to predict following the in_string | False (default=5) | int | 10 |
Return Type:
List - the predicted text, separated into list form.
To convert to string, add:
out_string = " ".join(out_list)
Response Time:
GPT-2 is a large model. API calls can take 1s per word of length. Request faster response time here
Fine-Tuning:
Adjusting these parameters from the default is not recommended.
Arg | Description | Required | Type | Example |
---|---|---|---|---|
temperature | A value between 0.1 and 1, to adjust randomness. Smaller values create seemingly random output. Larger values create repeating phrases in the output. |
False (default=0.8) | float | 0.8 |
batch_length | A value 1-50, to manage inference workload. Booste splits the API call into batches to avoid server timeout. Adjusting this does not affect prediction quality. Smaller values reduce total inference time, but require more API calls, so it is only suggested if you have high bandwidth. Larger values can cause server timeout. |
False (default=20) | int | 50 |
window_max | A value 1-200, to manage inference workload. GPT-2 can only accept an input string less than 200 words long. Booste uses a "sliding window" approach to handle longer length requests, where the input string is trimmed to the n=window_max most recent words. Smaller values reduce inference time dramatically, but output will drift into unrelated subjects due to lost context. Larger values increase output quality, but can 5x inference time per word and overload the model input limit. |
False (default=50) | int | 100 |
from Hacker News https://ift.tt/3iY2NXN
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.