Hi! Is there a way to have Thunk wait and retry later when encountering potentially temporary errors browsing a webpage? We run into these kinds of errors frequently that usually go away when restarting the task. Thank you for your help!
Encountered a timeout error while trying to navigate to the website. Unable to proceed with verifying publisher information and collecting additional data points.
Encountered a WebSocket error while trying to navigate to the website. Unable to proceed with accessing the website for further verification.
Encountered an error while trying to access the website. The URL seems to be invalid or the connection was reset. Unable to proceed with gathering additional information from this source.
This is something to do with the number of concurrent web browser requests being made.
Adding @Scott-Thunk.AI to this thread as he’s actively working on scaling this.
There should be no situation where we end up with a timeout errror because our browser infrastructure doesn’t scale.
On the other hand, many websites throttle external load coming from “bots” and block/rate-limit them. The general solution to this has to be to do less at a time and do it more slowly, some kind of retry logic on a longer timeframe – which is I think what you’re asking for.
Let’s see what Scott thinks. We’d love to get your workload and experiment with it.
1 Like
We do retry the page load up to three times with exponential backoff. It is currently a bit agressive in that only waits a couple seconds for the first retry. If it fails what you see is the last error message. We’ve recently changed some of the settings to have it try longer. And we have the ability on the back end to change the browser resources we give to a particular thunk.
If you DM me the URL when you are in the Thunk I can look in the logs and see what we can tune to prevent the errors.
1 Like
Thanks! I DM’d you an example.
We also ran into a couple other errors that don’t get resolved by trying again later. I will also DM URLs for these:
- The search results did not return any relevant information
- Encountered an error while trying to access the website
Both of these errors occur again when I restart the task later, but I confirmed that the search results do return working relevant webpages when I search with a manual Google search
The cause of not finding relevant information is that the search was restricted to the US. According the Meena we can fix this by adding something like the following to the prompt.
‘Do not restrict search to any specific language. If you have the information, you can restrict search to country from the gym’s address. If not leave “language”: “”, “country”:“” in your search parameters.’
The second issue I tried a couple of the failures and taking the URL and trying it in my local browser I get a DNS resolution error so it can’t connect to the site. That happened locally for me on my personal network and separately for our service.
1 Like