JSON from Web-Scraping Tool

Rachel · November 4, 2025, 2:43am

When the workflow calls the Thunk web-scraping tool to fetch and extract data from multiple URLs, several pages sometimes return identical JSON outputs–even though the pages themselves differ. So several unique websites produce the same extracted record.

I don’t know anything about this, and I hope this isn’t annoying – here are ChatGPT’s ideas of things to check:

Whether the tool is caching or reusing responses across URLs
Whether the extractor resets completely between pages so data from one run can’t persist into another
Whether a fetch error or timeout causes the tool to reuse the last successful JSON instead of returning an error
Whether the tool’s output can include the final URL and a content hash to confirm each record’s source

ChatGPT thinks it’s a data-reuse or caching issue within the scraping component. Maybe some of the above suggestions are ridiculous – I don’t know anything about this stuff… I just know that it makes our work very challenging if we can’t trust the JSON.

(I wish I knew more about this stuff so that I didn’t have to say “ChatGPT said this…!”)

Thank you!!

tony · November 4, 2025, 5:13pm

@Rachel do you have a link to a step where this happened? I could take a look to see what’s going on.

Rachel · November 12, 2025, 1:09am

Hi Tony - I’m so sorry - I can’t find an instance of it! I guess I was basically just stating a general concern that different tools and the AI at different points in the workflow seem to be able to bleed into what is being written, so it’s very hard to trust what the Thunk is saying. We know AI can hallucinate and lie, but it would be really like some hard-coded guardrails so that we can at least understand what we are looking at and whether what we are looking at might have been fabricated by the AI. It seems that parts of a workflow step leak into each other which scares us a lot

tony · November 12, 2025, 10:27pm

Hi @Rachel thanks for looking and I understand your concern. We’re actively working on making it so more of the agent’s work can be done deterministically to reduce these kinds of errors. If you encounter anything please send it my way.

Rachel · November 13, 2025, 3:59pm

That’s awesome, @tony! Thank you!

Topic		Replies	Views
Utilize the full power of Google Web Search in your thunks General Questions	2	62	June 26, 2025
Exceeded prompt/completion token limit Please enter your OPEN_AI API key to continue using Thunk.AI? General Questions	1	38	October 24, 2025
Web browsing errors - retrying General Questions	4	110	June 2, 2025
Exporting Steps General Questions	0	29	October 23, 2025
API Tools AI Bug General Questions api-tools	3	86	July 6, 2025

JSON from Web-Scraping Tool

Related topics