I was wondering about the search built into Google Drive’s connection with the Thunk agent. Through some testing I got Thunk to search Google Drive though giving it a Shared Drive URL, and it seemed to do a full content search, not just a “keywords found in the title” search. However, I only got the Thunk agent to search Google Drive once out of many attempts, and I haven’t been able to replicate it. Maybe a custom “search_google_drive” tool could be designed to make the search more reliable. The only time it worked, the agent used the “browser_navigate” tool with the correct URL for the Google Drive, and every time that it failed, it used " search_documents". Maybe also a way to force the “browser_navigate” tool could make it more reliable.
@tony , thoughts?
From what I remember, we found the builtin Google Drive search (via the API) to be really really bad. So we didn’t want to expose that as a tool and get back really poor results.
@Travis_Frick what kind of search do you want to do? It would be easy to add a tool. The keyword search isn’t great but I can imagine if you have a more specific use case it could work.
@tony I’m looking for a full text search of PDF files within a single Google Drive folder. Taking a quick look at the Google Drive API, it looks like their search is mostly keyword based, if trying to do a full text search. I feel a keyword search would be fine for the time being, since the AI agent gives lots of flexibility. Initially a full sentence could be entered for the search query, then the AI agent would be able to reduce the question down into keywords, and search again using those until it finds some matching files.
I’ll add a tool for drive search and will ping you here when it’s available.
You might also be interested in Thunk’s document collections (left menu in your thunk). These allow you to create a custom collection of docs that can serve as background info for your process. With this approach, Thunk creates a search engine over your data, which allows you to create custom fields to extract from document content.
There are situations where you’d want to use a simple google drive search vs a custom content collection, though.
@tony Awesome, thanks! We are working with some others on the Thunk team for a custom content collection, but we have such a large Google Drive folder that it is unknown if the content collection will be efficient at the scale we are thinking of. The Google Drive search can be a backup option for us, and help some others who may be looking for a quick scan through a Google Drive folder!
Again, a note of caution. When we looked at the Google Drive keyword search some months ago, from what I remember it was only on the title or something — basically it was very limited and not at all what the doctor would have ordered :]
There’s a drive search tool now. Here’s an example of it in action.
In this case, I told the agent to go to this folder of biology papers on drive
It produced this output for me (as a sheet):
It used the search tool to do this:
And then proceeded to visit each paper and summarize it (kind of a simple RAG)
@praveen-Thunk.AI mentioned some cavaets:
- It can search by file name, full text content, parent folder, and mime type
- Its relevance isn’t great - you’ll get an ordered list of search results back, but it is low precision
- I’d recommend telling your AI to use it with exact phrases to get the precision higher, i.e. tell it to put stuff in quotes like
search for the exact phrase "sea turtles"
Details of the backing API can be found here
Let me know how it goes!