Experiments with Custom Bots… and Quirkiness?

Estimated reading time: 3 minutes, 26 seconds

I’ve got an upcoming talk as part of a panel here in Egypt that I need to contribute to in Arabic. I’m fluent in Arabic but using it to discuss academic/professional stuff is a little awkward because I work 99% in an English-speaking context.

So since I’m on a roll experimenting with custom bots, and also working with Anna Mills on work related to assessing genAI’s capacity to provide relevant, correct and real references, I decided to create a custom bot (temperature 0 so it would not hallucinate much) that would do two things: respond to my question with references, and automatically translate to Arabic. I used Gemini Pro 1.5 2M via poe.com and I chose this particular LLM because I assumed Gemini had web search capabilities.

Custom Bot Trial

My first ask of the model was to explain SAMR and PICRAT and give examples (and it automatically provides references and translation, without prompting, because that’s my bot’s features, right?).

I noticed three interesting things in this model:

  1. It start responding, but very frequently flags non-problematic content as violating something or other. “My response to your message was blocked for potentially violating safety policies. I apologize for any inconvenience.” but then when I would reassure it there was nothing offensive here, please proceed, it would proceed. This happened at least three times in a completely benign conversation!
  2. It gets SAMR correctly (a commonly used model for many years) but PICRAT incorrectly, which is a less known model, and it makes up unreal references frequently.
  3. When I kept pushing it to correct itself on PICRAT, it apologized and asked for help. But when I asked to do a Google search, it anthropomorphized big time, saying “Types furiously on keyboard, squinting at the screen”. Typing? Squinting? Really, was that necessary? It also asked me to give it time and it would get back to me, but I’m unsure how to do that “wait and I’ll get back to you”. Then it said it couldn’t find it. Then I gave it a direct reference title and author name, and it pretended to know the author, and proceeded to provide fake references by that author.

Remember, this is a zero temperature custom bot. It should not be making things up so much.

What do you think? I’m going to create similar bots with other underlying GPTs and see what I get. I might also try the regular Gemini and Bing that definitely do search the internet.

Free Gemini Trial

I did try regular Gemini right now and it finds the PICRAT model directly, and offers references for both of them that are correct (the links don’t work, but the titles and sources pan out). So I am unsure why the free version of Gemini is doing better than my model. Is the zero temp getting in the way?

Of note, I had to prompt additionally to get references from free Gemini, and it was able to attribute SAMR to its real author Puentendura but was not able to attribute PICRAT to its creator.

Free Bing CoPilot Trial

Bing Co-pilot on Edge responded well to the question and got the two models right and provided references (not always the best ones, but sometimes authoritative ones). We don’t need to prompt Bing Copilot to provide references, it tends to do that anyway. I used it in “precise mode”. When prompted further on the authors who created these models, it answered well, correctly, with references.

GPT4o Trial via Poe

GPT4o via poe.com did well – it responded correctly, one of the more complex responses to the question, but had to be prompted to provide references (then provided correct ones).

Conclusion from this Experiment?

Perhaps I don’t know how to use custom bots well yet, because my custom bot performed the worst, and free Gemini did better than my Gemini Pro 1.05 2M!!!

Also: Bing is better at automatically giving good-ish references, GPT4o gives more complex responses and is good at also finding refs if prompted.

Featured image of a robot typing and squinting at screen while typing via DALL-E3 via poe.com

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.