Boba is an experimental AI co-pilot for product technique & generative ideation,
designed to enhance the inventive ideation course of. It’s an LLM-powered
utility that we’re constructing to find out about:

An AI co-pilot refers to a synthetic intelligence-powered assistant designed
to assist customers with varied duties, typically offering steering, help, and automation
in numerous contexts. Examples of its utility embody navigation methods,
digital assistants, and software program improvement environments. We like to think about a co-pilot
as an efficient companion {that a} consumer can collaborate with to carry out a selected area
of duties.

Boba as an AI co-pilot is designed to enhance the early levels of technique ideation and
idea era, which rely closely on fast cycles of divergent
pondering (also referred to as generative ideation). We usually implement generative ideation
by intently collaborating with our friends, clients and subject material specialists, in order that we will
formulate and take a look at progressive concepts that tackle our clients’ jobs, pains and positive factors.
This begs the query, what if AI might additionally take part in the identical course of? What if we
might generate and consider extra and higher concepts, quicker in partnership with AI? Boba begins to
allow this by utilizing OpenAI’s LLM to generate concepts and reply questions
that may assist scale and speed up the inventive pondering course of. For the primary prototype of
Boba, we determined to concentrate on rudimentary variations of the next capabilities:

1. Analysis indicators and tendencies: Search the net for
articles and information that will help you reply qualitative analysis questions,

2. Artistic Matrix: The inventive matrix is a concepting technique for
sparking new concepts on the intersections of distinct classes or
dimensions. This entails stating a strategic immediate, typically as a “How would possibly
we” query, after which answering that query for every
mixture/permutation of concepts on the intersection of every dimension. For

3. State of affairs constructing: State of affairs constructing is a means of
producing future-oriented tales by researching indicators of change in
enterprise, tradition, and expertise. Situations are used to socialize learnings
in a contextualized narrative, encourage divergent product pondering, conduct
resilience/desirability testing, and/or inform strategic planning. For
instance, you’ll be able to immediate Boba with the next and get a set of future
eventualities based mostly on completely different time horizons and ranges of optimism and

4. Technique ideation: Utilizing the Taking part in to Win technique
framework, brainstorm “the place to play” and “how one can win” selections
based mostly on a strategic immediate and doable future eventualities. For instance you
can immediate it with:

5. Idea era: Primarily based on a strategic immediate, similar to a “how would possibly we” query, generate
a number of product or function ideas, which embody worth proposition pitches and hypotheses to check.

6. Storyboarding: Generate visible storyboards based mostly on a easy
immediate or detailed narrative based mostly on present or future state eventualities. The
key options are:

Utilizing Boba

Boba is an internet utility that mediates an interplay between a human
consumer and a Giant-Language Mannequin, presently GPT 3.5. A easy net
front-end to an LLM simply gives the flexibility for the consumer to converse with
the LLM. That is useful, however means the consumer must learn to
successfully work together the LLM. Even within the quick time that LLMs have seized
the general public curiosity, we have realized that there’s appreciable ability to
setting up the prompts to the LLM to get a helpful reply, leading to
the notion of a “Immediate Engineer”. A co-pilot utility like Boba provides
a spread of UI parts that construction the dialog. This enables a consumer
to make naive prompts which the applying can manipulate, enriching
easy requests with parts that may yield a greater response from the

Boba will help with quite a lot of product technique duties. We cannot
describe all of them right here, simply sufficient to present a way of what Boba does and
to supply context for the patterns later within the article.

When a consumer navigates to the Boba utility, they see an preliminary
display just like this

The left panel lists the varied product technique duties that Boba
helps. Clicking on considered one of these adjustments the primary panel to the UI for
that process. For the remainder of the screenshots, we’ll ignore that process panel
on the left.

The above screenshot seems to be on the state of affairs design process. This invitations
the consumer to enter a immediate, similar to “Present me the way forward for retail”.

The UI gives quite a lot of drop-downs along with the immediate, permitting
the consumer to recommend time-horizons and the character of the prediction. Boba
will then ask the LLM to generate eventualities, utilizing Templated Prompt to complement the consumer’s immediate
with extra parts each from common data of the state of affairs
constructing process and from the consumer’s alternatives within the UI.

Boba receives a Structured Response from the LLM and shows the
consequence as set of UI parts for every state of affairs.

The consumer can then take considered one of these eventualities and hit the discover
button, mentioning a brand new panel with an extra immediate to have a Contextual Conversation with Boba.

Boba takes this immediate and enriches it to concentrate on the context of the
chosen state of affairs earlier than sending it to the LLM.

Boba makes use of Select and Carry Context
to carry onto the varied elements of the consumer’s interplay
with the LLM, permitting the consumer to discover in a number of instructions with out
having to fret about supplying the appropriate context for every interplay.

One of many difficulties with utilizing an
LLM is that it is skilled solely on information as much as some level previously, making
them ineffective for working with up-to-date info. Boba has a
function known as analysis indicators that makes use of Embedded External Knowledge
to mix the LLM with common search
amenities. It takes the prompted analysis question, similar to “How is the
lodge trade utilizing generative AI at the moment?”, sends an enriched model of
that question to a search engine, retrieves the urged articles, sends
every article to the LLM to summarize.

That is an instance of how a co-pilot utility can deal with
interactions that contain actions that an LLM alone is not appropriate for. Not
simply does this present up-to-date info, we will additionally guarantee we
present supply hyperlinks to the consumer, and people hyperlinks will not be hallucinations
(so long as the search engine is not partaking of the improper mushrooms).

Some patterns for constructing generative co-pilot functions

In constructing Boba, we learnt rather a lot about completely different patterns and approaches
to mediating a dialog between a consumer and an LLM, particularly Open AI’s
GPT3.5/4. This checklist of patterns is just not exhaustive and is restricted to the teachings
we have learnt up to now whereas constructing Boba.

Templated Immediate

Use a textual content template to complement a immediate with context and construction

The primary and easiest sample is utilizing a string templates for the prompts, additionally
generally known as chaining. We use Langchain, a library that gives an ordinary
interface for chains and end-to-end chains for frequent functions out of
the field. In the event you’ve used a Javascript templating engine, similar to Nunjucks,
EJS or Handlebars earlier than, Langchain gives simply that, however is designed particularly for
frequent immediate engineering workflows, together with options for operate enter variables,
few-shot immediate templates, immediate validation, and extra subtle composable chains of prompts.

For instance, to brainstorm potential future eventualities in Boba, you’ll be able to
enter a strategic immediate, similar to “Present me the way forward for funds” or perhaps a
easy immediate just like the title of an organization. The consumer interface seems to be like

The immediate template that powers this era seems to be one thing like

You're a visionary futurist. Given a strategic immediate, you'll create
num_scenarios futuristic, hypothetical eventualities that occur
time_horizon from now. Every state of affairs should be a optimism model of the
future. Every state of affairs should be realism.

Strategic immediate: strategic_prompt

As you’ll be able to think about, the LLM’s response will solely be pretty much as good because the immediate
itself, so that is the place the necessity for good immediate engineering is available in.
Whereas this text is just not supposed to be an introduction to immediate
engineering, you’ll discover some methods at play right here, similar to beginning
by telling the LLM to Adopt a
particularly that of a visionary futurist. This was a way we relied on
extensively in varied elements of the applying to provide extra related and
helpful completions.

As a part of our test-and-learn immediate engineering workflow, we discovered that
iterating on the immediate instantly in ChatGPT gives the shortest path from
thought to experimentation and helps construct confidence in our prompts shortly.
Having stated that, we additionally discovered that we spent far more time on the consumer
interface (about 80%) than the AI itself (about 20%), particularly in
engineering the prompts.

We additionally saved our immediate templates so simple as doable, devoid of
conditional statements. Once we wanted to drastically adapt the immediate based mostly
on the consumer enter, similar to when the consumer clicks “Add particulars (indicators,
threats, alternatives)”, we determined to run a unique immediate template
altogether, within the curiosity of conserving our immediate templates from turning into
too advanced and laborious to take care of.

Structured Response

Inform the LLM to reply in a structured information format

Virtually any utility you construct with LLMs will almost definitely must parse
the output of the LLM to create some structured or semi-structured information to
additional function on on behalf of the consumer. For Boba, we wished to work with
JSON as a lot as doable, so we tried many alternative variations of getting
GPT to return well-formed JSON. We have been fairly shocked by how properly and
constantly GPT returns well-formed JSON based mostly on the directions in our
prompts. For instance, right here’s what the state of affairs era response
directions would possibly appear to be:

You'll reply with solely a legitimate JSON array of state of affairs objects.
Every state of affairs object can have the next schema:
    "title": <string>,       //Should be an entire sentence written previously tense
    "abstract": <string>,   //State of affairs description
    "plausibility": <string>,  //Plausibility of state of affairs
    "horizon": <string>

We have been equally shocked by the truth that it might help pretty advanced
nested JSON schemas, even once we described the response schemas in pseudo-code.
Right here’s an instance of how we would describe a nested response for technique

You'll reply in JSON format containing two keys, "questions" and "methods", with the respective schemas under:
    "questions": [<list of question objects, with each containing the following keys:>]
      "query": <string>,           
      "reply": <string>             
    "methods": [<list of strategy objects, with each containing the following keys:>]
      "title": <string>,               
      "abstract": <string>,             
      "problem_diagnosis": <string>, 
      "winning_aspiration": <string>,   
      "where_to_play": <string>,        
      "how_to_win": <string>,           
      "assumptions": <string>          

An fascinating aspect impact of describing the JSON response schema was that we
might additionally nudge the LLM to supply extra related responses within the output. For
instance, for the Artistic Matrix, we would like the LLM to consider many alternative
dimensions (the immediate, the row, the columns, and every concept that responds to the
immediate on the intersection of every row and column):

By offering a few-shot immediate that features a particular instance of the output
schema, we have been capable of get the LLM to “assume” in the appropriate context for every
thought (the context being the immediate, row and column):

You'll reply with a legitimate JSON array, by row by column by thought. For instance:

If Rows = "row 0, row 1" and Columns = "column 0, column 1" then you'll reply
with the next:

    "row": "row 0",
    "columns": [
        "column": "column 0",
        "ideas": [
            "title": "Idea 0 title for prompt and row 0 and column 0",
            "description": "idea 0 for prompt and row 0 and column 0"
        "column": "column 1",
        "concepts": [
            "title": "Idea 0 title for prompt and row 0 and column 1",
            "description": "idea 0 for prompt and row 0 and column 1"
    "row": "row 1",
    "columns": [
        "column": "column 0",
        "ideas": [
            "title": "Idea 0 title for prompt and row 1 and column 0",
            "description": "idea 0 for prompt and row 1 and column 0"
        "column": "column 1",
        "concepts": [
            "title": "Idea 0 title for prompt and row 1 and column 1",
            "description": "idea 0 for prompt and row 1 and column 1"

We might have alternatively described the schema extra succinctly and
usually, however by being extra elaborate and particular in our instance, we
efficiently nudged the standard of the LLM’s response within the route we
wished. We imagine it’s because LLMs “assume” in tokens, and outputting (ie
repeating) the row and column values earlier than outputting the concepts gives extra
correct context for the concepts being generated.

On the time of this writing, OpenAI has launched a brand new function known as
, which
gives a unique method to obtain the purpose of formatting responses. On this
strategy, a developer can describe callable operate signatures and their
respective schemas as JSON, and have the LLM return a operate name with the
respective parameters offered in JSON that conforms to that schema. That is
notably helpful in eventualities while you need to invoke exterior instruments, similar to
performing an internet search or calling an API in response to a immediate. Langchain
additionally gives comparable performance, however I think about they’ll quickly present native
integration between their exterior instruments API and the OpenAI operate calling

Actual-Time Progress

Stream the response to the UI so customers can monitor progress

One of many first few belongings you’ll notice when implementing a graphical
consumer interface on prime of an LLM is that ready for the complete response to
full takes too lengthy. We don’t discover this as a lot with ChatGPT as a result of
it streams the response character by character. This is a vital consumer
interplay sample to remember as a result of, in our expertise, a consumer can
solely wait on a spinner for thus lengthy earlier than dropping persistence. In our case, we
didn’t need the consumer to attend various seconds earlier than they began
seeing a response, even when it was a partial one.

Therefore, when implementing a co-pilot expertise, we extremely suggest
exhibiting real-time progress through the execution of prompts that take extra
than just a few seconds to finish. In our case, this meant streaming the
generations throughout the complete stack, from the LLM again to the UI in real-time.
Luckily, the Langchain and OpenAI APIs present the flexibility to just do

const chat = new ChatOpenAI(
  temperature: 1,
  modelName: 'gpt-3.5-turbo',
  streaming: true,
  callbackManager: onTokenStream ?
      async handleLLMNewToken(token) 
    ) : undefined

This allowed us to supply the real-time progress wanted to create a smoother
expertise for the consumer, together with the flexibility to cease a era
mid-completion if the concepts being generated didn’t match the consumer’s

Nevertheless, doing so provides lots of extra complexity to your utility
logic, particularly on the view and controller. Within the case of Boba, we additionally had
to carry out best-effort parsing of JSON and keep temporal state through the
execution of an LLM name. On the time of penning this, some new and promising
libraries are popping out that make this simpler for net builders. For instance,
the Vercel AI SDK is a library for constructing
edge-ready AI-powered streaming textual content and chat UIs.

Choose and Carry Context

Seize and add related context info to subsequent motion

One of many largest limitations of a chat interface is {that a} consumer is
restricted to a single-threaded context: the dialog chat window. When
designing a co-pilot expertise, we suggest pondering deeply about how one can
design UX affordances for performing actions inside the context of a
choice, just like our pure inclination to level at one thing in actual
life within the context of an motion or description.

Select and Carry Context permits the consumer to slender or broaden the scope of
interplay to carry out subsequent duties – also referred to as the duty context. That is usually
performed by deciding on a number of parts within the consumer interface after which performing an motion on them.
Within the case of Boba, for instance, we use this sample to permit the consumer to have
a narrower, centered dialog about an thought by deciding on it (eg a state of affairs, technique or
prototype idea), in addition to to pick and generate variations of a
idea. First, the consumer selects an thought (both explicitly with a checkbox or implicitly by clicking a hyperlink):

Then, when the consumer performs an motion on the choice, the chosen merchandise(s) are carried over as context into the brand new process,
for instance as state of affairs subprompts for technique era when the consumer clicks “Brainstorm methods and questions for this state of affairs”,
or as context for a pure language dialog when the consumer clicks Discover:

Relying on the character and size of the context
you want to set up for a section of dialog/interplay, implementing
Select and Carry Context might be wherever from very simple to very tough. When
the context is transient and may match right into a single LLM context window (the utmost
dimension of a immediate that the LLM helps), we will implement it by means of immediate
engineering alone. For instance, in Boba, as proven above, you’ll be able to click on “Discover”
on an thought and have a dialog with Boba about that concept. The best way we
implement this within the backend is to create a multi-message chat

const chatPrompt = ChatPromptTemplate.fromPromptMessages([
const formattedPrompt = await chatPrompt.formatPromptValue(
  enter: enter

One other strategy of implementing Select and Carry Context is to take action inside
the immediate by offering the context inside tag delimiters, as proven under. In
this case, the consumer has chosen a number of eventualities and desires to generate
methods for these eventualities (a way typically utilized in state of affairs constructing and
stress testing of concepts). The context we need to carry into the technique
era is assortment of chosen eventualities:

Your questions and methods should be particular to realizing the next
potential future eventualities (if any)

Nevertheless, when your context outgrows an LLM’s context window, or when you want
to supply a extra subtle chain of previous interactions, you could have to
resort to utilizing exterior short-term reminiscence, which generally entails utilizing a
vector retailer (in-memory or exterior). We’ll give an instance of how one can do
one thing comparable in Embedded External Knowledge.

If you wish to be taught extra concerning the efficient use of choice and
context in generative functions, we extremely suggest a chat given by
Linus Lee, of Notion, on the LLMs in Manufacturing convention: “Generative Experiences Beyond Chat”.

Contextual Dialog

Enable direct dialog with the LLM inside a context.

This can be a particular case of Select and Carry Context.
Whereas we wished Boba to interrupt out of the chat window interplay mannequin
as a lot as doable, we discovered that it’s nonetheless very helpful to supply the
consumer a “fallback” channel to converse instantly with the LLM. This enables us
to supply a conversational expertise for interactions we don’t help in
the UI, and help circumstances when having a textual pure language
dialog does take advantage of sense for the consumer.

Within the instance under, the consumer is chatting with Boba a few idea for
customized spotlight reels offered by Rogers Sportsnet. The entire
context is talked about as a chat message (“On this idea, Uncover a world of
sports activities you like…”), and the consumer has requested Boba to create a consumer journey for
the idea. The response from the LLM is formatted and rendered as Markdown:

When designing generative co-pilot experiences, we extremely suggest
supporting contextual conversations together with your utility. Be certain that to
provide examples of helpful messages the consumer can ship to your utility so
they know what sort of conversations they will have interaction in. Within the case of
Boba, as proven within the screenshot above, these examples are supplied as
message templates below the enter field, similar to “Are you able to be extra

Out-Loud Pondering

Inform LLM to generate intermediate outcomes whereas answering

Whereas LLMs don’t truly “assume”, it’s price pondering metaphorically
a few phrase by Andrei Karpathy of OpenAI: “LLMs ‘think’ in
What he means by this
is that GPTs are inclined to make extra reasoning errors when making an attempt to reply a
query straight away, versus while you give them extra time (i.e. extra tokens)
to “assume”. In constructing Boba, we discovered that utilizing Chain of Thought (CoT)
prompting, or extra particularly, asking for a series of reasoning earlier than an
reply, helped the LLM to purpose its approach towards higher-quality and extra
related responses.

In some elements of Boba, like technique and idea era, we ask the
LLM to generate a set of questions that increase on the consumer’s enter immediate
earlier than producing the concepts (methods and ideas on this case).

Whereas we show the questions generated by the LLM, an equally efficient
variant of this sample is to implement an inside monologue that the consumer is
not uncovered to. On this case, we might ask the LLM to assume by means of their
response and put that interior monologue right into a separate a part of the response, that
we will parse out and ignore within the outcomes we present to the consumer. A extra elaborate
description of this sample might be present in OpenAI’s GPT Best Practices
, within the
part Give GPTs time to

As a consumer expertise sample for generative functions, we discovered it useful
to share the reasoning course of with the consumer, wherever applicable, in order that the
consumer has extra context to iterate on the following motion or immediate. For
instance, in Boba, figuring out the sorts of questions that Boba considered provides the
consumer extra concepts about divergent areas to discover, or to not discover. It additionally
permits the consumer to ask Boba to exclude sure lessons of concepts within the subsequent
iteration. In the event you do go down this path, we suggest making a UI affordance
for hiding a monologue or chain of thought, similar to Boba’s function to toggle
examples proven above.

Iterative Response

Present affordances for the consumer to have a back-and-forth
interplay with the co-pilot

LLMs are certain to both misunderstand the consumer’s intent or just
generate responses that don’t meet the consumer’s expectations. Therefore, so is
your generative utility. Probably the most highly effective capabilities that
distinguishes ChatGPT from conventional chatbots is the flexibility to flexibly
iterate on and refine the route of the dialog, and therefore enhance
the standard and relevance of the responses generated.

Equally, we imagine that the standard of a generative co-pilot
expertise depends upon the flexibility of a consumer to have a fluid back-and-forth
interplay with the co-pilot. That is what we name the Iterate on Response
sample. This could contain a number of approaches:

  • Correcting the unique enter offered to the applying/LLM
  • Refining part of the co-pilot’s response to the consumer
  • Offering suggestions to nudge the applying in a unique route

One instance of the place we’ve carried out Iterative Response
Boba is in Storyboarding. Given a immediate (both transient or elaborate), Boba
can generate a visible storyboard, which incorporates a number of scenes, with every
scene having a story script and a picture generated with Secure
Diffusion. For instance, under is a partial storyboard describing the expertise of a
“Resort of the Future”:

Since Boba makes use of the LLM to generate the Secure Diffusion immediate, we don’t
understand how good the pictures will end up–so it’s a little bit of a hit and miss with
this function. To compensate for this, we determined to supply the consumer the
capability to iterate on the picture immediate in order that they will refine the picture for
a given scene. The consumer would do that by merely clicking on the picture,
updating the Secure Diffusion immediate, and urgent Performed, upon which Boba
would generate a brand new picture with the up to date immediate, whereas preserving the
remainder of the storyboard:

One other instance Iterative Response that we
are presently engaged on is a function for the consumer to supply suggestions
to Boba on the standard of concepts generated, which might be a mix
of Select and Carry Context and Iterative Response. One
strategy could be to present a thumbs up or thumbs down on an thought, and
letting Boba incorporate that suggestions into a brand new or subsequent set of
suggestions. One other strategy could be to supply conversational
suggestions within the type of pure language. Both approach, we want to
do that in a method that helps reinforcement studying (the concepts get
higher as you present extra suggestions). A superb instance of this may be
Github Copilot, which demotes code ideas which were ignored by
the consumer in its rating of subsequent greatest code ideas.

We imagine that this is among the most essential, albeit
generically-framed, patterns to implementing efficient generative
experiences. The difficult half is incorporating the context of the
suggestions into subsequent responses, which is able to typically require implementing
short-term or long-term reminiscence in your utility due to the restricted
dimension of context home windows.

Embedded Exterior Information

Mix LLM with different info sources to entry information past
the LLM’s coaching set

As alluded to earlier on this article, oftentimes your generative
functions will want the LLM to include exterior instruments (similar to an API
name) or exterior reminiscence (short-term or long-term). We bumped into this
state of affairs once we have been implementing the Analysis function in Boba, which
permits customers to reply qualitative analysis questions based mostly on publicly
accessible info on the net, for instance “How is the lodge trade
utilizing generative AI at the moment?”:

To implement this, we needed to “equip” the LLM with Google as an exterior
net search instrument and provides the LLM the flexibility to learn doubtlessly lengthy
articles that will not match into the context window of a immediate. We additionally
wished Boba to have the ability to chat with the consumer about any related articles the
consumer finds, which required implementing a type of short-term reminiscence. Lastly,
we wished to supply the consumer with correct hyperlinks and references that have been
used to reply the consumer’s analysis query.

The best way we carried out this in Boba is as follows:

  1. Use a Google SERP API to carry out the net search based mostly on the consumer’s question
    and get the highest 10 articles (search outcomes)
  2. Learn the complete content material of every article utilizing the Extract API
  3. Save the content material of every article in short-term reminiscence, particularly an
    in-memory vector retailer. The embeddings for the vector retailer are generated utilizing
    the OpenAI API, and based mostly on chunks of every article (versus embedding the complete
    article itself).
  4. Generate an embedding of the consumer’s search question
  5. Question the vector retailer utilizing the embedding of the search question
  6. Immediate the LLM to reply the consumer’s unique question in pure language,
    whereas prefixing the outcomes of the vector retailer question as context into the LLM

This will likely sound like lots of steps, however that is the place utilizing a instrument like
Langchain can pace up your course of. Particularly, Langchain has an
end-to-end chain known as VectorDBQAChain, and utilizing that to carry out the
question-answering took just a few traces of code in Boba:

const researchArticle = async (article, immediate) => 
  const mannequin = new OpenAI();
  const textual content = article.textual content;
  const textSplitter = new RecursiveCharacterTextSplitter( chunkSize: 1000 );
  const docs = await textSplitter.createDocuments([text]);
  const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
  const chain = VectorDBQAChain.fromLLM(mannequin, vectorStore);
  const res = await
    input_documents: docs,
    question: immediate + ". Be detailed in your response.",
  return  research_answer: res.textual content ;

The article textual content accommodates the complete content material of the article, which can not
match inside a single immediate. So we carry out the steps described above. As you’ll be able to
see, we used an in-memory vector retailer known as HNSWLib (Hierarchical Navigable
Small World). HNSW graphs are among the many top-performing indexes for vector
similarity search. Nevertheless, for bigger scale use circumstances and/or long-term reminiscence,
we suggest utilizing an exterior vector DB like Pinecone or Weaviate.

We additionally might have additional streamlined our workflow by utilizing Langchain’s
exterior instruments API to carry out the Google search, however we determined in opposition to it
as a result of it offloaded an excessive amount of determination making to Langchain, and we have been getting
combined, gradual and harder-to-parse outcomes. One other strategy to implementing
exterior instruments is to make use of Open AI’s lately launched Function Calling
, which we
talked about earlier on this article.

To summarize, we mixed two distinct methods to implement Embedded External Knowledge:

  1. Use Exterior Device: Search and browse articles utilizing Google SERP and Extract
  2. Use Exterior Reminiscence: Brief-term reminiscence utilizing an in-memory vector retailer