Onboarding is Dead

Evan Boyle

LLMs have killed onboarding as we know it. Luckily, LLMs have reinvented it as well. We know that every additional step and manual input in an onboarding flow leads to falloff. No more.

Product onboarding often feels like dealing with the IRS. Despite having all my data and a deep understanding of their own system, the IRS still demands that I do all the legwork and provide numbers. Sometimes they punish my errors as well! Traditional product onboarding is no different.

I have to fill out a tedious form describing my company and select from a tired drop-down informing you about my role, company size, and objectives for using your software? Most of this data is published on my company’s website, or other publicly available sources. The rest of it can be inferred and extrapolated from it.

There is no excuse for collecting this category of information from your users manually in the year 2024. We have LLMs and LLM-optimized web scraping services that not only support scraping clean markdown ready for chunking and indexing into a vector database but also structured extraction into a predefined schema. Let's learn how to put them to use!

Effortless Onboarding with LLM-Optimized Web Scraping

At Cortex Click, we run the entire flow, soup to nuts, from a single input from the user: their website URL. This configures your catalogs, cortexes (what we call an AI agent specialized for content writing), and scrapes your entire docs and marketing website into our platform to improve the quality of content generation.


Walking through the onboarding flow configures everything necessary to get started with Cortex Click and starts indexing your website in the background.

When you only require a single input, you minimize dropoff. Most people are much better at editing than they are at synthesizing.

Learn more about the Cortex Click onboarding experience.

How to Utilize LLMs and Web Scraping for Onboarding

Let's look at a practical example of using Firecrawl to scrape a page, and then feeding the input into OpenAI using structured extraction against a JSON schema. With this output, we can prepopulate values for the rest of our onboarding flow.

Scraping Web Pages with Firecrawl

Firecrawl offers an LLM-optimized web scraper that converts output into LLM-friendly markdown. It includes features to remove common site elements like headers, footers, and navigation bars to increase the content density of each document and reduce duplicate elements that harm the quality of Retrieval-Augmented Generation (RAG). For more information, visit the Firecrawl website.

import FirecrawlApp, { ScrapeResponse } from "@mendable/firecrawl-js";
 
const fc = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
 
const result = await fc.scrapeUrl("https://example.com", {
  pageOptions: {
    onlyMainContent: true,
    replaceAllPathsWithAbsolutePaths: true,
  },
  timeout: 75000, // Some slower websites might need longer timeouts
});

Using OpenAI GPT-4o for Structured Data Extraction

We'll use GPT-4o to extract structured data from web scraping results. This approach allows you to transform unstructured web content into actionable JSON data with a single input. We specify the response_format: { type: "json_object" } parameter, and include a description of the desired JSON schema in the prompt. We also provide few shot examples to improve output quality.

Automating this process streamlines your onboarding flow, reducing friction and enabling users to get started quickly with minimal effort.

import { OpenAI } from "openai";
 
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});
 
const systemMessage = `
Extract the following information from the provided web page content:
- company_name: the name of the company
- domain: the primary industry or sector that the company operates in
- size: the approximate number of employees in the company
- primary_objectives: an array of key goals or focus areas that reflect what the company likely wants to get out of Cortex Click's GTM platform. Options are "grow top-of-funnel", "create high quality documentation", or "reduce customer support resolution times".
 
Return the extracted data in the following JSON format:
{
  "company_name": "<company_name>",
  "domain": "<domain>",
  "size": <size>,
  "primary_objectives": ["<primary_objective_1>", "<primary_objective_2>", ...]
}
 
Examples:
 
Input:
"Acme Corp is a leading provider of industrial solutions. With a workforce of over 5000 employees, Acme Corp aims to innovate and streamline manufacturing processes. Visit us at acme-corp.com."
 
Output:
{
  "company_name": "Acme Corp",
  "domain": "manufacturing",
  "size": 5000,
  "primary_objectives": ["create high quality documentation", "reduce customer support resolution times"]
}
 
Input:
"Tech Innovators Inc. specializes in cutting-edge software development. Our team of 200 experts is dedicated to revolutionizing the tech industry. Learn more at techinnovators.io."
 
Output:
{
  "company_name": "Tech Innovators Inc.",
  "domain": "software",
  "size": 200,
  "primary_objectives": ["grow top-of-funnel"]
}
`;
 
const extractStructuredData = async (content: string) => {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    temperature: 0.2,
    max_tokens: 800,
    response_format: { type: "json_object" },
    messages: [
      {
        role: "system",
        content: systemMessage,
      },
      {
        role: "user",
        content,
      },
    ],
  });
 
  const structuredData = JSON.parse(response.choices[0].message.content);
  return structuredData;
};
 
const data = await extractStructuredData(result.data?.content);
console.log("Structured Data: ", data);

From here, you have a piece of JSON that can be used to kick off the state of your onboarding flow. Users have the option of making edits, but it is just one click to get started using your product.

Putting AI to Use for Go-to-Market

Cortex Click helps technical products build a high-quality content strategy. I wrote this blog post with the help of Cortex Click. This includes a cortex generating first draft, 15 AI refinements, and a few human edits to get the code examples right. The entire process only took one hour.

Cortex Click specializes in helping marketers, engineers, and sales teams generate high-quality content swiftly. Here's how it works:

Cortex Click helps your team create higher quality content in 10% of the time through a combination of adversarial, mutli-agent LLM workflows, grounding in your company's existing data, and human review.

Get Started with Cortex Click:

The future of onboarding is here, and it's effortless, accurate, and zero-configuration. Embrace the power of LLMs and web scraping, and experience the new era of automated onboarding that saves time and minimizes falloff.