DirectoryIndexer

The DirectoryIndexer indexes files in a directory structure and adding them to a catalog. Supports the following extensions:

  • .docx
  • .md
  • .mdx
  • .txt

Example

A directory indexer that ingests a GitHub repo:

const catalog = await client.getCatalog("github-docs");
const rootDir = path.join(process.env.GITHUB_DOCS_ROOT_DIR, "content");
 
const gitHubDocsIndexer = new DirectoryIndexer(catalog, {
  rootDir,
  urlBase: "https://www.acme.com",
  // an optional function that maps directory structure to URLs on a website
  getUrl,
  // set document ID to URL
  getId: getUrl,
  // only include markdown
  includeFile(filePath) {
    return filePath.endsWith(".md");
  },
});
 
await gitHubDocsIndexer.index();
 
// getUrl specifies how to map documents on disk to public URLs
const getUrl = (docsPathList: string[], sitePathList: string[]) => {
  const fileName = sitePathList.pop();
  if (fileName === "_index.md") {
    return sitePathList.join("/");
  }
 
  return [...sitePathList, fileName].join("/").slice(0, -3);
};

DirectoryIndexerOpts

  • rootDir: The root directory to start indexing from.
  • urlBase?: Optional base URL for generated URLs.
  • batchSize?: Optional batch size for document insertion (default: 25).
  • getUrl?: Optional function to generate URLs for documents by mapping directory structure to a public URL relative to urlBase.
  • getId?: Optional function to generate IDs for documents.
  • getImageUrl?: Optional function to generate image URLs for documents.
  • includeFile?: Optional function to determine if a file should be included.
  • includeDirectory?: Optional function to determine if a directory should be included.

Public methods

index()

public async index(): Promise<void>

Indexes the directory structure starting from the root directory specified in the constructor options. This method processes all files and directories, creating FileDocument objects for each included file, and adds them to the catalog in batches.

  • Returns: A Promise that resolves when the indexing process is complete.