GitHub repos

Entire GitHub repos can be indexed with the DirectoryIndexer. This involves cloning the repo, and configuring an instance of DirectoryIndexer that can optionally map file names in the repo to public URLs if the content is hosted on a website as a part of a docs generator.

const catalog = await client.getCatalog("github-docs");
const rootDir = path.join(process.env.GITHUB_DOCS_ROOT_DIR, "content");
 
const gitHubDocsIndexer = new DirectoryIndexer(catalog, {
  rootDir,
  urlBase: "https://www.acme.com",
  // an optional function that maps directory structure to URLs on a website
  getUrl,
  // set document ID to URL
  getId: getUrl,
  // only include markdown
  includeFile(filePath) {
    return filePath.endsWith(".md");
  },
});
 
await gitHubDocsIndexer.index();
 
// getUrl specifies how to map documents on disk to public URLs
const getUrl = (docsPathList: string[], sitePathList: string[]) => {
  const fileName = sitePathList.pop();
  if (fileName === "_index.md") {
    return sitePathList.join("/");
  }
 
  return [...sitePathList, fileName].join("/").slice(0, -3);
};

Roadmap

A GitHub app that automates indexing repos from your organziation is under active development.