DirectoryIndexer
The DirectoryIndexer
indexes files in a directory structure and adding them to a catalog. Supports the following extensions:
.docx
.md
.mdx
.txt
Example
A directory indexer that ingests a GitHub repo:
const catalog = await client.getCatalog("github-docs");
const rootDir = path.join(process.env.GITHUB_DOCS_ROOT_DIR, "content");
const gitHubDocsIndexer = new DirectoryIndexer(catalog, {
rootDir,
urlBase: "https://www.acme.com",
// an optional function that maps directory structure to URLs on a website
getUrl,
// set document ID to URL
getId: getUrl,
// only include markdown
includeFile(filePath) {
return filePath.endsWith(".md");
},
});
await gitHubDocsIndexer.index();
// getUrl specifies how to map documents on disk to public URLs
const getUrl = (docsPathList: string[], sitePathList: string[]) => {
const fileName = sitePathList.pop();
if (fileName === "_index.md") {
return sitePathList.join("/");
}
return [...sitePathList, fileName].join("/").slice(0, -3);
};
DirectoryIndexerOpts
rootDir
: The root directory to start indexing from.urlBase?
: Optional base URL for generated URLs.batchSize?
: Optional batch size for document insertion (default: 25).getUrl?
: Optional function to generate URLs for documents by mapping directory structure to a public URL relative tourlBase
.getId?
: Optional function to generate IDs for documents.getImageUrl?
: Optional function to generate image URLs for documents.includeFile?
: Optional function to determine if a file should be included.includeDirectory?
: Optional function to determine if a directory should be included.
Public methods
index()
public async index(): Promise<void>
Indexes the directory structure starting from the root directory specified in the constructor options. This method processes all files and directories, creating FileDocument
objects for each included file, and adds them to the catalog in batches.
- Returns: A Promise that resolves when the indexing process is complete.