A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
-
Updated
Jan 18, 2026 - Python
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
Convert and validate your Markdown, then choose the best chunking strategy for your RAG pipeline.
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.
Astra Vector DB on Python-paketti, joka tallentaa dokumentteja DataStax Astra DB -vektoritietokantaan ja suorittaa semanttista hakua.
This repository provides a fully modular implementation of a Retrieval-Augmented Generation (RAG) pipeline tailored for Italian legal-domain documents.
building a CPU-Only "PDF Q&A System" using hugging face, chromaDB vector search, and Python
A Controlled Natural Language (CNL) for AI designed to "minify" language and make AI context denser.
Add a description, image, and links to the document-chunking topic page so that developers can more easily learn about it.
To associate your repository with the document-chunking topic, visit your repo's landing page and select "manage topics."