document-chunking

Here are 9 public repositories matching this topic...

messkan / rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

python nlp ia chunking rag vector-search embedding-vectors llm langchain retrieval-augmented-generation text-splitting rag-pipeline document-chunking

Updated Jan 18, 2026
Python

GiovanniPasq / chunky

Star

Convert and validate your Markdown, then choose the best chunking strategy for your RAG pipeline.

Updated Apr 25, 2026
Python

speedyk-005 / chunklet-py

Star

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

visualization python nlp natural-language-processing chunking code-structure code-chunking rag chunks-processing chunks-algorithm text-splitting document-chunking

Updated May 4, 2026
Python

davidmoserai / AzureDocumentIntelligenceChunker

Star

A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.

react python agent azure chunking agents unstructured-data rag production-grade react-pdf-viewer layout-parser llm langchain retrieval-augmented-generation azure-ai-search azure-ai-document-intelligence layout-parsing document-chunking

Updated Jan 11, 2025
Python

ItzikAquaMotek / rag-chunk

Star

📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.

tree-sitter library ai csharp dotnet chroma ia code-structure embedding-vectors streamlit hybrid-search aisearch semantickernel text-chunking rag-pipeline llama3 document-chunking propositional-models

Updated May 6, 2026
Python

FoxRav / RL-astradb-

Sponsor

Star

Astra Vector DB on Python-paketti, joka tallentaa dokumentteja DataStax Astra DB -vektoritietokantaan ja suorittaa semanttista hakua.

python nlp law finland embeddings openai semantic-search finlex document-search rag vector-search legal-tech vector-database astradb rag-pipeline document-chunking

Updated Jan 10, 2026
Python

kooroshsajadi / retrieval-augmented-generation

Star

This repository provides a fully modular implementation of a Retrieval-Augmented Generation (RAG) pipeline tailored for Italian legal-domain documents.

vectorization reranking rag hybrid-retrieval retrieval-augmented-generation document-chunking