Question 1

What is the difference between the Token Splitter and the Recursive Character Text Splitter?

Accepted Answer

The Token Splitter divides text based on token count, which aligns with how language models process text. The Recursive Character Text Splitter divides based on character count using logical boundaries like paragraphs and sentences. Token splitting is more precise for model input limits, while character splitting is faster and simpler.

Question 2

Why does token-based splitting matter for AI workflows?

Accepted Answer

Language models have token limits, not character limits. A single token can represent a whole word, part of a word, or punctuation. Splitting by tokens ensures each chunk fits within the model's context window precisely, avoiding truncation or wasted capacity.

Question 3

What is chunk overlap and why should I use it?

Accepted Answer

Chunk overlap means adjacent chunks share some text at their boundaries. This ensures that if an important piece of information spans two chunks, it appears in both. Without overlap, you risk losing context at the split points, which degrades retrieval and summarisation quality.

Question 4

How do I choose the right chunk size?

Accepted Answer

It depends on your model and use case. For embeddings, chunks of 256 to 512 tokens often work well because they are small enough to be specific but large enough to carry meaningful context. For summarisation, larger chunks reduce the number of API calls needed.

Question 5

Which tokeniser does the Token Splitter use?

Accepted Answer

The node typically uses the tiktoken library, which implements the same tokenisation used by OpenAI models. If you are using a different model provider, verify that the tokenisation aligns with your model to ensure accurate chunk sizing.

Question 6

Can Osher help us optimise our document chunking strategy?

Accepted Answer

Yes. Chunking strategy significantly impacts the quality of RAG systems, summarisation pipelines, and document classification workflows. We help clients test different chunk sizes and overlap settings against their actual data to find the configuration that produces the best results.

Token Splitter integration & automation experts

What you can automate with Token Splitter

Frequently Asked Questions

We work hand-in-hand with you to implement Token Splitter

Identify the text to split

Add the Token Splitter to your workflow

Configure the chunk size in tokens

Set the overlap parameter

Test with representative content

Connect chunks to downstream AI processing

Works well with Token Splitter

Ready to automate Token Splitter?

Transform your business with Token Splitter