Spaces:
Running
Running
File size: 1,521 Bytes
d3bdfe1 f45ab7f b00a25d f58d3f8 f0ccb80 5e8619d f0ccb80 8d8e735 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
title: README
emoji: 🐨
colorFrom: purple
colorTo: indigo
sdk: static
pinned: false
---
**💪 Feel free to join the organization if you want to add a dataset with a similar purpose :) Please [tell me](https://tillwenke.github.io/about/) about your dataset before asking to join the org.**
To test your **RAG** and other **semantic information retrieval solutions** it would be powerful to have access to a dataset that consists of a text corpus,
correct responses to queries (e.g. question-answer) to test the solution end-to-end and maybe even a set of relevant passages
from the text corpus for each query to test the retrieval component separately as well.
We call this a question-answer-passages dataset.
There are plenty of large-scale datasets of this kind such as [Google's Natural Questions](https://ai.google.com/research/NaturalQuestions/).
Still we lack such datasets that are **small-scale** and **narrow-domain** to just test our RAG solution quickly or to see how it performs
in a certain domain context.
We created this space to create a collections of such datasets to boost the developement of RAG solutions and welcome any feedback about how your ideal RAG-Dataset would look like. :)
Datasets consist of:
* A **text corpus** already split into passages, referencing passages by id.
* A dataset for testing consistig of:
* A **question**, and one or ideally both of the followin.
* A correct **short answer**.
* A **list of the passage ids** that are relevant to answer the question.
|