phosseini commited on
Commit
fb946e8
1 Parent(s): 66cdad4

adding notebook for ChatDoctor data

Browse files
Files changed (1) hide show
  1. notebooks/ChatDoctor-data.ipynb +205 -0
notebooks/ChatDoctor-data.ipynb ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "Access token is needed for reading this data (contact Lavita's admin: [email protected])"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "code",
12
+ "execution_count": 1,
13
+ "metadata": {},
14
+ "outputs": [],
15
+ "source": [
16
+ "from datasets import load_dataset"
17
+ ]
18
+ },
19
+ {
20
+ "cell_type": "code",
21
+ "execution_count": 2,
22
+ "metadata": {},
23
+ "outputs": [],
24
+ "source": [
25
+ "# Reading data from Lavita's Hugging Face\n",
26
+ "data_hf_name = \"lavita/ChatDoctor-HealthCareMagic-100k\"\n",
27
+ "dataset = load_dataset(data_hf_name)\n",
28
+ "\n",
29
+ "# Converting data to pandas DataFrame\n",
30
+ "df = dataset['train'].to_pandas()"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": 3,
36
+ "metadata": {},
37
+ "outputs": [
38
+ {
39
+ "data": {
40
+ "text/html": [
41
+ "<div>\n",
42
+ "<style scoped>\n",
43
+ " .dataframe tbody tr th:only-of-type {\n",
44
+ " vertical-align: middle;\n",
45
+ " }\n",
46
+ "\n",
47
+ " .dataframe tbody tr th {\n",
48
+ " vertical-align: top;\n",
49
+ " }\n",
50
+ "\n",
51
+ " .dataframe thead th {\n",
52
+ " text-align: right;\n",
53
+ " }\n",
54
+ "</style>\n",
55
+ "<table border=\"1\" class=\"dataframe\">\n",
56
+ " <thead>\n",
57
+ " <tr style=\"text-align: right;\">\n",
58
+ " <th></th>\n",
59
+ " <th>instruction</th>\n",
60
+ " <th>input</th>\n",
61
+ " <th>output</th>\n",
62
+ " </tr>\n",
63
+ " </thead>\n",
64
+ " <tbody>\n",
65
+ " <tr>\n",
66
+ " <th>0</th>\n",
67
+ " <td>If you are a doctor, please answer the medical...</td>\n",
68
+ " <td>I woke up this morning feeling the whole room ...</td>\n",
69
+ " <td>Hi, Thank you for posting your query. The most...</td>\n",
70
+ " </tr>\n",
71
+ " <tr>\n",
72
+ " <th>1</th>\n",
73
+ " <td>If you are a doctor, please answer the medical...</td>\n",
74
+ " <td>My baby has been pooing 5-6 times a day for a ...</td>\n",
75
+ " <td>Hi... Thank you for consulting in Chat Doctor....</td>\n",
76
+ " </tr>\n",
77
+ " <tr>\n",
78
+ " <th>2</th>\n",
79
+ " <td>If you are a doctor, please answer the medical...</td>\n",
80
+ " <td>Hello, My husband is taking Oxycodone due to a...</td>\n",
81
+ " <td>Hello, and I hope I can help you today.First, ...</td>\n",
82
+ " </tr>\n",
83
+ " <tr>\n",
84
+ " <th>3</th>\n",
85
+ " <td>If you are a doctor, please answer the medical...</td>\n",
86
+ " <td>lump under left nipple and stomach pain (male)...</td>\n",
87
+ " <td>HI. You have two different problems. The lump ...</td>\n",
88
+ " </tr>\n",
89
+ " <tr>\n",
90
+ " <th>4</th>\n",
91
+ " <td>If you are a doctor, please answer the medical...</td>\n",
92
+ " <td>I have a 5 month old baby who is very congeste...</td>\n",
93
+ " <td>Thank you for using Chat Doctor. I would sugge...</td>\n",
94
+ " </tr>\n",
95
+ " <tr>\n",
96
+ " <th>5</th>\n",
97
+ " <td>If you are a doctor, please answer the medical...</td>\n",
98
+ " <td>I am F 38 in good shape work out (do triathlon...</td>\n",
99
+ " <td>Hi, From history it seems that you might be ha...</td>\n",
100
+ " </tr>\n",
101
+ " <tr>\n",
102
+ " <th>6</th>\n",
103
+ " <td>If you are a doctor, please answer the medical...</td>\n",
104
+ " <td>sir, MY uncle has ILD-Interstitial Lung diseas...</td>\n",
105
+ " <td>Thanks for your question on Chat Doctor. I can...</td>\n",
106
+ " </tr>\n",
107
+ " <tr>\n",
108
+ " <th>7</th>\n",
109
+ " <td>If you are a doctor, please answer the medical...</td>\n",
110
+ " <td>my husband was working on a project in the hou...</td>\n",
111
+ " <td>Hello. It could be a blood collection due to m...</td>\n",
112
+ " </tr>\n",
113
+ " <tr>\n",
114
+ " <th>8</th>\n",
115
+ " <td>If you are a doctor, please answer the medical...</td>\n",
116
+ " <td>hi my nine year old son had a cough and flu sy...</td>\n",
117
+ " <td>Hi, If the symptoms persist that long this sug...</td>\n",
118
+ " </tr>\n",
119
+ " <tr>\n",
120
+ " <th>9</th>\n",
121
+ " <td>If you are a doctor, please answer the medical...</td>\n",
122
+ " <td>gyno problemsfor the past few months, I have b...</td>\n",
123
+ " <td>Dear Friend. Welcome to Chat Doctor. I am Chat...</td>\n",
124
+ " </tr>\n",
125
+ " </tbody>\n",
126
+ "</table>\n",
127
+ "</div>"
128
+ ],
129
+ "text/plain": [
130
+ " instruction \\\n",
131
+ "0 If you are a doctor, please answer the medical... \n",
132
+ "1 If you are a doctor, please answer the medical... \n",
133
+ "2 If you are a doctor, please answer the medical... \n",
134
+ "3 If you are a doctor, please answer the medical... \n",
135
+ "4 If you are a doctor, please answer the medical... \n",
136
+ "5 If you are a doctor, please answer the medical... \n",
137
+ "6 If you are a doctor, please answer the medical... \n",
138
+ "7 If you are a doctor, please answer the medical... \n",
139
+ "8 If you are a doctor, please answer the medical... \n",
140
+ "9 If you are a doctor, please answer the medical... \n",
141
+ "\n",
142
+ " input \\\n",
143
+ "0 I woke up this morning feeling the whole room ... \n",
144
+ "1 My baby has been pooing 5-6 times a day for a ... \n",
145
+ "2 Hello, My husband is taking Oxycodone due to a... \n",
146
+ "3 lump under left nipple and stomach pain (male)... \n",
147
+ "4 I have a 5 month old baby who is very congeste... \n",
148
+ "5 I am F 38 in good shape work out (do triathlon... \n",
149
+ "6 sir, MY uncle has ILD-Interstitial Lung diseas... \n",
150
+ "7 my husband was working on a project in the hou... \n",
151
+ "8 hi my nine year old son had a cough and flu sy... \n",
152
+ "9 gyno problemsfor the past few months, I have b... \n",
153
+ "\n",
154
+ " output \n",
155
+ "0 Hi, Thank you for posting your query. The most... \n",
156
+ "1 Hi... Thank you for consulting in Chat Doctor.... \n",
157
+ "2 Hello, and I hope I can help you today.First, ... \n",
158
+ "3 HI. You have two different problems. The lump ... \n",
159
+ "4 Thank you for using Chat Doctor. I would sugge... \n",
160
+ "5 Hi, From history it seems that you might be ha... \n",
161
+ "6 Thanks for your question on Chat Doctor. I can... \n",
162
+ "7 Hello. It could be a blood collection due to m... \n",
163
+ "8 Hi, If the symptoms persist that long this sug... \n",
164
+ "9 Dear Friend. Welcome to Chat Doctor. I am Chat... "
165
+ ]
166
+ },
167
+ "execution_count": 3,
168
+ "metadata": {},
169
+ "output_type": "execute_result"
170
+ }
171
+ ],
172
+ "source": [
173
+ "df.head(10)"
174
+ ]
175
+ },
176
+ {
177
+ "cell_type": "code",
178
+ "execution_count": null,
179
+ "metadata": {},
180
+ "outputs": [],
181
+ "source": []
182
+ }
183
+ ],
184
+ "metadata": {
185
+ "kernelspec": {
186
+ "display_name": "qa-env",
187
+ "language": "python",
188
+ "name": "qa-env"
189
+ },
190
+ "language_info": {
191
+ "codemirror_mode": {
192
+ "name": "ipython",
193
+ "version": 3
194
+ },
195
+ "file_extension": ".py",
196
+ "mimetype": "text/x-python",
197
+ "name": "python",
198
+ "nbconvert_exporter": "python",
199
+ "pygments_lexer": "ipython3",
200
+ "version": "3.9.18"
201
+ }
202
+ },
203
+ "nbformat": 4,
204
+ "nbformat_minor": 4
205
+ }