-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathdata_description.txt
142 lines (132 loc) · 9.68 KB
/
data_description.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
In our project we used different 2 different main datasets, which later on originated two others.
For the predictive model:
- `no_show` dataset: a dataset which was originally on hotel bookings cancellation and it was retrieved from kaggle (https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand).
The names of the columns were changed to be more compatible with the topic of our project.
METADATA OF `no_show`:
- `AppointmentID`: Unique identifier of each appointment
- `AppointmentYear`: Year of the appointment
- `AppointmentMonth`: Month of the appointment (range from 1-12)
- `AppointmentWeekNumber`: Week Number of the appointment (range from 0 to 52)
- `AppointmentDayOfMonth`: Day of the month of the appointment (range from 1-31)
- `AppointmentHour`: Hour of the appointment (range from 9-20h)
- `WeekendConsults`: Number of Appointments that the patient has scheduled for the weekend days
- `WeekdayConsults`: Number of Appointments that the patient has scheduled for the weekdays
- `Adults`: Number of adults registered for the appointment
- `Children`: Number of children registered for the appointment
- `Babies`: Number of babies registered for the appointment
- `FirstTimePatient`: Boolean feature worth 1 in case of it being the patient's first appointment in the clinic
- `AffiliatedPatient`: Boolean feature worth 1 in case the patient is affiliate to the clinic
- `PreviousAppointments`: Number of previous appointments that customer has scheduled
- `PreviousNoShows`: Number of previous appointments that the patient has booked and then missed
- `LastMinutesLate`: Number of minutes that the patient arrived late for the appointment
- `OnlineBooking`: Boolean feature worth 1 in the case of the appointment was booked online
- `AppointmentChanges`: Boolean feature worth 1 in the case of that the appointment has suffered changes from its initial state/booking
- `BookingToConsultDays`: Number of days from the day in which the appointment was booked to the actual day of the appointment
- `ParkingSpaceBooked`: Boolean feature worth 1 in the case the patient required for a parking space for the duration of the appointment
- `SpecialRequests`: Boolean feature worth 1 if the patient made special requests
- `NoInsurance`: Boolean feature worth 1 in case the patient has insurance`
- `CompanyReservation`: Boolean feature worth 1 if the appointment was made under the name of a company
- `ExtraExamsPerConsult`: Number of extra exams that the appointment required
- `DoctorRequested`: ID of the doctor the patient requested
- `DoctorAssigned`: ID of the doctor the patient was assigned
- `ConsultPriceEuros`: Price to pay for the appointment in euros
- `ConsultPriceUSD`: Price to pay for appointment in USD
- `%PaidinAdvance`: Percentage of the total cost of the appointment that was paid in advance
- `CountryofOriginAvgIncomeEuros (Year-2)`: Average income of the country of origin of the patient (in the second year of the data)
- `CountryofOriginAvgIncomeEuros (Year-1)`: Average income of the country of origin of the patient (in the first year of the data)
- `CountryofOriginHDI`: Human-Development-Index (HDI) for the country of origin of the patient
- `NoShow`: Target variable which indicates whether the patient showed up to the appointment or missed it
- `appointments`: This dataset is a small portion of the `no_show` dataset. Only 150 rows were selected, in order to match the number of wors in the `patients`dataset.
METADATA OF `appointments`:
- `AppointmentWeekNumber`: Week Number of the appointment (range from 0 to 52)
- `AppointmentDayOfMonth`: Day of the month of the appointment (range from 1-31)
- `AppointmentHour`: Hour of the appointment (range from 9-20h)
- `WeekendConsults`: Number of Appointments that the patient has scheduled for the weekend days
- `WeekdayConsults`: Number of Appointments that the patient has scheduled for the weekdays
- `Adults`: Number of adults registered for the appointment
- `Children`: Number of children registered for the appointment
- `Babies`: Number of babies registered for the appointment
- `AffiliatedPatient`: Boolean feature worth 1 in case the patient is affiliate to the clinic
- `PreviousAppointments`: Number of previous appointments that customer has scheduled
- `PreviousNoShows`: Number of previous appointments that the patient has booked and then missed
- `LastMinutesLate`: Number of minutes that the patient arrived late for the appointment
- `OnlineBooking`: Boolean feature worth 1 in the case of the appointment was booked online
- `AppointmentChanges`: Boolean feature worth 1 in the case of that the appointment has suffered changes from its initial state/booking
- `BookingToConsultDays`: Number of days from the day in which the appointment was booked to the actual day of the appointment
- `ParkingSpaceBooked`: Boolean feature worth 1 in the case the patient required for a parking space for the duration of the appointment
- `SpecialRequests`: Boolean feature worth 1 if the patient made special requests
- `NoInsurance`: Boolean feature worth 1 in case the patient has insurance`
- `ExtraExamsPerConsult`: Number of extra exams that the appointment required
- `DoctorAssigned`: ID of the doctor the patient was assigned
- `ConsultPriceEuros`: Price to pay for the appointment in euros
- `%PaidinAdvance`: Percentage of the total cost of the appointment that was paid in advance
- `CountryofOriginHDI`: Human-Development-Index (HDI) for the country of origin of the patient
- `Username`: Username chosen by the patient
For the chabot:
- `customer_data`: Obtained from a conversation with a LLM model (https://chat.openai.com/share/5a74238c-95fe-4ef2-ae65-f3747ac4c505), which provided data regarding customers.
METADATA OF `customer_data`:
- `PatientID`: Unique Identifier of the Patient
- `FirstName`:First Name of the Patient
- `LastName`: Last Name of the Patient
- `Email`: Email of the Patient
- `Age`: Age of the Patient
- `Username`: Username chosen by the Patient
- `patients`: Results from a combination of the data obtained from the conversation with the LLM model (´customer_data`) and the data from the dataset for the predictive model (only the variables related with the patient/person were retrieved into this new dataset)
METADATA OF `patients`:
- `FirstName`: First Name of the patient
- `LastName`: Last Name of the patient
- `Email`: Email of the patient
- `Age`: Age of the patient
- `Username`: Username chosen by the patient
- `WeekendConsults`: Number of Appointments that the patient has scheduled for the weekend days
- `WeekdayConsults`: Number of Appointments that the patient has scheduled for the weekdays
- `Adults`: Number of adults registered for the appointment
- `Children`: Number of children registered for the appointment
- `Babies`: Number of babies registered for the appointment
- `AffiliatedPatient`: Boolean feature worth 1 in case the patient is affiliate to the clinic (has already made appointments)
- `PreviousAppointments`: Number of previous appointments that customer has scheduled
- `PreviousNoShows`: Number of previous appointments that the patient has booked and then missed
- `LastMinutesLate`: Number of minutes that the patient arrived late for the appointment
- `NoInsurance`: Boolean feature worth 1 in case the patient has insurance
- `ExtraExamsPerConsult`: Number of extra exams that the appointment required
- `CountryofOriginHDI`: Human-Development-Index (HDI) for the country of origin of the patient
- HDI is an additional dataset which contains the values of HDI for each country in the world and whose values were then mapped to the `no_show`dataset.
Additionally, for the chatbot to be able to answer patient's questions about medication, our team collected 36 different patient information leaflets from Infomed (https://extranet.infarmed.pt/INFOMED-fo/) and then joined them in a single PDF.
This PDF was then fed into the model of the chatbot so that it could access the information in this file and collect the needed information to provide an answer to the patient.
The following medicine were included in the pdf:
- facetilcisteina 600.pdf
- acidovir_ 200.pdf
- alfa-amilase 3000.pdf
- amox acclv_875_125.pdf
- amoxicilina_ 1000.pdf
- atorvastatina.pdf
- bilastina_20.pdf
- butilescopolamina_10.pdf
- carbocisteina 5.pdf
- cetirizina_10.pdf
- caritromicina_500.pdf
- clindamicina_150.pdf
- clopidrogel.pdf
- deflazacorte_6_30.pdf
- Diazepam 5_10.pdf
- diclofenac 50.pdf
- escitalopram_10_20.pdf
- esomeprazol_20_40.pdf
- etodolac 300.pdf
- etonogestrel.pdf
- fosfomicina_3000.pdf
- furosemida_40.pdf
- ibuprofeno_600.pdf
- levotiroxina_25.pdf
- metamizol 575.pdf
- metformina_varios.pdf
- metronidazol_250.pdf
- miconazol_20.pdf
- midazolam_15.pdf
- nimesulida_100.pdf
- omeprazol_10.pdf
- paracetamol_1000.pdf
- paracetamol_codeina_500_30.pdf
- paracetamol_tiocolquicosido_500_2.pdf
- sinvastatina_varios.pd
- varfarina_5.pdf