The Tele-Health Data Analytics project dives deep into the analysis of telehealth services using state-of-the-art tools like Python ๐, PostgreSQL ๐, Power BI ๐, and ERD Builders ๐ ๏ธ. This project is designed to extract actionable insights from telehealth service data, helping healthcare providers optimize service delivery, enhance patient satisfaction, and make informed, data-driven decisions. Through this analysis, we emphasize the critical role of data in transforming healthcare delivery and improving patient outcomes.
- ๐ Service Utilization Patterns: The analysis identifies peak usage times and the most frequently used telehealth services, offering insights into demand patterns.
- ๐ Patient Satisfaction: We discovered key factors that influence patient satisfaction, providing opportunities for targeted improvements.
- ๐ธ Cost Analysis: The project offers a detailed breakdown of healthcare costs by service type and insurance coverage, allowing for a deeper understanding of financial trends.
- ๐ Follow-Up Trends: We explored how follow-up requirements impact healthcare costs and patient satisfaction, offering insights into care continuity.
- ๐ง Frequent Visits: Mental health issues were the most common diagnoses leading to frequent telehealth visits, highlighting the importance of accessible mental health services.
- ๐ฅ Source: The dataset is sourced from well-known telehealth providers, including Teladoc Health and Amwell, with additional data web-scraped from platforms like SourceTelehealth and RemoteHealth.
- ๐ Content: The dataset includes fields such as
ServiceType
,PatientID
,SatisfactionScore
,HealthcareCost
, andFollowUpRequired
. - ๐ Modification: To ensure privacy, fictional IDs were created while maintaining the authenticity of the other data fields.
- ๐งน Data Cleaning: Python was utilized for data cleaning, including the removal of duplicates and ensuring data integrity, to provide reliable insights.
- ๐จโโ๏ธ How many telehealth visits were conducted by gender?
SELECT Gender, COUNT(*) AS VisitCount FROM TelehealthServicesUsage GROUP BY Gender;
Output:
- ๐ข What is the average satisfaction score for each service type?
SELECT ServiceType, ROUND(AVG(SatisfactionScore), 2) AS AverageSatisfaction FROM TelehealthServicesUsage GROUP BY ServiceType;
Output:
- ๐ธ What is the total healthcare cost for each type of insurance?
SELECT InsuranceType, TO_CHAR(SUM(HealthcareCost), 'FM$999,999,999.00') AS TotalCost FROM TelehealthServicesUsage GROUP BY InsuranceType;
Output:
- โณ What is the average duration of visits for each gender and socioeconomic status (for satisfaction scores > 3)?
SELECT Gender, SocioeconomicStatus, ROUND(AVG(DurationOfVisit), 2) AS AverageDuration FROM TelehealthServicesUsage WHERE SatisfactionScore > 3 GROUP BY Gender, SocioeconomicStatus ORDER BY Gender, SocioeconomicStatus;
Output:
- ๐ฉบ How many patients had technical issues during their telehealth visits, grouped by primary diagnosis and insurance type?
SELECT PrimaryDiagnosis, InsuranceType, COUNT(*) AS NumPatientsWithTechnicalIssues FROM TelehealthServicesUsage WHERE TechnicalIssues = 'Yes' GROUP BY PrimaryDiagnosis, InsuranceType ORDER BY PrimaryDiagnosis, InsuranceType;
Output:
- ๐ฐ What is the total healthcare cost and the number of follow-up visits for each ethnicity, specifically for patients with 'Medicaid' insurance?
SELECT Ethnicity, TO_CHAR(SUM(HealthcareCost), 'FM$999,999,999.00') AS TotalHealthcareCost, COUNT(*) AS FollowUpVisits FROM TelehealthServicesUsage WHERE InsuranceType = 'Medicaid' AND FollowUpRequired = 'Yes' GROUP BY Ethnicity ORDER BY TotalHealthcareCost DESC;
Output:
- ๐ What is the average satisfaction score and total number of visits for each combination of primary diagnosis and service type?
SELECT PrimaryDiagnosis, ServiceType, ROUND(AVG(SatisfactionScore), 2) AS AverageSatisfactionScore, COUNT(*) AS TotalNumberOfVisits FROM TelehealthServicesUsage GROUP BY PrimaryDiagnosis, ServiceType ORDER BY PrimaryDiagnosis, ServiceType;
Output:
- ๐ For each primary diagnosis, determine the average healthcare cost and satisfaction score for visits that required follow-up compared to those that did not?
SELECT PrimaryDiagnosis, TO_CHAR(AVG(CASE WHEN FollowUpRequired = 'Yes' THEN CAST(HealthcareCost AS numeric) ELSE NULL END), 'FM$999,999,999.00') AS AvgCostWithFollowUp, TO_CHAR(AVG(CASE WHEN FollowUpRequired = 'No' THEN CAST(HealthcareCost AS numeric) ELSE NULL END), 'FM$999,999,999.00') AS AvgCostWithoutFollowUp, ROUND(AVG(CASE WHEN FollowUpRequired = 'Yes' THEN SatisfactionScore ELSE NULL END), 2) AS AvgSatisfactionWithFollowUp, ROUND(AVG(CASE WHEN FollowUpRequired = 'No' THEN SatisfactionScore ELSE NULL END), 2) AS AvgSatisfactionWithoutFollowUp FROM TelehealthServicesUsage GROUP BY PrimaryDiagnosis ORDER BY PrimaryDiagnosis;
Output:
- ๐ฅ Determine the top 3 primary diagnoses based on total healthcare costs, and for each, provide the average satisfaction score and the number of visits?
WITH DiagnosisStats AS ( SELECT PrimaryDiagnosis, SUM(CAST(HealthcareCost AS numeric)) AS TotalHealthcareCost, AVG(SatisfactionScore) AS AvgSatisfactionScore, COUNT(*) AS VisitCount FROM TelehealthServicesUsage GROUP BY PrimaryDiagnosis ) SELECT PrimaryDiagnosis, TO_CHAR(TotalHealthcareCost, 'FM$999,999,999.00') AS TotalHealthcareCost, ROUND(AvgSatisfactionScore, 2) AS AvgSatisfactionScore, VisitCount FROM DiagnosisStats ORDER BY TotalHealthcareCost DESC LIMIT 3;
Output:
- ๐ป Calculate the average healthcare cost and average satisfaction score for each telehealth platform, and identify the telehealth platform with the highest average satisfaction score?
WITH PlatformStats AS ( SELECT TelehealthPlatform, AVG(CAST(HealthcareCost AS numeric)) AS AvgHealthcareCost, AVG(SatisfactionScore) AS AvgSatisfactionScore FROM TelehealthServicesUsage GROUP BY TelehealthPlatform ) SELECT TelehealthPlatform, TO_CHAR(AvgHealthcareCost, 'FM$999,999,999.00') AS AvgHealthcareCost, ROUND(AvgSatisfactionScore, 2) AS AvgSatisfactionScore FROM PlatformStats ORDER BY AvgSatisfactionScore DESC LIMIT 1;
Output:
- ๐ Determine the top 3 primary diagnoses with the highest average healthcare costs for patients aged 60 and above, and calculate the average satisfaction score and the percentage of
visits that required follow-up for each diagnosis?
WITH AgeFilteredData AS ( SELECT PrimaryDiagnosis, SatisfactionScore, CAST(HealthcareCost AS numeric) AS HealthcareCost, FollowUpRequired, Age, PatientID FROM TelehealthServicesUsage WHERE Age >= 60 ), DiagnosisStats AS ( SELECT PrimaryDiagnosis, AVG(HealthcareCost) AS AvgHealthcareCost, AVG(SatisfactionScore) AS AvgSatisfactionScore, SUM(CASE WHEN FollowUpRequired = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS FollowUpVisitPercentage FROM AgeFilteredData GROUP BY PrimaryDiagnosis ), RankedDiagnoses AS ( SELECT PrimaryDiagnosis, AvgHealthcareCost, AvgSatisfactionScore, FollowUpVisitPercentage, RANK() OVER (ORDER BY AvgHealthcareCost DESC) AS Rank FROM DiagnosisStats ) SELECT PrimaryDiagnosis, TO_CHAR(AvgHealthcareCost, 'FM$999,999,999.00') AS AvgHealthcareCost, ROUND(AvgSatisfactionScore, 2) AS AvgSatisfactionScore, ROUND(FollowUpVisitPercentage, 2) AS FollowUpVisitPercentage FROM RankedDiagnoses WHERE Rank <= 3 ORDER BY AvgHealthcareCost DESC;
Output:
The Entity-Relationship Diagram (ERD) for the Tele-Health Data Analytics project is a visual representation of the key entities involved in telehealth services and their relationships. This ERD was created using dbdiagram.io ๐ ๏ธ and serves as the foundation for understanding the structure of the telehealth data.
-
๐งโโ๏ธ Patients:
- Contains patient information such as ID, age, gender, ethnicity, and socioeconomic status.
-
๐ฅ Visits:
- Records each telehealth visit, including details like patient ID, visit date, service type, visit duration, satisfaction score, technical issues, follow-up needs, healthcare cost, insurance details, and diagnosis.
-
๐ฉบ Diagnoses:
- Logs the primary diagnosis associated with each visit.
-
๐ป Telehealth Platforms:
- Tracks the platforms (e.g., Teladoc, Amwell) used to deliver telehealth services.
-
๐ก๏ธ Insurance:
- Captures information about the insurance coverage for telehealth services.
-
โ๏ธ Technical Issues:
- Logs any technical difficulties encountered during the telehealth visit, including descriptions of the issues.
-
๐ฅ Patients โ Visits:
- Each patient can have multiple visits, with each visit linked to a specific patient.
-
๐ Visits โ Diagnoses:
- Every visit is associated with a primary diagnosis.
-
๐ป Visits โ Telehealth Platforms:
- Each visit takes place on a particular telehealth platform.
-
๐ก๏ธ Visits โ Insurance:
- Each visit is covered by a specific type of insurance.
-
โ๏ธ Visits โ Technical Issues:
- Any technical issues during the visit are recorded and linked to the visit.
The ERD is designed to:
- ๐ Visualize Relationships: Clearly illustrate the connections between patients, visits, diagnoses, platforms, insurance, and technical issues.
- ๐ Facilitate Data Analysis: Enable efficient querying and analysis of telehealth data, such as service utilization patterns, patient satisfaction, and platform performance.
- ๐ Optimize Data Management: Assist healthcare providers in managing telehealth data, allowing for improved service delivery and data-driven decision-making.
This diagram is crucial for the project's data management strategy, making it easier for healthcare providers to analyze trends, improve patient outcomes, and optimize telehealth services. The ERD was constructed using dbdiagram.io ๐ ๏ธ, a powerful tool for creating and visualizing database structures.
In this project, several DAX functions were utilized to create dynamic reports and dashboards in Power BI. Below is a summary of the key DAX functions used:
-
๐งฎ Total Visits This DAX function calculates the total number of telehealth visits.
Total Visits = COUNTROWS('telehealth_services_usage_extended_cleaned')
-
๐ฐ Total Healthcare Costs This DAX function calculates the total healthcare costs across all visits.
Total Healthcare Cost = SUM('telehealth_services_usage_extended_cleaned'[Healthcare_Cost])
-
โญ Average Satisfaction Score This DAX function calculates the average satisfaction score for all telehealth visits.
Average Satisfaction Score = AVERAGE('telehealth_services_usage_extended_cleaned'[Satisfaction_Score])
-
๐ฅ Unique Patients This DAX function counts the number of unique patients in the dataset.
Unique Patients = DISTINCTCOUNT('telehealth_services_usage_extended_cleaned'[Patient_ID])
-
๐ High Satisfaction Flag This DAX function creates a flag to indicate whether a visit had a high satisfaction score (greater than or equal to 4).
High Satisfaction Flag = IF('telehealth_services_usage_extended_cleaned'[Satisfaction_Score] >= 4, "Yes", "No")
- ๐ Service Utilization Patterns: Analysis revealed peak usage times and the most common service types, helping to optimize resource allocation and improve service delivery.
- ๐ Patient Satisfaction Insights: Identified key factors that influence patient satisfaction, allowing for targeted improvements in telehealth services.
- ๐ต Cost Analysis: A detailed breakdown of healthcare costs by service type and insurance coverage provided insights into cost-saving opportunities.
- ๐ Follow-Up Trends: The impact of follow-up requirements on healthcare costs and satisfaction was examined, highlighting areas for cost reduction and patient retention.
- ๐ง Mental Health Focus: Mental health diagnoses were the most frequent reasons for telehealth visits, indicating a critical area for service improvement and resource allocation.
- โณ Implement Dynamic Scheduling: Introduce a dynamic scheduling system that adjusts appointment lengths based on the nature of the service. For example, quicker consultations could free up more slots, allowing for increased availability.
- ๐ค AI-Powered Appointment Matching: Use AI to match patients with providers based on availability, preferences, and historical satisfaction scores. This ensures that patients are paired with providers who best meet their needs, enhancing overall satisfaction.
- ๐ Expand Hours of Operation: Consider offering extended hours or weekend slots to accommodate patients who may struggle with standard operating hours. This could lead to higher satisfaction and more accessible appointment options.
- ๐ฒ Introduce Virtual Queueing: Implement a virtual queueing system where patients can check in online and be notified of their upcoming slot. This reduces waiting times and allows patients to manage their time better, leading to a more positive experience.
- ๐ Provider Training Programs: Develop specialized training programs focused on enhancing provider-patient communication and empathy. Better interaction during appointments can significantly boost patient satisfaction and reduce the need for follow-ups, freeing up more slots for new appointments.
The full dataset for this project is available for download on Google Drive. Click the link below to access the dataset: