A Python ETL script for validating clinical trial patient data. Reads raw CSV data, applies validation rules, and outputs clean JSON and a detailed error report.
- Python 3.14
- Pandas — CSV parsing and data processing
- JSON — structured output format
- All required fields must be present (id, full_name, age, diagnosis, enrolled_at, trial_id)
- Age must be between 0 and 120
- Enrollment date must follow YYYY-MM-DD format
pip install pandas colorama
python validator.pyvalid_patients.json— clean, validated records onlyvalidation_report.json— full report with errors per invalid row