## Technologies and Tools - Programming Languages: Python, R - Frameworks/Libraries: Flask, gspread, pandas, Google Cloud Platform (GCP) services (Storage, Compute Engine), REDCap API, mailjet, OpenAI API - Tools: Docker, Git, gcloud CLI ## Functionality 1. Data Export and Caching: Exports data from REDCap to GCP Cloud Storage and creates cached data files in various formats (.csv, .rds, .feather) for efficient access. 2. Analytic Dataset Generation: Processes the exported REDCap data to create analytic datasets based on specified criteria (randomization, anonymization, treatment arm removal) and sends email notifications with secure download links. 3. Query System: Executes SQL queries on the REDCap data, generates HTML reports with visualizations, and sends email notifications with result links. Supports AI-assisted query writing using OpenAI. 4. Payment Processing: Automates payment calculations and generates invoices for study sites. 5. Administrative Tasks: Includes scripts for syncing study sites, fixing database issues, updating default configurations, and verifying data types in function sheets. 6. Email Notifications: Sends various email notifications (error alerts, data availability, query results) using the Mailjet API. ## Relevant Skills 1. Integration of multiple technologies (REDCap, GCP, Mailjet, OpenAI) to create a cohesive data processing and reporting system. 2. Efficient data handling using caching and batch processing techniques to handle large datasets from REDCap. 3. Implementation of data anonymization and randomization techniques to protect sensitive information. 4. Generation of secure download links with expiration times for controlled access to data and reports. 5. Creation of interactive HTML reports with embedded visualizations for effective data presentation. 6. Error handling and notification system to alert administrators of any issues during data processing. 7. Use of Docker for containerization, ensuring consistent execution environments across different systems. ## Example Code 1. Data export and caching (task_data_export.py): ```python for count, id_batch in enumerate(id_batches): data = {"token": redcap_key, "content": "record", "format": "csv", "type": "flat", "rawOrLabel": "raw", "rawOrLabelHeaders": "raw", "exportCheckboxLabel": "false", "exportSurveyFields": "false", "exportDataAccessGroups": "false", "returnFormat": "json"} for n, ID in enumerate(id_batch): data['records[' + str(n) + ']'] = ID r = requests.post(url, data=data) ... ``` 2. AI-assisted query writing (task_ai_helper.py): ```python assistant = openai_client.beta.assistants.create( name=study.upper() + " " + assistant_type, instructions=assistant_types[assistant_type].replace("STUDY", study.upper()), model="gpt-4-turbo-preview", tools=[{"type": "retrieval"}], file_ids=[construct_groups.id, redcap_fields.id] ) ``` ## Notable Achievements 1. Development of a comprehensive data processing and reporting system for the METRC research consortium, streamlining their data analytics workflow. 2. Integration of AI-assisted query writing to help researchers create complex SQL queries more easily. 3. Implementation of a robust error handling and notification system to ensure smooth operation of the data pipeline. 4. Creation of a flexible and modular codebase that can be easily extended to support new features and studies. The code demonstrates strong skills in data engineering, system integration, and building practical applications using a variety of technologies. The developer has shown the ability to handle complex data processing tasks, implement security measures, and create user-friendly interfaces for researchers to access and analyze data.


© 2024 Elias Weston-Farber. All rights reserved.