If you want to start by filtering the projects by technology, click on the buttons below to filter the projects by the technology used in the project. Or filter by a series of your own keywords using the filter bar below.

It works best if you add one keyword at a time because as you add them, the projects that do not contain that keyword are hidden.
To start a new search, click the "Clear Filters" button.



VM Batch Jobs

## Technologies and Tools - Programming Languages: Python, R - Frameworks/Libraries: Flask, gspread, pandas, Google Cloud Platform (GCP) services (Storage, Compute Engine), REDCap API, mailjet, OpenAI API - Tools: Docker, Git, gcloud CLI ## Functionality 1. Data Export and Caching: Exports data from REDCap to GCP Cloud Storage and creates cached data files in various formats (.csv, .rds, .feather) for efficient access. 2. Analytic Dataset Generation: Processes the exported REDCap data to create analytic datasets based on specified criteria (randomization, anonymization, treatment arm removal) and sends email notifications with secure download links. 3. Query System: Executes SQL queries on the REDCap data, generates HTML reports with visualizations, and sends email notifications with result links. Supports AI-assisted query writing using OpenAI. 4. Payment Processing: Automates payment calculations and generates invoices for study sites. 5. Administrative Tasks: Includes scripts for syncing study sites, fixing database issues, updating default configurations, and verifying data types in function sheets. 6. Email Notifications: Sends various email notifications (error alerts, data availability, query results) using the Mailjet API. ## Relevant Skills 1. Integration of multiple technologies (REDCap, GCP, Mailjet, OpenAI) to create a cohesive data processing and reporting system. 2. Efficient data handling using caching and batch processing techniques to handle large datasets from REDCap. 3. Implementation of data anonymization and randomization techniques to protect sensitive information. 4. Generation of secure download links with expiration times for controlled access to data and reports. 5. Creation of interactive HTML reports with embedded visualizations for effective data presentation. 6. Error handling and notification system to alert administrators of any issues during data processing. 7. Use of Docker for containerization, ensuring consistent execution environments across different systems. ## Example Code 1. Data export and caching (task_data_export.py): ```python for count, id_batch in enumerate(id_batches): data = {"token": redcap_key, "content": "record", "format": "csv", "type": "flat", "rawOrLabel": "raw", "rawOrLabelHeaders": "raw", "exportCheckboxLabel": "false", "exportSurveyFields": "false", "exportDataAccessGroups": "false", "returnFormat": "json"} for n, ID in enumerate(id_batch): data['records[' + str(n) + ']'] = ID r = requests.post(url, data=data) ... ``` 2. AI-assisted query writing (task_ai_helper.py): ```python assistant = openai_client.beta.assistants.create( name=study.upper() + " " + assistant_type, instructions=assistant_types[assistant_type].replace("STUDY", study.upper()), model="gpt-4-turbo-preview", tools=[{"type": "retrieval"}], file_ids=[construct_groups.id, redcap_fields.id] ) ``` ## Notable Achievements 1. Development of a comprehensive data processing and reporting system for the METRC research consortium, streamlining their data analytics workflow. 2. Integration of AI-assisted query writing to help researchers create complex SQL queries more easily. 3. Implementation of a robust error handling and notification system to ensure smooth operation of the data pipeline. 4. Creation of a flexible and modular codebase that can be easily extended to support new features and studies. The code demonstrates strong skills in data engineering, system integration, and building practical applications using a variety of technologies. The developer has shown the ability to handle complex data processing tasks, implement security measures, and create user-friendly interfaces for researchers to access and analyze data.

Packaged Python App

## Technologies and Tools - Programming Languages: Python - Frameworks and Libraries: - PySide6 (Qt framework for GUI development) - PyAudio (audio processing) - OpenAI API (language model integration) - Google Cloud Text-to-Speech API - Porcupine (wake word detection) - Whisper (speech recognition) - spaCy (natural language processing) - chromadb (vector database) - Tools: - PyInstaller (packaging Python applications) - dmgbuild (creating macOS disk images) ## Functionality The main project in this repository is the Jarvis Voice Assistant app. It is a voice-controlled AI assistant that listens for user commands and performs various tasks. Key functionalities include: - Wake word detection using Porcupine - Speech recognition using Whisper - Natural language processing using OpenAI's language models (GPT-3.5 and GPT-4) - Text-to-speech using Google Cloud Text-to-Speech API or free alternatives - Storing conversation history and summaries using chromadb - Emailing responses and reminders to the user - Performing internet searches and synthesizing information - Real-time chat interface with typing animations - Customizable settings menu for API keys and configurations ## Relevant Skills - Advanced GUI development using PySide6 (Qt) with custom widgets and animations - Audio processing and streaming using PyAudio - Integration of various AI technologies (speech recognition, language models, text-to-speech) - Utilization of cloud APIs (OpenAI, Google Cloud) - Implementation of wake word detection using Porcupine - Natural language processing techniques using spaCy - Storing and retrieving conversation history using chromadb vector database - Packaging Python applications into executable formats using PyInstaller - Creating macOS disk images using dmgbuild - Multithreading and multiprocessing for handling background tasks and audio playback - Error handling and graceful recovery mechanisms - Modular and organized codebase with separate files for different functionalities ## Example Code - Wake word detection and audio processing: ```python def jarvis_process(jarvis_stop_event, jarvis_skip_event, queue, text_queue): handle = pvporcupine.create(access_key=get_pico_key(), keywords=['Jarvis'], keyword_paths=[get_pico_wake_path()]) prep_mic() start_audio_stream(handle.sample_rate, handle.frame_length) while not jarvis_stop_event.is_set(): pcm = get_next_audio_frame(handle) keyword_index = handle.process(pcm) if keyword_index >= 0: query_audio = listen_to_user() query = convert_to_text(query_audio) response = processor(query, skip=jarvis_skip_event, text_queue=text_queue) audio_path = text_to_speech(response, model=get_model()['name']) play_audio_file(audio_path, added_stop_event=jarvis_skip_event) ``` - Real-time chat interface with typing animations: ```javascript function process_typing_queue() { if (typing_queue.length > 0) { var message = typing_queue[0]; var new_div = document.createElement('div'); new_div.innerHTML = message.formatted_text; var body = document.getElementsByTagName('body')[0]; if (message.appear_as_typed) { var span = new_div.getElementsByClassName('chat-bubble')[0].getElementsByTagName('span')[0]; span.innerHTML = ""; var typedText = ""; typing_speed = message.typing_delay; var newText = message.formatted_text.match(/]*>([^<]+)<\/span>/)[1]; var index = 0; function type() { if (index < newText.length) { span.innerHTML += newText.charAt(index); index++; window.scrollTo(0, document.body.scrollHeight); setTimeout(type, typing_speed); } else { typing_queue.shift(); process_typing_queue(); } } type(); } else { typing_queue.shift(); process_typing_queue(); } } } ``` ## Notable Achievements - Development of a fully functional voice-controlled AI assistant with advanced capabilities - Integration of multiple AI technologies and APIs to create a seamless user experience - Implementation of a real-time chat interface with typing animations for enhanced user engagement - Efficient handling of conversation history and summaries using a vector database - Customizable settings menu for easy configuration of API keys and preferences - Packaging the application into an executable format for easy distribution and installation The Jarvis Voice Assistant project demonstrates strong skills in Python development, AI integration, audio processing, GUI development, and overall software engineering practices. The use of advanced technologies and the implementation of user-friendly features showcase the developer's ability to create robust and engaging applications.

Automated Invoices

## Technologies and Tools This repository primarily utilizes the following technologies and tools: - R: The main programming language used for development. - tidyverse: A collection of R packages designed for data science. - knitr and kableExtra: Packages for dynamic report generation in R. - lubridate: For date-time manipulation. - pagedown and staplr: Used for creating and manipulating PDF files. - pdftools: For handling PDF files. - GitHub Actions: Used for CI/CD processes. - Docker: Containers are used in GitHub Actions workflows. - devtools: An R package for development tasks. - act: Local execution of GitHub Actions. - actionlint: Static analysis tool for GitHub Actions workflows. ## Functionality The repository hosts the AutoPayments project, an R package developed to automate the calculation and documentation of payments for METRC studies. The package facilitates: - Generating detailed payment reports and creating corresponding PDF invoices for each study site. - Handling data via stateless or stored methods to ensure flexibility in managing payment data. - Customizing invoice templates to accommodate different study requirements. ## Relevant Skills The codebase demonstrates advanced skills in several areas: - Functional Programming in R: The use of `tidyverse` for data manipulation and `lubridate` for date-time calculations are prominent, reflecting a strong grasp of functional programming paradigms. - PDF Manipulation: The use of `pagedown`, `staplr`, and `pdftools` to generate and manipulate PDF documents showcases proficiency in handling file formats programmatically. - CI/CD Implementation: The implementation of GitHub Actions for continuous integration and deployment illustrates skills in automation and workflow optimization. - Containerization: Using Docker within GitHub Actions indicates knowledge in software containerization, ensuring consistency across development environments. Example of functional programming in R: \`\`\`r all_payments <- all_payments %>% filter(DatePayment==format(Sys.Date(),"%m/%d/%Y")) \`\`\` This snippet uses `dplyr` from `tidyverse` to filter payment data, showcasing the application of chaining operations. ## Example Code The use of Docker in GitHub Actions for running R scripts: \`\`\`yaml container: image: eliaswf/jammy-chromedriver-python-r:latest \`\`\` This demonstrates integrating R within a Docker container setup, ensuring that the CI environment is consistent and controlled. Example of PDF invoice generation using `staplr`: \`\`\`r set_fields(path, front_page, fields, flatten=TRUE) \`\`\` This function call modifies a PDF template to include dynamic content, illustrating the application of programming skills to document automation. ## Notable Achievements - Optimization: The caching mechanism in the GitHub Actions workflows is an optimization that speeds up the CI process by reusing previously downloaded or built resources. - Innovative Problem-Solving: The project includes a method for local testing of GitHub Actions (`act`), allowing developers to debug workflows locally without pushing numerous commits, demonstrating a practical solution to a common problem in CI/CD development. - Customizable Outputs: The ability to customize invoice templates based on study-specific metadata provides a tailored user experience, which is crucial for client-facing applications. These components not only reflect technical proficiency but also a thorough understanding of the project's domain-specific requirements, making them notable achievements in a software development context.

Cloud Website

## Technologies and Tools - Python: The primary programming language used throughout the repository - Flask: Web application framework used to build the CAS interface - Google Cloud Platform (GCP): Cloud provider used for hosting, storage, and computing services - REDCap: Research Electronic Data Capture system integrated for data management - Firebase: Realtime database used for managing permissions, tasks, and links - OAuth2: Authentication mechanism using Google and Microsoft OAuth - HTML/CSS: Used for building the web interface templates - JavaScript: Used for client-side interactivity and HTMX for dynamic page updates ## Functionality - METRC Reports Gateway: A comprehensive Flask-based web application for managing and generating reports and data exports related to the METRC system - Key features include secure user authentication, REDCap data export and caching, generation of analytic datasets and raw data exports, SQL query execution, invoice and payment tracking, study management, asynchronous task execution, Google Sheets integration, and secure file sharing - The application utilizes GCP services such as Compute Engine, Cloud Storage, and Sheets API for efficient data processing and storage ## Relevant Skills - Integration of multiple technologies and services (Flask, GCP, REDCap, Firebase) - Implementation of secure user authentication and authorization using OAuth2 and role-based access control (RBAC) - Efficient data retrieval and caching mechanisms for improved performance - Asynchronous task execution using Compute Engine instances for long-running tasks - Seamless integration with Google Sheets API for data storage and retrieval - Generation of secure, expiring download links for file sharing - Utilization of websockets for real-time updates in the user interface ## Example Code - Secure user authentication using Google OAuth: ```python def start_login(session, provider='google'): if provider == 'microsoft': microsoft = OAuth2Session(ms_client_id, scope=ms_scope, redirect_uri=ms_redirect_uri) authorization_url, state = microsoft.authorization_url(ms_authorization_base_url, access_type="offline", prompt="select_account") else: google = OAuth2Session(client_id, scope=scope, redirect_uri=redirect_uri) authorization_url, state = google.authorization_url(authorization_base_url, access_type="offline", prompt="consent") session['oauth_state'] = state return authorization_url ``` - Asynchronous task execution using Compute Engine instances: ```python def add_task(data, empty_arg_str_to_none=False, page=None, study=None): # ... data['estimated_seconds_taken'] = get_task_estimate(data) if "status" not in data.keys(): data['status'] = "Waiting for Compute Engine to come online..." utc = pytz.utc eastern = pytz.timezone('America/New_York') now_utc = datetime.datetime.now(utc) now_eastern = now_utc.astimezone(eastern) now_str = now_eastern.strftime('%B %d, %Y %I:%M:%S %p %Z') data["request_time"] = now_str # ... res = db.reference('tasks').push(data) return res.key ``` ## Notable Achievements - Developed a comprehensive and scalable web application for managing METRC reports and data exports - Implemented secure user authentication and authorization using industry-standard OAuth2 and RBAC - Integrated multiple external services (GCP, REDCap, Google Sheets) seamlessly into the application - Utilized asynchronous task execution and caching mechanisms for efficient data processing - Designed an intuitive user interface with real-time updates using websockets and HTMX - Contributed to the METRC project by providing a robust and secure solution for report generation and data management Overall, the METRC Reports Gateway repository demonstrates strong skills in web application development, integration of multiple technologies, secure user authentication and authorization, efficient data processing, and user-friendly interface design. The project showcases the ability to build a comprehensive and scalable solution for managing research data and generating reports in a secure and efficient manner.

Custom Docker Containers

## Technologies and Tools The repository utilizes a combination of programming languages, libraries, and tools primarily focused on building a Docker environment for integrating R and Python with web automation capabilities. Here are the key technologies: - Programming Languages: Python, R - Operating System: Ubuntu 22.04 LTS - Web Browsers and Drivers: Google Chrome, ChromeDriver - Python Libraries: Flask, Gunicorn, pandas, Scrapy, Selenium, Plotly, Google Cloud SDK, OpenAI, Flask-Sock - R Packages: tidyverse, knitr, feather, htmlTable, ggwordcloud, DiagrammeR, dplyr, reshape2, RSQLite - Containerization: Docker - Virtual Environment Tools: Python venv - Data Handling and Visualization: matplotlib, plotly, ggplot2 (part of tidyverse) - Web Scraping and Automation: Selenium, Scrapy - APIs and Cloud Services: Google Cloud Storage, Firebase, Google API Python Client ## Functionality The primary functionality of this repository is to create a robust Docker container that combines R and Python for data analysis and automation tasks. The projects included are set up to: - Run R and Python simultaneously in an isolated Docker environment. - Provide capabilities for web scraping, data manipulation, and visualization using both Python and R. - Support automation through web drivers, enabling interaction with web elements for tasks like data collection or testing. - Integrate with various APIs and cloud services for enhanced data operations and storage solutions. ## Relevant Skills The repository showcases several advanced coding techniques and architectural designs: - Containerization and Virtualization: Uses Docker to create a reproducible and consistent development environment that integrates multiple technologies. - Automation with Selenium: Demonstrates automation of web browsers to perform tasks like data scraping or UI testing, which is crucial for both development and testing phases. - Complex Dependency Management: Manages a complex set of dependencies in both R and Python, ensuring all necessary libraries and tools are installed correctly within the Docker container. - Cross-Language Integration: Seamlessly integrates R and Python, allowing for the utilization of the strengths of both languages in data analysis and automation. ## Example Code ```R lapply(pkg_list, install_if_not_present) ``` This R snippet from `install2.R` demonstrates the use of functional programming to manage package installations dynamically. ### 4. Illustrate with Examples Dockerfile Example: ```Dockerfile RUN apt install python3-launchpadlib python3.11-venv -y && \ python3.11 -m venv env && \ . env/bin/activate && \ python3.11 -m pip install -r requirements.txt ``` This segment of the Dockerfile highlights the setup of a Python virtual environment and the installation of dependencies from a requirements file, essential for isolating and managing project-specific dependencies. ## Notable Achievements - Innovative Integration: The integration of Python and R within a single Docker container is notable for its potential to streamline workflows in data science and automation. - Automation Excellence: The setup for browser-based automation using Selenium and ChromeDriver in a Dockerized environment represents a significant achievement in terms of infrastructure setup for testing and data scraping. - Dependency and Environment Management: Effective management of a large number of dependencies across two programming languages within a Docker environment showcases high proficiency in environment setup and maintenance. These achievements reflect a deep understanding of both system architecture and the practical application of programming skills in real-world projects.

Analysis Platform

## Technologies and Tools - Programming Languages: R - Frameworks/Libraries: tidyverse, httr, feather, googlesheets4, igraph, writexl, readr, pagedown - Tools: RStudio, GitHub Actions (for CI/CD) ## Functionality - The AnalyticSystem is an R package that automates the generation of analytic datasets from raw REDCap data. - It provides a standardized, data-matrix-driven workflow for creating consistent analytic variables (called constructs) across studies. - Key features include automated REDCap data import, testing of construct functions, dataset anonymization, generation of data dictionaries and long format data, and utilities for data and function exploration. - The package integrates with Google Sheets for storing study and construct metadata. ## Relevant Skills - Advanced R programming skills, including package development, function creation, and data manipulation using tidyverse. - Knowledge of REDCap and experience working with REDCap data. - Integration with external services like Google Sheets using APIs (googlesheets4). - Implementation of data anonymization techniques to protect sensitive information. - Creation of comprehensive data dictionaries and long format datasets. - Development of utilities for data and function exploration. - Usage of GitHub Actions for Continuous Integration and Continuous Deployment (CI/CD). ## Example Code - Package Development: ```r # DESCRIPTION file Package: AnalyticSystem Type: Package Title: METRC Redcap Analytic Codebase System Version: 0.3.1 ``` - Data Manipulation with tidyverse: ```r analytic_data <- build_analytic_dataset(names=display_order, error_path = error_path) ``` - Integration with Google Sheets: ```r fdf <- googlesheets4::read_sheet(pkg.globals$function_sheet_id, col_types = "c") set_function_sheet(fdf, study) ``` - Data Anonymization: ```r anonymize_redcap_data() ``` - GitHub Actions CI/CD: ```yaml # .github/workflows/check-r-package.yml on: pull_request: types: - opened jobs: R-CMD-check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: r-lib/actions/setup-r@v2 - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::rcmdcheck needs: check - uses: r-lib/actions/check-r-package@v2 ``` ## Notable Achievements - Development of a comprehensive R package for automating analytic dataset generation from REDCap data. - Creation of a data-matrix-driven workflow for standardizing analytic variables across studies. - Integration with Google Sheets for centralized storage and management of study and construct metadata. - Implementation of data anonymization techniques to protect sensitive information. - Generation of detailed data dictionaries and long format datasets for enhanced data understanding and usability. The AnalyticSystem package demonstrates strong R programming skills, experience working with REDCap data, integration with external services, and the development of a comprehensive toolkit for automating analytic dataset generation. The implementation of data anonymization, generation of data dictionaries, and usage of GitHub Actions for CI/CD further showcase the developer's skills and commitment to best practices in software development.

Packaged Electron Shiny App

## Technologies and Tools The repository primarily utilizes the following technologies and tools: - R - Shiny - JavaScript - Node.js - Electron - HTML - CSS - Shell scripting - Git The main technologies used are R and the Shiny web framework for the core application functionality. JavaScript, Node.js, and Electron are used to package the Shiny app as a standalone desktop application. HTML and CSS are used for the UI layout and styling. Shell scripting (Bash) is used for setup and build automation. Git is used for version control. ## Functionality The Farber Screen Machine is a GUI application for analyzing and visualizing chemical screening data. Its main features include: - Creating or uploading a main database file from chemical screening results - Linking and managing matched .sdf files for chemical structure information - Sampling, filtering, and coloring screening data - Exploring screening results with interactive visualizations: - Dual flashlight plots - Chemical similarity vs screen similarity plots - Specific CID chemical similarity vs screen results plots - Comparing a list of CIDs to an .sdf library - Standalone tools for SDF file manipulation and ID mapping The repository contains the core Shiny application code, JavaScript code for the Electron app, shell scripts for setup and building, and configuration files. ## Relevant Skills - Integrating R and Shiny with JavaScript and Node.js using Electron for building cross-platform desktop apps - Modularizing Shiny app code into UI and server components and separate R files for maintainability - Using reactive programming concepts in Shiny (reactive values, expressions, observers) for dynamic UIs - Creating interactive plots and visualizations with packages like plotly - Performing CRUD operations and data transformations on chemical screening data - Comparing chemical structures using various similarity metrics - Parallel processing with the parallel package for improved performance - Packaging R code and dependencies for portability using renv - Cross-platform scripting using Bash - Familiarity with version control using Git ## Example Code - Shiny reactive programming: ```r # Create a reactive value to store the data values <- reactiveValues(data = NULL) # Update the data when an input changes observeEvent(input$file, { values$data <- read.csv(input$file$datapath) }) # Render a plot using the reactive data output$plot <- renderPlot({ plot(values$data) }) ``` - Comparing chemical structures: ```r # Calculate Tanimoto similarity between two molecules fp1 <- rcdk::get.fingerprint(mol1, type = "maccs") fp2 <- rcdk::get.fingerprint(mol2, type = "maccs") tanimoto <- rcdk::tanimoto.coeff(fp1, fp2) ``` - Parallel processing: ```r # Perform a computation in parallel result <- mclapply(data, function(x) { # Computationally intensive task }, mc.cores = detectCores()) ``` ## Notable Achievements - Developed a comprehensive, user-friendly application for chemical screening data analysis and visualization - Integrated various technologies (R, Shiny, JavaScript, Electron) to create a seamless desktop experience - Implemented advanced cheminformatics techniques like chemical structure similarity - Optimized performance through parallel processing and efficient data handling - Utilized software engineering best practices such as modularization and reactive programming - Achieved cross-platform compatibility through containerization and build scripting In summary, the Farber Screen Machine showcases strong skills in full-stack application development, bridging the gap between scientific computing with R and desktop GUI development using web technologies. The project highlights the ability to create domain-specific tools that are both powerful and accessible to end-users.

rSmartsheet

## Technologies and Tools From the repository, the following technologies and tools were utilized: - Programming Language: R - Libraries: - `httr` — for handling HTTP requests. - `jsonlite` — for parsing and generating JSON data. - `readr` — for reading and writing data. - `dplyr` — for data manipulation. - `purrr` — for functional programming tools. - `tidyr` — for data tidying. - `stringr` — for string operations. - `magrittr` — for providing the forward-pipe operator. - API: Smartsheet API - Other Tools: - GitHub — for version control and hosting. - Roxygen2 — for documentation. - RStudio project configuration. ## Functionality The repository houses an R package named `rsmartsheet`, which serves as an SDK for interacting with the Smartsheet API. The primary functionalities of this package include: - Managing Smartsheet sessions, such as setting API keys and working folders. - Creating, modifying, and deleting sheets and their contents. - Managing attachments, including uploading new ones and downloading existing ones. - Retrieving various types of data from Smartsheet, such as sheets, workspaces, and reports. - Specialized functions for handling data types within Smartsheet, like converting all columns to text numbers or colorizing rows based on specified HEX codes. ## Relevant Skills - API Integration: Using the `httr` library to perform API calls to Smartsheet, handling authentication, and managing API responses. - Error Handling: Robust error checking and validation, particularly in checking API keys and handling potential errors from API responses. - Functional Programming: Utilizing `purrr` for operations on lists and applying functions over data structures seamlessly. - Data Manipulation: Effective use of `dplyr` and `tidyr` for transforming and managing data sets. - Documentation: Using Roxygen2 comments for documentation, which is a best practice in R package development. ## Example Code ```R # Using httr to handle API request and response set_smartsheet_api_key <- function(key) { r <- httr::GET("https://api.smartsheet.com/2.0/sheets?&includeAll=false", httr::add_headers('Authorization' = paste('Bearer',key, sep = ' '))) if(grepl("errorCode",httr::content(r, "text"))){ stop("rsmartsheet Error: Your API key was invalid.") } pkg.globals$api_key <- key } ``` ## Notable Achievements - Package Development: Creation of a comprehensive R package interfacing with a complex API, simplifying many tasks into user-friendly functions. - Innovative Solutions: Implementation of a colorizing feature for Smartsheet rows based on HEX codes, showing a creative approach to enhancing user experience and data visualization. - Community Contribution: By making this package open-source and available on GitHub, the developer contributes to the community, enabling others to interact with Smartsheet more effectively in R. - The repository showcases a well-rounded skill set in R programming, API interaction, and package development, making it an excellent portfolio piece for roles involving data manipulation, backend development, or API integrations.

Visualizations

## Technologies and Tools Based on the provided code snippets, here's a list of technologies and tools used in the repository: - Programming Language: R - Frameworks/Libraries: tidyverse (dplyr, tidyr, ggplot2), janitor, kableExtra, DiagrammeR, DiagrammeRsvg, rsvg, base64enc - Tools: RStudio, Git, GitHub, GitHub Actions, Docker, act, actionlint, REDCap (implied) ## Functionality The VisualizationLibrary-main repository is an R package designed to create standardized data visualizations for METRC studies. It focuses on generating tables and figures commonly used in study reports, ensuring consistency across different REDCap projects. Significant Projects/Components: - Standard Tables: Functions to generate tables for enrollment status, baseline characteristics, injury characteristics, follow-up visit status, adverse events, protocol deviations, and more. - Standard Figures: Functions to create consort diagrams, cumulative enrollment plots, and other visualizations relevant to study progress and outcomes. - Data-Matrix Driven Approach: Utilizes standardized variable names and interfaces ("constructs") defined from a data matrix, enabling consistency and cross-study analysis. ## Relevant Skills The code demonstrates several advanced skills relevant to a software developer's resume: - R Programming Expertise: The developer exhibits a strong understanding of R syntax, data manipulation techniques (using tidyverse), and visualization libraries (ggplot2, kableExtra). - Data Visualization: The code effectively utilizes various visualization libraries to create informative and visually appealing tables and figures. - Package Development: The developer demonstrates the ability to create and structure an R package, including documentation, dependencies, and proper use of namespaces. - Version Control and CI/CD: The repository utilizes Git and GitHub for version control and employs GitHub Actions for continuous integration and deployment, showcasing familiarity with modern development workflows. - Problem-Solving Skills: The code reveals the ability to analyze complex data, extract relevant information, and present it in a clear and concise manner. Examples: - Data Manipulation: The `closed_baseline_characteristics_percent` function uses dplyr verbs like `filter`, `select`, `group_by`, and `summarize` to efficiently manipulate and aggregate data for creating percentage tables. - Data Visualization: The `dsmb_consort_diagram` function utilizes the `grViz` package to generate a consort diagram, showcasing the ability to create complex visualizations programmatically. - Package Development: The use of roxygen2 comments to generate documentation and the structured organization of the package demonstrate understanding of R package development best practices. ## Example Code Here are some code snippets illustrating the use of technologies and skills: ```R # Example of data manipulation with dplyr df_final <- df %>% filter(enrolled) %>% group_by(injury_type) %>% summarize(Total = n()) # Example of visualization with ggplot2 g <- ggplot(df, aes(x = facilitycode, y = EnrolledPatients)) + geom_bar(stat = "identity", fill = 'blue3', color = 'black', size = 0.5, width = 0.8) + labs(title = "Number of patients enrolled by site", x = "Site", y = "Number enrolled") + theme_minimal() # Example of roxygen2 documentation #' @title Number of Subjects Screened, Eligible, Enrolled and Not Enrolled #' @description This function visualizes the enrollment totals for each site #' @param analytic This is the analytic data set ... enrollment_status_by_site <- function(analytic) { ... } ``` ## Notable Achievements - Development of a reusable R package for METRC studies, promoting standardization and efficiency in data visualization. - Implementation of a data-matrix driven approach, ensuring consistency and facilitating cross-study analysis. - Contribution to the open-source community by making the VisualizationLibrary package publicly available on GitHub. - Demonstrated expertise in R programming, data visualization, and package development.

Chemical Structure Analysis

## Technologies and Tools - Programming languages: R - Libraries: tidyverse, readxl, xml2, httr, BiocManager, webchem, scattermore, plotly, data.table, feather, digest, ChemmineR, ChemmineOB, fmcsR ## Functionality The ChemicalScreenR package provides tools for processing, analyzing, and visualizing large-scale screening data for a Large Screening Research Project. Key functionalities include: - Unpacking screening results from XML files and converting them to CSV format - Processing multi-sheet Excel files containing screening layout information - Building master database files by combining layout, results, and chemistry data - Calculating and visualizing chemical and screen comparisons - Generating dual flashlight plots for hit selection - Utility functions for filtering, coloring, and sampling hits - Gathering and analyzing PubChem data for compounds - Merging and managing SDF (structure-data file) data ## Relevant Skills - Handling complex file types like XML and SDF - Data processing and transformation using tidyverse functions - Integrating external data sources like PubChem - Implementing advanced algorithms for chemical similarity calculations (e.g., cmp.similarity, fpSim, fmcs) - Creating interactive visualizations using plotly - Parallel processing using the parallel library for improved performance - Crash recovery and progress tracking mechanisms ## Example Code - Unpacking screening results from XML: ```R unpack_screening_results_xml_5dg2do(input_path = "path/to/xml/results", output_path = "path/to/output.csv") ``` - Calculating chemical comparisons between CIDs and an SDF file: ```R cids_chemical_comparisons_to_sdf(sdf_path = "path/to/sdf", cids = c("CID1", "CID2"), output_path = "path/to/output.csv") ``` - Generating a dual flashlight plot: ```R flashlight_plot(input_path = "path/to/summary_results.csv", output_path = "path/to/plot.png") ``` ## Notable Achievements - Developed a comprehensive R package for processing and analyzing large-scale screening data - Implemented efficient data processing pipelines for handling XML, Excel, and SDF files - Integrated PubChem data to enrich compound information - Created interactive visualizations for exploring chemical and screen comparisons - Optimized performance using parallel processing and crash recovery mechanisms The ChemicalScreenR package demonstrates strong R programming skills, proficiency in handling complex file formats, data integration capabilities, and expertise in creating visualizations. The package streamlines the analysis of large-scale screening data and provides a suite of tools tailored for the Farber Lab's research project.

PIP Installable CLI Tool

## Technologies and Tools - Python - OpenAI (openai) - Whisper (openai-whisper) - Chromadb - SoundFile - SoundDevice - Pydub - Pyannote.audio - Faiss-cpu - gTTS - spaCy - BeautifulSoup - googlesearch-python - tiktoken - geocoder - Scrapy ## Functionality - Conversational voice assistant named "Jarvis" powered by OpenAI's GPT models - Real-time speech-to-text and text-to-speech capabilities - Manages conversation history using Chromadb for storing and retrieving context - Speaker recognition and diarization using Pyannote.audio - Integrates with various APIs for weather information, geocoding, web search - Customizable configuration and settings - Command-line interface for user interaction - Implements background processing and multiprocessing for performance ## Relevant Skills - Advanced usage of OpenAI's chat completion API and GPT models for conversational AI - Speech recognition and synthesis integration (Whisper, gTTS, etc.) - Vector database management with Chromadb and Faiss for efficient storage and retrieval - Web scraping and information extraction using BeautifulSoup and Scrapy - API integration (OpenWeatherMap, geonames) - Multiprocessing and multithreading for concurrent execution - Logging and configuration management - Unit testing with Python's unittest module ## Example Code - Conversational flow management in `conversationalist.py`: ```python def converse(memory, interrupt_event, start_event, stop_event): audio_queue = multiprocessing.Queue() text_queue = multiprocessing.Queue() ... while not stop_event.is_set(): try: text, ts = text_queue.get(timeout=1) ... if wake_word in text.lower(): ... new_history = process_assistant_response(...) ... ``` - Speaker recognition using Pyannote.audio in `audio_identifier.py`: ```python class SpeakerIdentifier: def __init__(self, ...): ... self.pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization") def get_speakers(self, audio_data_io): diarization = self.pipeline(audio_data_io) ... for turn, _, speaker in diarization.itertracks(yield_label=True): ... speaker_id = self.get_add_unknown_speaker(speakers[speaker]) ... ``` ## Notable Achievements - Developed a comprehensive conversational AI system with voice interaction - Implemented advanced techniques like speaker diarization and vector database indexing - Integrated multiple APIs and libraries to enhance the assistant's capabilities - Optimized performance through multiprocessing and background task handling - Designed a modular and extensible architecture for easy customization and improvement The code demonstrates strong proficiency in Python programming, particularly in the areas of natural language processing, speech technologies, and system design. The developer exhibits the ability to integrate various libraries and APIs to create a cohesive and functional application. The use of multiprocessing, configuration management, and testing also highlights good software engineering practices.

Query Data

## Technologies and Tools - Programming Languages: R - Frameworks/Libraries: sqldf, tidyverse, knitr, kableExtra, htmlTable, htmlwidgets, plotly, base64enc, AnalyticCodebase, QueryFunctions - Tools: RStudio, roxygen2 (for documentation) ## Functionality - The main functionality is to generate query reports, update the query database, and interact with study data for METRC Redcap studies. - Key functions include: - `query_report()`: Core function to build reports by running functions from the ReportingFunctions library. - `update_query_database()`: Updates and maintains the query database. - `get_query_data()`: Retrieves specified columns of Redcap data. - Various visualization and table generation functions like `timeline_graph()`, `summary_graph()`, `followups_table()`, etc. - Legacy versions of some functions are provided for backward compatibility. ## Relevant Skills - Advanced R programming techniques: - Package development using devtools and roxygen2 for documentation - Utilization of multiple packages for data manipulation (dplyr, tidyr), SQL queries (sqldf), report generation (knitr, kableExtra), and visualizations (plotly) - Custom function development for modular code organization - Global environment management using new.env() - S3 method dispatch for `print()` and subsetting - Data manipulation and analysis: - Querying and filtering data using dplyr and sqldf - Reshaping data with tidyr functions like pivot_longer(), separate_rows() - Joining datasets using left_join(), inner_join() - Report generation: - Dynamic report building by running functions based on templates - Formatting and styling HTML reports using CSS - Creating tables with kableExtra - Interactive visualizations with plotly - Database management: - Updating and maintaining a query database - Compressing and decompressing data for storage efficiency ## Example Code - `query_report()` demonstrates building a comprehensive report by running a series of functions based on a template. - `update_query_database()` shows updating the query database by calculating new queries and handling data compression. - Table and graph generation functions like `summary_graph()` and `confirmations_table()` showcase data manipulation and visualization skills. ## Notable Achievements - Development of a full-featured package for generating query reports and managing a query system for METRC Redcap studies. - Implementation of efficient data compression techniques for the query database. - Dynamic report generation system that allows customization via templates. - Integration of multiple packages for a wide range of functionality from querying data to generating interactive visualizations. The QuerySystem package demonstrates strong R programming skills, experience with package development best practices, utilization of various libraries for data analysis and visualization, and the ability to architect a cohesive system for generating query reports and managing study data. The dynamic report generation and query database management are notable features that showcase software engineering capabilities in an R environment.



© 2024 Elias Weston-Farber. All rights reserved.