## Technologies and Tools - Programming languages: R - Libraries: tidyverse, readxl, xml2, httr, BiocManager, webchem, scattermore, plotly, data.table, feather, digest, ChemmineR, ChemmineOB, fmcsR ## Functionality The ChemicalScreenR package provides tools for processing, analyzing, and visualizing large-scale screening data for a Large Screening Research Project. Key functionalities include: - Unpacking screening results from XML files and converting them to CSV format - Processing multi-sheet Excel files containing screening layout information - Building master database files by combining layout, results, and chemistry data - Calculating and visualizing chemical and screen comparisons - Generating dual flashlight plots for hit selection - Utility functions for filtering, coloring, and sampling hits - Gathering and analyzing PubChem data for compounds - Merging and managing SDF (structure-data file) data ## Relevant Skills - Handling complex file types like XML and SDF - Data processing and transformation using tidyverse functions - Integrating external data sources like PubChem - Implementing advanced algorithms for chemical similarity calculations (e.g., cmp.similarity, fpSim, fmcs) - Creating interactive visualizations using plotly - Parallel processing using the parallel library for improved performance - Crash recovery and progress tracking mechanisms ## Example Code - Unpacking screening results from XML: ```R unpack_screening_results_xml_5dg2do(input_path = "path/to/xml/results", output_path = "path/to/output.csv") ``` - Calculating chemical comparisons between CIDs and an SDF file: ```R cids_chemical_comparisons_to_sdf(sdf_path = "path/to/sdf", cids = c("CID1", "CID2"), output_path = "path/to/output.csv") ``` - Generating a dual flashlight plot: ```R flashlight_plot(input_path = "path/to/summary_results.csv", output_path = "path/to/plot.png") ``` ## Notable Achievements - Developed a comprehensive R package for processing and analyzing large-scale screening data - Implemented efficient data processing pipelines for handling XML, Excel, and SDF files - Integrated PubChem data to enrich compound information - Created interactive visualizations for exploring chemical and screen comparisons - Optimized performance using parallel processing and crash recovery mechanisms The ChemicalScreenR package demonstrates strong R programming skills, proficiency in handling complex file formats, data integration capabilities, and expertise in creating visualizations. The package streamlines the analysis of large-scale screening data and provides a suite of tools tailored for the Farber Lab's research project.


© 2024 Elias Weston-Farber. All rights reserved.