## Technologies and Tools
- Programming languages: R
- Libraries: tidyverse, readxl, xml2, httr, BiocManager, webchem, scattermore, plotly, data.table, feather, digest, ChemmineR, ChemmineOB, fmcsR
## Functionality
The ChemicalScreenR package provides tools for processing, analyzing, and visualizing large-scale screening data for a Large Screening Research Project. Key functionalities include:
- Unpacking screening results from XML files and converting them to CSV format
- Processing multi-sheet Excel files containing screening layout information
- Building master database files by combining layout, results, and chemistry data
- Calculating and visualizing chemical and screen comparisons
- Generating dual flashlight plots for hit selection
- Utility functions for filtering, coloring, and sampling hits
- Gathering and analyzing PubChem data for compounds
- Merging and managing SDF (structure-data file) data
## Relevant Skills
- Handling complex file types like XML and SDF
- Data processing and transformation using tidyverse functions
- Integrating external data sources like PubChem
- Implementing advanced algorithms for chemical similarity calculations (e.g., cmp.similarity, fpSim, fmcs)
- Creating interactive visualizations using plotly
- Parallel processing using the parallel library for improved performance
- Crash recovery and progress tracking mechanisms
## Example Code
- Unpacking screening results from XML:
```R
unpack_screening_results_xml_5dg2do(input_path = "path/to/xml/results", output_path = "path/to/output.csv")
```
- Calculating chemical comparisons between CIDs and an SDF file:
```R
cids_chemical_comparisons_to_sdf(sdf_path = "path/to/sdf", cids = c("CID1", "CID2"), output_path = "path/to/output.csv")
```
- Generating a dual flashlight plot:
```R
flashlight_plot(input_path = "path/to/summary_results.csv", output_path = "path/to/plot.png")
```
## Notable Achievements
- Developed a comprehensive R package for processing and analyzing large-scale screening data
- Implemented efficient data processing pipelines for handling XML, Excel, and SDF files
- Integrated PubChem data to enrich compound information
- Created interactive visualizations for exploring chemical and screen comparisons
- Optimized performance using parallel processing and crash recovery mechanisms
The ChemicalScreenR package demonstrates strong R programming skills, proficiency in handling complex file formats, data integration capabilities, and expertise in creating visualizations. The package streamlines the analysis of large-scale screening data and provides a suite of tools tailored for the Farber Lab's research project.