## Technologies and Tools
The repository utilizes a combination of programming languages, libraries, and tools primarily focused on building a Docker environment for integrating R and Python with web automation capabilities. Here are the key technologies:
- Programming Languages: Python, R
- Operating System: Ubuntu 22.04 LTS
- Web Browsers and Drivers: Google Chrome, ChromeDriver
- Python Libraries: Flask, Gunicorn, pandas, Scrapy, Selenium, Plotly, Google Cloud SDK, OpenAI, Flask-Sock
- R Packages: tidyverse, knitr, feather, htmlTable, ggwordcloud, DiagrammeR, dplyr, reshape2, RSQLite
- Containerization: Docker
- Virtual Environment Tools: Python venv
- Data Handling and Visualization: matplotlib, plotly, ggplot2 (part of tidyverse)
- Web Scraping and Automation: Selenium, Scrapy
- APIs and Cloud Services: Google Cloud Storage, Firebase, Google API Python Client
## Functionality
The primary functionality of this repository is to create a robust Docker container that combines R and Python for data analysis and automation tasks. The projects included are set up to:
- Run R and Python simultaneously in an isolated Docker environment.
- Provide capabilities for web scraping, data manipulation, and visualization using both Python and R.
- Support automation through web drivers, enabling interaction with web elements for tasks like data collection or testing.
- Integrate with various APIs and cloud services for enhanced data operations and storage solutions.
## Relevant Skills
The repository showcases several advanced coding techniques and architectural designs:
- Containerization and Virtualization: Uses Docker to create a reproducible and consistent development environment that integrates multiple technologies.
- Automation with Selenium: Demonstrates automation of web browsers to perform tasks like data scraping or UI testing, which is crucial for both development and testing phases.
- Complex Dependency Management: Manages a complex set of dependencies in both R and Python, ensuring all necessary libraries and tools are installed correctly within the Docker container.
- Cross-Language Integration: Seamlessly integrates R and Python, allowing for the utilization of the strengths of both languages in data analysis and automation.
## Example Code
```R
lapply(pkg_list, install_if_not_present)
```
This R snippet from `install2.R` demonstrates the use of functional programming to manage package installations dynamically.
### 4. Illustrate with Examples
Dockerfile Example:
```Dockerfile
RUN apt install python3-launchpadlib python3.11-venv -y && \
python3.11 -m venv env && \
. env/bin/activate && \
python3.11 -m pip install -r requirements.txt
```
This segment of the Dockerfile highlights the setup of a Python virtual environment and the installation of dependencies from a requirements file, essential for isolating and managing project-specific dependencies.
## Notable Achievements
- Innovative Integration: The integration of Python and R within a single Docker container is notable for its potential to streamline workflows in data science and automation.
- Automation Excellence: The setup for browser-based automation using Selenium and ChromeDriver in a Dockerized environment represents a significant achievement in terms of infrastructure setup for testing and data scraping.
- Dependency and Environment Management: Effective management of a large number of dependencies across two programming languages within a Docker environment showcases high proficiency in environment setup and maintenance.
These achievements reflect a deep understanding of both system architecture and the practical application of programming skills in real-world projects.