In progress

Exploratory Data Analysis and Predictive Modeling with Synthetic Splicing

Published on the October 25, 2024 in IT & Programming

About this project

Open

Gene splicing is a critical biological process that occurs in eukaryotic cells to generate functional messenger RNA (mRNA) molecules from the precursor RNA. The primary purpose of gene splicing is to remove non-coding regions (introns) and join the coding regions (exons) to create a mature mRNA molecule that can be translated into a protein.
In this project, you will work with a synthetic dataset containing the expressions of three splicing factors and a related splicing event across 100 subjects. Your objective is to perform exploratory data analysis (EDA), data visualization, and predictive modeling to gain insights into the relationships between splicing factors and the splicing event.

Project overview

Tasks: 1. Data Exploration: o Load the provided dataset, "splicing_data.csv," into R o Perform summary statistics of the dataset to get an overview. O Check for missing values and outliers in the dataset. O Generate a correlation matrix to assess the relationships between the splicing factors and the splicing event. 2. Data Visualization: o Create visualizations to explore the distribution of each splicing factor's expression and the splicing event. O Create graphs to visualize the relationships between each individual splicing factor and the splicing event. 3. Predictive Modeling: o Split the dataset into a training set (70%) and a testing set (30%). O Build a predictive model to estimate the splicing event values based on the expressions of the three splicing factors. You can use linear regression or any other appropriate modeling technique. O Evaluate the model's performance using appropriate metrics (e.g., R-squared, Mean Squared Error) on the testing set. O Interpret the model's coefficients to understand the relationships between the splicing factors and the splicing event. 4. Conclusions and Recommendations: o Summarize the key findings from your exploratory data analysis and modeling. O Provide insights into which splicing factors are most strongly associated with the splicing event. O Offer recommendations or suggestions for further analysis or experiments based on your results.

Category IT & Programming
Subcategory Other
Project size Small
Is this a project or a position? Project
I currently have Not applicable
Required availability As needed

Delivery term: Not specified

Skills needed