GMU’s – Center of Applied Proteins and Molecular Medicine (CAPMM)

What Am I Doing?

Plotting a 12 x 1000 dataset … IN A MEANINGFUL WAY

MORE DETAILS
The goal was to see if I could bring a different perspective into the interpretation of RPPA data (stage 4). I’ll be clear: I don’t have much insight on this subject. I was given a bunch of data to work with. There were restraints on what could and couldn’t not be compared (e.g. antibody counts could not be compared side by side due to affinity differences, possible population differences between proteins invalidated some percentage comparisons). It quickly became obvious that this data was difficult to work with.

The data was RPPA (level 4) data collected in effort to uncover any significant changes in the EGFR molecular signal pathway of lung cancer cells. The data structure was a typical “12 x [number of different antibodies used]” (3 samples of 4 timestamps). The PI and supervisor (Mariaelena and Elisa, respectively) were looking for ways to make sense of the data.


What was I thinking?

I like data analytics and proteins.

I wanted to step into the field of proteomics.

I know it will come in handy one day.

MORE DETAILS (context)
I like sussing out clues from data. The ability to extract interpretations from pure data (e.g. numbers, straight from the experiments) is critical, within any pioneering task. One reason why I wanted to give it a go.

Also, I’ve always been curious about proteins. Back in my undergraduate studies, I mainly stayed within the mechanical and robotic regions of biomedical engineering, but there was always a small inclination to move into proteomics; I never made a real effort to expand on that.

However, after a couple years later (with years of practical experience under me), I thought: “proteins? … cannot be that hard to analyze”. I reached out to Elisa, who was part of the GMU’s CAPMM, and was able to work within her lab. Brilliant, I had an opportunity.

Lastly, I know working in proteomics will benefit me in the future. I can’t say exactly why, but I know at some point I will be looking into bio-robotics. This opportunity seemed like a great first step into understanding biological systems.


Screen Shot 2021-01-06 at 12.10.26 AM
Screen Shot 2021-01-06 at 12.10.26 AM
previous arrowprevious arrow
next arrownext arrow
Shadow


What I made?

A website for BIG-Data graphing.

Graphs made to comprehend HUGE datasets.



MORE DETAILS (jargon)
The end-product was a website made to upload csv versions of the RPPA data and create an assortment of interactive graphs. I’ll expand on how I got to this point.

The biggest issue was the sheer amount of data points. I was stuck on simply plotting the data in a comprehensible manner. When you have a 12 x 1000 dataset, it’s hard to even look at what you have. The graphs could not be static images. The graphs had to be interactive with a zoom in/out ability, to grasp the spatial arrangement of the data points (for both 2D and 3D versions). I used Plotly (python through the Jupyter interface) to create the interactive 2d and 3D plots of the data. The 3D versions were great for noticing the progressive decline in antibody counts, as you go down the EGFR molecular signal pathway. The proteins deeper in the molecular signal pathway had significantly lower values, in comparison to the more superficial proteins.

The next issue was filtering the data. I tried applying different manipulations including averages, peak-values only, displaying just one sample, displaying all samples, using coloring to emphasize categories, and trying unconventional methods like superimposing the Willison amplitude (which is used for EMG analysis) onto the dataset. The Willison amplitude was the 1st attempt to see whether filters, not typically associated in molecular signal data processing, could be reasonably used.

As one could infer, there are lots of graph iterations. To steer away from individually creating each graph, I created a website to execute the graph-creating python scripts and to conveniently house all the resulting graphs. The massive benefit of the website was having the ability to quickly view a new RPPA dataset simply by uploading its respective file. The visuals helped with identifying outliers, errors, duplications, and getting a feel for how the raw data turned out.

Having an online website was recently done. For the most part, I had provided a local website. Not the best option, but was the best option at the time. I’ll talk more about the impacts of using an offline website later on.


Did I help?

Maybe (not enough to be in papers).

The team needed someone who could bring a different perspective.

I was the pioneer.

MORE DETAILS (explanations)
I brought different perspectives.

Inherent with strange views, much of the perspectives were not directly useful.

No matter; it was expected with my lack of expertise on molecular signal pathways. I found satisfaction in presenting the unusual perspectives. The most satisfying part is seeing the real-time progression of how the team’s wording (as in their manner of verbal description of the molecular signal pathway) slowly changed. I was having an effect on their outlooks. Slow, but steady progress, as far as I could tell.

Moving onto the website. The fact that it was a local website complicated the process of bringing up the website on any computer. There were no free servers available at that time (I completely forgot about AWS) and had to create a bash file to execute terminal commands on their computer, that would bring up the web page itself. Not an intuitive thing to do if you’re not familiar with web pages and servers. Providing a local website (for the most part) probably decreased the likelihood of the PI and supervisor from using it. Understandably so.

Also, there was a more obscure issue: if I, myself, don’t use this tool frequently in these analytics, I shouldn’t expect others to do so. I personally need to use the website more and see what the practical benefits of the tool are.


Big thanks to Elisa Baldelli and Mariaelena Pierobon for providing the opportunity.