>> Bill Reinhold: Good afternoon.
My name's Bill Reinhold.
I work at the National Cancer Institute, and
today I will be describing to you
a new web application.
This is an introduction to CellMiner Cross database,
a web application for the exploration of cancerous
cell lines for pharmacogenomics.
Our URL is here.
CellMiner CBD makes accessible multiple cancer
cell line databases.
The institutes that have developed these are shown
here on the left.
The cell line sets that they've developed are
shown on the right.
The URLs for each institute are shown here
on the bottom.
It allows the user to access both drug activity
and molecular information from these databases.
So, this is the information that's
currently present.
You see there's lots of molecular information for
the NCI-60 and lots of drugs -- drug information.
Many more cells for the other cell line databases.
The partial overlap of cell lines between
databases provides the opportunity for
cross-comparisons and study augmentation.
So, see, here you see the amount of overlap between
each pairing of databases.
CellMiner CBD provides multiple functions for
integrative genomics and pharmacogenomics analyses
of cancer cell lines -- those are shown here
on the bottom, and we'll be going over those
individually as we switch now to the web application.
So, that's shown here.
Okay, so we'll start with the univariate analysis page.
There's the tab for it, then plot data.
So, our plot data default input is the gene SLFN11.
We're looking at its transcript expression, and
it's being compared to the drug topotecan --
its drug activity, both from the NCI-60 cell lines.
The resultant scatter plot is shown here.
The SLFN11 expression is on the X axis
the topotecan activity on the Y.
Each dot is a cell line, and this relationship
between the two is causal.
I'll switch to a second example to cite so you can
see how things work.
Now I'm going to the CCLE database.
I'm going to change the gene.
I'm going to go, for the Y axis, to the GDSC data
set, going to DNA methylation.
Go to the same gene.
This is E-cadherin.
And here's the resultant scatter plot; you can see
here the expression of CDH1 on the X axis
and on the Y axis, the methylation levels.
And you get this nice L-shaped distribution
which shows you that as methylation levels
go up, expression goes to background.
This is epigenetic regulation of a gene that
you can demonstrate in this fashion.
Something else you can do here is choose which
tissue of origins you would like to include.
So, I'm going to choose brain
and then add on lung.
I'm holding "control" on a PC.
On a Mac, you would use "command."
And so, you can see you can get just the
tissues of origin of interest to you.
So in this example, you can see that
E-cadherin is mostly off in brain and
on in lung, with methylation more
influential in brain.
For view data, it basically displays the
data you've chosen in this input on the left and
allows its download.
So, here's the download tab here.
Compare patterns.
We'll switch back to the NCI-60; SLFN11.
And so, now what's brought up is the activity levels
of multiple drugs.
You'll see here there's abbreviations.
T1 stands for topoisomerase-1 inhibitors;
A7 is an alkylating agent; and if
you scroll down, you find many of these kinds of drugs.
Further down, you'll find some T2s,
which are topoisomerase-2 inhibitors.
These again are related causally to this gene,
which affects DNA-damage drugs and their responses.
And these are all DNA-damaging drugs.
Now we'll switch to the regression models tab.
Biologically, regression models are used if one
wishes to explore the potential effects of
multiple factors on a predictor response.
So, in the example we'll choose,
you can stick with the NCI-60.
We'll use the drug topotecan, and then our
predictive identifiers will be SLFN11, which you
just looked at, and BPTF, which is a transcription
factor which binds acetylated chromatin.
Note that multiple predictors can be chosen,
which would have to be the case for a regression model.
Either linear regression or Lasso machine learning
approaches may be selected.
In the absence of known predictors, Lasso is used.
Under the heat map tab here, you're provided with
a visualization of both the response and predictor data.
So, this is the response data here in the first
row; the two second rows are the predictor data.
The number of visualized cell lines may be varied --
that's chosen here -- and the names of the cell
lines are on the bottom.
Data provides the ability to copy or download
the response or predictor data.
Here's the copy function; here's the download.
Plot provides the actual versus predicted response
plot, the correlation of the P value.
So, here's the predicted values on the X axis and
the actual on the Y.
There's the correlation; there's the P value.
Cross-validation provides the actual versus 10X
cross-validated predicted response, an approach used
commonly in statistics to assess prediction confidence.
Technical details provides the mathematical details
for the actual versus predicted response model.
So, that would be this.
Partial correlation functions by hitting
this run button.
I'll hit it, so it'll be calculating.
You'll see it's calculating down here on
the bottom-right.
What this does this calculates the highest
correlated predictors once the effect of the previous
selected predictors are removed.
So, that is to say, in this case,
we have SLFN11 and BPTF.
Once the effect of these two are taken out, the
question is what gene has the next strongest effect
on the topotecan activity?
And you'll see in this case it's brought up a
gene called UBB, which is a DNA damage response gene.
Well, that makes a lot of sense, because topotecan
is a DNA damage drug.
Metadata allows one to see the databases available
for each cell line set and to download any of them.
So, in this case, you see I have the NCI-60 up here first.
By hitting this arrow here, you can see all
these different databases are available.
If I switch to a different set -- say, CCLE -- you
see a different set of databases are available.
Each one may be downloaded here at download data type.
The search IDs tab allows one to search for the
available identifiers for any data set.
So, in this example, where I'm looking at expression
data, in the gene name I can put in the famous
TP53, and you see all the genes with TP53
in its name are brought up.
In the help section we provide the motivation,
background information, descriptions,
and illustrations for how to use CellMiner CDB.
So, for instance, if you wanted more information on
how to do regression models, you click there,
and there's a bunch of text here describing what
you do, as well as illustrations of how to do it.
And so, here's the people that have contributed to
this project.
Let me note specifically Vinodh Rajapakse and
Augustin Luna, who are the ones that did
the initial work on this.
Currently, Fathi Elloumi is the one handling the project.
If you need help, probably be best to contact either
myself or Fathi Elloumi.
Our emails and phones and whatnot are shown here.
Thanks very much for your attention.
Please contact us if you have any thoughts on
questions you have or potential collaborations.
We do expect to try to make some additional
videos to give more detailed explanations of
some of the more difficult areas of the web application.
Không có nhận xét nào:
Đăng nhận xét