Thứ Ba, 12 tháng 2, 2019

Waching daily Feb 12 2019

>> Bill Reinhold: Good afternoon.

My name's Bill Reinhold.

I work at the National Cancer Institute, and

today I will be describing to you

a new web application.

This is an introduction to CellMiner Cross database,

a web application for the exploration of cancerous

cell lines for pharmacogenomics.

Our URL is here.

CellMiner CBD makes accessible multiple cancer

cell line databases.

The institutes that have developed these are shown

here on the left.

The cell line sets that they've developed are

shown on the right.

The URLs for each institute are shown here

on the bottom.

It allows the user to access both drug activity

and molecular information from these databases.

So, this is the information that's

currently present.

You see there's lots of molecular information for

the NCI-60 and lots of drugs -- drug information.

Many more cells for the other cell line databases.

The partial overlap of cell lines between

databases provides the opportunity for

cross-comparisons and study augmentation.

So, see, here you see the amount of overlap between

each pairing of databases.

CellMiner CBD provides multiple functions for

integrative genomics and pharmacogenomics analyses

of cancer cell lines -- those are shown here

on the bottom, and we'll be going over those

individually as we switch now to the web application.

So, that's shown here.

Okay, so we'll start with the univariate analysis page.

There's the tab for it, then plot data.

So, our plot data default input is the gene SLFN11.

We're looking at its transcript expression, and

it's being compared to the drug topotecan --

its drug activity, both from the NCI-60 cell lines.

The resultant scatter plot is shown here.

The SLFN11 expression is on the X axis

the topotecan activity on the Y.

Each dot is a cell line, and this relationship

between the two is causal.

I'll switch to a second example to cite so you can

see how things work.

Now I'm going to the CCLE database.

I'm going to change the gene.

I'm going to go, for the Y axis, to the GDSC data

set, going to DNA methylation.

Go to the same gene.

This is E-cadherin.

And here's the resultant scatter plot; you can see

here the expression of CDH1 on the X axis

and on the Y axis, the methylation levels.

And you get this nice L-shaped distribution

which shows you that as methylation levels

go up, expression goes to background.

This is epigenetic regulation of a gene that

you can demonstrate in this fashion.

Something else you can do here is choose which

tissue of origins you would like to include.

So, I'm going to choose brain

and then add on lung.

I'm holding "control" on a PC.

On a Mac, you would use "command."

And so, you can see you can get just the

tissues of origin of interest to you.

So in this example, you can see that

E-cadherin is mostly off in brain and

on in lung, with methylation more

influential in brain.

For view data, it basically displays the

data you've chosen in this input on the left and

allows its download.

So, here's the download tab here.

Compare patterns.

We'll switch back to the NCI-60; SLFN11.

And so, now what's brought up is the activity levels

of multiple drugs.

You'll see here there's abbreviations.

T1 stands for topoisomerase-1 inhibitors;

A7 is an alkylating agent; and if

you scroll down, you find many of these kinds of drugs.

Further down, you'll find some T2s,

which are topoisomerase-2 inhibitors.

These again are related causally to this gene,

which affects DNA-damage drugs and their responses.

And these are all DNA-damaging drugs.

Now we'll switch to the regression models tab.

Biologically, regression models are used if one

wishes to explore the potential effects of

multiple factors on a predictor response.

So, in the example we'll choose,

you can stick with the NCI-60.

We'll use the drug topotecan, and then our

predictive identifiers will be SLFN11, which you

just looked at, and BPTF, which is a transcription

factor which binds acetylated chromatin.

Note that multiple predictors can be chosen,

which would have to be the case for a regression model.

Either linear regression or Lasso machine learning

approaches may be selected.

In the absence of known predictors, Lasso is used.

Under the heat map tab here, you're provided with

a visualization of both the response and predictor data.

So, this is the response data here in the first

row; the two second rows are the predictor data.

The number of visualized cell lines may be varied --

that's chosen here -- and the names of the cell

lines are on the bottom.

Data provides the ability to copy or download

the response or predictor data.

Here's the copy function; here's the download.

Plot provides the actual versus predicted response

plot, the correlation of the P value.

So, here's the predicted values on the X axis and

the actual on the Y.

There's the correlation; there's the P value.

Cross-validation provides the actual versus 10X

cross-validated predicted response, an approach used

commonly in statistics to assess prediction confidence.

Technical details provides the mathematical details

for the actual versus predicted response model.

So, that would be this.

Partial correlation functions by hitting

this run button.

I'll hit it, so it'll be calculating.

You'll see it's calculating down here on

the bottom-right.

What this does this calculates the highest

correlated predictors once the effect of the previous

selected predictors are removed.

So, that is to say, in this case,

we have SLFN11 and BPTF.

Once the effect of these two are taken out, the

question is what gene has the next strongest effect

on the topotecan activity?

And you'll see in this case it's brought up a

gene called UBB, which is a DNA damage response gene.

Well, that makes a lot of sense, because topotecan

is a DNA damage drug.

Metadata allows one to see the databases available

for each cell line set and to download any of them.

So, in this case, you see I have the NCI-60 up here first.

By hitting this arrow here, you can see all

these different databases are available.

If I switch to a different set -- say, CCLE -- you

see a different set of databases are available.

Each one may be downloaded here at download data type.

The search IDs tab allows one to search for the

available identifiers for any data set.

So, in this example, where I'm looking at expression

data, in the gene name I can put in the famous

TP53, and you see all the genes with TP53

in its name are brought up.

In the help section we provide the motivation,

background information, descriptions,

and illustrations for how to use CellMiner CDB.

So, for instance, if you wanted more information on

how to do regression models, you click there,

and there's a bunch of text here describing what

you do, as well as illustrations of how to do it.

And so, here's the people that have contributed to

this project.

Let me note specifically Vinodh Rajapakse and

Augustin Luna, who are the ones that did

the initial work on this.

Currently, Fathi Elloumi is the one handling the project.

If you need help, probably be best to contact either

myself or Fathi Elloumi.

Our emails and phones and whatnot are shown here.

Thanks very much for your attention.

Please contact us if you have any thoughts on

questions you have or potential collaborations.

We do expect to try to make some additional

videos to give more detailed explanations of

some of the more difficult areas of the web application.

Không có nhận xét nào:

Đăng nhận xét