WEBVTT

170
00:00:00.934 --> 00:00:04.934
turn things over to one of our
first speakers. Felicia Barnett,
who is

171
00:00:04.934 --> 00:00:08.934
joining us from U.S. EPA and the
office of research

172
00:00:08.934 --> 00:00:09.934
and Feldman to provide opening
remarks.

173
00:00:09.934 --> 00:00:11.934
Felicia, whenever you are ready,
please begin.

174
00:00:11.934 --> 00:00:14.934
>> Thank you Jean. I just want
to welcome everybody. This is
the second

175
00:00:14.934 --> 00:00:17.934
in our series of three seminars
in the utilization of ProUCL
software

176
00:00:17.934 --> 00:00:20.934
tools. It has been introduced in
those in the first seminar maybe

177
00:00:20.934 --> 00:00:21.934
remember my name is Felicia
Barnett.

178
00:00:21.934 --> 00:00:24.934
I am the director of the EPA
office of research and
development site

179
00:00:24.934 --> 00:00:30.934
characterization and technical
support center. I am going to
refer to it

180
00:00:30.934 --> 00:00:34.934
as SCMTSC which is

181
00:00:34.934 --> 00:00:38.934
provide support on hazardous
waste issues to

182
00:00:38.934 --> 00:00:43.000
the EPA program and resource
offices created using ProUCL as
part of

183
00:00:43.000 --> 00:00:45.934
that function.

184
00:00:45.934 --> 00:00:48.934
ProUCL is a statistical software
packaged three on the EPA

185
00:00:48.934 --> 00:00:56.934
website for the analysis of
temperamental data sets with and
without Logitech

186
00:00:56.934 --> 00:00:57.934
observations but provide
statistical methods and
graphical

187
00:00:57.934 --> 00:01:05.934
tools to address temperamental
sampling and statistical issues.

188
00:01:05.934 --> 00:01:08.934
It is not intended to be able to
address every situation, and
were

189
00:01:08.934 --> 00:01:11.934
to be the only option for
environmental statistical
analysis. It was developed

190
00:01:11.934 --> 00:01:13.934
to be an easy to use option for
many standard statistical issues

191
00:01:13.934 --> 00:01:16.934
encountered when evaluating
data.

192
00:01:16.934 --> 00:01:19.934
The second seminar in the series
will focus on using ProUCL

193
00:01:19.934 --> 00:01:23.934
for trend analysis. Has genes in
the first seminar in the series

194
00:01:23.934 --> 00:01:32.934
is always already available for
viewing online.

195
00:01:32.934 --> 00:01:35.934
If you were unable to attend
went to view specific parts at
your own

196
00:01:35.934 --> 00:01:38.934
pace. We hope you find ProUCL to
be a useful tool in these
seminars

197
00:01:38.934 --> 00:01:40.934
assist you with understanding
both abilities and navigating
its program.

198
00:01:40.934 --> 00:01:45.934
We will be responding to all
questions here and during the
session or after

199
00:01:45.934 --> 00:01:59.934
and documenting those sessions.

200
00:01:59.934 --> 00:02:01.934
Additionally, SCMTSC provides
internal and some limited
external responses

201
00:02:01.934 --> 00:02:04.934
to your those who use it. Does
anyone have questions or issues
with using

202
00:02:04.934 --> 00:02:07.934
ProUCL? We did try to enter
those as much as possible. On
the ProUCL

203
00:02:07.934 --> 00:02:10.934
website, you can find under the
contact my name and my contact
information.

204
00:02:10.934 --> 00:02:16.934
I recommend contacting me via
email with as much detail as
possible,

205
00:02:16.934 --> 00:02:19.934
including screenshots. We will
try to respond to your questions
quickly.

206
00:02:19.934 --> 00:02:24.934
I am going to now turn it over
to ProUCL over to Travis Lynn
come

207
00:02:24.934 --> 00:02:29.934
Hatfield to start the
instruction on the data.

208
00:02:29.934 --> 00:02:32.934
Thank you. Travis.

209
00:02:32.934 --> 00:02:37.934
>> Yes, hello.

210
00:02:37.934 --> 00:02:43.067
So, thank you guys and thanks
everyone for joining us for our
second part

211
00:02:43.067 --> 00:02:47.000
of the ProUCL series.

212
00:02:47.000 --> 00:02:50.000
As a Jean and Felicia mentioned,
we will be focusing on the trend

213
00:02:50.000 --> 00:02:53.000
analysis features in ProUCL
today.

214
00:02:53.000 --> 00:02:56.000
However, I am not going to be
the one that is giving the
majority

215
00:02:56.000 --> 00:02:59.000
of the presentation. Alona
Carson will be doing that but I
figured

216
00:02:59.000 --> 00:03:04.000
real quick I would give a nice
recap of what we

217
00:03:04.000 --> 00:03:07.000
talked about in our first one,
just sent some of those aspects

218
00:03:07.000 --> 00:03:08.000
we are going to be using today.

219
00:03:08.000 --> 00:03:14.000
Really what we talked about for
most of our last presentation

220
00:03:14.000 --> 00:03:23.000
was really getting to look at
exploratory data analysis inside
ProUCL, looking

221
00:03:23.000 --> 00:03:25.000
at summary statistics and
different graphical methods,
including box

222
00:03:25.000 --> 00:03:28.000
plots and histograms and
whatnot.

223
00:03:28.000 --> 00:03:34.000
And then we also explored fit
tests, as well as Q2 plots to

224
00:03:34.000 --> 00:03:38.000
assess the normality of daytime,
which is something that will
definitely

225
00:03:38.000 --> 00:03:42.067
be coming up today. We also
looked at

226
00:03:42.067 --> 00:03:46.000
layers and options for
hypothesis testing in ProUCL. So
while we

227
00:03:46.000 --> 00:03:52.000
are going to be definitely using
the goodness of fit test today
again,

228
00:03:52.000 --> 00:03:56.000
I hope everybody either

229
00:03:56.000 --> 00:03:59.000
got to maybe watch it last time
or watch it in between since it

230
00:03:59.000 --> 00:04:03.000
was recorded but if not,
hopefully we don't totally lose
you. Other

231
00:04:03.000 --> 00:04:06.000
than that, as Jean and Felicia
mentioned,

232
00:04:06.000 --> 00:04:09.000
it is definitely available
online.

233
00:04:09.000 --> 00:04:13.000
And so will this one be as soon
as we are done.

234
00:04:13.000 --> 00:04:18.000
But, with that in mind, I think
I will turn it over to Polona

235
00:04:18.000 --> 00:04:20.000
and we can get started talking
about trend analysis.

236
00:04:20.000 --> 00:04:25.000
>> Thank you Travis. Today, we
will look at tools and features
for

237
00:04:25.000 --> 00:04:31.000
trend analysis. First, we will
look at some exploratory tools

238
00:04:31.000 --> 00:04:37.000
so time serials is a nice tool
for that and then

239
00:04:37.000 --> 00:04:42.067
we will continue with tests for
trend analysis.

240
00:04:42.067 --> 00:04:48.000
The first are Mann-Kendall and

241
00:04:48.000 --> 00:04:49.000
Theil-Sen Line Test.

242
00:04:49.000 --> 00:04:53.000
The second half of today's
webinar will look at ordinary
list square

243
00:04:53.000 --> 00:04:59.000
regression. Some of you may know
as linear regression.

244
00:04:59.000 --> 00:05:10.000
So, now I just want to mention

245
00:05:10.000 --> 00:05:12.000
like in previous sessions
presented by Travis, I will try
to keep the

246
00:05:12.000 --> 00:05:16.000
statistical jargon as slow as
possible and do my best to
explain so the

247
00:05:16.000 --> 00:05:20.000
practitioners in the audience
can follow without hopefully too
much

248
00:05:20.000 --> 00:05:26.000
trouble. So, Felicia already
mentioned

249
00:05:26.000 --> 00:05:31.000
ProUCL software is intended for
basic statistical calculation
using

250
00:05:31.000 --> 00:05:34.000
evaluation of contaminated
sites.

251
00:05:34.000 --> 00:05:40.000
So, today we will be touching a
little bit more advanced

252
00:05:40.000 --> 00:05:49.000
techniques. If you have
questions, please ask them
during

253
00:05:49.000 --> 00:05:53.000
the session and Jean and Travis
will monitor them and try to
answer

254
00:05:53.000 --> 00:05:56.000
as many as possible.

255
00:05:56.000 --> 00:06:00.000
But, if we have more, the
answers will be available at a
later time.

256
00:06:00.000 --> 00:06:05.000
So, at this point, I just want
to

257
00:06:05.000 --> 00:06:10.000
refresh our memory from previous
time that ProUCL software is
accompanied

258
00:06:10.000 --> 00:06:13.000
by two valuable documents. These
are user guides

259
00:06:13.000 --> 00:06:19.000
and technical guides. You can
find them both in document

260
00:06:19.000 --> 00:06:28.000
folder of the ProUCL
installation folder. So consult
those when

261
00:06:28.000 --> 00:06:32.000
you are working on your project
if you don't know where to go
exactly.

262
00:06:32.000 --> 00:06:38.000
Really valuable resources but if
you still have an issue where

263
00:06:38.000 --> 00:06:44.000
to go, it is probably a time to
consult a statistician for help

264
00:06:44.000 --> 00:06:47.000
with your analysis.

265
00:06:47.000 --> 00:06:52.000
So, at this point, if we can
please share

266
00:06:52.000 --> 00:06:54.000
my screen. So that I can start
the demonstration

267
00:06:54.000 --> 00:06:57.000
of ProUCL.

268
00:06:57.000 --> 00:07:02.000
>> The other screen, not my
screen Jean.

269
00:07:02.000 --> 00:07:08.000
>> I have it here.

270
00:07:08.000 --> 00:07:17.000
It is working.

271
00:07:17.000 --> 00:07:26.000
So we should be good now I hope.

272
00:07:26.000 --> 00:07:32.000
So we are good now. Just a few
navigational remarks.

273
00:07:32.000 --> 00:07:38.000
Here in this middle center is
where we to auto analysis

274
00:07:38.000 --> 00:07:47.000
so when we see the data on the
left-hand side, we have

275
00:07:47.000 --> 00:07:50.000
a navigation panel where ProUCL
displays a list of

276
00:07:50.000 --> 00:07:56.000
open data sets and generated
outputs. ProUCL does

277
00:07:56.000 --> 00:08:01.000
assign Suffolk splintering
output but if you want to make
them more

278
00:08:01.000 --> 00:08:10.000
meaningful for your application,
then you just go to

279
00:08:10.000 --> 00:08:13.000
file and save. We don't have
anything loaded so I can't show
you that

280
00:08:13.000 --> 00:08:19.000
but it would be file, save file,
and rename it however

281
00:08:19.000 --> 00:08:21.000
you want.

282
00:08:21.000 --> 00:08:25.000
It is disbelieving and green
color basically the log of the
actions

283
00:08:25.000 --> 00:08:31.000
you are performing. An orange
color display and warning
messages

284
00:08:31.000 --> 00:08:36.000
and in red color, it displays
the ever messages.

285
00:08:36.000 --> 00:08:41.000
So, those two orange and red are
something you

286
00:08:41.000 --> 00:08:44.000
may want to pay attention if you
don't think you are not
accomplishing

287
00:08:44.000 --> 00:08:49.000
what you want to do with the
software.

288
00:08:49.000 --> 00:08:55.000
Let's hope we won't have those
today for the presentation. So,

289
00:08:55.000 --> 00:09:00.000
we will be using two data sets
today. They both

290
00:09:00.000 --> 00:09:04.000
are available within the data
folder in ProUCL. Sewn they are
some of

291
00:09:04.000 --> 00:09:06.000
the training data sets
available.

292
00:09:06.000 --> 00:09:10.000
You can also download them from
the

293
00:09:10.000 --> 00:09:13.000
link site for today's webinar.

294
00:09:13.000 --> 00:09:17.000
To make it easier for the folder
so

295
00:09:17.000 --> 00:09:23.000
I can reach them quickly, let me
these two files are

296
00:09:23.000 --> 00:09:25.000
and W189.

297
00:09:25.000 --> 00:09:29.000
We will use a little bit at the
beginning of the demonstration
and

298
00:09:29.000 --> 00:09:36.000
then for most of the parts, we
will use trend and W and

299
00:09:36.000 --> 00:09:42.000
D to. Before we go further, I
just wanted

300
00:09:42.000 --> 00:09:45.934
to do a few remarks.

301
00:09:45.934 --> 00:09:48.934
Two data set that we will be
using today don't include
non-detect.

302
00:09:48.934 --> 00:09:52.934
So we won't be using them today
in today's demonstration. But,
as

303
00:09:52.934 --> 00:09:57.934
many of you are facing that, I
want to help

304
00:09:57.934 --> 00:10:02.934
a little bit how to deal with
them and trend analysis.

305
00:10:02.934 --> 00:10:06.934
So, the first thing that I want
to highlight is that the time

306
00:10:06.934 --> 00:10:11.934
detect create problems and they
should be removed before the
analysis.

307
00:10:11.934 --> 00:10:15.934
The reason for this really high
non-detect and I mean

308
00:10:15.934 --> 00:10:21.934
non-detect closest to the really
high detect value,

309
00:10:21.934 --> 00:10:27.934
the limitation of the laboratory
to

310
00:10:27.934 --> 00:10:30.934
perform analysis. They need to
do to dilute the samples many
times

311
00:10:30.934 --> 00:10:35.934
and especially when they analyze
simultaneously, they need to
make

312
00:10:35.934 --> 00:10:40.934
a compromise between the
dilution and the possible

313
00:10:40.934 --> 00:10:49.934
contamination of the analytical
system. They can't always go
back

314
00:10:49.934 --> 00:10:50.934
if it's a high non-detect.

315
00:10:50.934 --> 00:10:56.934
Can't always go back and read an
alliance at the last

316
00:10:56.934 --> 00:11:01.934
dilution because they may
contaminate the system. Or it
can also because

317
00:11:01.934 --> 00:11:07.934
the interference with some other
contaminants present in the

318
00:11:07.934 --> 00:11:08.934
sample.

319
00:11:08.934 --> 00:11:13.934
So, in this case, if you see the
really high non-detect, a good
practice

320
00:11:13.934 --> 00:11:17.934
is to reach out

321
00:11:17.934 --> 00:11:22.934
to the laboratory and get a
little bit of insight how the
samples were

322
00:11:22.934 --> 00:11:32.934
analyzed. So, to rules of them
how to treat the non-detect.

323
00:11:32.934 --> 00:11:36.934
If they are greater than highest
attempt, you are quite safe to
reject

324
00:11:36.934 --> 00:11:43.000
them. If you have non-detect

325
00:11:43.000 --> 00:11:46.934
about 10 times higher than the
lowest detected

326
00:11:46.934 --> 00:11:51.934
value, use judgment and you may
reach to the laboratory or just

327
00:11:51.934 --> 00:11:57.934
do some investigation to get a
feel for how those not

328
00:11:57.934 --> 00:11:59.934
detect where obtained.

329
00:11:59.934 --> 00:12:05.934
Then make a decision based on
that if you want to reject. When
you

330
00:12:05.934 --> 00:12:11.934
have a data set with non-detect,
you can use one of the

331
00:12:11.934 --> 00:12:13.934
substitution method.

332
00:12:13.934 --> 00:12:20.934
The two common substitution
method you can substitute the
value with

333
00:12:20.934 --> 00:12:26.934
half report or have of the
detection limit

334
00:12:26.934 --> 00:12:34.934
but this makes sense to do only
when you have

335
00:12:34.934 --> 00:12:35.934
less than 10 or 15% of the
non-detect.

336
00:12:35.934 --> 00:12:38.934
When you are dealing with a lot
of not detect in your dataset, I

337
00:12:38.934 --> 00:12:45.000
strongly suggest that you work
with statistician and consult

338
00:12:45.000 --> 00:12:50.000
how you want to go about that
kind of data set.

339
00:12:50.000 --> 00:12:56.000
So, this is about non-detect.
Let's go to time series

340
00:12:56.000 --> 00:13:02.000
plot for our first data set. I
am going to open

341
00:13:02.000 --> 00:13:08.000
my data set

342
00:13:08.000 --> 00:13:18.000
so and I will

343
00:13:18.000 --> 00:13:19.000
use those the first two columns.

344
00:13:19.000 --> 00:13:24.000
The index of the column is zero
and one. Please see we have data

345
00:13:24.000 --> 00:13:28.000
set for fuel as well as and here
is the manganese value

346
00:13:28.000 --> 00:13:32.000
concentration in groundwater.
These data sets are really

347
00:13:32.000 --> 00:13:37.000
nice to show you the capability
of ProUCL to group

348
00:13:37.000 --> 00:13:43.067
parameters. Many times you want
to display like

349
00:13:43.067 --> 00:13:47.000
several, you want to compare
several wells. I am going

350
00:13:47.000 --> 00:13:52.000
to do this how. The time series
plot is found on the statistical

351
00:13:52.000 --> 00:13:56.000
test trend analysis and here on
the bottom

352
00:13:56.000 --> 00:14:05.000
we have time series plot and I
will select this time just data

353
00:14:05.000 --> 00:14:10.000
so in measured data, we report
our concentration of

354
00:14:10.000 --> 00:14:16.000
the contaminant and here we have
this grouping,

355
00:14:16.000 --> 00:14:22.000
select group column. I will
click at this little row here
and

356
00:14:22.000 --> 00:14:25.000
select well I.D.

357
00:14:25.000 --> 00:14:28.000
Let's look under options.

358
00:14:28.000 --> 00:14:34.000
We have event label here, those
labels can be changed however

359
00:14:34.000 --> 00:14:37.000
you want.

360
00:14:37.000 --> 00:14:45.000
We can change that but one thing
I

361
00:14:45.000 --> 00:14:49.000
want to do at this point is I
went to click this little box
here, group

362
00:14:49.000 --> 00:14:52.000
and okay. Okay.

363
00:14:52.000 --> 00:14:58.000
Here we have this nice plot
comparing three wells on the

364
00:14:58.000 --> 00:15:03.000
same graph. Really nice feature,
really easy to do

365
00:15:03.000 --> 00:15:06.000
in UCL.

366
00:15:06.000 --> 00:15:11.000
You may want to use this feature
to compare contaminants

367
00:15:11.000 --> 00:15:16.000
and if you want to do that, then
you need to be aware of the

368
00:15:16.000 --> 00:15:23.000
scaling may be an issue if you
have contaminants in

369
00:15:23.000 --> 00:15:28.000
foreign concentration ranges.
You just need to play a little
bit of

370
00:15:28.000 --> 00:15:33.000
that option.

371
00:15:33.000 --> 00:15:37.000
So now I want to open our main
data set, the other one, that we

372
00:15:37.000 --> 00:15:46.000
will use from now on for most of
the presentation. So this is

373
00:15:46.000 --> 00:15:54.000
trend MW real data. I have

374
00:15:54.000 --> 00:15:58.000
a mark because I cheated a
little bit for the
demonstration. I'm

375
00:15:58.000 --> 00:16:04.000
going to do the time series plot
on time and

376
00:16:04.000 --> 00:16:10.000
MW 28 variable. Just a quick

377
00:16:10.000 --> 00:16:13.000
look at this time and date.

378
00:16:13.000 --> 00:16:17.000
We have a few instances here. We
have repeated measurement

379
00:16:17.000 --> 00:16:20.000
of the same sampling event.

380
00:16:20.000 --> 00:16:24.000
So, if you have those, make sure
you put

381
00:16:24.000 --> 00:16:33.000
in time: the same annotation.

382
00:16:33.000 --> 00:16:39.000
Now statistical test, time
analysis,

383
00:16:39.000 --> 00:16:48.000
this time and will use the event
so under events, it will

384
00:16:48.000 --> 00:16:54.000
come with time and date and
under measured data,

385
00:16:54.000 --> 00:16:57.000
MW 28.

386
00:16:57.000 --> 00:17:05.000
We don't have grouping parameter
now.

387
00:17:05.000 --> 00:17:08.000
The options can stay the same,
we don't need to put up the
graphs

388
00:17:08.000 --> 00:17:11.000
now because we don't have at.
Okay.

389
00:17:11.000 --> 00:17:20.000
So this is our time series plot
now.

390
00:17:20.000 --> 00:17:25.000
So, before we jump in front of
statistical analysis, I just

391
00:17:25.000 --> 00:17:31.000
want to refresh some statistical

392
00:17:31.000 --> 00:17:32.000
totems.

393
00:17:32.000 --> 00:17:35.000
This is where many
non-statistician specialist
struggle. And trend analysis,

394
00:17:35.000 --> 00:17:43.067
we always deal with two
variables and our data are
collected in

395
00:17:43.067 --> 00:17:49.000
time intervals. So, our variable
in interest, this is the
contaminant

396
00:17:49.000 --> 00:17:58.000
in most cases, is called a
dependent variable.

397
00:17:58.000 --> 00:18:02.000
To plot it is on the y-axis.

398
00:18:02.000 --> 00:18:08.000
Here is contaminant, dependent
variable. The other variable

399
00:18:08.000 --> 00:18:12.000
entities normally set by the
design.

400
00:18:12.000 --> 00:18:18.000
So basically, beforehand, we
determined time intervals

401
00:18:18.000 --> 00:18:22.000
let's say when we would sample
groundwater. These are variables

402
00:18:22.000 --> 00:18:28.000
is displayed on the X axis and
it is called independent or
exploratory

403
00:18:28.000 --> 00:18:31.000
variable.

404
00:18:31.000 --> 00:18:36.000
Just so what we have those
totems refreshed.

405
00:18:36.000 --> 00:18:41.000
So and now let's go into our
statistical test for trend
analysis. The first

406
00:18:41.000 --> 00:18:47.000
test that we will look at is a
non-parametric

407
00:18:47.000 --> 00:18:56.000
test.

408
00:18:56.000 --> 00:19:02.000
So I will use this trend MW data
statistical test

409
00:19:02.000 --> 00:19:08.000
trend analysis, this is where

410
00:19:08.000 --> 00:19:17.000
we will find it. The benefit of
basically both

411
00:19:17.000 --> 00:19:21.000
nonparametric tests Mann-Kendall
and Theil-Sen Line Test if they

412
00:19:21.000 --> 00:19:26.000
don't have any underlying
assumptions

413
00:19:26.000 --> 00:19:27.000
about the distribution of the
data.

414
00:19:27.000 --> 00:19:30.000
We don't need to think about
distribution, we don't need to
worry about transforming

415
00:19:30.000 --> 00:19:35.000
the daytime but it is the first
step in trend analysis to do
those

416
00:19:35.000 --> 00:19:43.934
tests. So, let's look now at the
Mann-Kendall test.

417
00:19:43.934 --> 00:19:49.934
Measured value MW 28.

418
00:19:49.934 --> 00:19:55.934
One thing I want to highlight
here is that the data you can
them

419
00:19:55.934 --> 00:20:00.934
in time and do need to pay
attention that you

420
00:20:00.934 --> 00:20:04.934
organize them that way when you
and put them in the table. So
that

421
00:20:04.934 --> 00:20:07.934
they are ordered by time.

422
00:20:07.934 --> 00:20:13.934
We don't have grouping parameter
so you select

423
00:20:13.934 --> 00:20:18.934
your confidence levels the
standard confidence level is
95%.

424
00:20:18.934 --> 00:20:24.934
We will disclaim graphics, we

425
00:20:24.934 --> 00:20:29.934
can latest regression line is
something that

426
00:20:29.934 --> 00:20:35.934
helps visualize what is going on
on the data. Click

427
00:20:35.934 --> 00:20:41.934
okay. Okay. Here is our table
with the results. We

428
00:20:41.934 --> 00:20:46.934
have some general

429
00:20:46.934 --> 00:20:50.934
information about the data and
the test here in the general
statistics

430
00:20:50.934 --> 00:20:56.934
in the next section. And the
bottom, we see the result of the
Mann-Kendall

431
00:20:56.934 --> 00:21:01.934
analysis. So here is the
statistics, the

432
00:21:01.934 --> 00:21:07.934
values were really small and we
have this message

433
00:21:07.934 --> 00:21:13.934
helping us to plot the results.
So now let's

434
00:21:13.934 --> 00:21:16.934
look at this trend test
graphics.

435
00:21:16.934 --> 00:21:19.934
Here is the plot.

436
00:21:19.934 --> 00:21:24.934
It is again like the jacket to
plot time

437
00:21:24.934 --> 00:21:31.934
series plot and then I added
this regression line, which
helps to

438
00:21:31.934 --> 00:21:34.934
visualize what is going on with
the data and on the right-hand
side,

439
00:21:34.934 --> 00:21:40.934
we have the Mann-Kendall test
table and

440
00:21:40.934 --> 00:21:44.934
the interpretation. Statistical
significant evidence

441
00:21:44.934 --> 00:21:49.934
of decreasing trend in this
data.

442
00:21:49.934 --> 00:21:54.934
So the Mann-Kendall test only
gives us an idea about the

443
00:21:54.934 --> 00:21:59.934
modernist city of the trend.
Basically what does that mean is
if the trend

444
00:21:59.934 --> 00:22:03.934
consistently decreases or
increases.

445
00:22:03.934 --> 00:22:08.934
It doesn't give us any
information about slope of the
trend. So, the

446
00:22:08.934 --> 00:22:13.934
rate of the which the
contaminant concentration is

447
00:22:13.934 --> 00:22:15.934
increasing or decreasing.

448
00:22:15.934 --> 00:22:21.934
So, one benefit of this test is
that we can perform it with

449
00:22:21.934 --> 00:22:24.934
as little as four data points.
You need to know

450
00:22:24.934 --> 00:22:29.934
that if you have a really small
data set, the power of this test

451
00:22:29.934 --> 00:22:34.934
is very low.

452
00:22:34.934 --> 00:22:39.934
So the test may not detect the
trend, even if the trend is
present.

453
00:22:39.934 --> 00:22:44.000
But let Satan groundwater
analysis we keep disabling

454
00:22:44.000 --> 00:22:48.000
kinetic points. Over time, we
are gathering more and more
evidence

455
00:22:48.000 --> 00:22:53.000
and basically increasing the
power of these tests whilst we
are adding

456
00:22:53.000 --> 00:23:05.000
the data. So, eventually the
test will detect it.

457
00:23:05.000 --> 00:23:11.000
So this is about the
Mann-Kendall

458
00:23:11.000 --> 00:23:16.000
test. Now we can go to the next
step

459
00:23:16.000 --> 00:23:20.000
in to do so we see here the
decreasing trend, now we can go
to the next

460
00:23:20.000 --> 00:23:28.000
step, perform Theil-Sen Line
Test. This gives us a little bit
more

461
00:23:28.000 --> 00:23:32.000
information. It does give us
some idea about the

462
00:23:32.000 --> 00:23:38.000
rate of the decay of the
contaminant and the slope

463
00:23:38.000 --> 00:23:42.067
of it. So, I am selecting again
my data set,

464
00:23:42.067 --> 00:23:46.000
going to statistical test, trend
analysis,

465
00:23:46.000 --> 00:23:52.000
Theil-Sen Line Test, we need to
again

466
00:23:52.000 --> 00:23:58.000
select variables, time under
options

467
00:23:58.000 --> 00:24:06.000
let's display Theil-Sen Line

468
00:24:06.000 --> 00:24:11.000
Test. This is a nice feature.
Previously I printed out

469
00:24:11.000 --> 00:24:17.000
that the data set contains a few
events when several

470
00:24:17.000 --> 00:24:22.000
samples were obtained at the
same interval.

471
00:24:22.000 --> 00:24:25.000
Here we have 705 twice, 90 six
twice.

472
00:24:25.000 --> 00:24:31.000
1535 twice. It was also two
samples

473
00:24:31.000 --> 00:24:35.000
were taken.

474
00:24:35.000 --> 00:24:40.000
When you click this box, the
results will be average. And
this is really

475
00:24:40.000 --> 00:24:46.000
important the Theil-Sen test
only can handle one

476
00:24:46.000 --> 00:24:52.000
data point sampling event.

477
00:24:52.000 --> 00:25:01.000
So let's see now. Analysis

478
00:25:01.000 --> 00:25:07.000
so immediate advantage of this
test is that

479
00:25:07.000 --> 00:25:12.000
we can estimate slope as opposed
to just

480
00:25:12.000 --> 00:25:18.000
the identification of increasing
or decreasing trend with
Mann-Kendall

481
00:25:18.000 --> 00:25:23.000
test. Since slope is negative,

482
00:25:23.000 --> 00:25:32.000
let me just pull down the
results, so here we have

483
00:25:32.000 --> 00:25:37.000
the values. Slope is negative,
the filter is really

484
00:25:37.000 --> 00:25:43.067
small. That confirms the
Mann-Kendall test.

485
00:25:43.067 --> 00:25:49.000
One thing about this method it
is robust against extreme
values.

486
00:25:49.000 --> 00:25:54.000
What it does is Theil-Sen test
finds the medium,

487
00:25:54.000 --> 00:26:00.000
medium slope and divides the
points we have available

488
00:26:00.000 --> 00:26:04.000
into two sections so half of
them

489
00:26:04.000 --> 00:26:10.000
will be above the line, half of
them will be below the line.

490
00:26:10.000 --> 00:26:15.000
Because of that, it is robust
against a couple of those
values.

491
00:26:15.000 --> 00:26:21.000
Basically, it is the formative
to ordinary square

492
00:26:21.000 --> 00:26:25.000
regression. The intent of this
test is it

493
00:26:25.000 --> 00:26:33.000
can only handle one observation
at sampling point.

494
00:26:33.000 --> 00:26:38.000
So here is now the spot. This is
the regression line on a median

495
00:26:38.000 --> 00:26:42.067
slope. This divides the points
into sections.

496
00:26:42.067 --> 00:26:49.000
Half of them below, half of them
about.

497
00:26:49.000 --> 00:26:53.000
Another disadvantage of this
test is that it does require a
fair amount

498
00:26:53.000 --> 00:26:59.000
of data to provide a reliable
conclusion.

499
00:26:59.000 --> 00:27:05.000
When it starts with the

500
00:27:05.000 --> 00:27:10.000
Mann-Kendall test, we can use it
to confirm the sign of the
slopes

501
00:27:10.000 --> 00:27:12.000
or do we have a decreasing or
increasing trend

502
00:27:12.000 --> 00:27:16.000
and gives us additional
information about the rate of
the change in

503
00:27:16.000 --> 00:27:27.000
contaminant because it does give
us that estimate of the slope.

504
00:27:27.000 --> 00:27:32.000
So here is now time for a short
break before we go to the
ordinary

505
00:27:32.000 --> 00:27:35.000
list square regression.

506
00:27:35.000 --> 00:27:38.000
So, if we can have a couple of

507
00:27:38.000 --> 00:27:40.000
quiz questions.

508
00:27:40.000 --> 00:27:46.000
>> We will also be taking a
couple of quick question and
answer

509
00:27:46.000 --> 00:28:01.000
session as well.

510
00:28:01.000 --> 00:28:03.000
>> I will remind the audience
there is still an opportunity to
send

511
00:28:03.000 --> 00:28:06.000
in questions using the Q&A
window in the lower corner. We
are going

512
00:28:06.000 --> 00:28:09.000
to turn on a quiz in the
proportion of the screen and
test her knowledge

513
00:28:09.000 --> 00:28:12.000
but there will be questions that
will appear and you can click on

514
00:28:12.000 --> 00:28:15.000
the answer that you think is
correct then we will keep track
of who

515
00:28:15.000 --> 00:28:18.000
is is going well. Our first
question is appearing now.
Remember, just

516
00:28:18.000 --> 00:28:23.000
click on the box you think is
the correct answer.

517
00:28:23.000 --> 00:28:27.000
The quiz score will keep track
and let you know if you got that

518
00:28:27.000 --> 00:28:29.000
right or wrong. The sooner you
pick the right into, the sooner
the higher

519
00:28:29.000 --> 00:28:33.000
your score will be. It looks
like we've got some people who
know their

520
00:28:33.000 --> 00:28:49.067
stuff. Moving on to our next
question.

521
00:28:49.067 --> 00:28:55.000
This one seems to be tripping
few people up. A handful of you
got

522
00:28:55.000 --> 00:29:04.000
that one correct. Just reminding
everybody,

523
00:29:04.000 --> 00:29:07.000
with that, we will go ahead and
go back to our split screen view

524
00:29:07.000 --> 00:29:09.000
and take a quick break for
questions.

525
00:29:09.000 --> 00:29:12.000
I am going to start earlier.
This first question comes in
asking if

526
00:29:12.000 --> 00:29:17.000
ProUCL can use J. Crew did
values for trend analysis.

527
00:29:17.000 --> 00:29:30.000
>> Yes, by j coded, I assume you
mean not detect.

528
00:29:30.000 --> 00:29:37.000
I totally understand but in
ProUCL right now, there is not

529
00:29:37.000 --> 00:29:40.000
a great way to have a directly
handled you're not detect
problems. Which

530
00:29:40.000 --> 00:29:46.934
is why we kind of decided he
would give some general rules of
thumb

531
00:29:46.934 --> 00:29:49.934
if you want to take a look at it
basically really, if you are
trying

532
00:29:49.934 --> 00:29:54.934
to do trend analysis and you
have interesting sets of not
detect in

533
00:29:54.934 --> 00:30:00.934
your data, you should probably
consult

534
00:30:00.934 --> 00:30:03.934
a statistician. On how your data
is set up, how many detect in
the

535
00:30:03.934 --> 00:30:05.934
mood to have, what percent of
your data are not detect, all
those different

536
00:30:05.934 --> 00:30:09.934
confounding factors, you are
going to get different answers
each time

537
00:30:09.934 --> 00:30:14.934
and I think Jean, this also
falls into the

538
00:30:14.934 --> 00:30:19.934
second question wouldn't be
substitution method

539
00:30:19.934 --> 00:30:23.934
of doing half the deduction
limit or whatnot artificially

540
00:30:23.934 --> 00:30:28.934
alter the distribution of the
data and

541
00:30:28.934 --> 00:30:33.934
decreased the variance. The
answer is yes, that totally
could do that.

542
00:30:33.934 --> 00:30:39.934
With not detect, you are making

543
00:30:39.934 --> 00:30:41.934
the best of a bad situation.

544
00:30:41.934 --> 00:30:50.934
How to do that again is going to
come down to a case-by-case
basis.

545
00:30:50.934 --> 00:30:52.934
Not to beat a dead horse here if
you're going to want to consult

546
00:30:52.934 --> 00:30:57.934
a statistician if there isn't
something that is immediately
solvable to

547
00:30:57.934 --> 00:30:59.934
you. And, in the case of lots of
not detect, multiple detect
limit,

548
00:30:59.934 --> 00:31:04.934
that is not easily solvable
situation I would say.

549
00:31:04.934 --> 00:31:07.934
So, sorry to kind of breeze over
all of those not detect
questions

550
00:31:07.934 --> 00:31:13.934
with basically the answer is
consult a statistician.

551
00:31:13.934 --> 00:31:23.934
>> Okay.

552
00:31:23.934 --> 00:31:26.934
Going to bounce back to some
questions about dates and times
in the data

553
00:31:26.934 --> 00:31:27.934
set you have been working with.

554
00:31:27.934 --> 00:31:30.934
One of the attendees noted that
the data set didn't appear to
have

555
00:31:30.934 --> 00:31:32.934
a column for the date of the
sampling event and we were
wondering if ProUCL

556
00:31:32.934 --> 00:31:35.934
automatically convert the date
to a corresponding event number
or

557
00:31:35.934 --> 00:31:37.934
is it calculating the number of
days in between these event?

558
00:31:37.934 --> 00:31:43.000
>> I believe Polona mentioned
earlier but just in case it got
messed,

559
00:31:43.000 --> 00:31:47.934
ProUCL needs to have numbers in
the date

560
00:31:47.934 --> 00:31:56.934
column. Because of if you want
to put in the March third 2020.

561
00:31:56.934 --> 00:31:59.934
like that is not going to work,
to were going to need to convert

562
00:31:59.934 --> 00:32:04.934
it to a numeric representation
of date or time or sampling
event.

563
00:32:04.934 --> 00:32:14.934
You can either do that by just
saying a set number of weeks
after an

564
00:32:14.934 --> 00:32:16.934
initial sampling event or days
or months or if you are coming
back

565
00:32:16.934 --> 00:32:22.934
and forth from maybe different
software, you can convert things
to a

566
00:32:22.934 --> 00:32:25.934
set up where is it is cutting a
number of seconds from

567
00:32:25.934 --> 00:32:30.934
a given starting point. That is
going to give you a

568
00:32:30.934 --> 00:32:34.934
couple of weird x-axis but it
might play a little better if
you are

569
00:32:34.934 --> 00:32:40.934
bobbing back and forth between
our or Python or what have you.

570
00:32:40.934 --> 00:32:47.000
But in ProUCL, door time
variable does

571
00:32:47.000 --> 00:32:51.000
need to be numeric not a
character or data string.

572
00:32:51.000 --> 00:32:58.000
>> If I can just hear that it is
really easy and Excel

573
00:32:58.000 --> 00:33:02.000
to convert the date into a
number of dates, number of days
between

574
00:33:02.000 --> 00:33:06.000
the events.

575
00:33:06.000 --> 00:33:09.000
>> Okay. I did just want to
comment.

576
00:33:09.000 --> 00:33:12.000
I see a lot of the dialogue and
comments that are coming in from

577
00:33:12.000 --> 00:33:20.000
what I believe are statisticians
about how to handle

578
00:33:20.000 --> 00:33:22.000
J coded values and whether they
are estimates or documents that

579
00:33:22.000 --> 00:33:26.000
something is there, we just
can't be certain of what the
value is

580
00:33:26.000 --> 00:33:30.000
as we can to zero it out so I
want to thank everyone that has
been

581
00:33:30.000 --> 00:33:41.000
coming in with comments and
input on how to

582
00:33:41.000 --> 00:33:44.000
handle J coded values. I will
certainly work to take the
speculative. This

583
00:33:44.000 --> 00:33:46.000
could be a topic for a future
webinar in itself. Thanks for
everyone present

584
00:33:46.000 --> 00:33:49.000
in the methods but think of the
interest of time, we have about

585
00:33:49.000 --> 00:33:53.000
50 minutes left, but we
transitioned back to the
presentation. People

586
00:33:53.000 --> 00:34:07.000
pause again for Q&A and will
encourage

587
00:34:07.000 --> 00:34:10.000
those parts bins to get this
questions incomes coming on
input if you haven't

588
00:34:10.000 --> 00:34:13.000
to join us late, if we don't get
your question live, we will be
looking

589
00:34:13.000 --> 00:34:15.000
to get a written response to
what the questions that were
submitted

590
00:34:15.000 --> 00:34:18.000
that we may not have addressed
live during the webinar. With
that, I

591
00:34:18.000 --> 00:34:20.000
will turn it back to Travis and
Fran eight.

592
00:34:20.000 --> 00:34:23.000
>> So, yes, I will speak up from
here again. So, the third option

593
00:34:23.000 --> 00:34:25.000
for trend analysis in ProUCL is
ordinary list square aggression.

594
00:34:25.000 --> 00:34:29.000
This is the most advanced method
of three but gives the most
information.

595
00:34:29.000 --> 00:34:37.000
So, without getting in-depth on
the mass of ordinary list square

596
00:34:37.000 --> 00:34:41.000
regression, what is really
trying to do is to fit

597
00:34:41.000 --> 00:34:46.000
a line between the data point
that maximizes the square
distance

598
00:34:46.000 --> 00:34:51.000
between the points and the line.

599
00:34:51.000 --> 00:34:55.000
So, since this is a little bit
more and then

600
00:34:55.000 --> 00:35:01.000
advanced analysis, I am first
providing you the steps to do
about it.

601
00:35:01.000 --> 00:35:06.000
So, first you select dependent
and independent variables.

602
00:35:06.000 --> 00:35:11.000
Then, fit the regression. Then
you need to evaluate the

603
00:35:11.000 --> 00:35:14.000
regression model that you get.

604
00:35:14.000 --> 00:35:19.000
First to go to the steps, then
you need to

605
00:35:19.000 --> 00:35:24.000
check the assumptions that they
were met. There are three
assumptions,

606
00:35:24.000 --> 00:35:28.000
we will explain them a little
bit later but you need to check

607
00:35:28.000 --> 00:35:34.000
them, and then based on the
evaluation of the model and
assumptions, you

608
00:35:34.000 --> 00:35:42.067
interpret your results. Okay,
let's go ahead and

609
00:35:42.067 --> 00:35:46.000
analyze our data set with
ordinary list square regression
features.

610
00:35:46.000 --> 00:35:52.000
I am selecting this trend MW
real data set going

611
00:35:52.000 --> 00:35:56.000
to do a statistical test and
ordinary list square regression

612
00:35:56.000 --> 00:36:03.000
is just about trend analysis
option. Now

613
00:36:03.000 --> 00:36:09.000
here we have this dependent
variable, remember, dependent
variable is

614
00:36:09.000 --> 00:36:17.000
our variable of concern. The
contaminant MW

615
00:36:17.000 --> 00:36:26.000
28. Then we have independent
variable, which is our time and
date.

616
00:36:26.000 --> 00:36:32.000
Under options, to

617
00:36:32.000 --> 00:36:36.000
select the confidence level. You
want to

618
00:36:36.000 --> 00:36:42.067
apply the regression people with
regression results. We will
display

619
00:36:42.067 --> 00:36:49.000
a diagnostic as well. Just take
a look at it.

620
00:36:49.000 --> 00:36:53.000
You want to display a regression
plot. On

621
00:36:53.000 --> 00:36:57.000
regression plot, we also want
confidence prediction interval
for this demonstration.

622
00:36:57.000 --> 00:37:02.000
So, okay. Okay.

623
00:37:02.000 --> 00:37:08.000
Here we display the results. Let
first pop up the regression.

624
00:37:08.000 --> 00:37:14.000
This is now the regression line.

625
00:37:14.000 --> 00:37:23.000
The gray dotted line in the
middle is and were fitted
regression line.

626
00:37:23.000 --> 00:37:28.000
The two green lines on at this
point are confidence intervals
of

627
00:37:28.000 --> 00:37:35.000
the mean and the two plants are
protection

628
00:37:35.000 --> 00:37:39.000
intervals. Let me explain the
meaning of the confidence and
protection

629
00:37:39.000 --> 00:37:44.000
interval in a plain language as
I can do that. So, 95 confidence

630
00:37:44.000 --> 00:37:50.000
interval if we take a bunch of
samples

631
00:37:50.000 --> 00:37:54.000
of time, let's say here we have
a

632
00:37:54.000 --> 00:38:02.000
point of 160. if we would take
several samples of that time and
then

633
00:38:02.000 --> 00:38:07.000
average them, we expect the
meaning of those samples to be
within the

634
00:38:07.000 --> 00:38:12.000
green lines 95% of the time.

635
00:38:12.000 --> 00:38:16.000
So, but then on the other hand,
if we only take one random
sample

636
00:38:16.000 --> 00:38:22.000
the same time, we expect

637
00:38:22.000 --> 00:38:27.000
the value of that sample to be
95% of the time

638
00:38:27.000 --> 00:38:32.000
between the redlines. We are 95%
confident that its value will be

639
00:38:32.000 --> 00:38:38.000
within the redline.

640
00:38:38.000 --> 00:38:43.067
So, here on this plot, we have
also the results of the
regression

641
00:38:43.067 --> 00:38:47.000
analysis. At this point, I will
move back to the table because
we

642
00:38:47.000 --> 00:38:59.000
have a little bit more
information there. So, let's
look.

643
00:38:59.000 --> 00:39:02.000
Here in the middle, we have
regression

644
00:39:02.000 --> 00:39:05.000
estimates and interference
table.

645
00:39:05.000 --> 00:39:11.000
So, in regression, we are
estimating two parameters.
Intercept

646
00:39:11.000 --> 00:39:14.000
and slope of the regression
line.

647
00:39:14.000 --> 00:39:20.000
These are the values for the
estimates

648
00:39:20.000 --> 00:39:24.000
and on the slide, to hear see
the table

649
00:39:24.000 --> 00:39:29.000
that is a regression model,
regression equation. Our
contaminant is equal

650
00:39:29.000 --> 00:39:35.000
to intercept, 2164 minus

651
00:39:35.000 --> 00:39:45.934
a decreasing trend is negative
slope minus one point 637.

652
00:39:45.934 --> 00:39:50.934
Ordinary list square regression
gives us more information then
the

653
00:39:50.934 --> 00:39:57.934
Theil-Sen. It gives us the
estimate of the

654
00:39:57.934 --> 00:39:59.934
ever of those parameters.
Parameter estimates. Here is the
estimate

655
00:39:59.934 --> 00:40:04.934
of the error and then it
performs the t-test for
significant of those

656
00:40:04.934 --> 00:40:06.934
parameters.

657
00:40:06.934 --> 00:40:12.934
Basically, the value is how
strong is the relationship
between

658
00:40:12.934 --> 00:40:18.934
observed valuables. With the
really care about

659
00:40:18.934 --> 00:40:27.934
is the P-Value

660
00:40:27.934 --> 00:40:30.934
of the slope. This is a measure
of

661
00:40:30.934 --> 00:40:34.934
how good is the model. How
strong is the relationship
between two

662
00:40:34.934 --> 00:40:38.934
parameters. P-Value is really
small in this case. We have a
strong relationship

663
00:40:38.934 --> 00:40:44.934
between the two variables. If
P-Value is large, what does

664
00:40:44.934 --> 00:40:47.934
this mean?

665
00:40:47.934 --> 00:40:49.934
It means slope is zero so
basically the slope

666
00:40:49.934 --> 00:40:55.934
is constant, there is no
relationship between the
variables.

667
00:40:55.934 --> 00:41:02.934
So slope is negative.

668
00:41:02.934 --> 00:41:06.934
That is nice, but is agreement
between the previous two tests,
which we

669
00:41:06.934 --> 00:41:11.934
hope that will be the outcome.
And we can move on from this
point down

670
00:41:11.934 --> 00:41:13.934
to the next table.

671
00:41:13.934 --> 00:41:19.934
So, in the second table, we have
another table put we have

672
00:41:19.934 --> 00:41:25.934
sources of variation looking
down in regression,

673
00:41:25.934 --> 00:41:31.934
which means how much of the
variability

674
00:41:31.934 --> 00:41:35.934
is explained with the model that
we just don't. So, with the
equation

675
00:41:35.934 --> 00:41:39.934
we just informed it the ever
totem is explained variation.

676
00:41:39.934 --> 00:41:44.934
And then this total is basically
the sum of two. It is the total

677
00:41:44.934 --> 00:41:49.934
variation. Here on the
right-hand side we have a test
for regression

678
00:41:49.934 --> 00:41:55.934
and here I want to mention this
is

679
00:41:55.934 --> 00:42:01.934
basically the only part really
and regression analysis

680
00:42:01.934 --> 00:42:07.934
that is sensitive to
non-normality of the residuals.
I

681
00:42:07.934 --> 00:42:13.934
will explain what residuals are
in a moment I

682
00:42:13.934 --> 00:42:16.934
want to highlight there is
really no requirement for
normality of

683
00:42:16.934 --> 00:42:22.934
our data. But all assumptions
are about the residuals.

684
00:42:22.934 --> 00:42:26.934
So, P-Value is a small, which
again confirms that

685
00:42:26.934 --> 00:42:31.934
our model shows a strong
relationship between the

686
00:42:31.934 --> 00:42:37.934
two variables. Here on the
bottom, we have our

687
00:42:37.934 --> 00:42:46.000
square and probably most of you
are very familiar bull

688
00:42:46.000 --> 00:42:51.000
with that. Statisticians are
making the decisions based on
the our square.

689
00:42:51.000 --> 00:42:54.000
Adjusted our square is a little
bit more conservative.

690
00:42:54.000 --> 00:42:59.000
That is why I am suggesting you
to basically use

691
00:42:59.000 --> 00:43:05.000
adjusting our square as a
measure, another measure

692
00:43:05.000 --> 00:43:08.000
to evaluate the model.

693
00:43:08.000 --> 00:43:17.000
What it means, this number is
basically 83.3%

694
00:43:17.000 --> 00:43:22.000
in this case of variation is
explained by this regression
model. Basically,

695
00:43:22.000 --> 00:43:28.000
the meaning is present
variability explained by the
regression

696
00:43:28.000 --> 00:43:31.000
model.

697
00:43:31.000 --> 00:43:35.000
It is not the best measure. We
still need to look into the
assumptions

698
00:43:35.000 --> 00:43:41.000
but it gives us some ideas about
how good is the model.

699
00:43:41.000 --> 00:43:46.000
So, main advantages of the
ordinary list

700
00:43:46.000 --> 00:43:51.000
square regression arm is a
pretty standard approach

701
00:43:51.000 --> 00:43:57.000
so most of the users are
familiar with it.

702
00:43:57.000 --> 00:44:01.000
It provides more information
than either

703
00:44:01.000 --> 00:44:06.000
of the nonparametric tests.
Besides the estimate for slope
and intercept,

704
00:44:06.000 --> 00:44:10.000
we also get our information
about the uncertainty

705
00:44:10.000 --> 00:44:20.000
of both parameters. Regression
analysis also provides us
confidence

706
00:44:20.000 --> 00:44:23.000
Benz. There's confidence bends
can be used to determine
compliance

707
00:44:23.000 --> 00:44:28.000
with the extent of cleanup
levels, even with the trend is
apparent.

708
00:44:28.000 --> 00:44:32.000
Even when concentration of the
pollutant is changing, we

709
00:44:32.000 --> 00:44:38.000
can still use those confidence
bands to compare

710
00:44:38.000 --> 00:44:44.000
with standards and make
conclusions.

711
00:44:44.000 --> 00:44:49.000
So, the downside of the
regression analysis is that it
is really sensitive

712
00:44:49.000 --> 00:44:57.000
to extreme outliers.

713
00:44:57.000 --> 00:45:00.000
And underlying assumptions.
There are three assumptions for
residuals

714
00:45:00.000 --> 00:45:04.000
that we need to check to make
the final conclusion about the
result

715
00:45:04.000 --> 00:45:10.000
of the regression model. So, in
the next step, I will

716
00:45:10.000 --> 00:45:16.000
show you how to go about the the
residual analysis.

717
00:45:16.000 --> 00:45:21.000
First, let's explain what
residuals are. If we

718
00:45:21.000 --> 00:45:26.000
take our regression equation
that we talked about a little
bit ago

719
00:45:26.000 --> 00:45:32.000
and Clark in time and date,

720
00:45:32.000 --> 00:45:37.000
the results that we get our cold
fitted values were model
outcomes.

721
00:45:37.000 --> 00:45:43.067
Because our model doesn't
perfectly fit the data, we

722
00:45:43.067 --> 00:45:46.000
said before this model for our
training data

723
00:45:46.000 --> 00:45:49.000
explained that 83% of variation.

724
00:45:49.000 --> 00:45:54.000
So the fitted values are
different from

725
00:45:54.000 --> 00:45:57.000
the observations.

726
00:45:57.000 --> 00:46:01.000
So, the residuals are actually
the difference between the two.

727
00:46:01.000 --> 00:46:07.000
Observed values minus fitted
values gives us residuals.

728
00:46:07.000 --> 00:46:15.000
Regression analysis give us this
bottom table, regression

729
00:46:15.000 --> 00:46:21.000
table, where we have the

730
00:46:21.000 --> 00:46:24.000
fitted values and residuals.

731
00:46:24.000 --> 00:46:27.000
It calculates that for us.

732
00:46:27.000 --> 00:46:32.000
There are three assumptions that
we need to

733
00:46:32.000 --> 00:46:34.000
check.

734
00:46:34.000 --> 00:46:39.000
First, they are all related to
residuals. The first one is
constant

735
00:46:39.000 --> 00:46:43.067
variance, or total elasticity

736
00:46:43.067 --> 00:46:46.000
of residuals. We need to confirm
the residuals are independent
into

737
00:46:46.000 --> 00:46:51.000
that the residuals are normally
distributed.

738
00:46:51.000 --> 00:46:56.000
So, remember there is no
requirement for data itself to
be normally distributed

739
00:46:56.000 --> 00:47:04.000
but this requirement needs to
hold for the residuals.

740
00:47:04.000 --> 00:47:10.000
Many times, checking these
assumptions is overlooked.

741
00:47:10.000 --> 00:47:15.000
We do need to check then. So, I
will

742
00:47:15.000 --> 00:47:24.000
so you know how to do that. We
need to do little bit of extra
work.

743
00:47:24.000 --> 00:47:30.000
We need to copy this part of the

744
00:47:30.000 --> 00:47:36.000
regression table. The easiest
way to copy and paste it in
Excel and

745
00:47:36.000 --> 00:47:40.000
open it again in Excel. When you
do that, sometimes

746
00:47:40.000 --> 00:47:46.000
when you paste the data in the
XL, you have

747
00:47:46.000 --> 00:47:51.000
leading blank spaces that they
don't like

748
00:47:51.000 --> 00:47:57.000
to do the analysis because that
makes the values not numerical.

749
00:47:57.000 --> 00:48:03.000
So we provided you on the slide
help with how to remove

750
00:48:03.000 --> 00:48:07.000
those leading spices. They are
listed there but I am not going

751
00:48:07.000 --> 00:48:12.000
to for the six of them, I am not
going to demonstrate it.

752
00:48:12.000 --> 00:48:18.000
I did that before so we can just
move on with ProUCL

753
00:48:18.000 --> 00:48:24.000
features. So I am opening now
the file, which

754
00:48:24.000 --> 00:48:36.000
is residuals.

755
00:48:36.000 --> 00:48:42.067
>> You will need to switch it to
XL.

756
00:48:42.067 --> 00:48:48.000
>> Thank you Travis. It is the

757
00:48:48.000 --> 00:48:51.000
XLS file.

758
00:48:51.000 --> 00:48:59.000
Residual analysis, this one,
right?

759
00:48:59.000 --> 00:49:03.000
Here we have the same table that
we put into

760
00:49:03.000 --> 00:49:09.000
the ProUCL. What we do now, we
will basically use of the time

761
00:49:09.000 --> 00:49:12.000
series plot to make some plots
of those residuals.

762
00:49:12.000 --> 00:49:18.000
The first plot that I will
create is

763
00:49:18.000 --> 00:49:24.000
the residuals for fitted value.
I am selecting

764
00:49:24.000 --> 00:49:34.000
this data set, statistical test,
trend analysis, time series
plot.

765
00:49:34.000 --> 00:49:40.000
We will need the event data now
plodding

766
00:49:40.000 --> 00:49:45.934
on the event so under X axis, I
am plodding fitted values.

767
00:49:45.934 --> 00:49:49.934
There by axis, I am plotting our
residuals.

768
00:49:49.934 --> 00:49:55.934
And okay.

769
00:49:55.934 --> 00:50:00.934
So now I will do this plot a
little bit nicer so that

770
00:50:00.934 --> 00:50:04.934
we are not too distracted. I am
going into properties,

771
00:50:04.934 --> 00:50:10.934
series, I want this to be
scatterplot

772
00:50:10.934 --> 00:50:15.934
and I am going to make bullets a
little bit bigger so it will be

773
00:50:15.934 --> 00:50:21.934
easier to see. Okay, this is now
our plot.

774
00:50:21.934 --> 00:50:26.934
I can just show you real quickly
how to change

775
00:50:26.934 --> 00:50:29.934
the title.

776
00:50:29.934 --> 00:50:34.934
You right-click and this little
menu pops up and

777
00:50:34.934 --> 00:50:40.934
you can have it

778
00:50:40.934 --> 00:50:43.934
residuals, fitted values.

779
00:50:43.934 --> 00:50:46.934
Okay.

780
00:50:46.934 --> 00:50:52.934
So we want this picture to be,
the scatter of the plot, to

781
00:50:52.934 --> 00:50:53.934
be as random as possible.

782
00:50:53.934 --> 00:50:58.934
This is not completely random
but it is not the worst I have
seen.

783
00:50:58.934 --> 00:51:02.934
In this case, I am going to do
what I would say

784
00:51:02.934 --> 00:51:08.934
variance is roughly constant.
This plot tells us about the

785
00:51:08.934 --> 00:51:14.934
first assumption, the variance
residuals constant.

786
00:51:14.934 --> 00:51:20.934
We will say okay, assumption is

787
00:51:20.934 --> 00:51:23.934
satisfied.

788
00:51:23.934 --> 00:51:31.934
The second plot is to check the
second assumption

789
00:51:31.934 --> 00:51:34.934
for the independent. I will
again create a timeseries plot
of residuals

790
00:51:34.934 --> 00:51:40.934
and this time I will plot them
against the consecutive event
number.

791
00:51:40.934 --> 00:51:49.934
Basically, let me Paul the data.

792
00:51:49.934 --> 00:51:54.934
Timeseries plot again.

793
00:51:54.934 --> 00:51:57.934
This time, event is our
observation.

794
00:51:57.934 --> 00:52:02.934
This column that tells us the
number of the observation.

795
00:52:02.934 --> 00:52:05.934
Observation and residuals.

796
00:52:05.934 --> 00:52:17.934
Again, we will create,
sorry for that.

797
00:52:17.934 --> 00:52:23.934
Observations, residuals. Okay.

798
00:52:23.934 --> 00:52:35.934
Where is the plot?

799
00:52:35.934 --> 00:52:52.000
Try again.

800
00:53:11.000 --> 00:53:16.000
Let me just explain to you. We
will play the same plot. Is on
the

801
00:53:16.000 --> 00:53:22.000
slide. Similar plot, we want
scatterplot to be random and we

802
00:53:22.000 --> 00:53:26.000
don't want to see any unusual
patterns. If you

803
00:53:26.000 --> 00:53:32.000
see on the slide that we
presented, you seem the

804
00:53:32.000 --> 00:53:38.000
second half of this life if
maybe Travis you can

805
00:53:38.000 --> 00:53:43.067
get the green arrow or it is
some kind of curvature. We

806
00:53:43.067 --> 00:53:46.000
don't want those curvatures in
the plot. That is

807
00:53:46.000 --> 00:53:51.000
an indication that we have what
is called black feet. In

808
00:53:51.000 --> 00:53:56.000
this case, the second assumption
is not

809
00:53:56.000 --> 00:54:02.000
completely satisfied. So we can
move on to the third

810
00:54:02.000 --> 00:54:06.000
assumption. The third assumption
is about the normality

811
00:54:06.000 --> 00:54:11.000
of the residuals. So prove for
sprints at the normality of the

812
00:54:11.000 --> 00:54:19.000
residuals, we will do the
goodness of fit test. So,

813
00:54:19.000 --> 00:54:23.000
I will again pull the staple, go
to statistical test, goodness

814
00:54:23.000 --> 00:54:28.000
of fit, we want the one for
normality. And we want

815
00:54:28.000 --> 00:54:33.000
to perform this test on
residuals. Okay. Let's

816
00:54:33.000 --> 00:54:39.000
cross our fingers that this will
work.

817
00:54:39.000 --> 00:54:45.000
So, here we see this point. They
are very close to

818
00:54:45.000 --> 00:54:50.000
the line. This is not perfect
but it is a reasonable line.

819
00:54:50.000 --> 00:54:53.000
If we look at the results of the
normality

820
00:54:53.000 --> 00:54:59.000
test here on the right-hand
side, we see here

821
00:54:59.000 --> 00:55:06.000
in the middle part where I am
circling now it says

822
00:55:06.000 --> 00:55:12.000
data appears normal. So this
test confirms that our residuals
are

823
00:55:12.000 --> 00:55:17.000
normally distributed, so the
third assumption is also

824
00:55:17.000 --> 00:55:20.000
confirmed.

825
00:55:20.000 --> 00:55:25.000
So, to summarize, if any of the
assumptions are

826
00:55:25.000 --> 00:55:31.000
substantially violated and a
list square

827
00:55:31.000 --> 00:55:36.000
regression has a little bit of
variance. If the assumption

828
00:55:36.000 --> 00:55:42.067
is violated, that might indicate
that the trend is either

829
00:55:42.067 --> 00:55:48.000
nonlinear or the magnitude
independent, the main
concentration

830
00:55:48.000 --> 00:55:54.000
level. We have what we call a
lack of fit.

831
00:55:54.000 --> 00:55:58.000
What does it mean is that the
model we have built is not
perfect. We

832
00:55:58.000 --> 00:56:03.000
can improve it but there is an
opportunity to improve

833
00:56:03.000 --> 00:56:09.000
it but when this really matters,
when we want to improve the
model

834
00:56:09.000 --> 00:56:12.000
but this really matters when we
are interested for predictions.

835
00:56:12.000 --> 00:56:16.000
If you just want to evaluate
whether you have a friend to
present, a

836
00:56:16.000 --> 00:56:22.000
downward or upward trend on your
data then

837
00:56:22.000 --> 00:56:27.000
not a perfect model will show
you that. If you want to predict
or

838
00:56:27.000 --> 00:56:32.000
if you want to make conclusions
about the rate of decay,

839
00:56:32.000 --> 00:56:38.000
in that case, you want to
improve the model.

840
00:56:38.000 --> 00:56:42.067
ProUCL is a very limited in what
kind of improvements

841
00:56:42.067 --> 00:56:51.000
it can make. Basically the only
improvement in the model

842
00:56:51.000 --> 00:56:55.000
we can handle and ProUCL is
transforming the data. Even

843
00:56:55.000 --> 00:57:01.000
that is not really encouraged.
We need to make some

844
00:57:01.000 --> 00:57:07.000
extra steps with the data. The
other options are like adding

845
00:57:07.000 --> 00:57:12.000
the variables or adding
polynomial toward hymns but
these are

846
00:57:12.000 --> 00:57:17.000
more advanced techniques when I
strongly recommend you

847
00:57:17.000 --> 00:57:25.000
go to statistician and work with
a statistician if you want to
protect

848
00:57:25.000 --> 00:57:30.000
so before we jump into
transforming the data,

849
00:57:30.000 --> 00:57:35.000
I will demonstrate how you can
do it. I want to give you some
additional

850
00:57:35.000 --> 00:57:40.000
guidelines when you can kind of

851
00:57:40.000 --> 00:57:45.000
get an information from the site
if transformation of the data as

852
00:57:45.000 --> 00:57:47.000
the right way to go.

853
00:57:47.000 --> 00:57:53.000
So, if you have a visible
exponential decay express the
next financial

854
00:57:53.000 --> 00:57:59.000
trend on the timeseries plot and
let me be a consequence of

855
00:57:59.000 --> 00:58:04.000
the first order reaction. In
this case, the

856
00:58:04.000 --> 00:58:06.000
rate of decay of the contaminant
is proportional to

857
00:58:06.000 --> 00:58:08.000
the amount of contaminant
present.

858
00:58:08.000 --> 00:58:14.000
More contaminant present that
you are observing faster is the
decay.

859
00:58:14.000 --> 00:58:19.000
Practical example of this is the

860
00:58:19.000 --> 00:58:25.000
solution of contaminated
groundwater, which no absorption
of it from

861
00:58:25.000 --> 00:58:28.000
the soil.

862
00:58:28.000 --> 00:58:36.000
In this case, when we know these
kinds of reactions

863
00:58:36.000 --> 00:58:39.000
can be present on the site and
receive the exponential pattern

864
00:58:39.000 --> 00:58:45.000
on the data, the data makes
sense.

865
00:58:45.000 --> 00:58:52.000
Another case may be that we have
zero order reaction, which is
expressed

866
00:58:52.000 --> 00:58:55.000
as a linear decay rate.

867
00:58:55.000 --> 00:59:03.000
It is basically the decay rate
is limited by a limiting factor.

868
00:59:03.000 --> 00:59:06.000
It is constant instead of
exponential.

869
00:59:06.000 --> 00:59:12.000
So some examples of zero order
reaction so that maybe

870
00:59:12.000 --> 00:59:20.000
in the media are bio

871
00:59:20.000 --> 00:59:22.000
dictation limiting the toxicity
of contaminant or nutrient
limited

872
00:59:22.000 --> 00:59:29.000
biodegradation or we have maybe
absorbed contaminant that is
distorting

873
00:59:29.000 --> 00:59:34.000
groundwater pollutes contaminant
concentration. So, in this case,

874
00:59:34.000 --> 00:59:40.000
transformation of data won't
help.

875
00:59:40.000 --> 00:59:43.934
But if we see that the model is
not perfect, we need

876
00:59:43.934 --> 00:59:48.934
to take some other approach. So,
options are non-linear fit so

877
00:59:48.934 --> 00:59:53.934
polynomial regression or adding
more exploratory variables

878
00:59:53.934 --> 00:59:59.934
but these are advanced
techniques that need assistance
of a

879
00:59:59.934 --> 01:00:06.934
study to Titian into some better
software with more capabilities.

880
01:00:06.934 --> 01:00:12.934
So, okay. Now let's show how you
can

881
01:00:12.934 --> 01:00:15.934
do transformations.

882
01:00:15.934 --> 01:00:21.934
For transformation, you will
need to go to XL and and

883
01:00:21.934 --> 01:00:28.934
an additional column to your
data whether you are

884
01:00:28.934 --> 01:00:36.934
in XL or some other spreadsheet
software.

885
01:00:36.934 --> 01:00:43.000
So, here in my spreadsheet, I
have already done that. Just

886
01:00:43.000 --> 01:00:47.934
so you want to take a note, I
used the natural logarithm. You
can do

887
01:00:47.934 --> 01:00:51.934
the logarithm as well but I used
the natural logarithm so that
you

888
01:00:51.934 --> 01:00:56.934
can replicate what I am working
on here. So and

889
01:00:56.934 --> 01:01:02.934
now we can repeat the regression
analysis

890
01:01:02.934 --> 01:01:07.934
with this log transformed
variable. There

891
01:01:07.934 --> 01:01:15.934
are some downsides of the
regression with log transformed
data

892
01:01:15.934 --> 01:01:20.934
so it is really probably this is
the time to go to a
statistician.

893
01:01:20.934 --> 01:01:23.934
But for some of you, let me be a
little bit more

894
01:01:23.934 --> 01:01:29.934
advanced. Demonstrate how you
can do that in ProUCL. So, I

895
01:01:29.934 --> 01:01:36.934
will, instead of the variable

896
01:01:36.934 --> 01:01:39.934
in the original skill, I will
use now the log transformed as a
dependent

897
01:01:39.934 --> 01:01:48.934
variable and then the time and
date of independent variable.
Okay?

898
01:01:48.934 --> 01:01:50.934
Before, we will first looked at
the regression plot.

899
01:01:50.934 --> 01:01:55.934
So regression plot here with the
confidence and protection
intervals,

900
01:01:55.934 --> 01:02:01.934
we see a little bit in those

901
01:02:01.934 --> 01:02:07.934
points around the regression
plot but it

902
01:02:07.934 --> 01:02:09.934
is not too bad.

903
01:02:09.934 --> 01:02:13.934
Now let's look at the results of
the regression. Again, I will

904
01:02:13.934 --> 01:02:16.934
switch to the spreadsheet
because it has more information.
So, in

905
01:02:16.934 --> 01:02:22.934
the regression estimate table,
we have intercept

906
01:02:22.934 --> 01:02:25.934
and slope.

907
01:02:25.934 --> 01:02:30.934
Remember now that these are log
transformed data

908
01:02:30.934 --> 01:02:35.934
so the values may not be
straightforward to interpret.

909
01:02:35.934 --> 01:02:41.934
In terms of the rate of decay,
let's say. P-Value's appeared

910
01:02:41.934 --> 01:02:47.000
here are the errors. Let's look
at the P-Value

911
01:02:47.000 --> 01:02:53.000
for slope.

912
01:02:53.000 --> 01:02:55.000
P-Value is a small, which
indicate that there

913
01:02:55.000 --> 01:02:57.000
is a relationship between the
statistical significant
relationship between

914
01:02:57.000 --> 01:02:58.000
the dependent and independent
variable.

915
01:02:58.000 --> 01:03:01.000
During the contaminant and time.

916
01:03:01.000 --> 01:03:09.000
And another table, we have ever

917
01:03:09.000 --> 01:03:12.000
terms and here is the
statistical test for aggression.
Statistical

918
01:03:12.000 --> 01:03:17.000
test indicates that regression
is statistically significant and
if

919
01:03:17.000 --> 01:03:33.000
we look at adjusted R Square as

920
01:03:33.000 --> 01:03:36.000
one of the indicators for
regression, adjusted R square we
set is more

921
01:03:36.000 --> 01:03:39.000
conservative, it is a little bit
higher now. 0.874. which, many

922
01:03:39.000 --> 01:03:44.000
people would quickly say, okay,
we have a better model. But
remember,

923
01:03:44.000 --> 01:03:48.000
we really can't make final
conclusions before we look

924
01:03:48.000 --> 01:03:51.000
at the residuals.

925
01:03:51.000 --> 01:03:57.000
Here on the bottom, we have
again the residual table.

926
01:03:57.000 --> 01:04:01.000
So, you will take the same steps
as before, take

927
01:04:01.000 --> 01:04:07.000
this table in XL and bring it in

928
01:04:07.000 --> 01:04:10.000
ProUCL.

929
01:04:10.000 --> 01:04:15.000
I prepared a data table in excel
already

930
01:04:15.000 --> 01:04:18.000
so I will open it now.

931
01:04:18.000 --> 01:04:21.000
It should be XLS again.

932
01:04:21.000 --> 01:04:26.000
I think I have and it in.
Longoria

933
01:04:26.000 --> 01:04:32.000
longer correction. These are my
residuals for

934
01:04:32.000 --> 01:04:38.000
log regression. I am opening now
this table. You see here

935
01:04:38.000 --> 01:04:42.067
log transformed data. These are
fitted values and residuals.

936
01:04:42.067 --> 01:04:48.000
We will again repeat the same
plot as we did before for
diagnostic

937
01:04:48.000 --> 01:04:51.000
the regression model.

938
01:04:51.000 --> 01:04:56.000
So, first I am going to plot
residuals

939
01:04:56.000 --> 01:04:59.000
versus fitted values.

940
01:04:59.000 --> 01:05:05.000
I am going to the data

941
01:05:05.000 --> 01:05:11.000
table. Statistical test, trend
analysis, we both use time

942
01:05:11.000 --> 01:05:17.000
series plot and tweak it a
little bit, just the same as
before.

943
01:05:17.000 --> 01:05:23.000
Residuals as measured data, why
is event. On

944
01:05:23.000 --> 01:05:29.000
X axis, okay, here is our plot
now.

945
01:05:29.000 --> 01:05:35.000
I am tweaking it, going to
properties, series

946
01:05:35.000 --> 01:05:39.000
scatterplot, making bullets a

947
01:05:39.000 --> 01:05:48.000
little bit bigger so it is
easier to see. Okay.

948
01:05:48.000 --> 01:05:52.000
So with

949
01:05:52.000 --> 01:05:57.000
that before checking the first
assumption, we want this scatter

950
01:05:57.000 --> 01:06:01.000
to be as random as possible.
It's assumption is that the
residuals

951
01:06:01.000 --> 01:06:06.000
are not having a constant
variance.

952
01:06:06.000 --> 01:06:16.000
This picture is not too bad. I
would say first assumption is
approved.

953
01:06:16.000 --> 01:06:19.000
That the variance is reasonably
constant. Now, let's create the

954
01:06:19.000 --> 01:06:31.000
second plot to check the second
assumption that residuals

955
01:06:31.000 --> 01:06:37.000
so, I am taking residuals into
measured value

956
01:06:37.000 --> 01:06:43.067
and observation as event on X
part. Okay.

957
01:06:43.067 --> 01:06:49.000
Oh, this time it worked. Nice. I
am tweaking it again.

958
01:06:49.000 --> 01:06:59.000
Series scatterplot.

959
01:06:59.000 --> 01:07:04.000
It is always better to look at
scatterplot so we are not
distracted

960
01:07:04.000 --> 01:07:07.000
with the line.

961
01:07:07.000 --> 01:07:12.000
So, this time we see this
curvature may be even

962
01:07:12.000 --> 01:07:16.000
more pronounced. It is going the
other direction upwards

963
01:07:16.000 --> 01:07:20.000
but it is maybe even more
pronounced, so

964
01:07:20.000 --> 01:07:26.000
this definitely indicates there
is a lack of fit.

965
01:07:26.000 --> 01:07:32.000
So, the residuals are not
completely independent.

966
01:07:32.000 --> 01:07:36.000
Now, the third assumption, we
need to do another

967
01:07:36.000 --> 01:07:42.067
goodness of fit test for log
transformed data.

968
01:07:42.067 --> 01:07:45.000
So, let me do that.

969
01:07:45.000 --> 01:07:50.000
I am going to log regressions,
statistical test, goodness

970
01:07:50.000 --> 01:08:03.000
of fit. We want to check for
normality. Taking residuals,
okay.

971
01:08:03.000 --> 01:08:07.000
So, this plot doesn't look at
that nice now. Let me just make

972
01:08:07.000 --> 01:08:13.000
those bullets a little bit
bigger so that we see better.

973
01:08:13.000 --> 01:08:19.000
We see this around

974
01:08:19.000 --> 01:08:25.000
the plot and if we look at the
outcome

975
01:08:25.000 --> 01:08:30.000
of the goodness of fit test,
here in the middle

976
01:08:30.000 --> 01:08:35.000
it says that they are not
normal.

977
01:08:35.000 --> 01:08:40.000
So the third assumption
definitely is not confirmed.

978
01:08:40.000 --> 01:08:46.000
So, no, what did we learn from
all this?

979
01:08:46.000 --> 01:08:52.000
So, you really can't rely on R
square.

980
01:08:52.000 --> 01:09:05.000
R square was a little bit better
but it really didn't

981
01:09:05.000 --> 01:09:07.000
contribute to the problem with
residuals. You need to look at
residuals,

982
01:09:07.000 --> 01:09:10.000
need to take the assumptions. To
give us a lot of information
about

983
01:09:10.000 --> 01:09:14.000
the regression model. So,
plotting residuals will actually
read the

984
01:09:14.000 --> 01:09:18.000
lack of a fit of the model. What
we haven't seen in our example
but

985
01:09:18.000 --> 01:09:24.000
will also show on residual pots
extreme value outliers.

986
01:09:24.000 --> 01:09:30.000
That is another reason why to
take time and analyze.

987
01:09:30.000 --> 01:09:35.000
The point to remember is that
when you can't confirm the
assumptions

988
01:09:35.000 --> 01:09:41.000
for the residuals, and you

989
01:09:41.000 --> 01:09:47.934
do have a need to improve the
model to can do some

990
01:09:47.934 --> 01:09:51.934
predictions, then that is
definitely the time to work with
statisticians

991
01:09:51.934 --> 01:09:56.934
and find a way to best analyze
your data. This brings us almost
to the

992
01:09:56.934 --> 01:10:00.934
end of the first session. Next
slide we will see

993
01:10:00.934 --> 01:10:09.934
on the screen is the comparison
of three trend

994
01:10:09.934 --> 01:10:12.934
tests we explored today. So, at
the beginning we looked at the
two

995
01:10:12.934 --> 01:10:14.934
nonparametric tests that are
both nice starting point when we
are

996
01:10:14.934 --> 01:10:17.934
looking for trends in the data.

997
01:10:17.934 --> 01:10:23.934
So Mann-Kendall will tell us if
we have a consistent or monotone

998
01:10:23.934 --> 01:10:29.934
trend, statically increasing or
study decreasing.

999
01:10:29.934 --> 01:10:33.934
The nice thing about this test
is that it will give us a
reasonable

1000
01:10:33.934 --> 01:10:37.934
outcomes with as little as four
points. Remember the power of
the

1001
01:10:37.934 --> 01:10:43.934
test when we have a small Beta
test low.

1002
01:10:43.934 --> 01:10:46.934
Even if the trend is present, it
may not

1003
01:10:46.934 --> 01:10:51.934
show because of the low power of
the test. So, the next task
nonparametric

1004
01:10:51.934 --> 01:10:57.934
test is Theil-Sen. It identifies
the transit

1005
01:10:57.934 --> 01:11:05.934
confirms the trend observed with
Mann-Kendall. It also gives a
slope

1006
01:11:05.934 --> 01:11:10.934
estimate.

1007
01:11:10.934 --> 01:11:15.934
But, for the contrast of the
Mann-Kendall test requires a
fair amount of

1008
01:11:15.934 --> 01:11:20.934
data and it only works when we
have one observation per time
interval.

1009
01:11:20.934 --> 01:11:25.934
However, it averages repeated
measurements of the same
sampling

1010
01:11:25.934 --> 01:11:30.934
event. So, if you have repeated
measurements, remember

1011
01:11:30.934 --> 01:11:33.934
to check that box.

1012
01:11:33.934 --> 01:11:41.934
And the third metric we looked
at in ProUCL is

1013
01:11:41.934 --> 01:11:45.934
OLS Regression. This is the most
advanced of the three methods.

1014
01:11:45.934 --> 01:11:50.934
Because of that, it also gives
us the most information. It
evaluates

1015
01:11:50.934 --> 01:11:56.934
slope and intercept. It gives us
estimate for both parameters

1016
01:11:56.934 --> 01:12:00.934
and there are three assumptions
for

1017
01:12:00.934 --> 01:12:05.934
residuals that need to be
confirmed.

1018
01:12:05.934 --> 01:12:11.934
So, it does require a little bit
more work but it really gives

1019
01:12:11.934 --> 01:12:16.934
us the residual analysis can
reveal a lot about

1020
01:12:16.934 --> 01:12:24.934
the data. Another downside of
the ordinary square regression

1021
01:12:24.934 --> 01:12:27.934
is it also requires a fair
amount of

1022
01:12:27.934 --> 01:12:38.934
data to obtain reliable
perimeter estimates.

1023
01:12:38.934 --> 01:12:41.934
So, this is a more or less
everything we wanted to cover
today. It is

1024
01:12:41.934 --> 01:12:47.000
time for a second quiz. If we
can show that on the screen now.

1025
01:12:47.000 --> 01:12:53.000
And then Travis will make a
short recap and if he has
something

1026
01:12:53.000 --> 01:12:57.000
to add.

1027
01:12:57.000 --> 01:13:05.000
>> Okay. We are going to stick
that second quiz questions.
Travis or

1028
01:13:05.000 --> 01:13:07.000
Polona, we have had people ask
if you could read questions out

1029
01:13:07.000 --> 01:13:10.000
loud because some are following
along on the phones. Going to
start

1030
01:13:10.000 --> 01:13:14.000
that quiz now. Feel free to read
them out to them.

1031
01:13:14.000 --> 01:13:19.000
>> I don't see them if you can
with them Travis.

1032
01:13:19.000 --> 01:13:23.000
>> The R square and adjusted R
square

1033
01:13:23.000 --> 01:13:26.000
values tell us what?

1034
01:13:26.000 --> 01:13:31.000
Proportion of the variation that
is explained by the cut off but

1035
01:13:31.000 --> 01:13:35.000
his post as a model at the end
there and see, the estimate of
our

1036
01:13:35.000 --> 01:13:46.000
OLS slope.

1037
01:13:46.000 --> 01:13:52.000
All right. Non-normality of our
OLS residuals indicate a lack of

1038
01:13:52.000 --> 01:13:56.000
fit, be our model is

1039
01:13:56.000 --> 01:14:00.000
not optimal, see, the confident
in our predicament is not
relabeled,

1040
01:14:00.000 --> 01:14:04.000
or the, all of the above.

1041
01:14:04.000 --> 01:14:06.000
>> Okay.

1042
01:14:06.000 --> 01:14:19.000
It looks like the right answer
for that one was the. Job
everybody.

1043
01:14:19.000 --> 01:14:22.000
>> Just. Real quick. I think we
will just make a couple final
remarks

1044
01:14:22.000 --> 01:14:26.000
and answer a few more questions
at the end. Just to recap
everything

1045
01:14:26.000 --> 01:14:32.000
that Polona sat there.

1046
01:14:32.000 --> 01:14:35.000
When you are looking to do trend
analysis, really make

1047
01:14:35.000 --> 01:14:37.000
sure you spend time exploring
your data before you start doing
your

1048
01:14:37.000 --> 01:14:43.067
tests. And, want to have to
figure out, especially with OLS,
don't

1049
01:14:43.067 --> 01:14:49.000
totally rely on your R square
and adjusted R square

1050
01:14:49.000 --> 01:14:54.000
values because they might not be
revealing the problems that your

1051
01:14:54.000 --> 01:15:02.000
underlying assumptions need to
have validated.

1052
01:15:02.000 --> 01:15:06.000
Also, if you are thinking,
transforming your data, take
time to think about

1053
01:15:06.000 --> 01:15:11.000
the processes that might be
underlining that need,

1054
01:15:11.000 --> 01:15:15.000
such as the first order or zero
order reactions that we
discussed

1055
01:15:15.000 --> 01:15:21.000
for a moment there.

1056
01:15:21.000 --> 01:15:24.000
And, really as always, when in
doubt consult

1057
01:15:24.000 --> 01:15:27.000
a statistician. I know that was
brought up a bunch with the
nondetectable

1058
01:15:27.000 --> 01:15:32.000
stuff and I can't say it enough
there that

1059
01:15:32.000 --> 01:15:37.000
that one is a tricky subject and
you are not totally sure on
what,

1060
01:15:37.000 --> 01:15:40.000
you need to talk to people who
might know better. That doesn't

1061
01:15:40.000 --> 01:15:42.067
even mean just one. Maybe a
fuel.

1062
01:15:42.067 --> 01:15:46.000
People will have different
opinions on how to handle
non-detect in different

1063
01:15:46.000 --> 01:15:52.000
situations.

1064
01:15:52.000 --> 01:15:55.000
As always, similar to the
outliers with last time,

1065
01:15:55.000 --> 01:15:57.000
make sure you are really
documenting all the steps in
your analysis and

1066
01:15:57.000 --> 01:16:01.000
the decisions that you decide to
make. Yeah, real quick, if we
want

1067
01:16:01.000 --> 01:16:06.000
to jump into a shoe questions
that we had, Jean, did you want
to be

1068
01:16:06.000 --> 01:16:10.000
the one I had highlighted there?

1069
01:16:10.000 --> 01:16:13.000
>> One of the participants is
asked what should you do if your
data

1070
01:16:13.000 --> 01:16:17.000
appear to exhibit a seasonal
trend?

1071
01:16:17.000 --> 01:16:23.000
>> So, I picked that one because
it can apply to all of them.

1072
01:16:23.000 --> 01:16:28.000
If you are doing ordinary least
squares regression

1073
01:16:28.000 --> 01:16:38.000
and when you view your data you
don't see a linear trend whether

1074
01:16:38.000 --> 01:16:40.000
that be a seasonal trend or when
we look at it initially there
looked

1075
01:16:40.000 --> 01:16:44.000
like there was be a quadratic
print to it, something nonlinear
that

1076
01:16:44.000 --> 01:16:47.000
you want to figure out if there
are any transformations

1077
01:16:47.000 --> 01:16:52.000
that work best for your
situation, so far as we had
happened to look

1078
01:16:52.000 --> 01:16:55.000
at the lognormal transformation
because it is something

1079
01:16:55.000 --> 01:17:01.000
that is effective for fixing a
quadratic curve. For seasonal
things, you

1080
01:17:01.000 --> 01:17:05.000
might want to look at
aggregating your data in some
way to get rid

1081
01:17:05.000 --> 01:17:15.000
of that seasonality but again,
really any of those are going to

1082
01:17:15.000 --> 01:17:20.000
beat this dead horse that you
should talk to a statistician.
The answer

1083
01:17:20.000 --> 01:17:24.000
isn't clear to you as to how you
want to

1084
01:17:24.000 --> 01:17:27.000
handle it, you need to make sure
to talk to someone but it is
clear

1085
01:17:27.000 --> 01:17:33.000
to them. And then so there were
some other questions in their

1086
01:17:33.000 --> 01:17:36.000
but individuals had asked and I
got a little ahead of myself and

1087
01:17:36.000 --> 01:17:40.000
at entered directly to them. I
realize some of them might be
helpful for

1088
01:17:40.000 --> 01:17:43.067
you all.

1089
01:17:43.067 --> 01:17:50.000
Some one had asked the question
that, in a previous

1090
01:17:50.000 --> 01:17:55.000
webinar of how do you save all
of the work that you have done
inside

1091
01:17:55.000 --> 01:17:56.000
of ProUCL.

1092
01:17:56.000 --> 01:17:57.000
>> I can show.

1093
01:17:57.000 --> 01:18:05.000
>> You are just going to
highlight.

1094
01:18:05.000 --> 01:18:09.000
>> I highlighted here on the
navigation

1095
01:18:09.000 --> 01:18:15.000
panel the name of the item that
you want to save. Go to file,
save

1096
01:18:15.000 --> 01:18:18.000
as and to save it.

1097
01:18:18.000 --> 01:18:26.000
>> Yes. Cool.

1098
01:18:26.000 --> 01:18:30.000
Someone else had asked as far as
ProUCL's need to have a numeric

1099
01:18:30.000 --> 01:18:35.000
value for the time estimates,
could you also have a text
column in their.

1100
01:18:35.000 --> 01:18:46.000
The answer is to cut but

1101
01:18:46.000 --> 01:18:49.000
if you need it for some sort of
presentations at the, feel free

1102
01:18:49.000 --> 01:18:52.000
to throw it in there. You are
not going to be able to use it
for any

1103
01:18:52.000 --> 01:18:54.000
of the data analysis. When you
are selecting your independent
and dependent

1104
01:18:54.000 --> 01:18:57.000
variable, you just going to want
to make sure you don't select
that

1105
01:18:57.000 --> 01:19:02.000
character: and in fact select
the numeric one.

1106
01:19:02.000 --> 01:19:10.000
>> And another remark here is
that it is especially important

1107
01:19:10.000 --> 01:19:15.000
because if you don't capture
that your events are unevenly

1108
01:19:15.000 --> 01:19:20.000
placed, let's say five days
followed by

1109
01:19:20.000 --> 01:19:25.000
10 days in between the event and
something like that, you may
miss

1110
01:19:25.000 --> 01:19:29.000
the shape of your trend.

1111
01:19:29.000 --> 01:19:36.000
>> I think that was, also I am
aware there were a lot

1112
01:19:36.000 --> 01:19:41.000
of questions asking for a ProUCL
session just on dealing with
non-detect

1113
01:19:41.000 --> 01:19:42.000
and trend analysis. Agreed.

1114
01:19:42.000 --> 01:19:47.934
I think that would be a great
one to do. The main focus on the

1115
01:19:47.934 --> 01:19:49.934
three part webinar with just
getting people

1116
01:19:49.934 --> 01:19:54.934
comfortable with all of the
available tools that we have in
ProUCL at

1117
01:19:54.934 --> 01:19:58.934
the moment. Since that is such a
large

1118
01:19:58.934 --> 01:20:04.934
topic, it deserves release on
webinar and not be shoved in
five

1119
01:20:04.934 --> 01:20:08.934
minutes of trying to explain it.
That is why we pushed

1120
01:20:08.934 --> 01:20:14.934
it off a bit on this one. It is
a tricky topic that we couldn't

1121
01:20:14.934 --> 01:20:19.934
recover and cover everything
else that we

1122
01:20:19.934 --> 01:20:23.934
did today as well. Just wanted
to let everyone know that their
concerns

1123
01:20:23.934 --> 01:20:27.934
are noted. Yes. Jean, anything
else?

1124
01:20:27.934 --> 01:20:33.934
>> I think we have a couple of
other reminders in terms of the
sessions.

1125
01:20:33.934 --> 01:20:37.934
Coming up to our scheduled time
and a few

1126
01:20:37.934 --> 01:20:43.934
more remotest walk-through.
Preface or Polona, can you take
the audience

1127
01:20:43.934 --> 01:20:51.934
through a few final reminders?

1128
01:20:51.934 --> 01:20:54.934
>> Next session, which I believe
is on March 9th, correct Jean?
Is

1129
01:20:54.934 --> 01:21:06.934
going to be discussing
background site comparisons as
well as

1130
01:21:06.934 --> 01:21:12.934
UCL's. We finally get

1131
01:21:12.934 --> 01:21:15.934
to the UCL part of ProUCL.

1132
01:21:15.934 --> 01:21:18.934
As always, our contact
information is going to be
available at the

1133
01:21:18.934 --> 01:21:20.934
end if you want to get hold of
us.

1134
01:21:20.934 --> 01:21:36.934
Yes, I think that is most of the
housekeeping staff.

1135
01:21:36.934 --> 01:21:41.934
>> Okay. All right. With that,
let me walk through a few final
reminder

1136
01:21:41.934 --> 01:21:43.934
before we officially close out
today's session. What to think
the nearly

1137
01:21:43.934 --> 01:21:46.934
500 people who joined us for
today's live broadcast. We will
walk-through

1138
01:21:46.934 --> 01:21:48.934
just a couple of things. We
often get questions about how
you can

1139
01:21:48.934 --> 01:21:51.934
stay informed when webinars like
this get announced or are
available.

1140
01:21:51.934 --> 01:21:56.934
I encourage you to visit us at
clue in.work or sign in for a
free monthly

1141
01:21:56.934 --> 01:22:03.934
newsletter. We will announce
topics like this and

1142
01:22:03.934 --> 01:22:05.934
other topics such as cleaning up
hazardous waste the site and
most

1143
01:22:05.934 --> 01:22:07.934
of his letters. Copies of the
presentation materials are
available to download

1144
01:22:07.934 --> 01:22:10.934
from the seminar links and
resources page, as well as
contact information

1145
01:22:10.934 --> 01:22:12.934
before the organizers. You will
see those links already
available

1146
01:22:12.934 --> 01:22:18.934
under related URL. School to
public interest, click it

1147
01:22:18.934 --> 01:22:20.934
for those of you replaying the
recorded version of, those links

1148
01:22:20.934 --> 01:22:25.934
are still active. I have also
pasted the URLs into the queue
and a window.

1149
01:22:25.934 --> 01:22:30.934
The links are still active even
if you were replaying the
recorded

1150
01:22:30.934 --> 01:22:37.934
version. I would ask you to fill
out our online feedback

1151
01:22:37.934 --> 01:22:39.934
form to let us know what you
thought of today's delivery. One
of the

1152
01:22:39.934 --> 01:22:56.000
most common questions I get is
if we offer certificate

1153
01:22:56.000 --> 01:23:00.000
and we will generate a
participation certificate for
you within us to

1154
01:23:00.000 --> 01:23:02.000
fill out and submit your
feedback.

1155
01:23:02.000 --> 01:23:05.000
Be sure to check the box at the
very bottom of the form
certifying

1156
01:23:05.000 --> 01:23:07.000
you either here for the whole
life event or replace the whole
recorded

1157
01:23:07.000 --> 01:23:10.000
the and picture your email
address is entered quickly on
that one.

1158
01:23:10.000 --> 01:23:13.000
As long as those two things are
done, as soon as you submit your

1159
01:23:13.000 --> 01:23:15.000
feedback, you will be able to
print out or save a copy of the
parts

1160
01:23:15.000 --> 01:23:18.000
patient certificate, something
like this one, for your own
records.

1161
01:23:18.000 --> 01:23:21.000
If you join together as a group,
each person in that room can
fill

1162
01:23:21.000 --> 01:23:23.000
out the form on their own
whether you would state as an
individual

1163
01:23:23.000 --> 01:23:26.000
or not. If you happened to get a
bunch of coworkers together to
join

1164
01:23:26.000 --> 01:23:29.000
you, please be sure to share the
link with that feedback form so

1165
01:23:29.000 --> 01:23:39.000
they can fill out the form and
get their own certificates

1166
01:23:39.000 --> 01:23:42.067
for their own records. As noted,
today's session was recorded,
just

1167
01:23:42.067 --> 01:23:44.000
like the others in the series.
In just a few days, to get an
automatic

1168
01:23:44.000 --> 01:23:47.000
email from me with instructions
of how you can replay today's
training.

1169
01:23:47.000 --> 01:23:49.000
If you weren't able to follow
along and real-time committee
will be

1170
01:23:49.000 --> 01:23:52.000
able to follow those
instructions to download the
sample data sets

1171
01:23:52.000 --> 01:23:55.000
in ProUCL and once the webinar
and it pops and restart it as
needed

1172
01:23:55.000 --> 01:23:57.000
to follow along with the
exercises.

1173
01:23:57.000 --> 01:23:59.000
Hopefully, will be on for our
third and final webinar this
weekend please

1174
01:23:59.000 --> 01:24:02.000
be sure to take time to turn
your feedback for or broadcast.
We do

1175
01:24:02.000 --> 01:24:09.000
read each and every one of those
missions and some proof
delivery.

1176
01:24:09.000 --> 01:24:11.000
With that, I will formally
conclude today's broadcast.
Thank you so

1177
01:24:11.000 --> 01:24:12.000
much for joining us.

1178
01:24:12.000 --> 01:24:15.000
>> Thank you everybody. Thank
you Gene.

1179
01:24:15.000 --> 01:24:20.000
>> Thanks everyone.