WEBVTT
1
00:00:12.529 --> 00:00:23.270
Mihai Anitescu: And sort of like you know, Jonathan has been a sort of a so jonathas we've got a bit of famer in his career probably all of it like you know, working on wearing the bus stochastic systems.
2
00:00:24.020 --> 00:00:30.980
Mihai Anitescu: Statistics over there are you know how to compute that fast and all that and where i'd like it to be the squatter and a paper about how to.
3
00:00:31.790 --> 00:00:45.260
Mihai Anitescu: How to actually use large deviation theory to actually deal with power grid, for example, and well i'm i'm looking forward to what sounds like a very interesting talk so Jonathan go ahead and sort of like you knows you know it's 45 minutes and and sort of like you know it's a.
4
00:00:46.670 --> 00:00:52.310
Mihai Anitescu: Work I think we're pretty formal but but it's about 45 minutes is no, thank you, but.
5
00:00:53.060 --> 00:01:06.620
Jonathan Weare: Thank you me hi and thank you, everybody for Thank you organizers for inviting me I wish I could have been at UFC is me I knows I was even after moving I was a pretty regular.
6
00:01:07.910 --> 00:01:10.880
Jonathan Weare: I made pretty regular returns, so that the pandemic really.
7
00:01:12.290 --> 00:01:14.960
Jonathan Weare: really changed things in that way, and many others, but.
8
00:01:16.190 --> 00:01:20.000
Jonathan Weare: So I want to point out that there are quite a few.
9
00:01:21.110 --> 00:01:24.230
Jonathan Weare: Co authors and and people involved in this work.
10
00:01:25.730 --> 00:01:45.500
Jonathan Weare: i'm going to talk about a couple different things you see demetrius is on there as well, he was very central to some of the stuff i'll talk about towards the end other people at at UFC, for example, dorian abbott's at UFC Aaron dinner at UFC justin finkel as a student in me highs department.
11
00:01:47.450 --> 00:01:47.840
and
12
00:01:48.920 --> 00:01:51.350
Jonathan Weare: Jon jon's drawn to pat.
13
00:01:52.460 --> 00:02:01.550
Jonathan Weare: And Eric are we're all students in chemistry is at UFC so lots of UFC connections.
14
00:02:05.000 --> 00:02:15.770
Jonathan Weare: Okay, so i'm mostly focused on rare events in this talk, and this is going to be, it is basically a numerical analysis talk if if you know if some of the numerical analysis is.
15
00:02:16.520 --> 00:02:24.230
Jonathan Weare: Is not the flavor of things that you're interested in then then maybe focus on the kinds of methods i'm talking about.
16
00:02:25.250 --> 00:02:34.850
Jonathan Weare: Maybe that part will be more interesting so again we're focused on rare events so by that I mean events that happen very infrequently.
17
00:02:36.620 --> 00:02:38.030
Jonathan Weare: So examples.
18
00:02:39.080 --> 00:02:54.860
Jonathan Weare: would be, for example, very intense weather events or they could be could be climate and climate events as well, major blackout so it's something that that actually me I introduced me to, and everything I know about that comes from interacting with him.
19
00:02:56.120 --> 00:02:57.200
Jonathan Weare: Then there's.
20
00:02:59.330 --> 00:03:03.440
Jonathan Weare: More along the lines of what i'll mentioned as examples in this talk.
21
00:03:04.760 --> 00:03:12.560
Jonathan Weare: large scale rearrangements of biomolecules so this picture here is of disassociation of to insulin.
22
00:03:14.060 --> 00:03:18.530
Jonathan Weare: And insulin dimer and that's has implications for.
23
00:03:20.360 --> 00:03:33.320
Jonathan Weare: For diabetes treatments, I should mention you know what what unites all of these events, is that they occur on a time scale much longer than the simulation time scale, so you have some model on your computer.
24
00:03:33.830 --> 00:03:41.510
Jonathan Weare: It has some natural time scale, so if you're thinking of like integrating an OD or a time dependent P, would be the time step.
25
00:03:42.590 --> 00:03:46.790
Jonathan Weare: You have to go through many, many, many, many times steps to.
26
00:03:47.840 --> 00:03:50.750
Jonathan Weare: to observe one of these events in your model.
27
00:03:54.350 --> 00:03:55.310
Jonathan Weare: oops wrong button.
28
00:03:56.930 --> 00:04:00.590
Jonathan Weare: So there's different ways that that are common to.
29
00:04:01.700 --> 00:04:11.150
Jonathan Weare: To address this issue So how do I reach that those long time scale events, given that i'm constrained to simulate on a on a very fast time scale.
30
00:04:13.160 --> 00:04:22.520
Jonathan Weare: So you know one way is just to build a bigger computer that's been done this picture here is of Anton, which is actually hardware built specifically for.
31
00:04:24.080 --> 00:04:26.990
Jonathan Weare: Those animalistic simulations of biomolecules.
32
00:04:28.490 --> 00:04:34.520
Jonathan Weare: Or you can try and build a cheaper model, you lose some accuracy, but maybe you can keep you know, maybe.
33
00:04:35.270 --> 00:04:43.460
Jonathan Weare: Some accuracy will be retained in the in terms of representing the event that you're interested in so of course granting is a is a very.
34
00:04:44.120 --> 00:04:50.690
Jonathan Weare: Common thing people try and do and then another thing that that i've worked on quite a bit is rare events simulation so you.
35
00:04:51.500 --> 00:05:01.700
Jonathan Weare: You keep the original you keep a high fidelity model, but you try and you try and somehow trick it so that you can see the event that you're interested in.
36
00:05:02.270 --> 00:05:13.610
Jonathan Weare: In a much shorter times on much shorter time scales, but still get the correct statistics, so this isn't this is this talk is sort of related to two and three, but it's somewhere in between the two.
37
00:05:19.130 --> 00:05:27.530
Jonathan Weare: Okay, so whatever method you choose either direct simulation or some kind of rare events simulation scheme, you need to.
38
00:05:28.190 --> 00:05:40.400
Jonathan Weare: To that's a you know talking about a high dimensional dynamical system, you have you know it's time series data it's not something you can just look at and pull out any interesting observations from you need some kind of a.
39
00:05:40.400 --> 00:05:40.880
Mihai Anitescu: method.
40
00:05:41.360 --> 00:05:43.730
Jonathan Weare: To process that data and produce.
41
00:05:45.020 --> 00:05:55.640
Jonathan Weare: produce something useful, so you know me hi during demetrius talk was talking about mechanism that's the kind of thing, so you know what are precursors, for example.
42
00:05:56.300 --> 00:06:05.690
Jonathan Weare: Physical observations things you're used to thinking about variables, you might be used to thinking about but, but you want to know sort of in what order should I look at them to.
43
00:06:06.140 --> 00:06:13.850
Jonathan Weare: see that one of these events, is about to happen that kind of information is what you want to mine this massive data set for.
44
00:06:15.380 --> 00:06:26.750
Jonathan Weare: And I should say that, in a lot of these problems, the data set is massive really because the state of the system is pretty massive in terms of actual amount of data, well, we could do with a lot more.
45
00:06:29.840 --> 00:06:40.610
Jonathan Weare: it's expensive to generate data for for a lot of these systems okay so i'm mostly going to talk about one already existing sorry i'm in my son's room, so we have company.
46
00:06:41.900 --> 00:06:47.570
Jonathan Weare: i'm mostly going to talk about an existing algorithm for the first half of the talk.
47
00:06:48.650 --> 00:06:51.740
Jonathan Weare: So an algorithm we had nothing to do with introducing but.
48
00:06:53.000 --> 00:07:00.020
Jonathan Weare: But we did analyze it so i'm going to talk about that analysis error analysis and that's you know that's a connection to.
49
00:07:01.430 --> 00:07:10.430
Jonathan Weare: The the topic of the workshop i'm going to in the second part i'm going to discuss a closely related set of methods.
50
00:07:11.450 --> 00:07:14.300
Jonathan Weare: But for a different tasks it's really a forecasting task.
51
00:07:15.650 --> 00:07:30.020
Jonathan Weare: And then, in the third part i'll try and just suggest some some interesting observations from simulations that that we don't you know, have a theoretical handle on but but seemed to be pretty interesting i'll try and convince you of that.
52
00:07:32.060 --> 00:07:42.350
Jonathan Weare: Let me give you a model to keep in mind, so this is the folding of a of a protein for those who don't know anything about proteins that's fine just think of a Bead of.
53
00:07:43.190 --> 00:07:56.210
Jonathan Weare: string of beads if you the string of beads evolves according to some live given a an sd here the dx equals minus well i've got this sd here, you can think of.
54
00:08:01.250 --> 00:08:16.250
Jonathan Weare: This equation here, you can think of as describing the evolution of the position of every atom in this protein and in the solvent that surrounds it so there's lots of water water molecules surrounding that protein.
55
00:08:17.480 --> 00:08:23.180
Jonathan Weare: This part of the equation, even if you're not familiar with cody's this part of the with SDS this part of the equation.
56
00:08:23.660 --> 00:08:29.750
Jonathan Weare: should be familiar, this is just a steepest descent O D so there's some potential that describes.
57
00:08:30.530 --> 00:08:43.160
Jonathan Weare: The interaction between atoms you know they they they might attract each other and if they get too close they might repose repel each other, so this just says in general, I want to try and decrease that potential.
58
00:08:44.660 --> 00:08:51.500
Jonathan Weare: But I have you know interaction with with the heat bath that i'm not modeling so interaction with other atoms.
59
00:08:52.460 --> 00:09:11.030
Jonathan Weare: With the rest of the universe that i'm not modeling and and I model, those with this, what I would call a thermostat Okay, so this is a, this is a typical simple model that you might use it's not quite what people would actually use for something like studying trip cage folding.
60
00:09:11.540 --> 00:09:33.800
Jonathan Weare: But but it's not far off morally, this is the kind of model people think about so i'd like you to keep this model in mind one one important feature of this model is that it has a stationary distribution, and we know exactly what it is it's either the minus beta V where V was that potential.
61
00:09:35.540 --> 00:09:43.670
Jonathan Weare: And beta is the term appearing in the noise, so when beta is really big the noise is really small.
62
00:09:44.720 --> 00:09:50.810
Jonathan Weare: And the annex just wants to sit near the minimum the minimum of the.
63
00:09:52.040 --> 00:09:58.190
Jonathan Weare: Okay, and that's what that's what this is saying this distribution concentrates around the minimum of V when beta gets large.
64
00:09:59.510 --> 00:10:11.660
Jonathan Weare: Okay, so keep that in mind as a model problem it's the kind of dynamics, you know the equation is relatively simple, but when you consider that there are many atoms in this system.
65
00:10:12.680 --> 00:10:15.620
Jonathan Weare: it's it's quite a complicated evolution.
66
00:10:18.980 --> 00:10:29.870
Jonathan Weare: Oh, and I will come back to this trip cage example, so let me just point out when I say that it's folding What I mean is that that string of beads can be stretched out, but then it it.
67
00:10:30.890 --> 00:10:37.430
Jonathan Weare: If I started to simulate it from a stretched out position, it would tend to form a glob.
68
00:10:39.020 --> 00:10:44.420
Jonathan Weare: If I ran it long enough okay so that's what I mean by the folding and unfolding if I say that.
69
00:10:45.380 --> 00:11:03.080
Jonathan Weare: Okay, so i'm going to start with this variation approach to come formal dynamics again, this is not invented by us that particular method that i'm going to describe is was introduced by frank Noli in 2013 but actually it has a long history.
70
00:11:04.490 --> 00:11:09.050
Jonathan Weare: Under the name mark off state modeling so there's a particular instance of this back.
71
00:11:10.310 --> 00:11:17.930
Jonathan Weare: This back framework that i'm going to show you mark off state modeling goes back to the late 90s, and is a pretty popular tool in.
72
00:11:18.860 --> 00:11:35.120
Jonathan Weare: income in competitions disco mechanics the method i'm going to describe has a lot of similarities with this stuff that demetrius talked about this morning one difference is that i'm going to be talking about a stochastic evolution i'm also going to assume that my.
73
00:11:36.590 --> 00:11:44.810
Jonathan Weare: My this transition operator, which is like the stochastic version of the Cuban operator that it's a self a joint operator.
74
00:11:46.100 --> 00:11:48.920
Jonathan Weare: So, in terms of the dynamics that means that it's reversible.
75
00:11:51.530 --> 00:11:51.920
Okay.
76
00:11:55.580 --> 00:12:05.810
Jonathan Weare: way, it also means that it's that any that it has a real spectrum so so that's going to be useful to, but in fact i'm going to assume more than that i'm going to assume that it has oops sorry.
77
00:12:06.860 --> 00:12:12.800
Jonathan Weare: That it is quasi compact, so my goal in this in this first part of the talk is actually to estimate.
78
00:12:14.630 --> 00:12:27.680
Jonathan Weare: eigenvectors and eigenvalue so the eigenvalues of this operator all why and the unit this okay and and, since this is self a joint in fact they're all real.
79
00:12:28.910 --> 00:12:38.150
Jonathan Weare: It i'm going to assume that this is quasi compact, which means essentially that I have I get actual true eigenvalues near one.
80
00:12:39.260 --> 00:12:49.190
Jonathan Weare: And then, at some point below some point inside a smaller disk I might not have you know I might have a continuous spectrum but that's separated away from one.
81
00:12:49.670 --> 00:13:07.100
Jonathan Weare: And my goal is going to be to compute these eigenvalues and I can vectors near one, so this is like you think of a matrix This is just like a diagonal ization of a symmetric matrix, this is the projection onto the ice eigenvector.
82
00:13:08.540 --> 00:13:08.960
Okay.
83
00:13:10.280 --> 00:13:20.750
Jonathan Weare: So this assumption mean the only real consequence of this assumption is to make the whole task makes sense, I want the eigenvalues that are.
84
00:13:21.140 --> 00:13:29.120
Jonathan Weare: closest to one to be I want them to part of the spectrum that's closest to one to be an actual eigenvalues with eigenvectors that I can try and approximate.
85
00:13:31.370 --> 00:13:38.750
Jonathan Weare: Okay i'm gonna just for notation i'm going to assume that these eigenvectors are are ordered as follows, so.
86
00:13:39.950 --> 00:13:46.160
Jonathan Weare: So either the minus Sigma one tau tau is a is this time what's called the time lag parameter.
87
00:13:47.180 --> 00:13:52.550
Jonathan Weare: So it's the amount of time I evolve forward before I take the expectation.
88
00:13:53.420 --> 00:14:08.090
Jonathan Weare: So if I take TAO large than all of the the eigenvalues all scale in a very predictable way like either minus some constant times TAO So if I take towel very large all the eigenvalues except for the one at one.
89
00:14:09.020 --> 00:14:18.380
Jonathan Weare: All the eigenvalues inside the desk are going to shrink to zero okay that's just saying look if I imagine taking TAO infinity if this isn't organic.
90
00:14:20.150 --> 00:14:31.490
Jonathan Weare: or got extra casting process, then the distribution of X TAO is going to relax all the way to the invariant distribution mute Okay, so all the other eigenvectors will.
91
00:14:32.720 --> 00:14:33.740
Jonathan Weare: Will relax away.
92
00:14:36.050 --> 00:14:36.440
Jonathan Weare: OK.
93
00:14:39.650 --> 00:14:52.340
Jonathan Weare: So my goal, again, is to understand is to estimate numerically to build a scheme that will estimate these eigenvalues the ones near one and the accompanying eigenvectors.
94
00:14:54.530 --> 00:14:59.540
Jonathan Weare: And why do I want to do that, why do Why do people Why do people who introduced this method want to do that.
95
00:15:00.800 --> 00:15:19.640
Jonathan Weare: Well it's because there's a direct connection between the relaxation in time of these functions So what do I mean by that I mean the correlations how how fast these functions forget their initial values, so if Ada is one of these eigenvectors or let's say it's in the span of.
96
00:15:20.660 --> 00:15:41.360
Jonathan Weare: The first K, except for the the I can vector one a two one is just the constant so let's say it's in the span of data to data K, then it forgets that these eigenvalues tell you how fast how fast that that vector would D correlate okay so here, I have an upper bound.
97
00:15:42.410 --> 00:15:56.450
Jonathan Weare: Sorry, a lower bound if i'm in the span of the first K and Upper bound if i'm orthogonal to the span of the first K Okay, so if i'm in this if I in the span of K plus one through.
98
00:15:57.590 --> 00:15:59.720
Jonathan Weare: Through infinity then i'm going to.
99
00:16:00.920 --> 00:16:10.700
Jonathan Weare: d correlate at least this fast by the correlate I mean how fast This correlation goes to zero so so this these functions describe.
100
00:16:11.420 --> 00:16:21.230
Jonathan Weare: Their they're often called relaxation modes, so they describe how fast different features of the system will relax their equilibrium over time.
101
00:16:21.680 --> 00:16:29.930
Jonathan Weare: And usually you know you're interested in the ones that relax the slowest in a lot of applications you're interested in the ones that relax the slow so i've given you an example.
102
00:16:30.290 --> 00:16:41.210
Jonathan Weare: of you know what you might expect Ada to, for example, the seconds of these so that the first non trivial eigenvector this is, you know for for a model that i'm not going to talk about but.
103
00:16:41.870 --> 00:16:47.390
Jonathan Weare: This is the kind of thing that people would be looking at looking for if they had approximated Ada to correctly.
104
00:16:48.500 --> 00:16:57.680
Jonathan Weare: They might expect to see that it relaxes slowly in the sense that this this time series looks like it relaxes slow it's got some kind of by modality.
105
00:16:59.600 --> 00:17:00.020
Jonathan Weare: Okay.
106
00:17:01.490 --> 00:17:08.690
Jonathan Weare: So it doesn't look it looks like it takes a while to reach a steady state that's all that picture is meant to convey don't worry about anything more quantitative of that.
107
00:17:12.770 --> 00:17:22.490
Jonathan Weare: Okay, so that's the goal of VAC it's to to estimate these eigenvectors because they tell you something about the slowest modes in the system.
108
00:17:25.130 --> 00:17:42.740
Jonathan Weare: Let me tell you how it works it's actually a it's a very simple algorithm if you're familiar with finite element or in particular really rich approximation, then this is totally standard, the first part of the algorithm that is that.
109
00:17:44.090 --> 00:17:51.470
Jonathan Weare: So, first of all i'm going to need to select some initial condition from you remember me was the invariant distribution so.
110
00:17:51.740 --> 00:17:57.710
Jonathan Weare: The way that that's typically done is I imagine that i've really run the system on my computer for a very, very long time.
111
00:17:58.190 --> 00:18:11.060
Jonathan Weare: That would be that's not the only way to do it, but that would be the simplest way to do it, imagine that i've run all the way until I reached the organic distribution and then I pull samples from that trajectory and those samples would give me samples from you.
112
00:18:13.160 --> 00:18:20.360
Jonathan Weare: But they also, if I look from each of those points, if I look ahead towel units of time they give me samples of X town.
113
00:18:21.710 --> 00:18:22.160
Okay.
114
00:18:24.170 --> 00:18:32.600
Jonathan Weare: Now I need to choose a set of basis functions that's, of course, an important choice, but let me delay on that for a second I choose a set of basis functions.
115
00:18:33.860 --> 00:18:35.810
Jonathan Weare: And then I take.
116
00:18:37.580 --> 00:18:39.800
Jonathan Weare: I take the the Eigen problem.
117
00:18:40.850 --> 00:18:56.090
Jonathan Weare: I apply it to each basis function so we're looking for eigenvalues of tea I just expand the solution to the problem in in the basis, and then I write out Okay, what does that imply for the coefficients.
118
00:18:57.380 --> 00:19:04.790
Jonathan Weare: For the coefficients of the basis so that's that's a well established approximation technique it's called really rich approximation.
119
00:19:07.250 --> 00:19:14.000
Jonathan Weare: So, because you you build a really question and you, you minimize the really question in just the way that those eigenvalues do.
120
00:19:15.440 --> 00:19:25.850
Jonathan Weare: So anyways I I turn by doing a basis expansion in a standard kind of lurking way i've turned my infinite dimensional I can problem into a finite dimensional I can problem.
121
00:19:26.180 --> 00:19:40.190
Jonathan Weare: which I then solve by my it's actually generalized I can problem solve it by my favorite method that gives me I can values I can value estimates denoted by the hats and accompanying.
122
00:19:41.390 --> 00:19:51.230
Jonathan Weare: I can vector estimates which are just the linear combination of the basis functions, corresponding to the to the finite dimensional vectors that you found.
123
00:19:52.010 --> 00:20:01.190
Jonathan Weare: Okay, so that's that's the algorithm do it take your I can problem do a basis expansion plugin a basis expansion for the solution.
124
00:20:01.580 --> 00:20:17.300
Jonathan Weare: That gives you a finite dimensional I can problem which you then just solved by your favorite finite dimensional your favorite matrix eigenvalue solver and then you you from that you get the eigenvalue approximations and you get.
125
00:20:18.710 --> 00:20:30.410
Jonathan Weare: You get I come back to approximations now, the issue is the, the only so that is as like I said totally standard, the issue is that these inner products.
126
00:20:33.800 --> 00:20:41.870
Jonathan Weare: These are in a very high dimensional space so there's no way I can actually compute these inner products these intervals.
127
00:20:42.710 --> 00:21:01.100
Jonathan Weare: So that's where I use the fact that i've sampled my ex zeros and my house from a long trajectory that allows me to write the inner products which are integrals over view as expectations okay so i'm going to approximate both of these matrices.
128
00:21:02.570 --> 00:21:12.320
Jonathan Weare: c zero and C town i'm going to approximate by a sample average approximation, so I take all my samples of X zero I compute fee if ej on each of them.
129
00:21:12.710 --> 00:21:24.890
Jonathan Weare: VI times vj on each of them I sum them up and I divide by the number of samples, and the same thing here, except that I matching fee I at X zero with cj at X tower at that.
130
00:21:25.430 --> 00:21:35.900
Jonathan Weare: X zero point evolved for towel units of time Okay, so that gives me a Monte Carlo straightforward Monte Carlo approximation of both of these matrices.
131
00:21:37.010 --> 00:21:39.380
Jonathan Weare: Sorry, both of these matrix these these inner products.
132
00:21:40.670 --> 00:21:57.680
Jonathan Weare: And then I do everything I said before about concerning the really rich approximation Okay, so this really is it's a fairly simple algorithm it's just really ritz approximation or basis expansion, plus Monte Carlo for the high dimensional inner products.
133
00:22:06.380 --> 00:22:11.870
Jonathan Weare: So what i've left out is you know, this is a very high dimensional space.
134
00:22:14.570 --> 00:22:18.710
Jonathan Weare: In that trip cage example in the simulations i'll show you.
135
00:22:19.760 --> 00:22:33.680
Jonathan Weare: It has about 11,000 atoms so that means about the 33,000 dimensions and that's not counting momentum so it's.
136
00:22:34.730 --> 00:22:42.290
Jonathan Weare: So it's a very high dimensional space and i'm doing a basis expansion, so you know clearly i'm not doing FM finite element methods.
137
00:22:44.810 --> 00:23:03.980
Jonathan Weare: What has worked for people over the years is some kind of data informed basis, so the most popular Is this what I call mark off state modeling earlier, I take the data, so I have already generated this long trajectory I take that long trajectory and I do some clustering of it.
138
00:23:05.810 --> 00:23:10.670
Jonathan Weare: And then I defined basis functions using those clusters, to be.
139
00:23:12.170 --> 00:23:21.170
Jonathan Weare: To be so I can define basis functions just by partitioning space so meaning each basis function is one in one region of space and zero everywhere else.
140
00:23:21.740 --> 00:23:38.120
Jonathan Weare: And and those regions don't overlap, so that so that, for every point in space every point in space is contained in a single single set where a single basis function is one and the rest are all zero Okay, so I do that by.
141
00:23:39.980 --> 00:23:42.830
Jonathan Weare: By clustering usually some kind of cluster operation.
142
00:23:46.280 --> 00:23:59.600
Jonathan Weare: that's one that's one approach another approach is something called time timelag independent component analysis that's essentially just using the coordinate axes as a as a basis, so you can see that these are.
143
00:24:00.530 --> 00:24:13.550
Jonathan Weare: You know, especially the second one, these are not super flexible representations of the solution, in fact, you can use more flexible representations this scheme has been.
144
00:24:14.720 --> 00:24:20.180
Jonathan Weare: extended so that it can be used with neural network representations of the solution.
145
00:24:22.160 --> 00:24:23.720
Jonathan Weare: And we've done some of that stuff.
146
00:24:25.130 --> 00:24:28.130
Jonathan Weare: But from purpose of analysis i'm going to stick with the.
147
00:24:30.290 --> 00:24:39.830
Jonathan Weare: The basis expansion, I should say that you know it's it's very far from clear that the neural network representations are better for this task.
148
00:24:41.090 --> 00:24:50.240
Jonathan Weare: or for any of the tasks, let me talk about here, they just require so much data and the data here simulation data for complicated model is quite expensive.
149
00:24:52.910 --> 00:25:03.950
Jonathan Weare: So let me give you an example of somebody else somebody else's results using using this variation dial this back algorithm for trip gauge.
150
00:25:04.820 --> 00:25:18.500
Jonathan Weare: Just to show you the kind of pictures that people look at so they'll plot they'll this is these that they call them tix because they use the tikka method, which means they use the coordinates as basis functions coordinate functions as basis functions.
151
00:25:20.120 --> 00:25:35.120
Jonathan Weare: So they would compute using the algorithm as I described to you, they would compute the eigenvector approximations then they might plot, the first eigenvector versus the second eigenvector at each data point.
152
00:25:36.170 --> 00:25:39.620
Jonathan Weare: And so, so the color is the value of the.
153
00:25:40.940 --> 00:25:42.260
Jonathan Weare: value of the.
154
00:25:43.610 --> 00:25:44.360
Jonathan Weare: I don't know why.
155
00:25:47.990 --> 00:26:04.070
Jonathan Weare: I guess, in this case it's the color is actually a free energy, which is a marginal probability but anyways they'll look at at these plots and try and get some idea, so in this case, you know as a function of So if you look at take this ticket to.
156
00:26:05.120 --> 00:26:14.060
Jonathan Weare: it's separating out to two regions here they're going to use that to describe the transition between those two regions in space.
157
00:26:15.890 --> 00:26:19.700
Jonathan Weare: Okay, but that's our interest is more and trying to understand when is this.
158
00:26:21.110 --> 00:26:27.320
Jonathan Weare: One is this a good approach when should we trust the results from this method and when, should we be.
159
00:26:29.150 --> 00:26:45.440
Jonathan Weare: Perhaps when should we be leery so again we didn't introduce the method we are trying to analyze it, how do you analyze it well there's two major contributions to the air one is the approximation air so that just has to do with the basis expansion.
160
00:26:46.820 --> 00:26:47.810
Jonathan Weare: So suppose that.
161
00:26:49.040 --> 00:26:55.250
Jonathan Weare: Forget about Monte Carlo just think about really rich approximation so really, this is just a question about really rich approximation.
162
00:26:56.360 --> 00:27:03.920
Jonathan Weare: When can I expect that really rich approximation to to give me a good answer and by good answer What do I mean well.
163
00:27:04.400 --> 00:27:12.860
Jonathan Weare: Specifically, I mean when can I expect the span of my approximate Eigen functions to be close to the span of the true I get punches.
164
00:27:13.850 --> 00:27:34.940
Jonathan Weare: Okay approximate again meaning really rich approximation with no randomness so no Monte Carlo air from the inner products Okay, and then the other kind of error that you might be interested in is estimation air so that's saying well what if really risks is the truth.
165
00:27:36.380 --> 00:27:50.870
Jonathan Weare: But we have to approximate the inner products by Monte Carlo approximation, then how close is the span of the first K approximate eigenvectors to the span of the first a really rich approximation approximate eigenvectors.
166
00:27:51.830 --> 00:28:00.980
Jonathan Weare: OK, so those are the two two forms of error that that matter that that occur, and we want to understand how big are those.
167
00:28:03.080 --> 00:28:21.020
Jonathan Weare: So i've tried to emphasize what a standard technique, the deterministic scheme is determined deterministic version of this game, so the version of the scheme, without any sampling, so you know, not surprisingly, there are existing bounds whoops.
168
00:28:22.970 --> 00:28:32.990
Jonathan Weare: For the railing for railing ritz approximation unfortunately those bounds depend on the inverse spectral gaps, so the gap between two consecutive if I wanna if I want to.
169
00:28:33.500 --> 00:28:46.880
Jonathan Weare: Ask what's the air in the span of the first K, I can functions from rarely ritz approximation, but i'm going to find is an error that depends on the gap between the case eigenvalue and the K plus first eigenvalue roughly.
170
00:28:47.660 --> 00:28:57.620
Jonathan Weare: And that's not good, because, as I said earlier, as towel goes to zero both of these eigenvalues go to zero, they bear therefore get closer and closer to zero.
171
00:28:58.730 --> 00:29:03.020
Jonathan Weare: So my error is going to my error estimate is going to blow up.
172
00:29:06.200 --> 00:29:09.710
Jonathan Weare: So that's just what you're seeing here, this is the air bound.
173
00:29:11.180 --> 00:29:20.120
Jonathan Weare: On the left the really rich, this is what we would get if we just asked for went to the you know finite element literature and asked what does it predict about the.
174
00:29:20.900 --> 00:29:31.580
Jonathan Weare: The air in eigenvalues from really rich sorry and subspace or from really rich approximation you'd see that as the lag time that TAO parameter increases the bound blows up.
175
00:29:32.240 --> 00:29:43.100
Jonathan Weare: But you can work out this is for a particular example you can work out, you can find that in fact that the air doesn't blow up as lag time increases in fact it looks like it stabilizes.
176
00:29:43.820 --> 00:29:57.710
Jonathan Weare: So Okay, so we worked a bit harder and specifically my student rob webber worked a bit harder and he figured out i'm not going to go through this carefully, but he was able to come up with better bounce so.
177
00:29:59.300 --> 00:30:07.280
Jonathan Weare: To to use the same machinery, but but be a bit be more careful in in driving the bounds.
178
00:30:09.050 --> 00:30:23.090
Jonathan Weare: And he was able to show that you actually get a bound that depends on the the relative size of the spectral gap So here we still have the gap between K plus first between K and K plus first eigenvalues.
179
00:30:24.200 --> 00:30:31.760
Jonathan Weare: But it's multiplied on top by either the mind by the K plus first eigenvalue so as TAO goes to infinity this.
180
00:30:33.140 --> 00:30:34.280
Jonathan Weare: This stabilizes.
181
00:30:37.130 --> 00:30:53.240
Jonathan Weare: Okay, so So this was on the left the really rich found the standard really rich bound, which you see blowing up as the lag time increases on the right again is the truth, the true error approximation here we're just talking about approximation here right now.
182
00:30:54.590 --> 00:31:02.150
Jonathan Weare: And in the middle, you see the new error bounce doing about the right right behaving about the right way for longer lag times.
183
00:31:03.290 --> 00:31:09.530
Jonathan Weare: For shorter lag times use different estimates, so you can also do better for shorter lifetimes.
184
00:31:10.580 --> 00:31:14.330
Jonathan Weare: Okay, so so that was an analysis of an existing method.
185
00:31:17.600 --> 00:31:21.590
Jonathan Weare: That you know, there was a lot of uncertainty in the.
186
00:31:22.880 --> 00:31:27.650
Jonathan Weare: Statistical computational Cisco mechanics literature about how to choose TAO.
187
00:31:29.030 --> 00:31:45.320
Jonathan Weare: This doesn't This at least tells you that for the approximation area, you want to choose how large Okay, which is quite a different conclusion than you would draw from these really rich spells unfortunately a pretty straightforward as perturbation argument for the estimation air.
188
00:31:46.370 --> 00:31:54.260
Jonathan Weare: shows that for estimation error you do get this dependence on the the spectral gap okay so i'm not going to go through this.
189
00:31:54.740 --> 00:32:09.050
Jonathan Weare: Through this expression in detail, but you get what you would what you would maybe naively expect, which is that, as the gap between the K and K plus first eigenvalue to shrinks you expect your estimation air to grow.
190
00:32:10.940 --> 00:32:20.690
Jonathan Weare: So okay so that's true, and that means if you use the scheme as described then you're stuck trying to choose this lag time this key parameter.
191
00:32:21.470 --> 00:32:29.720
Jonathan Weare: Not too short, because the approximation error will be larger if it's short and not too long, because the estimation error will be larger if it's long.
192
00:32:30.440 --> 00:32:39.710
Jonathan Weare: It turns out i'll show you later that actually if you sample the data in a specific way in a smarter way, rather than just from a long trajectory, then you can.
193
00:32:41.600 --> 00:32:45.050
Jonathan Weare: kind of magically bypass this which hopefully i'll get to.
194
00:32:45.740 --> 00:32:47.780
Mihai Anitescu: In a second so Jonathan one question.
195
00:32:47.810 --> 00:32:55.700
Mihai Anitescu: Can you remind us, what is the astral body to them what is important is this the limit of what going to work.
196
00:32:56.450 --> 00:33:03.200
Jonathan Weare: Well, this is not Oh, the oh one, this is as the as the sampling error goes away.
197
00:33:06.320 --> 00:33:16.790
Jonathan Weare: So, as you, as this is this is specifically so with the hat, this is the really ritz approximated eigenvectors with Monte Carlo sampling.
198
00:33:18.200 --> 00:33:31.700
Jonathan Weare: and on over here the really ritz approximate eigenvectors without money Carlos sampling so we're not in this expression we're not we've separated out the comparison of really ritz on the one hand to the truth.
199
00:33:32.960 --> 00:33:40.460
Jonathan Weare: And then separately in this expression railey ritz to really ritz approximated with sampling.
200
00:33:41.960 --> 00:33:42.950
Jonathan Weare: Does that make sense me hi.
201
00:33:43.640 --> 00:33:48.800
Mihai Anitescu: A big kept so that means as your date molnar logo trajectories or what what.
202
00:33:49.160 --> 00:33:49.550
Mihai Anitescu: It is that.
203
00:33:49.730 --> 00:33:59.180
Jonathan Weare: Yet so so so in this in the way I described the algorithm if I take a longer and longer trajectory and select more and more X zero X TAO pairs.
204
00:33:59.870 --> 00:34:09.590
Jonathan Weare: I see this will go to zero Okay, but, but how large the air is before I get to that limit this is just the leading order term that i've pulled out.
205
00:34:09.650 --> 00:34:14.300
Jonathan Weare: You can see that that leading order term is going to depend on on how close those eigenvalues are.
206
00:34:15.860 --> 00:34:18.080
Jonathan Weare: So this this, this is, you know.
207
00:34:19.250 --> 00:34:29.810
Jonathan Weare: A version of the standard expression that you would get doing a perturbation analysis of the Eigen problem yeah standard perturbation analysis of have a symmetric Eigen problem.
208
00:34:34.820 --> 00:34:46.010
Jonathan Weare: The other one is a bit the other one was a bit different because we had to work a little bit harder but but i'm going to show you that that this is actually if you are willing to change the algorithm this is pessimistic.
209
00:34:49.160 --> 00:34:57.980
Jonathan Weare: OK, so now that was eigenvectors Now I want to just very quickly talk about a different problem in this case with this is with demetrius.
210
00:34:58.490 --> 00:35:15.530
Jonathan Weare: And this is a method that that we introduced for but it's very closely related to back the machinery of this algorithm is very closely, is very much like VAC but the task is a different task so rather than computing eigenvectors.
211
00:35:17.000 --> 00:35:18.380
Jonathan Weare: Of the transition operator.
212
00:35:19.610 --> 00:35:20.720
Jonathan Weare: I want to.
213
00:35:22.100 --> 00:35:32.420
Jonathan Weare: compute conditional expectations, given my current initial condition, so you might think of these as forecasts, given that i'm currently sitting at position X.
214
00:35:32.870 --> 00:35:46.460
Jonathan Weare: What is the expected cost okay at at time capital T so cost i've included two possible ways of measuring costs one just in terms of some function that the final time.
215
00:35:47.300 --> 00:35:52.730
Jonathan Weare: And another is what I would normally call what in control you'd call it running costs.
216
00:35:53.420 --> 00:36:02.960
Jonathan Weare: So this is an accumulated costs up until the final time you can choose these, however, however, you like, whatever is appropriate for your problem T here is a random time.
217
00:36:03.680 --> 00:36:15.050
Jonathan Weare: it's the time that I exit some domain, so you know basically i'm interested in the event that could be described like this i'm sitting at position X I have all the system.
218
00:36:16.190 --> 00:36:26.990
Jonathan Weare: Maybe, for a long time it finally exits the domain this domain is D it finally exits the domain here and I want to associate some costs with that.
219
00:36:27.530 --> 00:36:36.110
Jonathan Weare: Okay, so this could be, you know that i'm looking at the current state of the weather and i'm going to call time T, the first time that.
220
00:36:36.830 --> 00:36:46.160
Jonathan Weare: I see a you know, maybe i'm looking at this current state of the weather and there's a storm there but it's a it's not a it's a i'm not sure whether it's going to develop into something serious.
221
00:36:47.540 --> 00:36:54.110
Jonathan Weare: And i'm going to call time T, the first, the first time that I reached let's say like a category five hurricane or something.
222
00:36:56.690 --> 00:37:03.170
Jonathan Weare: Okay, so an example of this kind of quantity would be what this this whoops wrong.
223
00:37:05.000 --> 00:37:13.490
Jonathan Weare: This probability so suppose, starting from X i'm interested in the probability that I reach be before returning to a so.
224
00:37:14.390 --> 00:37:40.250
Jonathan Weare: This is actually a version of the conditional expectation above so here D is everything that's not A and B, so I start from X and I run and I want to know when I hit a or B do I hit be first so so this specifically a Union be.
225
00:37:41.330 --> 00:37:53.510
Jonathan Weare: compliment let's see to fit it in to fit it in this form, I would take g to be one if X is and B and zero.
226
00:37:55.550 --> 00:38:04.340
Jonathan Weare: Otherwise, and I would take age to be zero Okay, so if I made those choices, then this probability called the committee which is.
227
00:38:05.450 --> 00:38:12.410
Jonathan Weare: it's very important, I mean it's a natural quantity right, even if I was interested in forecasting some weather event it's a natural quantity.
228
00:38:13.850 --> 00:38:19.190
Jonathan Weare: But it's a very commonly it's an important quantity and computational siskel mechanics again.
229
00:38:20.570 --> 00:38:22.130
Jonathan Weare: called the computer or people.
230
00:38:23.690 --> 00:38:32.060
Jonathan Weare: Okay, so that's the kind of quantity, we want to compute now rather than eigenvectors we're going to do something very similar so step one is, first, I need to.
231
00:38:32.840 --> 00:38:43.760
Jonathan Weare: express that conditional expectation, as the solution to an infinite dimensional linear system, so these equations like this are called fineman cats formula.
232
00:38:45.470 --> 00:38:59.720
Jonathan Weare: The key here is that I want to avoid running so I mean, of course, I could I could try and estimate you just by running from starting from X running a zillion trajectories up until time T that would be kind of the standard forecasting approach some version of that.
233
00:39:01.010 --> 00:39:02.060
Jonathan Weare: I don't want to do that.
234
00:39:03.410 --> 00:39:06.770
Jonathan Weare: Because T may be a very long time away, for example.
235
00:39:08.000 --> 00:39:10.280
Jonathan Weare: or it may just be an inaccurate way to do things.
236
00:39:11.900 --> 00:39:14.750
Jonathan Weare: So, instead, I want to write an expression for you.
237
00:39:16.100 --> 00:39:18.260
Jonathan Weare: In terms of only short trajectories.
238
00:39:19.400 --> 00:39:29.390
Jonathan Weare: Okay, so so that's this expression, the only part that really is important to understand is that this transition operator, this is a transition operator, just like the tea, we saw earlier.
239
00:39:30.260 --> 00:39:50.660
Jonathan Weare: But it's it's for a stopped process, so this towel appears again this says that this is this says evolved the process forward, but only until time T or I until exit from D Okay, whichever comes first, so if TAO is short.
240
00:39:52.040 --> 00:39:59.900
Jonathan Weare: Most of the time you're going to stop this integration at time TAO so this does express you.
241
00:40:01.070 --> 00:40:07.550
Jonathan Weare: In it expresses you as a solution to an equation that only involves trajectories that are no longer than time town.
242
00:40:08.330 --> 00:40:16.850
Jonathan Weare: it's fully exact if I manage to solve this equation, I do indeed find the conditional expectation i'm interested in but i've never.
243
00:40:17.210 --> 00:40:22.940
Jonathan Weare: done anything with a long trajectory i've never integrated all the way up until time capital T all the way up until I exit the.
244
00:40:23.930 --> 00:40:43.490
Jonathan Weare: Okay what's the catch the catch is that, rather than just having an expression you equals the expectation of something I have a linear system, I have to somehow invert this operator Okay, but I can use the the back kind of idea do a, for example, a basis expansion.
245
00:40:44.930 --> 00:40:55.580
Jonathan Weare: Or again I can do, different things like use Colonel representations, I can use neural networks i'll describe since i've described back i'll describe the basis expansion approach because it's it's almost exactly the same as the.
246
00:40:57.230 --> 00:41:06.590
Jonathan Weare: The pseudo code, I gave you for VAC So the first thing to do is to get rid of the the boundary conditions, so this equation, if I go back.
247
00:41:07.640 --> 00:41:22.220
Jonathan Weare: it's it solves this linear infinite dimensional linear equation, but it has boundary conditions so on the boundary of of the set D this domain D you is equal to G So the first thing i'm going to do is get rid of the boundary conditions by get rid of, I mean.
248
00:41:23.870 --> 00:41:33.230
Jonathan Weare: replaced them by a problem for a function that has 00 on the boundary conditions so that means that I choose some guests function as a standard.
249
00:41:33.770 --> 00:41:46.190
Jonathan Weare: Very standard kind of trick I choose a guest function and I, I actually look for I try to solve for the difference between the true you and my guests function that difference function V is zero.
250
00:41:47.390 --> 00:41:47.750
Okay.
251
00:41:49.220 --> 00:41:50.900
Jonathan Weare: i'm going to.
252
00:41:52.250 --> 00:42:10.640
Jonathan Weare: start a bunch of simulations inside the domain, so not just that one initial condition, but maybe as many initial conditions and i'm going to evolve them for time towel so for some of them time towel might actually be enough to reach the boundary for others it won't be.
253
00:42:11.660 --> 00:42:14.480
Jonathan Weare: Okay, I generate all these pairs these are pairs.
254
00:42:17.030 --> 00:42:24.650
Jonathan Weare: This would be like X zero, and this would be like X TAO okay I generated many, many pairs like that.
255
00:42:25.970 --> 00:42:42.230
Jonathan Weare: X zero is drawn from some distribution new new in the first part of the talk was the invariant distribution and this part of the talk it's not invariant distribution it's just some distribution from what you're able to see these initial conditions okay i'm going to choose a basis.
256
00:42:44.270 --> 00:42:53.060
Jonathan Weare: A set of basis functions that's very similar to to what we did in back the difference being that now, I need to make sure that they are zero on the boundary.
257
00:42:54.170 --> 00:43:09.830
Jonathan Weare: Okay, and then the rest is is pretty much the same as back, I once i've done that basis expansion i've reduced my infinite dimensional linear solve to a finite dimensional linear solve meeting a matrix and version, the matrix elements.
258
00:43:12.440 --> 00:43:15.830
Jonathan Weare: Are these inner products that are not tractable.
259
00:43:16.910 --> 00:43:21.260
Jonathan Weare: But I can approximate them using money currently using all these samples that i've generated.
260
00:43:22.220 --> 00:43:41.780
Jonathan Weare: Okay, and then i'll solve the finite dimensional linear system, and that gives me by gives me the coefficients in the basis expansion for my approximation to you okay so that's very much like like the vax game that I described as I said, you don't need to use the basis expansion, but.
261
00:43:43.430 --> 00:43:50.030
Jonathan Weare: You can Okay, so I see that i'm i'm pretty much out of time, let me just show you the pictures.
262
00:43:51.410 --> 00:43:57.170
Jonathan Weare: me show you some pictures of the results, so this is for trip cage this is for the combat computer function that I mentioned.
263
00:43:59.120 --> 00:44:12.500
Jonathan Weare: So here a represents the this is in two variables to what are called collective variables could because I have to show you a plot I need to show you a plot in two variables, although Q is a function of all the variables right so.
264
00:44:13.220 --> 00:44:23.810
Jonathan Weare: So there's many pictures of Q, I can show you but i'm going to show you, one that is pretty informative and these two, these are two what are called collective variables, so in this two dimensional.
265
00:44:24.620 --> 00:44:40.400
Jonathan Weare: plot the unfolded state corresponds to being over here, where a is and the folded state corresponds to being over here where be is okay, and I can tell you more about the the basis expansion, as I mentioned, the choice of basis is important.
266
00:44:42.590 --> 00:44:44.390
Jonathan Weare: If you just you know you.
267
00:44:45.590 --> 00:44:54.620
Jonathan Weare: You couldn't possibly just grid and democratize in any kind of standard way, so the basis is always going to have some information from the data set built into it.
268
00:44:57.320 --> 00:45:02.060
Jonathan Weare: Okay, so this is my This is my computer it's telling me, you know what you'd expect if i'm near a.
269
00:45:03.230 --> 00:45:12.680
Jonathan Weare: it's going to be zero i'm going to be more likely to return to a then to go to be, and if i'm near be it's going to be one i'm more likely to go to be sent to return to a.
270
00:45:12.950 --> 00:45:25.310
Jonathan Weare: what's more interesting is this region were at zero, which is called the ISO commander surface, which tells you, you know where Am I not quite sure whether i'm going to a are going to be so those would be associated with what you might call transition states.
271
00:45:26.930 --> 00:45:36.770
Jonathan Weare: Just to show you I didn't describe I told you that there are other ways I could solve that equation, one that I, that I find very appealing, is is a gaussian process regression.
272
00:45:38.240 --> 00:45:44.510
Jonathan Weare: algorithm which I won't describe but i'll show you that that at least this particular plot of the computer is almost identical.
273
00:45:45.440 --> 00:45:55.040
Jonathan Weare: And it works quite well works with with less data again we've done this with neural networks, it can work but it requires a huge amount of data and the data all require.
274
00:45:56.150 --> 00:45:59.780
Jonathan Weare: simulation of a expensive enough system that that's not such a good.
275
00:46:00.980 --> 00:46:09.800
Jonathan Weare: approach Okay, how you place the date of the rest of the talk was really about how you place the data in this algorithm so I told you, you know what I drew this picture.
276
00:46:10.460 --> 00:46:20.750
Jonathan Weare: And I want to stop staring I don't want to I will stop soon, so how you place these initial conditions for your short simulations that ends up being really important.
277
00:46:23.060 --> 00:46:33.800
Jonathan Weare: And I the way we've done it here for this trip cage simulation is, we chose two variables that you know, using expert knowledge basically choose two variables that we felt were important and we.
278
00:46:35.000 --> 00:46:37.730
Jonathan Weare: And we tried to place our initial conditions.
279
00:46:38.840 --> 00:46:48.290
Jonathan Weare: sort of uniformly in those two variables so very far from the invariant distribution intentionally not distributing them according to the variant distribution.
280
00:46:50.360 --> 00:46:57.140
Jonathan Weare: And you know we managed to get pretty good accuracy from this algorithm with you know.
281
00:46:57.620 --> 00:47:04.280
Jonathan Weare: relatively small amount of actual simulation time so compared to, for example, I showed you that Anton machine in the beginning.
282
00:47:04.670 --> 00:47:16.640
Jonathan Weare: Compared to simulations of trip cage folding unfolding done on Amazon, with a single long trajectory we get very accurate results in much, much less simulation time, so the rest of the talk was.
283
00:47:17.810 --> 00:47:22.400
Jonathan Weare: Which i'll skip and if there's questions about it, I can talk about it more but it points out that.
284
00:47:23.240 --> 00:47:28.430
Jonathan Weare: That this issue of how you place the points, so I didn't I described back using only a long trajectory.
285
00:47:28.790 --> 00:47:35.360
Jonathan Weare: And I described the second scheme, you know to sampling points from a long trajectory so that means that your your initial points are all sample from a variant distribution.
286
00:47:36.020 --> 00:47:43.490
Jonathan Weare: That turns out to be a very bad idea, almost the worst thing you could do, even though it's very natural it's really a terrible thing to do.
287
00:47:44.930 --> 00:47:53.360
Jonathan Weare: And it's essential that you don't do that and, in fact, you know if we look at 20 examples we can show that you know, in very dramatic.
288
00:47:54.680 --> 00:48:00.980
Jonathan Weare: very dramatic ways placing the sampling the right points the right positions for the short trajectories.
289
00:48:02.660 --> 00:48:08.090
Jonathan Weare: can make a huge difference, so let me, let me stop there i'm sorry I went a bit over.
290
00:48:10.970 --> 00:48:15.530
Mihai Anitescu: Thank you very much, Jonathan so let me see i'm using my iPod myself.
291
00:48:17.930 --> 00:48:20.840
Mihai Anitescu: So right any questions.
292
00:48:23.930 --> 00:48:27.860
Mihai Anitescu: there's a hand up for Mike Mike a pizza ask the question.
293
00:48:30.800 --> 00:48:40.250
Maike Sonnewald: hi yeah great talk, I was wondering if you could maybe elaborate with with an example in terms of your data informed basis.
294
00:48:41.360 --> 00:48:42.800
Jonathan Weare: So yeah sure, so I think.
295
00:48:42.830 --> 00:48:43.970
Jonathan Weare: I think that's I.
296
00:48:44.360 --> 00:48:45.290
Jonathan Weare: Can you hear me right.
297
00:48:46.340 --> 00:49:02.900
Maike Sonnewald: yeah yes, I was also just gonna say I was wondering what what well sample means for for that case you know, he said he used domain knowledge, but of course that's can be biased as well, so I was just wondering if she could elaborate a little bit it sounds it sounds really interesting.
298
00:49:03.440 --> 00:49:05.300
Jonathan Weare: yeah so I mean there there.
299
00:49:06.620 --> 00:49:08.390
Jonathan Weare: There are a lot of places where.
300
00:49:09.680 --> 00:49:18.800
Jonathan Weare: You know so me I again this morning he asked this question about discovering new mechanisms that's really hard to do.
301
00:49:19.310 --> 00:49:29.900
Jonathan Weare: Because you know you can write down that kind of right quantities to compute, but when you get down to making choices like what basis should I use or where, how should I do this sampling.
302
00:49:30.500 --> 00:49:37.100
Jonathan Weare: But that's really hard to do without making some without biasing yourself towards one mechanism or another and that's really where.
303
00:49:37.100 --> 00:49:39.770
Jonathan Weare: It sneaks in and it's it's it's very hard to.
304
00:49:40.790 --> 00:49:49.070
Jonathan Weare: To get around yeah so in the trip cage case, in particular the the basis of the first step.
305
00:49:49.880 --> 00:49:57.500
Jonathan Weare: it's a chain of beads right a string of beads imagine each amino acid is a Bead there's a central Adam in each Bead.
306
00:49:58.400 --> 00:50:10.940
Jonathan Weare: We assemble distances between central atoms in the beats in in in pairs of beats not every pair but in you know many pairs of beads pete pairs of beads with a certain separation.
307
00:50:11.150 --> 00:50:12.200
Maike Sonnewald: all the way down the.
308
00:50:12.530 --> 00:50:16.340
Jonathan Weare: chain, so that leads to 153 distances.
309
00:50:17.630 --> 00:50:20.810
Jonathan Weare: We then take those 153 distances.
310
00:50:21.950 --> 00:50:22.730
Jonathan Weare: And we.
311
00:50:24.230 --> 00:50:35.360
Jonathan Weare: let's see so we multiply we think of them as like a new set of coordinates we we multiply them by a distance to the boundary so that they go to zero at the boundary.
312
00:50:35.810 --> 00:50:45.080
Jonathan Weare: Because I need the number the basis has to be zero at the boundary of the set which in this case i'm computing a computer, so the boundary of the set is the boundary of this a set and be set.
313
00:50:45.290 --> 00:50:46.580
Jonathan Weare: Where a represents.
314
00:50:46.670 --> 00:50:49.430
Jonathan Weare: unfolded and be represents folded.
315
00:50:50.660 --> 00:50:51.950
Jonathan Weare: And then we do some.
316
00:50:53.270 --> 00:51:00.710
Jonathan Weare: Processing by orthogonal icing on the on the data points, but you can think of something like the coordinate functions.
317
00:51:01.880 --> 00:51:05.240
Jonathan Weare: In those in these in those internal coordinates hmm.
318
00:51:05.960 --> 00:51:12.350
Maike Sonnewald: I was actually also interested in your use of clustering he said clustering what What exactly do you mean there.
319
00:51:13.100 --> 00:51:31.730
Jonathan Weare: So, so people will do they'll do you know one simulation one long simulation as I described or often they'll do many simulations starting from different points so so that means that that means that the resulting data is not sampled from the variant distribution right if I.
320
00:51:32.210 --> 00:51:43.100
Jonathan Weare: just put down some points and then run for for a while once I run forward, you know it for infinity the the mixture of the points that I get from the different trajectories they're not going to.
321
00:51:43.100 --> 00:51:44.750
Maike Sonnewald: Be from.
322
00:51:45.110 --> 00:51:54.560
Jonathan Weare: The invariant distribution so they'll take all that data they'll just ignore that for a moment, take that data and and cluster it using like K means or something so.
323
00:51:55.610 --> 00:52:02.360
Jonathan Weare: They might in fact there's hierarchical procedures where you might do that tikka algorithm first.
324
00:52:02.900 --> 00:52:14.960
Jonathan Weare: And then you use the tikka court, so that was the method where you just use the the coordinates as the basis, once you might use those coordinates to collect to do that to be the space in which you cluster.
325
00:52:15.560 --> 00:52:16.190
Jonathan Weare: From second.
326
00:52:16.670 --> 00:52:20.000
Jonathan Weare: Once you've cluster, then you use an MSM basis so there's all kinds of.
327
00:52:21.590 --> 00:52:25.100
Jonathan Weare: sequences that people have kind of worked out over time that work.
328
00:52:26.930 --> 00:52:29.750
Jonathan Weare: Relatively you know that work relatively well.
329
00:52:30.980 --> 00:52:31.970
Maike Sonnewald: yeah that's really interesting.
330
00:52:32.510 --> 00:52:43.220
Jonathan Weare: So we actually we've done this, the forecasting part we've done on a model of SS w but it's only 75 dimensions, so in that problem we use.
331
00:52:44.330 --> 00:52:54.170
Jonathan Weare: We use every yeah I mean it's enough it's enough to be you know not something you could grid up, but obviously it's, a far cry from a climate model or something but.
332
00:52:55.880 --> 00:52:59.240
Jonathan Weare: You know it's it's We use every coordinate in.
333
00:53:00.740 --> 00:53:06.410
Jonathan Weare: In that method in the clustering and that method we just clustered in the 75 dimensional space which isn't impossible right.
334
00:53:07.880 --> 00:53:22.310
Jonathan Weare: Thinking about scaling up to a gcm we would need to do things like think about course representations of the state, you know, maybe I don't know whether that involves local averaging that's something we're just starting to think about now.
335
00:53:22.370 --> 00:53:31.430
Jonathan Weare: yeah but you you do I think there is a step, where you do need to kind of think about the problem, reduce the dimension somewhat.
336
00:53:31.940 --> 00:53:34.910
Jonathan Weare: Before you can start thinking about basis expansion.
337
00:53:35.780 --> 00:53:43.310
Maike Sonnewald: yeah no definitely I mean you mentioned the hurricane example and there again, I mean you have to reduce the problem somehow to.
338
00:53:43.790 --> 00:53:44.390
Jonathan Weare: To get I mean.
339
00:53:45.680 --> 00:54:01.250
Jonathan Weare: The the really important thing is that there's a huge difference between, and this is kind of the key I think different direction that you know key deviation let's say from from so so if you're trying to approximate if you're trying to build a coarse grained model.
340
00:54:02.630 --> 00:54:09.260
Jonathan Weare: Closed dynamical system for some subset of the variables they represents those variables accurately.
341
00:54:10.790 --> 00:54:13.010
Maike Sonnewald: that's not interesting moments yeah.
342
00:54:13.520 --> 00:54:16.610
Jonathan Weare: I mean if that's what you know that's the goal it.
343
00:54:16.730 --> 00:54:18.260
Jonathan Weare: may or may not work, but that's the goal.
344
00:54:20.840 --> 00:54:21.140
Jonathan Weare: If.
345
00:54:22.730 --> 00:54:30.860
Jonathan Weare: that's that's a that's a much more stringent requirements that i'm asking for here right because i'm i'm asking that my specific sorry.
346
00:54:31.940 --> 00:54:37.130
Jonathan Weare: That this specific statistic that i'm trying to compute Oh, maybe I haven't here the, for example.
347
00:54:38.840 --> 00:54:39.770
Jonathan Weare: This computer problem.
348
00:54:40.760 --> 00:54:41.090
Maike Sonnewald: mm hmm.
349
00:54:41.390 --> 00:54:47.960
Jonathan Weare: i'm asking that it can be represented relatively accurately as a function of those reduced coordinates.
350
00:54:49.220 --> 00:54:57.890
Jonathan Weare: i'm not trying to to build a whole dynamic whole close dynamical system for those reduced coordinates i'm just asking that this quantity.
351
00:54:59.150 --> 00:55:03.350
Jonathan Weare: The mostly to a good approximation a function of those coordinates.
352
00:55:03.920 --> 00:55:04.340
Maike Sonnewald: mm hmm.
353
00:55:04.550 --> 00:55:07.250
Jonathan Weare: I think that's actually a much less stringent.
354
00:55:09.680 --> 00:55:13.490
Maike Sonnewald: ya know for sure, I mean the problem you brought up gc is.
355
00:55:14.690 --> 00:55:26.930
Maike Sonnewald: Like that the system is also just terribly nonlinear so making making you know, assuming that you can get somewhere with K means, for example, is is a very strong assumption yeah.
356
00:55:27.440 --> 00:55:33.170
Jonathan Weare: But that's also true of these that's also true of the biomolecular system, so you know.
357
00:55:33.260 --> 00:55:34.520
Maike Sonnewald: I know much less about those.
358
00:55:34.910 --> 00:55:38.720
Jonathan Weare: yeah no there there they're very unfriendly to.
359
00:55:40.370 --> 00:55:43.760
Jonathan Weare: You know I think it's probably in some ways, better in other ways, worse, but.
360
00:55:47.840 --> 00:55:58.070
Jonathan Weare: So, again, I mean you're not trying to represent the full dynamics of the system, or even the dynamics of a subset of a subset of the variables you're just trying to represent this function.
361
00:55:58.220 --> 00:56:00.620
Jonathan Weare: which might be a much more slowly varying function, then.
362
00:56:01.100 --> 00:56:02.360
Jonathan Weare: you're even though that dynamics is.
363
00:56:02.360 --> 00:56:03.110
Jonathan Weare: So non linear.
364
00:56:03.440 --> 00:56:07.820
Jonathan Weare: This particular function might not be such rapidly very function.
365
00:56:08.390 --> 00:56:14.540
Maike Sonnewald: yeah and I guess that would be the hope, but I also recognize that taking up a lot of a lot of time here, so I guess I should.
366
00:56:16.010 --> 00:56:19.610
Maike Sonnewald: Let other people ask questions, but thank you very good talk.
367
00:56:27.260 --> 00:56:34.790
Mihai Anitescu: Sure, thank you Mike so we have now question for my believe Stephen Stephen Please go ahead.
368
00:56:35.480 --> 00:56:43.910
Stephen Eubank: yeah i'm i'm sorry if this is totally off the wall, because i'm not very familiar with either the gcm or the bio folding domain, but.
369
00:56:44.480 --> 00:56:54.590
Stephen Eubank: But I think it follows up on what you were just talking about is that this problem reminds me a lot of the problem of determining base and boundaries in a.
370
00:56:57.080 --> 00:57:07.010
Stephen Eubank: an ethical system that's that's drawn to different attractors so the problem there is to figure out which starting points need the which attractors.
371
00:57:07.760 --> 00:57:23.510
Stephen Eubank: And that problem does not mean in very small dimensions in London nonlinear systems, you can get fractal basic boundaries so i'm wondering if that's an issue here, or how the assumptions of smoothness might come into play.
372
00:57:24.620 --> 00:57:26.660
Jonathan Weare: yeah I don't think that.
373
00:57:31.430 --> 00:57:49.790
Jonathan Weare: So that strikes me as a as a harder problem than computing this this probability, I mean you're saying, for example, if you really wanted to nail down the committee one half surface the the let's see, let me go back.
374
00:57:52.010 --> 00:57:54.020
Jonathan Weare: you're saying, for example in this picture.
375
00:57:56.090 --> 00:58:00.320
Jonathan Weare: The problem of actually identifying where this.
376
00:58:01.850 --> 00:58:08.240
Jonathan Weare: This where the value is one half that this surface could be fractal is that.
377
00:58:08.300 --> 00:58:09.770
Stephen Eubank: yeah I think that's what.
378
00:58:10.130 --> 00:58:23.090
Jonathan Weare: yeah I mean so there's some amount of of regular rising effect that you know, having a dynamical system with noise in parts which might actually kill a fractal like that real quick I don't know.
379
00:58:24.680 --> 00:58:34.910
Jonathan Weare: it's also a question of you know what are you hoping for, you know how precisely do you want to characterize this committee one half surface, I mean, I think.
380
00:58:36.080 --> 00:58:39.680
Jonathan Weare: In the in this molecular dynamics domain.
381
00:58:41.450 --> 00:58:50.390
Jonathan Weare: You know, being able to look at a picture like this is is thought of as quite quite good progress it's not you know not maybe.
382
00:58:52.310 --> 00:58:56.780
Jonathan Weare: Not as quantitative as precisely identifying the committed one half surface but.
383
00:58:59.090 --> 00:59:00.680
Jonathan Weare: yeah so so I guess.
384
00:59:04.730 --> 00:59:05.540
Jonathan Weare: yeah I don't know.
385
00:59:06.830 --> 00:59:08.960
Jonathan Weare: To me they're not quite the same problem but but.
386
00:59:10.850 --> 00:59:15.920
Jonathan Weare: But it may, it may also depend on on what the on the precise goal.
387
00:59:17.000 --> 00:59:28.430
Jonathan Weare: I mean, in this, you know, on this particular trip cage example we validated it as much as we can, but here we're in 33,000 dimensions so we're certainly not talking about a toy.
388
00:59:29.450 --> 00:59:31.820
Jonathan Weare: You know, three dimensional system or something.
389
00:59:33.290 --> 00:59:53.690
Jonathan Weare: So you can try and validate it with long simulations but that's probably not even if it's probably not as accurate as the you know if you had if I had to choose one I would choose dga but it's it's difficult to really be 100% certain in a nasty chaotic 33,000 dimensional system, obviously.
390
01:00:10.520 --> 01:00:11.150
Mihai Anitescu: All right.
391
01:00:13.400 --> 01:00:14.420
Mihai Anitescu: Other questions.
392
01:00:17.300 --> 01:00:18.620
Mihai Anitescu: I don't see anybody else.
393
01:00:26.180 --> 01:00:32.300
Mihai Anitescu: So I don't see any other hands raised and the judge last goal if people want to like like ask a question, or something.
394
01:00:34.640 --> 01:00:40.130
Mihai Anitescu: All right, well, I mean I think my my Jonathan for a very stimulating talk there's a personal request you to please.
395
01:00:42.650 --> 01:00:46.100
Mihai Anitescu: request, this is a, this is actually very informative I thought and.