WEBVTT

00:00.000 --> 00:18.640
All right. Sounds good. Okay. Good morning, everyone. On this lovely Sunday. My name is Sam

00:18.640 --> 00:23.040
Cheng. I'm the global lead scientist for Impact and Evidence at the World Wildlife Fund.

00:23.040 --> 00:28.080
I'm joined by Larry Kilroy, my colleague at Data Kind. And we're really excited to share

00:28.160 --> 00:35.840
some of our reflections on developing a free, open, evidence synthesis tool for the past decade or so.

00:37.520 --> 00:41.920
So what we're going to do today is just talk a little bit about where did this come from? What

00:41.920 --> 00:47.040
was the challenge this was trying to respond to? Our experience is developing it, maintaining it,

00:47.680 --> 00:52.080
Larry will run us through the technical development of this tool and then talk a bit about what's

00:52.080 --> 01:03.600
coming next. Okay. All right. So we all know the world is changing pretty fast. And at one point,

01:03.600 --> 01:10.800
a reporter in the Washington Post said, what if the solution to all of our world's greatest challenges

01:10.800 --> 01:17.440
is already been solved? It's just hiding in a PDF that nobody is ever going to read. And they

01:17.520 --> 01:22.560
published this article because there was a report from the World Bank over ten years ago

01:22.560 --> 01:26.480
that out of the tens of thousands of reports, they published every year about their projects,

01:26.480 --> 01:31.840
about policy analyses, only a third had ever been downloaded, which does not mean they've been read.

01:33.600 --> 01:39.600
And so if you take that alongside the fact that research output is rising exponentially. We're

01:39.600 --> 01:44.960
getting new papers published every single year. It's on tracked double every nine to ten years.

01:45.760 --> 01:52.480
So on one hand, right, this is really, really exciting. There's a lot of information and

01:52.480 --> 01:58.880
evidence that's out there that we can use as practitioners and policy makers to inform our work.

01:58.880 --> 02:04.320
But we need that information in a timely way, in a digestible way, so that's usable,

02:04.320 --> 02:08.160
so we can actually feed it into the decision-making processes at the right time.

02:09.040 --> 02:16.400
And this is where approaches called evidence that this is coming. So this is a really broad

02:16.400 --> 02:22.160
set of approaches that exist out there. They help us capture review and synthesize existing knowledge

02:22.160 --> 02:27.520
for specific topics and questions. And so this really came from the medical field. So anytime

02:27.520 --> 02:32.960
you go to the doctor, they might make a recommendation. That's informed by medical guidelines.

02:33.040 --> 02:38.000
Those medical guidelines are updated based on publish literature on a regular basis.

02:38.720 --> 02:43.040
And what's really valuable about these approaches is that they're transparent.

02:43.040 --> 02:47.600
They aim to minimize any potential bias that might be in those days of the fact,

02:47.600 --> 02:52.320
the types of insights you get out of them. And they're also reproducible, meaning that you can

02:52.320 --> 02:57.200
repeat that analysis at any time so you can update the evidence base to keep up with all that

02:57.200 --> 03:04.640
stuff that's getting published. So at WF, we are a science-based organization. We are trying

03:04.640 --> 03:12.160
to develop solutions to answer key conservation challenges so that we can halt the degradation

03:12.160 --> 03:19.200
of the environment, conserve biodiversity, address all those drivers that impact nature,

03:19.200 --> 03:22.960
including climate change and human development, and ensure the well-being of people.

03:23.600 --> 03:29.840
But to do that, we need timely access to reliable and relevant information at the time

03:29.840 --> 03:34.240
skills that we are trying to make decisions that are tailored for the places where we're working.

03:37.040 --> 03:43.920
So in response to this, I feel like I'm aging myself here. Like 16 years ago or more,

03:44.800 --> 03:49.440
my colleagues and I across a number of different conservation organizations and development

03:49.440 --> 03:54.480
organizations said, okay, let's take advantage of the decades of research on conservation.

03:54.480 --> 03:59.040
So we can understand what do we know about the impacts of nature conservation on people?

03:59.760 --> 04:04.800
We thought this was going to be a weekend project. So there's that.

04:06.160 --> 04:12.720
And we ended up looking at 35,000 published citations. Taking all that,

04:12.720 --> 04:20.000
winning it down to about 3,000 relevant articles and getting to about 1,000 included in that map.

04:20.000 --> 04:24.400
And there were a lot of really exciting insights there, which helped us understand where we had

04:24.400 --> 04:31.360
known and known unknowns and gaps we needed to address. But it took us three years. So by the time

04:31.360 --> 04:36.480
we did that, our knowledge was three years out of date. And to do it again, it would take us another

04:36.480 --> 04:41.520
three years. And so it very quickly became a very daunting task of figuring out how we could

04:41.520 --> 04:44.960
actually use this on a regular basis to inform conservation work.

04:46.720 --> 04:51.440
So that leads me to the second part of the talk. To talk about the challenges of synthesis,

04:51.440 --> 04:58.560
to really make them scalable for decision making on the whole. The first piece is obviously the

04:58.560 --> 05:05.120
efficiency. We are taking a lot of time. We have to sort through a lot of stuff to get to that

05:05.120 --> 05:13.600
proverbial evidence needle in the information haystack. And so to do this process manually

05:13.600 --> 05:22.480
is quite literally impossible. Which leads me to this point that not only are they inefficient,

05:22.480 --> 05:29.520
they're also sometimes insufficient to look for the right information. We look to cross the environmental

05:29.520 --> 05:36.160
sector. We found that for similar types of studies, authors are looking at around 14,000 citations

05:36.160 --> 05:41.760
to get to really only 2.5% of those. They were actually relevant to what they're looking for.

05:41.760 --> 05:46.640
So obviously there are a lot of reasons why you might not include things. But ultimately what the

05:46.640 --> 05:52.320
verdict is here is people are looking through a lot of potentially irrelevant stuff to get to what they

05:52.400 --> 05:59.680
need. Which leads to the second challenge. It's really costly. So all that time it takes to search

05:59.680 --> 06:05.920
through all those citations costs money. This is a fairly old study, but I think still quite

06:05.920 --> 06:13.840
relevant, not accounting for inflation. But on average 30,000 to 300,000 US dollars to do one

06:13.840 --> 06:21.440
synthesis, not updated. And it can take anywhere from six months to a few years. And so that's

06:21.520 --> 06:25.280
not really sufficient if we want to generate evidence at the timeline that we actually need to

06:25.280 --> 06:30.320
make these decisions that. You can imagine somebody in a government agency is like we've got

06:30.320 --> 06:34.960
a past as policy, we've got a format in the next couple of months. And we're like ask us in three

06:34.960 --> 06:42.960
years and we'll help you. And so any sort of approach we take to making this process faster,

06:43.600 --> 06:47.520
we also need to take account the third challenge. Which is the value of these evidence and

06:47.520 --> 06:53.840
this is approaches is that they're trustworthy, they're rigorous. And so whatever the outputs

06:53.840 --> 06:59.920
of any sped-up process is, we need to make sure that they still maintain this component,

06:59.920 --> 07:04.880
because the value of these approaches is that you know the evidence hasn't been cherrypicked

07:04.880 --> 07:13.520
to match a particular decision or finding or position. So it's within this context that

07:13.520 --> 07:21.360
calendar was founded and launched in 2017. So what is calendar? So with the science for nature

07:21.360 --> 07:27.600
and people partnership that was led at the time by conservation and international, we decided that

07:27.600 --> 07:33.520
evidence synthesis was something important that we wanted to pursue and support across the conservation

07:33.520 --> 07:39.600
sector. But we needed to find a way to do it faster, more efficiently and dynamically. And so

07:39.600 --> 07:45.680
we partnered with data kind to develop an open access, open source computer assisted software

07:45.680 --> 07:50.000
platform for evidence and physicists called Column, or because you're sifting stuff through

07:50.000 --> 07:58.240
to find what you need. And so the way the calendar worked was really to have an easy to use

07:58.240 --> 08:03.200
front-end platform where users could input what they were looking at to help them sort through

08:03.200 --> 08:10.640
faster and intelligently labeled that information. So when we developed Column,

08:10.640 --> 08:15.440
we had some key features in mind. The first is that we had to maintain a human in the loop.

08:15.440 --> 08:20.480
It couldn't be fully automated. Again, the value of these approaches is that they're transparent

08:20.480 --> 08:24.320
and we wanted to maintain that trust. So one of the features that's built into Column

08:24.320 --> 08:28.640
is an active learning approach where users are maintained through the whole process,

08:28.800 --> 08:35.840
maintaining that oversight. And the second is we wanted to make sure that it was open.

08:35.840 --> 08:38.720
So at the time that we developed Column there over 10 years ago,

08:39.840 --> 08:44.560
there were lots of different approaches trying to apply machine learning to evidence synthesis

08:44.560 --> 08:48.320
in different softwares. But we didn't know what people were doing. They were opaque,

08:48.320 --> 08:52.640
no one was really talking to each other. And so every time someone tried to start something new,

08:52.640 --> 08:56.320
they were basically reinventing the wheel. And this took a lot of time and resources for our

08:56.480 --> 09:01.200
development impact and dive and figure out what they were going to do. But then the second part

09:01.200 --> 09:05.920
of it was, was seeing that there's a lot of opportunity to apply these various different AI approaches

09:05.920 --> 09:11.760
to synthesis, but we needed to foster some kind of community of practice to learn from each other.

09:11.760 --> 09:16.000
And so that was the rationale behind pursuing an open source approach.

09:17.120 --> 09:23.120
And the third is Column's open access. It's free to use. And this is really,

09:24.000 --> 09:29.600
I think predicated on the challenge that if we want to be able to use evidence, we should be

09:29.600 --> 09:34.640
able to use evidence everywhere. Everyone should be able to use it. Some of the biggest challenges in

09:34.640 --> 09:39.280
the world, particularly for conservation, are in some of the countries with a few of the resources.

09:39.280 --> 09:44.880
And so researchers and low middle income countries don't always have access to really expensive

09:44.880 --> 09:50.720
research platforms and tools. Moreover, if we want to update them, they again cost money. And so this

09:50.720 --> 09:57.680
is something that was intended to help accelerate the field forward. Okay, so in the past 10 years,

09:59.040 --> 10:04.240
we've seen the user community for calendar really grow organically. We have over 6,000 active

10:04.240 --> 10:10.720
users now, nearly 4,000 paths and ongoing reviews that are being conducted. And it's really been used

10:10.720 --> 10:15.840
across a lot of different fields. We've built this for conservation, but turns out it was applicable

10:15.840 --> 10:21.360
and answered a need across health, you know, education and social development, environmental management,

10:21.920 --> 10:27.520
you name it. And so what some of those users have used Column for are folks like the ICO

10:27.520 --> 10:31.840
Alliance who's been using a to extract information about certification impacts, right? Like four

10:31.840 --> 10:37.600
certification, product certification, the American College Physicians has been looking at the

10:37.600 --> 10:41.840
machine learning algorithms, but they're behind Column to help them improve their synthesis approaches

10:41.920 --> 10:48.160
and medicine. And then for us at WF, we are working to harness some of the work in Column

10:48.160 --> 10:51.440
or to help us figure out how we maintain living evidence and conservation.

10:55.600 --> 10:59.280
But the other thing that's been really exciting is that this community of users has

10:59.840 --> 11:05.040
helped grow some of the features, right? We built Column as sort of an initial tool.

11:05.840 --> 11:09.680
People then came and said, this is great. We want to build some add-ons. We want to build tools

11:09.840 --> 11:15.680
that can complement enhanced functionality, leveraging things like R so that they could create

11:15.680 --> 11:20.720
trackers and create things that would pipe the outputs to other tools. They've been helping

11:20.720 --> 11:24.640
track the research performance, conducting studies, comparing calendars performance to other

11:24.640 --> 11:30.800
tools that are out there and providing feedback on how we could continue to develop it to respond to

11:30.800 --> 11:39.440
changes. But this also raises some of the challenges. In 2016, Column was built by a team of volunteers,

11:39.760 --> 11:45.520
working on their nights and weekends. And then over the past nine years, it's continued to be

11:45.520 --> 11:51.040
maintained by these volunteers with sporadic support from various institutions, really following

11:51.040 --> 11:56.480
me around to multiple institutions over time. But we've been lucky that we've had a really

11:56.480 --> 12:01.600
supported and committed user community that has allowed us to continue to see the value and

12:01.600 --> 12:07.360
maintaining this over time. But it's on that note that I'm going to hand this over

12:08.560 --> 12:15.120
Celery to talk about the technical development and what's next for Paulander?

12:19.760 --> 12:24.000
Okay, hopefully that helps. Excellent. All right, so the technical development of

12:24.080 --> 12:32.560
Column. Data kind is an organization or a nonprofit organization and we're trying to close the gap

12:32.560 --> 12:38.640
between the access that non-social impact organizations have in the user every day in social

12:38.640 --> 12:47.360
impact organizations use every day. So this was an early application of machine learning and a lot

12:47.360 --> 12:52.640
of the base machine learning we're using was key in the development of what we now think of

12:52.640 --> 12:59.280
as Gen AI in large language models. And so Column was built as a two system approach.

13:00.320 --> 13:08.400
One, looking at inputs of search results from existing systems and the other uploading articles

13:08.400 --> 13:16.880
for full text assessment, and we'll dig into each of these. So on the first, we're using a

13:16.880 --> 13:25.840
technique called word-to-vector developed by Google in the early around 2013 by Google researchers.

13:26.720 --> 13:32.800
And it's a technique that transforms text into numerical vectors. We have a little example here

13:35.040 --> 13:40.240
where it takes the I like NLP and I love cats and based on the association of the words starts to

13:40.240 --> 13:46.720
create a scoring system. And so the way Column uses this is we use it to identify

13:46.720 --> 13:53.120
co-occurrences of words around user-friendly key terms and we actually read the abstracts and

13:53.120 --> 13:59.840
provide relevance to that. And this is what the system one diagram looks like.

14:03.120 --> 14:08.560
The second technique we're using is referred to as glove, which is global vectors. So we're capturing

14:08.560 --> 14:14.160
global vectors. Most of this work is in the English language. So capturing in the English language

14:14.880 --> 14:22.000
and we are generating word vectors based on occurrence and relevance statistics. And then it

14:22.000 --> 14:29.520
creates a probability of giving words what they actually mean based on how it's used in the

14:29.520 --> 14:36.560
presence of other words in the system. So this is a slightly different flow. So we would read in

14:36.560 --> 14:42.000
actually like a PDF and then apply this technique. So we aren't pulling any other information from

14:42.000 --> 14:49.360
another abstract system. So all of this, and I'm going to jump, I hate to do this to everyone.

14:50.160 --> 14:52.720
But thank you for your inspiration. We're going to do some live stuff.

14:56.080 --> 15:05.600
We, that whole back end goes out to arrest API. So right now we have built a user interface.

15:05.600 --> 15:12.080
It's based in English. But our hope is that with this published API, others can build their own

15:12.080 --> 15:20.960
applications to it. So basically all the documentation is here. For example, it's easy to be

15:20.960 --> 15:26.320
for you commit to building, to go ahead and we would provide any organization and tokens and authentication.

15:27.280 --> 15:30.320
You can just try it out in this interface before you commit to writing any code.

15:30.480 --> 15:39.520
So that's the basis of that, what we call the back end. And then we have built a front end.

15:39.520 --> 15:46.480
And this is meant to make it easy to use. Here we go. Oh, there we go. It's working.

15:47.280 --> 15:50.000
Loving we're living on time. I'll just show a couple of the key features.

15:52.160 --> 15:55.680
Action, I'm going to back out. Well, I'll do this first. And then we're kind of going to reverse.

15:56.400 --> 16:01.440
So here, you know, there's some basic tools. You could have quite a bit of these uploaded

16:01.440 --> 16:06.240
citations, especially coming from a system. We might be hundreds at a time. And so this is meant

16:06.240 --> 16:12.080
for teams to use. So teams of researchers can go through and use us. And you can sort by relevance

16:12.080 --> 16:18.800
or you can sort by recently added. If we go ahead and pick one, then you have the information

16:18.800 --> 16:24.000
you need to go ahead and start cataloging. You know, if you wanted to enter a tag,

16:25.600 --> 16:32.960
I'm just going to say, I'm going to say water. Keep it interesting. This is the test database,

16:32.960 --> 16:40.000
don't worry. And then that allows you to eventually filter by all the different tags that

16:40.000 --> 16:46.800
have to do with water, etc. And then, of course, you can go ahead and include it or exclude it.

16:46.880 --> 16:52.240
So similar to the system we just saw. And then once is included or excluded, it'll be, you know,

16:52.240 --> 16:58.480
in the completed phase. If it's in conflict with another reviewers, it shows up in that conflict,

16:58.480 --> 17:04.240
and then they can be resolved going forward. Again, knowing we are limited on time,

17:06.000 --> 17:12.240
let's just go quickly. We want to take a quick look at, we want to make it easy for

17:12.960 --> 17:18.400
importing information into the system. So, you know, we just have a basic import here,

17:19.600 --> 17:24.960
get some data information, make sure you can identify the data source, etc. And then it's

17:24.960 --> 17:29.600
pretty much drag and drop. So you can drag collections in here, things like that, keep that very simple.

17:31.840 --> 17:37.520
And with that, I'm going to skip back because we're wrapping up here to, and I'll go back to full screen.

17:37.920 --> 17:46.880
That was the human in the loop part. So that is how we maintain that human in the loop.

17:48.080 --> 17:53.920
So what's next? So as Sam mentioned, it's been maintained by a very small group.

17:54.560 --> 17:59.680
We're hoping to expand that. As we get more users, particularly more university users,

17:59.680 --> 18:04.480
we've had requests, can we contribute to the code? So we wanted to make some changes. The early

18:04.480 --> 18:10.320
versions were written in Scala, which is a JVM language, so not as well used these days.

18:10.960 --> 18:16.880
By researchers, especially not used it all. So we've refactor that entire back end to Python.

18:17.520 --> 18:22.720
Something that can be used. We've also, as you saw, that's the brand new API. We've refactor the

18:22.720 --> 18:29.360
API to a fast API, which used to be called Swagger. And we've rebuilt the front end in ReactJS,

18:29.360 --> 18:34.400
which is a popular, one of the most popular JavaScript languages. So it should be easy to

18:34.400 --> 18:40.560
contribute. The big thing, anytime you're kidding, and we've also had a build out our test suite.

18:40.560 --> 18:44.480
It didn't have a lot of coverage. So that way, it's easier for our team to test any incoming

18:45.440 --> 18:52.800
pull requests and merge. Up on GitHub, MIT license. So people can take this and improve it.

18:52.800 --> 19:02.400
The next thing we've done is we have through email primarily, which Sam has been the full customer

19:02.400 --> 19:08.240
support. We have brought in all of those that we started creating a GitHub issues. And we're

19:08.240 --> 19:12.320
hoping that, especially for people who are early in open source and want to pick, get their

19:12.320 --> 19:18.080
public commits up. This is a great place to start, find an issue, solve the issue. And then finally,

19:18.080 --> 19:23.440
we're establishing maintainer policies. Hopefully not the same for people who will be doing it

19:23.440 --> 19:29.040
all the time. So how people can also help maintain and approve that coming in. And the last step

19:29.040 --> 19:35.360
is very new, which a lot of the ML we're using has been used in creating LLMs. But now we want to

19:35.360 --> 19:40.000
explore how LLMs can also supplement this work. And that'll be a big focus in the next year or two.

19:40.560 --> 19:51.040
With that, thank you, and questions. And I'll make this mobile and so depending on who I say answer.

19:54.640 --> 19:55.680
We'll start left to right.

19:55.680 --> 20:12.560
So interesting enough, a lot of the feedback we got. Oh, the question was, thank you. That's

20:12.560 --> 20:17.760
why I just put my glasses on. The question was, are we considering redesigning the user interface

20:17.760 --> 20:22.720
based on feedback that we've gotten from caller one to caller two point two point out? The answer

20:22.720 --> 20:29.040
is, there are a number of changes we've made because caller one wasn't up to date with all the various

20:29.040 --> 20:35.200
standards like GDPR and others. So when we hacked those in, they became pretty ugly. So it's a

20:35.200 --> 20:39.360
much better looking interface. But for the major features, we've actually got a lot of feedback

20:39.360 --> 20:44.240
that the simplicity of the user interface is one of the, one of the selling points of this version

20:44.240 --> 20:49.120
of caller. So we don't have any major changes planned, except for improving around like the

20:49.120 --> 20:53.520
drag and drop technology, things like that, and making sure that that's obtained and obtainable.

20:53.520 --> 20:59.200
We do hope that there will be contributions of new features though, and that if it requires

20:59.200 --> 21:01.520
some adjustment, we would do.

21:08.800 --> 21:12.640
I'm going to hand us the same, but the question was, where do the research articles come from?

21:12.640 --> 21:15.200
Are we limited to open access?

21:16.160 --> 21:22.000
That is a great question. So if you're a caller user, you have to use your own access licenses,

21:22.000 --> 21:27.600
so whether you're at a university or wherever else, you are the ones searching for those publications

21:27.600 --> 21:30.560
and uploading them into callnors, so we won't search the databases for you.

21:38.160 --> 21:43.280
The question is, I mentioned callnors two is coming soon. How soon? Hopefully within the next six weeks,

21:43.840 --> 21:48.320
we're in final testing right now, and then I saw a hand over here.

21:48.320 --> 21:53.120
Yes, I some of who does conservation research, I've done it for many years, thank you for

21:53.120 --> 21:59.280
what you do. My question is, I saw the words include an exponent on the demo that just now,

21:59.280 --> 22:05.840
so it's what any particular paper would call and edit tell me, you know, callnors, why

22:06.560 --> 22:08.960
it suggests that including the word exclusion.

22:10.000 --> 22:15.200
The question is, is there is an include an exclude function during that review process,

22:15.200 --> 22:20.080
and would we have the reasoning that someone else could access why it was included or excluded?

22:21.280 --> 22:27.440
That is a great question, so because all decisions in callnors have to make by a person,

22:28.000 --> 22:32.800
when you say include, you can, that you know, it meets all the criteria, your team should have met

22:32.800 --> 22:37.600
and decided these are all the things it must fulfill. In the exclude, when you click on it,

22:37.600 --> 22:41.200
it gives you a whole set of options. You can define all those options yourself, right?

22:41.200 --> 22:48.080
So say you're interested in studying the impact of restoring mangroves on storm surge, right?

22:48.080 --> 22:53.600
You get a lot of storms, you can reduce flooding, and you might say something like, oh,

22:53.600 --> 22:58.000
this paper is about mangrove restoration, but it's not looking at the impact of storm surge

22:58.000 --> 23:02.400
at reduction, so you'd say no, like irrelevant outcome or whatever, so you can really come up with

23:02.400 --> 23:07.040
any reasons you want, you can add whatever tags you want, and that's to keep that whole process

23:07.120 --> 23:12.080
transparent, but it's not going to tell you you should include this or you should not include this.

23:12.080 --> 23:15.600
It's just going to say this seems more relevant to what you've been saying is relevant.

23:16.880 --> 23:20.800
And I'll just add that's an area we're thinking about where LLMs could be helpful. It could,

23:20.800 --> 23:25.200
you know, synthesize what all different teams have said about that, and maybe then we can maybe

23:25.200 --> 23:29.200
have some sort of suggestion feature. Oh, yes.

23:29.280 --> 23:45.200
Do you say open Alex? Well, one, I have not heard of open Alex, so I have not considered it.

23:46.240 --> 23:51.200
So the question was, have we considered using open Alex as a source for reviews?

23:51.200 --> 23:55.840
So I think that's, I think that's a great suggestion to the evidence and this is community, right?

23:55.840 --> 24:00.480
So the choice of where you look for papers and where you're looking for a mainstream is really important.

24:00.480 --> 24:05.360
You look in, you know, a really small database, you're going to have a small set of things to come back.

24:06.720 --> 24:11.120
But I think that's something to go back to, okay, where are the open source resources?

24:11.120 --> 24:15.840
We can be relying on as an evidence and this is community that are actually going to allow us,

24:15.840 --> 24:20.160
more visibility to all the things that are out there. But all under the time doesn't pull from it.

24:25.920 --> 24:40.560
Thank you. Thank you. So next up, we are going to stream a video from a speaker,

24:40.560 --> 24:46.000
speaker, that couldn't make it because of visa consideration.

