Logo

Data Science Colloquium FY21 - Shared screen with speaker view
Mark Turner
15:44
https://colab.research.google.com/drive/1jLIyzFHqwOITjaNJ6fX7qob6-SnXX7NW?usp=sharing
Mark Turner
16:57
https://colab.research.google.com/drive/1jLIyzFHqwOITjaNJ6fX7qob6-SnXX7NW?usp=sharing
Mark Turner
26:51
Link to the Data Science Colloquium schedule: https://cognitivescience.case.edu/data-science-colloquium/. That page contains the link to the Colab notebook mentioned in this talk.
Cathie Kelsey
47:34
I’m wondering how stream of consciousness text (eg Ulysses by Joyce) or poetry as input text works in series of steps?
Michael Hemenway (he/him)
49:30
great question Cathie! would be very interesting to try training GPT-2 further on Joyce and see what could be generated?
Raghav Sharma
52:44
can you set encomium (the corpus variable) to a .txt file the way we did for KJVbot?
Raghav Sharma
52:53
and if so, does it need to be formatted in a specific way?
Timothy Beal
53:10
Great question Cathie! I can say I had no luck with Markov chains and Ulysses. Would be really interesting to experiment vis GPT-2.
Cathie Kelsey
53:58
It would also tell us more about what Joyce was actually doing as he wrote - whether consciously or not?
Mark Turner
55:22
For those interested in NLP and its subset NLU vs. ASR, see http://redhenlab.org and https://sites.google.com/case.edu/techne-public-site/ for beginning tutorials
Michael Hemenway (he/him)
56:17
Raghav, yes we can change input to a text file. the format of the text can be whatever you want.
Michael Hemenway (he/him)
56:30
but the format of the text will have an impact on results
Michael Hemenway (he/him)
56:49
many times, we spend more time pre-processing text than we do running the models.
Raghav Sharma
57:50
Thank you! Makes sense.
Timothy Beal
59:10
Justin was a student of Peter Knox at CU Boulder btw.
Michael Hemenway (he/him)
59:18
with the transformers library, you might have to load your text file into a text variable in order to use it with the simple functions like classifier()
Mark Turner
01:03:26
Well, in the 70s and 80s, this question was asked and answered everywhere, and proved to be highly contested. A central reference would be John Searle’s Chinese Box article.
Cathie Kelsey
01:03:36
If the machine were a student, I would not call this understanding. It is the basis for understanding (which I think requires being able to do something constructive with the material.)
Anne Helmreich (she/her)
01:04:35
humanities answer: how are you defining thinking and understanding? seems like the machine recognizes-and assembles parts into a whole based on recognition- is that the same as thinking and understanding?
Mark Turner
01:04:49
https://plato.stanford.edu/entries/chinese-room/
Mark Turner
01:05:11
The argument and thought-experiment now generally known as the Chinese Room Argument was first published in a 1980 article by American philosopher John Searle (1932– ). It has become one of the best-known arguments in recent philosophy. Searle imagines himself alone in a room following a computer program for responding to Chinese characters slipped under the door. Searle understands nothing of Chinese, and yet, by following the program for manipulating symbols and numerals just as a computer does, he sends appropriate strings of Chinese characters back out under the door, and this leads those outside to mistakenly suppose there is a Chinese speaker in the room.
Mark Turner
01:06:13
The narrow conclusion of the argument is that programming a digital computer may make it appear to understand language but could not produce real understanding. Hence the “Turing Test” is inadequate. Searle argues that the thought experiment underscores the fact that computers merely use syntactic rules to manipulate symbol strings, but have no understanding of meaning or semantics. The broader conclusion of the argument is that the theory that human minds are computer-like computational or information processing systems is refuted. Instead minds must result from biological processes; computers can at best simulate these biological processes. Thus the argument has large implications for semantics, philosophy of language and mind, theories of consciousness, computer science and cognitive science generally. As a result, there have been many critical replies to the argument.
Timothy Beal
01:10:23
Anne’s comment raises interesting questions about frictions between the industry’s market-driven interests in these techs on the one hand (e.g., immediacy and invisibility) and our interests as humanists on the other (e.g., complexity, ambiguity, slowing down and making visible) …
Michael Hemenway (he/him)
01:12:02
great points, Anne and Tim. perhaps this is a place where humanists can help shape a more useful approach to understanding in natural language spaces
Timothy Beal
01:12:32
What possibilities are there for using these techs in ways that are subversive of such industry interests?
Anne Helmreich (she/her)
01:15:05
Sorry to leave.... thanks for sharing! And to Tim's point- we have been thinking about that with respect to the machine learning/image analysis we have been doing given some of the issues (putting it mildly around image recognition)…. such a good conversation... gotta go supervise an intern!
Peter Yang
01:20:24
Does the transformer GPT-2 process datasets in foreign languages and related grammars? Can it be trained to identify grammar mistakes? That is to be used to correct foreign language tests?
Michael Hemenway (he/him)
01:21:54
Hi Peter. I believe GPT-2 focused on English language. there are other transformers that are better at other languages.
Mark Turner
01:22:05
https://openai.com/blog/better-language-models/
Timothy Beal
01:23:33
What is training a model?
Timothy Beal
01:35:33
I have a quick thought ...
Mark Turner
01:41:43
Thanks to Justin and Michael. Awesome!
Raghav Sharma
01:42:08
thanks so much Justin and Michael!
Cathie Kelsey
01:42:32
Thank you!