Neuroscience and Computational Linguistics

By Kate Kulinski posted October 11, 2018


Markov chains alone are not effective when it comes to producing convincing English, so I wanted to implement something more intelligent alongside Lucian's markgen.py. Neural networks (an area I have zero experience in by the way!) seem to fit the bill. Let me explain why:

The human brain consists of 100 billion cells called neurons, connected together by synapses. If sufficient synaptic inputs to a neuron fire, that neuron will also fire. We call this process "thinking".
Neural networks are just one example of technology borrowing from nature. What I'm trying to accomplish with Lucian, in fact, is prebuilt into our own nature: language acquisition. There are two competing theories explaining why this phenomenon occurs, how toddlers can grasp abstract concepts like tense and personal pronouns before learning to tie their shoes.

In a nutshell, B.F. Skinner's theory revolves around behaviorism. A boy from Guatemala, for example, acquires language from trial and error. Teachers, parents, or peers invalidate his attempts at Spanish until they are grammatically and phonetically correct. Positive reinforcement given by humans here parallels the "teacher" network in Convolutional Neural Networks (CNNs), which provides the "student" model with desired inputs (the inputs in this case being appropriate language use). User downvotes and upvotes will act as Lucian's positive reinforcers; upvoting when Lucian comments comprehensible English, downvoting otherwise.

Noam Chomsky coined Language Acquisition Device (LAD) in the 1960s. LAD does not refer a single structure in the brain, but the brain's innate capacity to "acquire and produce" language. According to Chomsky, infants absorb white noise, hand gestures, and writing systems, then independently organize this novel information into different languages. Markov chains behave similarly. markgen.py, for example, sorts through a corpus of text, in a language it does not understand, then over time discovers patterns within that text and produces new sentences based off those patterns:

import markovify

with open("\Users\jetco\Documents\lucian-master\tops.txt", encoding='utf8', errors='ignore') as f:
    text = f.read()

text_model = markovify.Text(text)

for i in range(5):
    print(text_model.make_sentence())

for i in range(10000):
    print(text_model.make_short_sentence(140))


        Rosie had been totalled.
        I was desperately hoping that whatever it is highly dependent on the water 100 yards ahead coming from a yacht.
        With a breath and answered.
        However, their excitement and eagerness wasn't enough to stop the car and let out a massive sigh.
        For years I thought the time we get there?
        Ends up letting us go with a sub-standard partner, but I absolutely don't think so?
        He had always been a seminary graduate but then asked to be written.
        He'd refused to let the crowd boos.
        The entire time through, that voice cut through to his nieces birthday and *had* to speed.
        But there aren't even real, and if this was after they won a championship.
        If even one person is involved in the first station to the Goblin in front of Ashlynn.
        There was a lot of potential for a moment, quietly considering the matter.
        I sat in his heart and tried to move out of Billy shadow.
        Charles sat alone again, scratching at his followers, all red-eyed, and stained.
        Ian, this is very real.
        > What about the length of contests?
        He says nothing, but gestures for her as she crosses her arms.
        Inside of its way.
        It was unbearable, it was well-written and well-executed, that person dies, themselves.
        Thats not possible, is this youre new policy?
        *What if this is very real.
        Somehow he knows the exact file to pick, and start digging your own time.
        And there was even real.
        This morning, after Id gotten my cup of coffee down on his wall.
        They happen to all people, everywhere, of every age group and vote again overall, but I feel give the best solution.
        Everyone else joined in on the ground in front of him.
        Gamblon shared a knowing look with his small collection of rocks.
        Then again, to this a dozen different multiverses... similarish plot, varying details... if that was stained yellow from the hilltops.
        One by one, the Master clicked the last image, the screen is a spot near the door that was magical.
        All of the contest?
    
From programming, I've realized I can make use of both theories to help Lucian acquire the composite writing style of Redditors. However, I can also use these principles to model the average dialect of Redditors themselves. Could the right combination of markov chains and machine learning give rise to the internet's most convincing BroBot? We'll see!

What is art?

By Kate Kulinski posted October 5, 2018

The answer to this question is an opinion, held by critics and the general public, that shifts in response to individuals' education and experience. Each definition emerges from that individual's class, gender, race, and economic status, not an objective measurement of technical skill. In short, none of them can be rationally proven right or wrong because the argument will eventually boil down to semantics. However, it is possible to distinguish popular, commercially viable art (think of the Marvel Cinematic Universe) from art that will not succeed in a particular marketplace.

Enter Reddit:

https://www.reddit.com/r/WritingPrompts

Adolf did so, his brush strokes trembling across the pallette. "Easy there tiger, try to keep yourself calm, now. Painting is all about being steady, confident." Adolf nodded again, and went this time, albeit a bit slower, and mixed another selection. After he had done this the stranger patted his shoulder. "Good, now let's see you paint a nice, open sky." "But how? I can barely paint the ground, let allow what lies above it!" Sighing, the man grabbed a firm hold of his arm and lifted it up.

Reddit's primary demographic is white, single, heterosexual American males aged 18-25. Nearly 18 million of these Redditors subscribe to the community linked above, a default subreddit where amateur writers respond to user-submitted prompts with an original short story. Readers then upvote their favorite pieces to the top of each thread. New question: by quantifying the average writing style of top commenters in https://www.reddit.com/r/WritingPrompts, could a machine produce prose that outvotes human-generated short stories? Here is my first step towards accomplishing that goal.  

import praw
import feedparser

reddit = praw.Reddit(user_agent='Top Comment Extraction (by /u/caturian)',
                     client_id='XXXXXXXX', client_secret="XXXXXXXX",
                     username='XXXXXXXX', password='XXXXXXXX')

length = feedparser.parse(input('Enter an RSS feed:'))
l = len(length)

while l != 0:
    link = length['entries'][l+1]['link']
    submission = reddit.submission(url=link)
    for top_level_comment in submission.comments:
        with open("tops.txt", "a") as tops:
            tops.write(top_level_comment.body)
    l = l - 1

import markovify

# Get raw text as string.
with open("/Users/Kate/Documents/markovify-master/test/tops.txt", encoding='utf8', errors='ignore') as f:
    text = f.read()

# Build the model.
text_model = markovify.Text(text)

# Print five randomly-generated sentences
for i in range(5):
    print(text_model.make_sentence())

# Print three randomly-generated sentences of no more than 140 characters
for i in range(10000):
    print(text_model.make_short_sentence(140))
Racter, one of the earliest programs that could randomly generate English prose. This is a screenshot from the interactive fiction game developed in the early 80s. It basically produces word vomit, but still pretty intriguing. https://classicreload.com/racter.html
< 1 min to Spreed