Indico.io API + text analysis of definitions: privacy, surveillance, interrogation

Interesting introduction to thinking a bit more deeply about how algorithms are engineered to learn about sentiment and meaning analysis.  This small CSV data set was a set of definitions of 'privacy', 'surveillance', and 'interrogation'.  

I analyzed each definition as a line using the Indico API for "political analysis" and also "sentiment."  They seem pretty arbitrary but I suppose they gather the information from tags and things like that - but I think a point to explore further is: the internet is quite two-dimensional as a way to gather understanding of complex human sentiment and issues and tropes with a spatio-temporal history.  If AIs are the children of humanity - is that any way to teach a child? Come now.

 It is fun to play with the browser/GUI version and run words like "God" or "visage" or "face" or "help" (100% positive sentiment analysis) or "hurt" (1% sentiment analysis) though. https://indico.io

I just used some ngram examples I worked with a little in Python from Spring 2015's Python-based course Reading and Writing Electronic Text.  I dabbled a bit in using the API to analyze each definition in a row as a line.  Then I also analyzed each individual word which showed up more than twice in the text.  

  

TWO SACRAMENTS FOR THE EXORCISM OF LANGUAGE BY A RIGHTEOUS CYBORG

INSPIRATION

Script combining my Midterm and Final projects for the May 6, 2015 generative poetry reading in the Reis Lounge at Tisch, NYU. 

Script combining my Midterm and Final projects for the May 6, 2015 generative poetry reading in the Reis Lounge at Tisch, NYU. 

My final project was born of my fascination with the agricultural and economic legacy of the English language. Why is society so seemingly fucked?  Throughout the Spring semester I studied and explored a veritable buffet of concepts in energy, the Anthropocene, engineering ethics, and the future of "new media" and infrastructure, and of course electronic writing and reading as a sort of bas relief around human creativity in finding meaning in interpreting words expressed via the predictable/unpredictable results of programming logic, set in motion by human hand. 

This panoply of dense topics inspired me to explore the ideas of materialism and physicalism.  Which in turn inspired me to return to my old craft of screenwriting. I left film for ITP because I was sick of how manipulative and economically driven the medium and industry are.  To quote Michael Corleone "Just as I thought I was out they pull me back in."  So I decided to write an outline screenplay about a female-type cyborg who is seeded and raised in a host family as a control in a long-term study of human illogic and illogic in navigating decisions for the goal of total mutual benefit (in social and biodiversity spheres.)  The things that make this screenplay tenable for me are:

  • a) an interactive screenwriting formatted website which I would like to design to dynamically tag action and dialogue blocks, lines, or words with an associative or literal visual reference.
  • b)  a protagonist cyborg who realizes the only way to preserve and achieve her directive for mutual benefit is to destroy human language and this is what happens in the end - humanity is left to sort out their emotionally and materially determined future with humanity-wide aphasia.  
  • c) The story follows Semi, the cyborg character as she uses her creator-limited processes to help her human peers at home and a research university navigate decisions with logic, rather than emotion.  Her talent is being able to project all possible paths of rigid bodies in vector space, as well as being able to use her understanding of human language interactions, behavioral psychology, and situational factors to do so.  Much of the story structure is about how the human characters she encounters make poor social, ethical, material, and temporal decisions based on a misunderstanding of the nature of reality.  For example, they constantly disregard her personhood when her advice for how they should act to ameliorate a poor decision stimulates a strong cognitive dissonance response.  In many ways, I seek to use typical film drama as an instructive rage against emotions and the lies of language. 

WHAT THE ABOVE HAS TO DO WITH MY GENERATED TEXT

In each of my classes, which were all extremely burdensome existentially speaking, I chose to use the second half of the semester to work on an aspect of world building.  For my RWET project I wanted to explore ways in which I could begin to explore ways to use logic to generate my cyborg's non-cooperative, post-revelation dialogue with her creator and her frustration with their inability to discuss the real issues at hand - the nature of reality. 

VERBS VERSUS NOUNS

Learning about TextBlob and NLTK inspired me to first approach my crackpot brand of physicalism from a very simple place.   I decided to design and generate a "prototype" of the kind of conversation my cyborg, once she twigs to how shitty human language is at handling reality and representing the truth.  

My central idea is: agricultural and colonial-enabled language has caused us to lose our innate animal understanding that everything is energy.  Language, by way of weighting the concepts of property and labor as most important has divided our understanding of the nature of reality, which is bundles of energy flowing through points in space at varying levels of stability or excitation, into two concepts which are harder to immediately synthesize retrospectively: TIME and MATTER.  Time is the economic measure of the energy we output for those who control the narrative.  Matter is the justification of property-holding.  

For the dialogue between Semi and her creator I made the simplistic organizational rule that: 

  • any time a question Semi's creator asks is noun heavy rather than parse his intended meaning she uncooperatively chooses to "assume" he wants to discuss "the illusion of matter, human economics and/or emotional content."
  • any time a question Semi's creator asks a verb-heavy question she relates that to the concept of energy waves moving.  She prefers any discussion which downplays the literal idea of physical, particles or quanta. 

CORPUSES

Nabokov's Lolita was the text I used to generate Semi's answers to Creator A's noun-heavy questions.  I used n-gram analysis to try and distill the essence of what is "the matter" in such a linguistically stellar handling of dysfunctional, self-involved, destructive decision-making and culture. 

I used Louis Debroglie's Particle-Wave Duality theory text for verb heavy questions.  I also used n-gram analysis to get the most common phrases in his corpus.  Debroglie was a contemporary of Einstein's whose 1924 PhD thesis postulated the wave nature of electrons and suggested all matter has wave properties. 

Creator A's questions are randomly selected questions for a psychometrics battery for child trauma. As Wikipedia says: "Psychometrics is a field of study concerned with the theory and technique of psychological measurement concerned with objective measurement of skills, knowledge, abilities, attitudes, personality traits, and educational achievement."  I wanted to create the impression that Creator A is debriefing Semi and I wanted to move the questioning beyond the Turing test references into something more deeply suggestive of the scenario. 

Using Allison Parrish's RWET example code for n-gram analysis on the DeBroglie thesis text:

ngram analyis of debroglie

ngram analyis of debroglie

TEXTBLOB

I used Python's TextBlob module to find noun phrases and verbs in the creator's questions. Some pseudo code stylings: 

some pseudo code to organize my program's structure. 

some pseudo code to organize my program's structure. 

PERFORMANCE

My n-gram analysis text-mash up lacked markov-chain sophistication.  So it was a string of non-sensical preposition and conjunction phrases. In my reading performance I took the notes I got after my final presentation: I went with a bombastic oratory style which fit into my idea that Semi as a character is disdainful of human language.  After my performance peers and audience members volunteered that they could feel Semi's attitude from my reading of the text- I think it had an interesting effect with the context-heavy framing of my text cut up/generation choices. 

[reaching]

[reaching]

RWET MIDTERM: REJECTING MEANING IN TEXT & EXPLORING A PHYSICAL REPRESENTATION

I wanted to explore the meaninglessness of text by deconstructing or reducing it to an electronic data level: where in RAM are these strings and exactly how much memory do they occupy in terms of bits. 

The form is a bit flat but I designed it to merely comply with the midterm assignment verbiage.

  • Devise a new poetic form.
  • Create a computer program that generates texts that conform to new poetic form you devised

Questions of rigour: Could a human do it better? It would be unfortunate to have to calculate the bytes and a human could not possibly find the place in space in which electrons as binary digits/bits have settled. 

How does your choice of source text (your "raw material") affect the character and quality of the poems that your program generates? I think the fact that these lines are celebrated grand openers for celebrated works of pure representation/fiction is a nice contrast to the excruciating lack of imagination and factuality of this "poetic form." 

OUTPUT: 

Located at RAM address 4373218288, 'In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.' is a string of length 120 characters, 157 bytes and 1256 bits

Located at RAM address 4396564032, 'Take my camel, dear, said my Aunt Dot, as she climbed down from this animal on her return from High Mass.'  is a string of length 105 characters, 142 bytes and 1136 bits. 

Located at RAM address 4396885808, 'I am an invisible man.'  is a string of length 22 characters, 59 bytes and 472 bits.

Located at RAM address 4457509712, 'It was a queer, sultry summer, the summer they electrocuted the Rosenbergs, and I didn't know what I was doing in New York.' is a string of length 123 characters, 160 bytes and 1280 bits.

Located at RAM address 4456529592,  'The Miss Lonelyhearts of the New York Post-Dispatch (Are you in trouble?—Do-you-need-advice?—Write-to-Miss-Lonelyhearts-and-she-will-help-you) sat at his desk and stared at a piece of white cardboard.' is a string of length 204 characters, 241 bytes and 1928 bits.

Located at RAM address 4457509712,  'It was a queer, sultry summer, the summer they electrocuted the Rosenbergs, and I didn't know what I was doing in New York.' is a string of length 123 characters, 160 bytes and 1280 bits.

INPUT - 100 best first lines from modern novels

http://americanbookreview.org/100bestlines.asp

Example of ALTERNATE OUTPUT: 

Located at RAM address 4459266096, 'The human race, to which so many of my readers belong, has been playing at children's games from the beginning, and will probably do it till the end, which is a nuisance for the few people who grow up.' is a string of length 201 characters, 238 bytes and 1904 bits.

The binary represenation of this sentence is:

1010100 1101000 1100101 100000 1101000 1110101 1101101 1100001 1101110 100000 1110010 1100001 1100011 1100101 101100 100000 1110100 1101111 100000 1110111 1101000 1101001 1100011 1101000 100000 1110011 1101111 100000 1101101 1100001 1101110 1111001 100000 1101111 1100110 100000 1101101 1111001 100000 1110010 1100101 1100001 1100100 1100101 1110010 1110011 100000 1100010 1100101 1101100 1101111 1101110 1100111 101100 100000 1101000 1100001 1110011 100000 1100010 1100101 1100101 1101110 100000 1110000 1101100 1100001 1111001 1101001 1101110 1100111 100000 1100001 1110100 100000 1100011 1101000 1101001 1101100 1100100 1110010 1100101 1101110 100111 1110011 100000 1100111 1100001 1101101 1100101 1110011 100000 1100110 1110010 1101111 1101101 100000 1110100 1101000 1100101 100000 1100010 1100101 1100111 1101001 1101110 1101110 1101001 1101110 1100111 101100 100000 1100001 1101110 1100100 100000 1110111 1101001 1101100 1101100 100000 1110000 1110010 1101111 1100010 1100001 1100010 1101100 1111001 100000 1100100 1101111 100000 1101001 1110100 100000 1110100 1101001 1101100 1101100 100000 1110100 1101000 1100101 100000 1100101 1101110 1100100 101100 100000 1110111 1101000 1101001 1100011 1101000 100000 1101001 1110011 100000 1100001 100000 1101110 1110101 1101001 1110011 1100001 1101110 1100011 1100101 100000 1100110 1101111 1110010 100000 1110100 1101000 1100101 100000 1100110 1100101 1110111 100000 1110000 1100101 1101111 1110000 1101100 1100101 100000 1110111 1101000 1101111 100000 1100111 1110010 1101111 1110111 100000 1110101 1110000 101110