Inside-Out Markov Chain

Idea for hybrid of ELIZA-bot and Markov Chain for CoachBot.

  • instead of merging the code, just be redundant. Reply with 2 lines
    • statement relating last-user-statement to corpus
      • but doesn't make sense to trigger this off the first word of the user-statement. So find the keyword. Or more precisely, the keyword that ELIZA picks.
      • then build sentence by work backward and forward (inside-out) from that keyword with a Markov Chain.
    • then ELIZA mirror-question

Nov18'2017: code for Markov

  • built markov_rev at same time as markov
    • only working with chain_length=1
  • new generate_sentence1(word) builds sentence from inside out
  • Nov21: finish function, tweak to handle multi-word seed.

Nov20: start to build corpus

  • take entire PrivateWiki contents, cat together into 1 file
  • remove empty lines, leading spaces and *, and URLs
  • sort lines, showing lots of other crazy cases to clean up with regex...
  • Nov21: mostly cleaned up.
    • Definitely some issues, but good enough to generate some interesting stuff.
    • Strangely, have fair number of multi-sentence paragraphs which mean I'd expect to see some weird period-ending-words in the middle of generated phrases, but it's happened only a couple times.
    • Of course it's still just a Markov Chain so it's 80% nonsense. But it's meant almost like a Free-Association engine, which makes sense if the corpus is your own thoughts, but probably not super effective if it's someone else's.
    • Also realizing that because my PrivateWiki has lots of lists and cross-references, it generates some useless short phrases. Probably need to clean out the corpus. On the other hand, as I browse it, I see some interesting pithy little phrases, so I'm not sure... Hmm, maybe I'll output 3 sentences for each ELIZA interaction...

Next: find place in ELIZA to branch out to this.

  • Also have to decide how to save markov and markov_rev dictionaries to they can be used repeatedly. Just pickle, or treat like a server? Or maybe just initiate from inside ELIZA daemon itself, so tied to that session length. Yeah, just start with that.
  • using the old Joe Strout eliza.py code, not the pyAIML stuff.
  • Nov27: have ELIZA calling the Markov Chain bits.

Next: deal with details

  • no-match case: when input word doesn't exist in corpus at all
  • case-sensitivity: make everything lower-case before adding to markov?
  • Nov27: did both, but that lower-case is kinda bad because it flattens any Smashed Together Words.
    • probably don't do lower() if word is CamelCase....
  • different corpus (or additional?) - at bottom of IrcBot it says I already did some conversion of The Obstacle Is The Way so I should go find that! (There are some alternatives noted there, too.)

Edited: |

blog comments powered by Disqus