Fractal Dimension Of A Hypertext Space

The crux of my preference for Smashed Together Words over the Free Link approach to Automatic Linking in a Wiki is that I believe it increases the "bushiness" of the space.

In 1997 Leo Egghe (L Egghe) wrote an article on the "Fractal And Informetric Aspects Of Hypertext Systems".

In the paper he defines the Fractal dimension of a HyperText based on the number of (internal) links it includes:

  • let n denote the total number of pages, and
  • m the average number of "HLs" (hypertext links) per page
    • I'm pretty sure that that count should be only
      • count only explicit Wiki Name references, not automatic BackLinks.
      • unique cases of a link on a given page (so if a Page A has 3 separate links to Page B in it (which happens in a wiki when you use the same Wiki Name multiple times in a page), then it should only count as 1 for that page)
      • links to pages that actually already exist (vs the Wiki case of a link to create-a-page-with-that-name)
  • the fractal dimension D = ln(n) / [ln(n) + ln((1+m) / m)]
    • Python: fractal_dimension = log(num_pages)/(log(num_pages) + log((1 + avg_frontlinks) / avg_frontlinks))

I should really work on writing some code to calculate this for various spaces... (I'm kinda surprised this hasn't been done for Wikipedia already)

Aug08'2014: realized my WikiGraph code got me 90% of the way... ended up with:

number of pages:  16506
total number of frontlinks:  85189
avg num frontlinks/page:  5.1610929359
fractal dimension:  0.982089872391

Let's compare to a fake WebLog (I've made a spreadsheet to calculate these):

number of pages:  1000
total number of frontlinks:  500
avg num frontlinks/page:  0.5 (because most blogs don't do much non-navigation in-linking)
fractal dimension: 0.862782681486289

Let's check 2 variations on that fake WebLog:

  • avg_frontlinks = 1.0 -> fractal dimension: 0.9088072522638707
  • avg_frontlinks = 0.1 -> fractal dimension: 0.7423183624341485

And let's compare to a smaller version of my WikiLog (note that if I had fewer pages, then I'd have fewer links/page because many WikiWords wouldn't hit matches - but we'll ignore that for now):

  • n=1000, avg=5.16 -> fractal dimension: 0.975002217

Sept18'2014: was going to write an HTML-scraper to handle my Private Wiki, but decided to just grab the raw-text, since that's much easier.

  • n=2687, avg = 3.74 -> fractal dimension: 0.970874964094

Hmm, what's an upper-bound? How much is too bushy? Fake scenario:

  • n=1000, avg=100 -> fractal dimension: 0.99856

Oct29'2014: how about that TINAC-test of 3 links/page?

  • n=1000, avg=3 -> fractal dimension: 0.960

Next

Sept'2015: realize my discomfort with this metric is that it's just about the average number of links, and not the distribution of that curve. Specifically, isn't a "rich" HyperText going to have a Power Law?

  • note that the links out of a page won't be a Power Law, it's the links in that are.
  • the average of in=out - showing that you need a better distribution-summary than average.

Edited: |

blog comments powered by Disqus