Fractal Dimension Of A Hypertext Space

The crux of my preference for Smashed Together Words over the Free Link approach to Automatic Linking in a Wiki is that I believe it increases the "bushiness" of the space.

In 1997 Leo Egghe (L Egghe) wrote an article on the "FractalAndInformetricAspectsOfHypertextSystems".

In the paper he defines the Fractal dimension of a HyperText based on the number of (internal) links it includes:

  • let n denote the total number of pages, and
  • m the average number of "HLs" (hypertext links) per page
    • I'm pretty sure that that count should be only
      • count only explicit Wiki Name references, not automatic BackLinks.
      • unique cases of a link on a given page (so if a Page A has 3 separate links to Page B in it (which happens in a wiki when you use the same Wiki Name multiple times in a page), then it should only count as 1 for that page)
      • links to pages that actually already exist (vs the Wiki case of a link to create-a-page-with-that-name)
  • the fractal dimension D = ln(n) / [ln(n) + ln((1+m) / m)]
    • Python: fractal_dimension = log(num_pages)/(log(num_pages) + log((1 + avg_frontlinks) / avg_frontlinks))

I should really work on writing some code to calculate this for various spaces... (I'm kinda surprised this hasn't been done for Wikipedia already)

Aug08'2014: realized my WikiGraph code got me 90% of the way... ended up with:

number of pages:  16506
total number of frontlinks:  85189
avg num frontlinks/page:  5.1610929359
fractal dimension:  0.982089872391

Let's compare to a fake WebLog (I've made a spreadsheet to calculate these):

number of pages:  1000
total number of frontlinks:  500
avg num frontlinks/page:  0.5 (because most blogs don't do much non-navigation in-linking)
fractal dimension: 0.862782681486289

Let's check 2 variations on that fake WebLog:

  • avg_frontlinks = 1.0 -> fractal dimension: 0.9088072522638707
  • avg_frontlinks = 0.1 -> fractal dimension: 0.7423183624341485

And let's compare to a smaller version of my WikiLog (note that if I had fewer pages, then I'd have fewer links/page because many WikiWords wouldn't hit matches - but we'll ignore that for now):

  • n=1000, avg=5.16 -> fractal dimension: 0.975002217

Sept18'2014: was going to write an HTML-scraper to handle my Private Wiki, but decided to just grab the raw-text, since that's much easier.

  • n=2687, avg = 3.74 -> fractal dimension: 0.970874964094

Hmm, what's an upper-bound? How much is too bushy? Fake scenario:

  • n=1000, avg=100 -> fractal dimension: 0.99856

Oct29'2014: how about that TINAC-test of 3 links/page?

  • n=1000, avg=3 -> fractal dimension: 0.960


Sept'2015: realize my discomfort with this metric is that it's just about the average number of links, and not the distribution of that curve. Specifically, isn't a "rich" HyperText going to have a Power Law?

  • note that the links out of a page won't be a Power Law, it's the links in that are.
  • the average of in=out - showing that you need a better distribution-summary than average.

Edited: |

blog comments powered by Disqus