Fractal Dimension Of A Hypertext Space
The crux of my preference for Smashed Together Words over the Free Link approach to Automatic Linking in a Wiki is that I believe it increases the "bushiness" of the space.
- cf Intertwingularity, Associative
- Mark Bernstein: The TINAC Manifesto said, "Three links per node or it's not a HyperText."
In 1997 Leo Egghe (L Egghe) wrote an article on the "Fractal And Informetric Aspects Of Hypertext Systems".
In the paper he defines the Fractal dimension of a HyperText based on the number of (internal) links it includes:
- let
n
denote the total number of pages, and m
the average number of "HLs" (hypertext links) per page- I'm pretty sure that that count should be only
- count only explicit Wiki Name references, not automatic BackLinks.
- unique cases of a link on a given page (so if a
Page A
has 3 separate links toPage B
in it (which happens in a wiki when you use the same Wiki Name multiple times in a page), then it should only count as 1 for that page) - links to pages that actually already exist (vs the Wiki case of a link to create-a-page-with-that-name)
- I'm pretty sure that that count should be only
- the fractal dimension
D = ln(n) / [ln(n) + ln((1+m) / m)]
- Python:
fractal_dimension = log(num_pages)/(log(num_pages) + log((1 + avg_frontlinks) / avg_frontlinks))
- Python:
I should really work on writing some code to calculate this for various spaces... (I'm kinda surprised this hasn't been done for Wikipedia already)
Aug08'2014: realized my WikiGraph code got me 90% of the way... ended up with:
number of pages: 16506
total number of frontlinks: 85189
avg num frontlinks/page: 5.1610929359
fractal dimension: 0.982089872391
Let's compare to a fake WebLog (I've made a spreadsheet to calculate these):
number of pages: 1000
total number of frontlinks: 500
avg num frontlinks/page: 0.5 (because most blogs don't do much non-navigation in-linking)
fractal dimension: 0.862782681486289
Let's check 2 variations on that fake WebLog:
avg_frontlinks = 1.0 -> fractal dimension: 0.9088072522638707
avg_frontlinks = 0.1 -> fractal dimension: 0.7423183624341485
And let's compare to a smaller version of my WikiLog (note that if I had fewer pages, then I'd have fewer links/page because many WikiWords wouldn't hit matches - but we'll ignore that for now):
n=1000, avg=5.16 -> fractal dimension: 0.975002217
Sept18'2014: was going to write an HTML-scraper to handle my Private Wiki, but decided to just grab the raw-text, since that's much easier.
n=2687, avg = 3.74 -> fractal dimension: 0.970874964094
Hmm, what's an upper-bound? How much is too bushy? Fake scenario:
n=1000, avg=100 -> fractal dimension: 0.99856
Oct29'2014: how about that TINAC-test of 3 links/page?
n=1000, avg=3 -> fractal dimension: 0.960
Next
- write HTML-scraper code, try against my own sites as double-check
- share my code on GitHub
- calculate for Community Wiki, Meatball Wiki...
Sept'2015: realize my discomfort with this metric is that it's just about the average number of links, and not the distribution of that curve. Specifically, isn't a "rich" HyperText going to have a Power Law?
- note that the links out of a page won't be a Power Law, it's the links in that are.
- the average of in=out - showing that you need a better distribution-summary than average.
Edited: | Tweet this! | Search Twitter for discussion