# Fractal Dimension Of A Hypertext Space

The crux of my preference for Smashed Together Words over the Free Link approach to Automatic Linking in a Wiki is that I believe it increases the "bushiness" of the space.

- cf Intertwingularity, Associative
- Mark Bernstein:
*The TINAC Manifesto said, "Three links per node or it's not a HyperText."*

In 1997 Leo Egghe (L Egghe) wrote an article on the "Fractal And Informetric Aspects Of Hypertext Systems".

In the paper he defines the Fractal dimension of a HyperText based on the number of (internal) links it includes:

- let
`n`

denote the total number of pages, and `m`

the average number of "HLs" (hypertext links) per page- I'm pretty sure that that count should be only
- count only explicit Wiki Name references, not automatic BackLinks.
- unique cases of a link on a given page (so if a
`Page A`

has 3 separate links to`Page B`

in it (which happens in a wiki when you use the same Wiki Name multiple times in a page), then it should only count as 1 for that page) - links to pages that actually already exist (vs the Wiki case of a link to create-a-page-with-that-name)

- I'm pretty sure that that count should be only
- the fractal dimension
`D = ln(n) / [ln(n) + ln((1+m) / m)]`

- Python:
`fractal_dimension = log(num_pages)/(log(num_pages) + log((1 + avg_frontlinks) / avg_frontlinks))`

- Python:

I should really work on writing some code to calculate this for various spaces... (I'm kinda surprised this hasn't been done for Wikipedia already)

Aug08'2014: realized my WikiGraph code got me 90% of the way... ended up with:

```
number of pages: 16506
total number of frontlinks: 85189
avg num frontlinks/page: 5.1610929359
fractal dimension: 0.982089872391
```

Let's compare to a fake WebLog (I've made a spreadsheet to calculate these):

```
number of pages: 1000
total number of frontlinks: 500
avg num frontlinks/page: 0.5 (because most blogs don't do much non-navigation in-linking)
fractal dimension: 0.862782681486289
```

Let's check 2 variations on that fake WebLog:

`avg_frontlinks = 1.0 -> fractal dimension: 0.9088072522638707`

`avg_frontlinks = 0.1 -> fractal dimension: 0.7423183624341485`

And let's compare to a smaller version of my WikiLog (note that if I had fewer pages, then I'd have fewer links/page because many WikiWords wouldn't hit matches - but we'll ignore that for now):

`n=1000, avg=5.16 -> fractal dimension: 0.975002217`

Sept18'2014: was going to write an HTML-scraper to handle my Private Wiki, but decided to just grab the raw-text, since that's much easier.

`n=2687, avg = 3.74 -> fractal dimension: 0.970874964094`

Hmm, what's an upper-bound? How much is *too* bushy? Fake scenario:

`n=1000, avg=100 -> fractal dimension: 0.99856`

Oct29'2014: how about that TINAC-test of 3 links/page?

`n=1000, avg=3 -> fractal dimension: 0.960`

Next

- write HTML-scraper code, try against my own sites as double-check
- share my code on GitHub
- calculate for Community Wiki, Meatball Wiki...

Sept'2015: realize my discomfort with this metric is that it's just about the average number of links, and not the distribution of that curve. Specifically, isn't a "rich" HyperText going to have a Power Law?

- note that the links
*out*of a page won't be a Power Law, it's the links*in*that are. - the
*average*of in=out - showing that you need a better distribution-summary than*average*.

Edited: | Tweet this!