Skip to content
This repository has been archived by the owner on Mar 22, 2018. It is now read-only.

Belief calculation : defining a topic

Jan Paul Posma edited this page May 15, 2014 · 1 revision

We started out with one global authority: The basic authority of a user that has been thought up earlier, was:

The authority of a user on a fact is 1 + log(SUM(n[i]), i=1..N), where i indexes the facts that were created by this user and n[i] the amount of times these facts are used as evidence

A person either is smart or stupid, and knowing something about Ruby also implies that you probably know a lot about Neuroscience.

Since this isn't very logical we want subject specific authority:

No person is ever knowledgeable about everything, and the background of any one person usually defines a small set of expertise. This set of expertise allows someone to make authoritative claims about subjects. Claims by a studied professional in the field of biochemistry might be authoritative on that subject, but not necessarily on politics.

Accurately labeling an area of expertise is challenging, since some areas do not have a definitive label, people can gain and lose authority and domains themselves change, for example by adding new sub domains or changing jargon.

More formal the problem at hand is that of mapping knowledge to a labeled dynamic ontology.

Classically this problem has been solved by introducing '''(hierarchal) classification''', such as the indexing systems of local libraries or class based object orientated programming. This, however, has many problems, first and foremost categories are often rigid and are not flexible to instances that require multiple categories or other exceptions, leading to unwanted forced fits, e.g. is a virus alive or lifeless.

'''Tags''' circumvent part of the problem by being able to categorize instances in multiple categories (multiple 'tags'), but often at the expense of a hierarchy. And if the tags are user generated it is difficult to determine if two tags are in fact the same.

The proposed solution is to use '''user generated hierarchical labels (channels)''' which can be propagated (shared). Say Pete labels an item in birds, a subject he is knowledgeable about, and Jim subscribes to this channel by adding it to his animals channel. The item them gains both the label birds and animals, but since the birds channel is part-of the animals channel a hierarchy is also produced.

With this construction there is however no incentive for creating proper labels which is necessary for the hierarchy to work. This incentive is added by introducing increasing the authority of a person per channel for each time some one subscribes to his or her channel.

The assumption is that more knowledgeable people are better fit to create more detailed channels, giving them higher authority and pushing that authority upward to more general (more abstract) channels, and giving them incentive in terms of a higher authority to do so.

We want a subject space to basically be a bag of facts, which are related on subject. We can think of multiple ways to define subject spaces.

Possible sources

##Fact Graph

We can define a subject space on the Fact Graph. For instance facts within 3 steps are in the same subject space. You can discuss what steps need to be used (for example only do the steps in which the fact is used for evidence, and not visa-versa)

Advantages

  • We use the Fact Graph, which is highly factual, more than for instance channels
  • The structure that is created by reasoning is used to create the subject spaces

Disadvantages

  • It's not easy to define a name tag for a 'subject space', that is, calculate authority for someone on a named subject space
  • propagating authority as well as believes somehow doesn't feel right (to Mark at least). This feels very vulnerable to self-fulfilling prophecies.
  • the distance relation is not necessarily symmetric, since the implications (factrelations) are not

Channels

We can define a subject space on a certain word, for instance 'Ruby', and then all facts in channels with that name can belong to that subject space.

Advantages

  • Possible to calculate authority on a subject space
  • This also makes calculations possible with lower complexity (big-Oh)

Disadvantages

  • Gathering facts in channels does not necessarily mean they are indeed related
  • The authority of a user gest dependent on the 'tagging' that people do when moving factlinks into channels

Word analyses

We could cluster facts based on word analysis, for instance using Latent Semantic Analysis.

Advantages

  • Does not require user interaction

Disadvantages

  • Word-similarity does not necessarily imply subject similarity
  • Harder than the other two approaches to implement

Using Subject Spaces for Authority

We want the authority of a user to be dependent of the subject space. So, what we could do is:

Let the authority of a user be determined by anything that exists in the subject space (and not outside). Let the authority of a user be determined by things in the subject space, dependent of the distance.

Abstract

We assume we calculate global authority for a user (u) in the following way:

[a(u) = 1 + \mathrm{influencing_authority}(f_i)]

where (f_i \in \mathrm{created_facts}(u))

We now make this authority subject specific in the following way:

[a(u,f_X) = 1 + \mathrm{influencing_authority}(u,s_i)]

where (s_i) are the subjectspaces that contain fact (f_X). In this document we will describe how we will determine which subjectspaces exist and in which subjectspaces a fact is contained, and how we can calculate the influencing authority of a subject space based on the influencing authority of the facts contained therein.

Terminology

Throughout this document I will try to use the following terms:

  • Authority: This is the braincycles a user gathers, authority is an unit which always refers to something a user has or receives
  • Credibility: this is the braincycles a fact(wheel) gathers, this always refers to something a fact has or receives
  • Topic: new word for subjectspace

Defining a subject space and calculating it's incluencing authority

A subject space is defined by a word, which is also used as a channel title. For instance, this could be 'Ruby' or 'Neuropsychology'. A subject space can contain many channels (all bearing its name). Facts can be contained by none, some, or all of those channels. The amount of relative participation of a fact in a certain subject space (as compared to other subject spaces) defines how much of the influencing authority trickles down.

For instance, when a fact of user u participates in 2 'ruby' channels, and in 3 'dynamic languages' channels, and it has 5 influencing_authority, this fact will contribute 2 to the 'ruby' authority of the user, and '3' to the 'dynamic languages' authority of the user.

More formally:

  • (ch(f)) The channels which contain the fact f
  • (ch(f,s)) The channels in subject space s which contain the fact f
  • (ia(f,u)) influencing authority of the fact f for user u (zero if the user has not created the fact)

We can now define the influencing authority of a user on a subject space:

[ia(s,u) = \sum_{f \in s} \frac{ch(f,s)}{ch(f)} * ia(f,u) ]

Calculating the credibility of a fact based on its believers

When we now want to calculate the credibility of a fact we can do the same thing backwards. We look in which subject spaces the fact is contained, and take a weigthed average per believer of its authority in those respective subject spaces. So:

[c(f) = \sum_{u \in \mathrm{opiniated}}\sum_{s \in \mathrm{containing subject spaces}} \frac{ch(f,s)}{ch(f)} * a(s,u) ]

Open issues:

  • when is a fact 'in' a channel? Is this only true if it is explicitly added? Or also if its in there because of its subchannels?
  • should we also look at the proportion (number of channels a fact is in for a certain subject space)/(number of channels in the subject space)?
  • in which topic is a factrelation? The topic of its to_fact? or from_fact?