Package trlda :: Package utils
[frames] | no frames]

Package utils

source code

Functions
list/generator
load_documents(filepath, batch_size=None, stochastic=False)
Load documents from a text file.
list/generator
load_users(filepath, batch_size=None, stochastic=False, threshold=4)
Load users from a text file.
dict/generator
load_users_as_dict(filepath, batch_size=None, stochastic=False, threshold=4)
Like load_users, but users are stored in a dictionary instead of a list.
list
random_select(k, n)
Randomly selects $k$ out of $n$ elements.
 
sample_dirichlet(...)
 
polygamma(...)
Function Details

load_documents(filepath, batch_size=None, stochastic=False)

 

Load documents from a text file. If batch_size is given, behaves like a generator and returns one batch at a time. Each document is represented as a list of tuples, where each tuple contains a word id and a word count.

Each line of the text file is assumed to contain one document and should start with the number of unique words in that document, followed by the words. Each word should be represented by its id and a number of occurences separated by a colon. For example:

       6 5600:2 293:1 5548:1 2577:1 3733:3 2677:2
Parameters:
  • batch_size (int) - the number of documents to return at once
  • stochastic (bool) - if True, the batch size is drawn from a Poisson distribution
Returns: list/generator
returns either a list of documents or a generator of lists of documents

See Also: load_users()

load_users(filepath, batch_size=None, stochastic=False, threshold=4)

 

Load users from a text file. If `batch_size` is given, behaves like a generator and returns one batch at a time. Each user is represented as a list of tuples, where each tuple contains an item id and a rating.

Each line is assumed to contain a 3-tuple of a user id, an item id, and a rating. The ratings of users should be grouped. For example:

       1488844   1  3
       1488844   8  4
       1488844  17  2
       1488844  30  3
       8850131  33  4
       8850131  35  1
       8850131  86  5
Parameters:
  • filepath (str) - path to file containing data
  • batch_size (int) - the number of users to return at once
  • stochastic (bool) - if True, the batch size is drawn from a Poisson distribution
  • threshold (int) - only load users whose rating is greater or equal this threshold
Returns: list/generator
returns either a list of users or a generator of lists of users

See Also: load_documents()

load_users_as_dict(filepath, batch_size=None, stochastic=False, threshold=4)

 

Like load_users, but users are stored in a dictionary instead of a list. The keys correspond to user IDs and the values are lists of item/rating pairs.

Returns: dict/generator
returns either a dictionary of users or a generator of dictionaries of users

random_select(k, n)

 

Randomly selects $k$ out of $n$ elements.

Parameters:
  • k (int) - the number of elements to pick
  • n (int) - the number of elements to pick from
Returns: list
a list of $k$ indices