-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add hz_mobile and senegal_mobile repos
- Loading branch information
1 parent
191e069
commit 5e6afd4
Showing
5 changed files
with
124 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# HangZhou Data Description | ||
|
||
This data set contains user web-browsing logs in 19 days, across two months, | ||
in August and October, 2012. The network topology covers the main areas of | ||
Hangzhou City and Wenzhou City, Zhejiang Province. | ||
|
||
|
||
## Data path | ||
|
||
hdfs://user/omnilab/warehouse/HzMobile/hzclean | ||
|
||
|
||
## Data Columns | ||
|
||
This folder maintains the set after data cleansing and formatting. Each set | ||
contains 27 independent columns separated by '\t' to describe user web-browsing activities. | ||
|
||
* ttime (double): timestamp issuing a web request, in seconds | ||
* dtime (double): timestamp ending a request or dumping this log, in seconds | ||
* BS (long): signature of individual base stations (LAC*10^6 + CI) | ||
* IMSI (string): user IMSI signature | ||
* mobile_type (string): signature of mobile client type | ||
* dest_ip (long): destination IP address | ||
* dest_port (int): destination TCP port | ||
* success (long): indicating if the web request succeeded | ||
* failure_cause (string): reason of web request failure | ||
* response_time (long): time delay from request to the first byte of response | ||
* host (string): host name of web request | ||
* content_length (long): content-length field of HTTP header | ||
* retransfer_count (long): the number of retransmission | ||
* packets (long): the number of network packets | ||
* status_code (int): HTTP status code | ||
* web_volume (long): byte number of transfered web request | ||
* content_type (string): content-type field of HTTP header | ||
* user_agent (string): MD5 value of user-agent field of HTTP header | ||
* is_mobile (int): if the client is mobile device | ||
* e_gprs (int): E_GPRS mode indicator | ||
* umts_tdd (int): UMTS/TDD mode indicator | ||
* ICP (long): classification of Internet Content Providers, e.g., Netease | ||
* SC (string): service classification, e.g., video, music. | ||
* URI (string): Uniform resource identifier | ||
* OS (string): operating system type | ||
* LON (double): latitude of base station location | ||
* LAT (double): longitude of base station location | ||
|
||
|
||
## Data Stat | ||
|
||
* Total logs: 852314304 | ||
* Total unique users: | ||
* Total base stations: | ||
|
||
|
||
## Data sample | ||
|
||
1345084549.229 1345085752.000 22696030330 460022688112277 1862344734 80 1 2000 storage7.cdn.kugou.com 17221 0 12 206 16384 application/octet-stream 12 0 13 酷狗音乐网 /201208161032/602b72233338bcf732ed1a0d1ab9de0e/M01/11/DC/OtfxyE_xEgf9hd16AB-VzJZmo9g824.m4a 119.06104 29.615866 | ||
1345084666.528 1345085752.000 22696030330 460007472554744 com.sina.weibo 1862344789 80 1 1880 tp4.sinaimg.cn 3079 0 3 200 2550 image/jpeg zMu/gx+2nvFedIg8HudOww== 1 10 0 32 新浪微博 /2044794631/50/5613889201/0 IOS 119.06104 29.615866 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Senegal Mobile Data | ||
|
||
This repo contains data sets for the second Data for Development (D4D) Challenge. The data were wangled from Orange's | ||
mobile phone users in Senegal, wich consist of three subsets: | ||
|
||
* Dataset 1: One year of site-to-site traffic for 1666 sites on an hourly basis, | ||
|
||
* Dataset 2: Fine-grained mobility data (site level) on a rolling 2-week basis with bandicoot behavioral indicators at | ||
individual level for about 300,000 randomly sampled users meeting the two criteria mentioned before for each 2 week | ||
period, | ||
|
||
* Dataset 3: One year of coarse-grained (123 arrondissement level) mobility data with bandicoot behavioral indicators at | ||
individual level for about 150,000 randomly sampled users meeting the two criteria mentioned before for a year | ||
|
||
|
||
## Data path | ||
|
||
hdfs://user/omnilab/warehouse/Senegal | ||
|
||
|
||
## Data format | ||
|
||
For more introduction of data collection, preprocessing and format, refer to [this | ||
paper](http://arxiv.org/abs/1407.4885). | ||
|
||
|
||
## Data sample | ||
|
||
1,2013-01-07 13:10:00,461 | ||
1,2013-01-07 17:20:00,454 | ||
1,2013-01-07 17:30:00,454 | ||
1,2013-01-07 18:40:00,327 | ||
1,2013-01-07 20:30:00,323 | ||
1,2013-01-08 18:40:00,323 | ||
1,2013-01-08 19:30:00,323 | ||
1,2013-01-08 21:00:00,323 | ||
1,2013-01-09 11:00:00,323 | ||
1,2013-01-09 14:50:00,323 |
File renamed without changes.
File renamed without changes.