Skip to content

connector tw configurations

dinoolivo edited this page Mar 11, 2015 · 7 revisions

SETTING UP CONNECTOR-TW

CONFIGURATIONS

Under the folder sda\confs\connector-tw you will find 3 configuration files:

log4j.properties

the properties for log4j. Set where you want the connector log. Edit this file following your needs.

twstats.cfg.xml

configuration file for hibernate. Edit it if you compiled the GE with the DAO default implementation. If you provide a different implementation you can leave this file as is or delete it. Edit the following fields with your database configuration:

<property name="connection.url"></property>
<property name="connection.username"> </property>
<property name="connection.password"> </property>

You can find the model of the default DAO in social-data-aggregator/data_model in the project directory.

TwStreamConnector.properties

Twitter Configurations

In this section of the configuration file there are all the properties regarding the connection with Twitter:

Key Name Optional Description
twConsumerKey NO Consumer Key of the twitter application
twConsumerSecret NO Consumer Secret of the twitter application
twToken NO User token
twTokenSecret NO User token secret

Node Configurations

In this section of the configuration file there are the configurations regarding the node that hosts the driver:

Key Name Optional Description
nodeName NO The name of the node (the value must be the same of the field monitoring_from_node in the db model in case you use the default DAO). This property is needed in case of multiple instances of the collector in nodes that have different Public IPs but share the same rdbms. In this way you can choose which key will be monitored from a target node.
proxyPort YES (Uncomment this property in case you use a proxy for outbound connections) The proxy port
proxyHost YES (Uncomment this property in case you use a proxy for outbound connections) The proxy host

Spark Configurations

In this section of the configuration file there are the configurations regarding the spark Streaming Context:

Key Name Optional Description
numMaxCore YES Number of cores to associate to this application (in case you have to run multiple streaming application) If you run just the collector you can comment this property
checkpointDir NO Directory where spark will save this application checkpoints
sparkBatchDurationMillis NO Duration of the batch (in milliseconds). It is the basic interval at which the system with receive the data in batches
sparkCleanTTL NO Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be forgotten.
twitterInserterWindowDuration NO Duration of the window. Both the window duration and the slide duration must be multiples of the batch interval. Save frequency for gathered data.
twitterInserterWindowSlidingInterval NO Window sliding interval. The interval at which the window will slide or move forward. (set equal to the twitterInserterWindowDuration to avoid duplicated data saved)

App Configurations

In this section of the configuration file there are the configurations regarding the app:

Key Name Optional Description
serverPort NO The port on which jetty server will listen. Needed to start,restart,stop the collector.
savePartitions NO Number of partition to coalesce before save. Equals one will generate one file containing raw tweets for window.
dataOutputFolder NO the folder where the raw data will be saved
dataRootFolder NO Root folder on which data will be saved. Example: dataOutputFolder=file://tmp/data and dataRootFolder=raw will save data on file://tmp/data/raw/...
daoClass YES class for the custom dao if you don't want to use the default one

Kafka Configurations

In this section of the configuration file there are the configurations regarding the kafka. If you don’t want the data sent on kafka delete or comment the following properties:

Key Name Optional Description
brokersList NO Kafka brokers list (separated by ,)
kafkaSerializationClass NO Default kafka.serializer.StringEncoder Change it if you want another serializer.
kafkaRequiredAcks NO tells Kafka the number of acks you want your Producer to require from the Broker that the message was received.
maxTotalConnections NO number of total connections for the connection pool
maxIdleConnections NO number of idle connections for the connection pool
customProducerFactoryImpl YES uncomment if needed other implementation for connection to bus different than kafka