connector tw configurations

SETTING UP CONNECTOR-TW

CONFIGURATIONS

Under the folder sda\confs\connector-tw you will find 3 configuration files:

log4j.properties

the properties for log4j. Set where you want the connector log. Edit this file following your needs.

twstats.cfg.xml

configuration file for hibernate. Edit it if you compiled the GE with the DAO default implementation. If you provide a different implementation you can leave this file as is or delete it. Edit the following fields with your database configuration:

<property name="connection.url"></property>
<property name="connection.username"> </property>
<property name="connection.password"> </property>

You can find the model of the default DAO in social-data-aggregator/data_model in the project directory.

TwStreamConnector.properties

Twitter Configurations

In this section of the configuration file there are all the properties regarding the connection with Twitter:

Key Name	Optional	Description
twConsumerKey	NO	Consumer Key of the twitter application
twConsumerSecret	NO	Consumer Secret of the twitter application
twToken	NO	User token
twTokenSecret	NO	User token secret

Node Configurations

In this section of the configuration file there are the configurations regarding the node that hosts the driver:

Key Name	Optional	Description
nodeName	NO	The name of the node (the value must be the same of the field monitoring_from_node in the db model in case you use the default DAO). This property is needed in case of multiple instances of the collector in nodes that have different Public IPs but share the same rdbms. In this way you can choose which key will be monitored from a target node.
proxyPort	YES	(Uncomment this property in case you use a proxy for outbound connections) The proxy port
proxyHost	YES	(Uncomment this property in case you use a proxy for outbound connections) The proxy host

Spark Configurations

In this section of the configuration file there are the configurations regarding the spark Streaming Context:

Key Name	Optional	Description
numMaxCore	YES	Number of cores to associate to this application (in case you have to run multiple streaming application) If you run just the collector you can comment this property
checkpointDir	NO	Directory where spark will save this application checkpoints
sparkBatchDurationMillis	NO	Duration of the batch (in milliseconds). It is the basic interval at which the system with receive the data in batches
sparkCleanTTL	NO	Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks generated, etc.). Periodic cleanups will ensure that metadata older than this duration will be forgotten.
twitterInserterWindowDuration	NO	Duration of the window. Both the window duration and the slide duration must be multiples of the batch interval. Save frequency for gathered data.
twitterInserterWindowSlidingInterval	NO	Window sliding interval. The interval at which the window will slide or move forward. (set equal to the twitterInserterWindowDuration to avoid duplicated data saved)

App Configurations

In this section of the configuration file there are the configurations regarding the app:

Key Name	Optional	Description
serverPort	NO	The port on which jetty server will listen. Needed to start,restart,stop the collector.
savePartitions	NO	Number of partition to coalesce before save. Equals one will generate one file containing raw tweets for window.
dataOutputFolder	NO	the folder where the raw data will be saved
dataRootFolder	NO	Root folder on which data will be saved. Example: dataOutputFolder=file://tmp/data and dataRootFolder=raw will save data on file://tmp/data/raw/...
daoClass	YES	class for the custom dao if you don't want to use the default one

Kafka Configurations

In this section of the configuration file there are the configurations regarding the kafka. If you don’t want the data sent on kafka delete or comment the following properties:

Key Name	Optional	Description
brokersList	NO	Kafka brokers list (separated by ,)
kafkaSerializationClass	NO	Default kafka.serializer.StringEncoder Change it if you want another serializer.
kafkaRequiredAcks	NO	tells Kafka the number of acks you want your Producer to require from the Broker that the message was received.
maxTotalConnections	NO	number of total connections for the connection pool
maxIdleConnections	NO	number of idle connections for the connection pool
customProducerFactoryImpl	YES	uncomment if needed other implementation for connection to bus different than kafka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly