org.apache.hadoop.examples
Class RandomTextWriter

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.examples.RandomTextWriter
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class RandomTextWriter
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task writes a large unsorted random sequence of words. In order for this program to generate data for terasort with a 5-10 words per key and 20-100 words per value, have the following config:

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.randomtextwriter.minwordskey</name> <value>5</value> </property> <property> <name>mapreduce.randomtextwriter.maxwordskey</name> <value>10</value> </property> <property> <name>mapreduce.randomtextwriter.minwordsvalue</name> <value>20</value> </property> <property> <name>mapreduce.randomtextwriter.maxwordsvalue</name> <value>100</value> </property> <property> <name>mapreduce.randomtextwriter.totalbytes</name> <value>1099511627776</value> </property> </configuration> Equivalently, RandomTextWriter also supports all the above options and ones supported by Tool via the command-line. To run: bin/hadoop jar hadoop-${version}-examples.jar randomtextwriter [-outFormat output format class] output


Field Summary
static String BYTES_PER_MAP
           
static String MAPS_PER_HOST
           
static String MAX_KEY
           
static String MAX_VALUE
           
static String MIN_KEY
           
static String MIN_VALUE
           
static String TOTAL_BYTES
           
 
Constructor Summary
RandomTextWriter()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
          This is the main routine for launching a distributed random write job.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

TOTAL_BYTES

public static final String TOTAL_BYTES
See Also:
Constant Field Values

BYTES_PER_MAP

public static final String BYTES_PER_MAP
See Also:
Constant Field Values

MAPS_PER_HOST

public static final String MAPS_PER_HOST
See Also:
Constant Field Values

MAX_VALUE

public static final String MAX_VALUE
See Also:
Constant Field Values

MIN_VALUE

public static final String MIN_VALUE
See Also:
Constant Field Values

MIN_KEY

public static final String MIN_KEY
See Also:
Constant Field Values

MAX_KEY

public static final String MAX_KEY
See Also:
Constant Field Values
Constructor Detail

RandomTextWriter

public RandomTextWriter()
Method Detail

run

public int run(String[] args)
        throws Exception
This is the main routine for launching a distributed random write job. It runs 10 maps/node and each node writes 1 gig of data to a DFS file. The reduce doesn't do anything.

Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
IOException
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2009 The Apache Software Foundation