org.apache.hadoop.examples.terasort
Class TeraInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
org.apache.hadoop.examples.terasort.TeraInputFormat
public class TeraInputFormat
- extends FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
An input format that reads the first 10 characters of each line as the key
and the rest of the line as the value. Both key and value are represented
as Text.
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TeraInputFormat
public TeraInputFormat()
writePartitionFile
public static void writePartitionFile(JobContext job,
org.apache.hadoop.fs.Path partFile)
throws IOException,
InterruptedException
- Use the input splits to take samples of the input and generate sample
keys. By default reads 100,000 keys from 10 locations in the input, sorts
them and picks N-1 keys to generate N equally sized partitions.
- Parameters:
job
- the job to samplepartFile
- where to write the output file to
- Throws:
IOException
- if something goes wrong
InterruptedException
createRecordReader
public RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> createRecordReader(InputSplit split,
TaskAttemptContext context)
throws IOException
- Description copied from class:
InputFormat
- Create a record reader for a given split. The framework will call
RecordReader.initialize(InputSplit, TaskAttemptContext)
before
the split is used.
- Specified by:
createRecordReader
in class InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Parameters:
split
- the split to be readcontext
- the information about the task
- Returns:
- a new record reader
- Throws:
IOException
makeSplit
protected FileSplit makeSplit(org.apache.hadoop.fs.Path file,
long start,
long length,
String[] hosts)
- Description copied from class:
FileInputFormat
- A factory that makes the split for this class. It can be overridden
by sub-classes to make sub-types
- Overrides:
makeSplit
in class FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
getSplits
public List<InputSplit> getSplits(JobContext job)
throws IOException
- Description copied from class:
FileInputFormat
- Generate the list of files and make them into FileSplits.
- Overrides:
getSplits
in class FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Parameters:
job
- the job context
- Returns:
- an array of
InputSplit
s for the job.
- Throws:
IOException
Copyright © 2009 The Apache Software Foundation