hadoop mapreduce多表关联
假设有如下两个文件,一个是表是公司和地址的序号的对应,一个表是地址的序号和地址的名称的对应。
表1:
[plain]
A:Beijing Red Star 1
A:Shenzhen Thunder 3
A:Guangzhou Honda 2
A:Beijing Rising 1
A:Guangzhou Development Bank 2
A:Tencent 3
A:Back of Beijing 1
表2:
[plain]
B:1 Beijing
B:2 Guangzhou
B:3 Shenzhen
B:4 Xian
mapreduce如下:
[plain]
private static final Text typeA = new Text("A:");
private static final Text typeB = new Text("B:");
private static Log log = LogFactory.getLog(MTJoin.class);
public static class Map extends Mapper<Object, Text, Text, MapWritable> {
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String valueStr = value.toString();
String type = valueStr.substring(0, 2);
String content = valueStr.substring(2);
log.info(content);
if(type.equals("A:"))
{
String[] contentArray = content.split("\t");
String city = contentArray[0];
String address = contentArray[1];
MapWritable map = new MapWritable();
map.put(typeA, new Text(city));
context.write(new Text(address), map);
}
else if(type.equals("B:"))
{
String[] contentArray = content.split("\t");
String adrNum = contentArray[0];
String adrName = contentArray[1];
MapWritable map = new MapWritable();
map.put(typeB, new Text(adrName));
context.write(new Text(adrNum), map);
}
}
}
public static class Reduce extends Reducer<Text, MapWritable, Text, Text> {
public void reduce(Text key, Iterable<MapWritable> values, Context context)
throws IOException, InterruptedException {
Iterator<MapWritable> it = values.iterator();
List<Text> cityList = new ArrayList<Text>();
List<Text> adrList = new ArrayList<Text>();
while(it.hasNext())
{
MapWritable map = it.next();
if(map.containsKey(typeA))
{
cityList.add((Text)map.get(typeA));
}
else if(map.containsKey(typeB))
{
adrList.add((Text)map.get(typeB));
}
}
for(int i = 0; i < cityList.size(); i++)
{
for(int j = 0; j < adrList.size(); j++)
{
context.write(cityList.get(i), adrList.get(j));
}
}
}
}
原理很简单,map的出口,以地址的序号作为key,然后出来的时候,公司名称放一个list,地址的名称放一个list,两个list的内容作笛卡儿积,就得到了结果。
输出如下:
[plain]
Beijing Red Star Beijing
Beijing Rising Beijing
Back of Beijing Beijing
Guangzhou Honda Guangzhou
Guangzhou Development Bank Guangzhou
Shenzhen Thunder Shenzhen
Tencent Shenzhen