Hadoop HDFS 0.21.0 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes Since Hadoop 0.20.2
Sub-task
- [HDFS-396] - Process dfs.name.edits.dirs as URI
- [HDFS-436] - AspectJ framework for HDFS code and tests
- [HDFS-444] - Current fault injection framework implementation doesn't allow to change probability levels dynamically
- [HDFS-475] - Create a separate targets for fault injection related test and jar files creation files
- [HDFS-498] - Add development guide and framework documentation
- [HDFS-508] - Factor out BlockInfo from BlocksMap
- [HDFS-519] - Create new tests for lease recovery
- [HDFS-520] - Create new tests for block recovery
- [HDFS-521] - Create new tests for pipeline
- [HDFS-551] - Create new functional test for a block report.
- [HDFS-552] - Change TestFiDataTransferProtocol to junit 4 and add a few new tests
- [HDFS-561] - Fix write pipeline READ_TIMEOUT
- [HDFS-564] - Adding pipeline test 17-35
- [HDFS-616] - Create functional tests for new design of the block report
- [HDFS-660] - Remove deprecated methods from InterDatanodeProtocol.
- [HDFS-663] - DFSIO for append
- [HDFS-668] - TestFileAppend3#TC7 sometimes hangs
- [HDFS-676] - NPE in FSDataset.updateReplicaUnderRecovery(..)
- [HDFS-716] - Define a pointcut for pipeline close
- [HDFS-719] - Add more fault injection tests for pipeline close
- [HDFS-730] - Add fault injection tests for pipleline close ack
- [HDFS-1057] - Concurrent readers hit ChecksumExceptions if following a writer to very end of file
- [HDFS-1067] - Create block recovery tests that handle errors
- [HDFS-1100] - Override TestFcHdfsSymlink#unwrapException
Bug
- [HDFS-15] - All replicas of a block end up on only 1 rack
- [HDFS-29] - In Datanode, update block may fail due to length inconsistency
- [HDFS-76] - Namespace quota exceeded message unclear
- [HDFS-94] - The "Heap Size" in HDFS web ui may not be accurate
- [HDFS-101] - DFS write pipeline : DFSClient sometimes does not detect second datanode failure
- [HDFS-119] - logSync() may block NameNode forever.
- [HDFS-127] - DFSClient block read failures cause open DFSInputStream to become unusable
- [HDFS-145] - FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
- [HDFS-167] - DFSClient continues to retry indefinitely
- [HDFS-181] - INode.getPathComponents throws NPE when given a non-absolute path
- [HDFS-187] - TestStartup fails if hdfs is running in the same machine
- [HDFS-192] - TestBackupNode sometimes fails
- [HDFS-195] - Need to handle access token expiration when re-establishing the pipeline for dfs write
- [HDFS-415] - Unchecked exception thrown inside of BlockReceiver cause some threads hang
- [HDFS-423] - Unbreak FUSE build and fuse_dfs_wrapper.sh
- [HDFS-438] - Improve help message for quotas
- [HDFS-439] - HADOOP-5961 is incorrectly committed.
- [HDFS-440] - javadoc warnings: broken links
- [HDFS-441] - TestFTPFileSystem fails
- [HDFS-445] - pread() fails when cached block locations are no longer valid
- [HDFS-446] - Offline Image Viewer Ls visitor incorrectly says 'output file' instead of 'input file'
- [HDFS-454] - HDFS workflow in JIRA does not match MAPREDUCE, HADOOP
- [HDFS-456] - Problems with dfs.name.edits.dirs as URI
- [HDFS-462] - Unit tests not working under Windows
- [HDFS-463] - CreateEditsLog utility broken due to FSImage URL scheme check
- [HDFS-464] - Memory leaks in libhdfs
- [HDFS-466] - hdfs_write infinite loop when dfs fails and cannot write files > 2 GB
- [HDFS-472] - Document hdfsproxy design and set-up guide
- [HDFS-480] - Typo in jar name in build.xml
- [HDFS-481] - Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
- [HDFS-482] - change HsftpFileSystem's ssl.client.do.not.authenticate.server configuration setting to ssl-client.xml
- [HDFS-483] - Data transfer (aka pipeline) implementation cannot tolerate exceptions
- [HDFS-484] - bin-package and package doesnt seem to package any jar file
- [HDFS-489] - Updated TestHDFSCLI for changes from HADOOP-6139
- [HDFS-499] - Fix deprecation warnings introduced by HADOOP-5438
- [HDFS-500] - Fix lingering and new javac warnings
- [HDFS-514] - DFSClient.namenode is a public field. Should be private.
- [HDFS-525] - ListPathsServlet.java uses static SimpleDateFormat that has threading issues
- [HDFS-534] - Required avro classes are missing
- [HDFS-538] - DistributedFileSystem::listStatus incorrectly returns null for empty result sets
- [HDFS-540] - TestNameNodeMetrics fails intermittently
- [HDFS-553] - BlockSender reports wrong failed position in ChecksumException
- [HDFS-568] - TestServiceLevelAuthorization fails on latest build in Hudson
- [HDFS-586] - TestBlocksWithNotEnoughRacks fails
- [HDFS-587] - Test programs support only default queue.
- [HDFS-590] - When trying to rename a non-existent path, LocalFileSystem throws an FileNotFoundException, while HDFS returns false
- [HDFS-596] - Memory leak in libhdfs: hdfsFreeFileInfo() in libhdfs does not free memory for mOwner and mGroup
- [HDFS-601] - TestBlockReport should obtain data directories from MiniHDFSCluster
- [HDFS-602] - Atempt to make a directory under an existing file on DistributedFileSystem should throw an FileAlreadyExistsException instead of FileNotFoundException
- [HDFS-606] - ConcurrentModificationException in invalidateCorruptReplicas()
- [HDFS-609] - Create a file with the append flag does not work in HDFS
- [HDFS-611] - Heartbeats times from Datanodes increase when there are plenty of blocks to delete
- [HDFS-612] - FSDataset should not use org.mortbay.log.Log
- [HDFS-614] - TestDatanodeBlockScanner obtain should data-node directories directly from MiniDFSCluster
- [HDFS-615] - TestLargeDirectoryDelete fails with NullPointerException
- [HDFS-622] - checkMinReplication should count only live node.
- [HDFS-625] - ListPathsServlet throws NullPointerException
- [HDFS-629] - Remove ReplicationTargetChooser.java along with fixing import warnings.
- [HDFS-637] - DataNode sends a Success ack when block write fails
- [HDFS-638] - The build.xml refences jars that don't exist
- [HDFS-640] - TestHDFSFileContextMainOperations uses old FileContext.mkdirs(..)
- [HDFS-641] - Move all of the benchmarks and tests that depend on mapreduce to mapreduce
- [HDFS-646] - missing test-contrib ant target would break hudson patch test process
- [HDFS-647] - Internal server errors
- [HDFS-653] - Multiple unit tests fail in branch-0.21
- [HDFS-673] - BlockReceiver#PacketResponder should not remove a packet from the ack queue before its ack is sent
- [HDFS-677] - Rename failure due to quota results in deletion of src directory
- [HDFS-679] - Appending to a partial chunk incorrectly assumes the first packet fills up the partial chunk
- [HDFS-682] - TestBlockUnderConstruction fails
- [HDFS-688] - Add configuration resources to DFSAdmin
- [HDFS-690] - TestAppend2#testComplexAppend failed on "Too many open files"
- [HDFS-691] - Limitation on java.io.InputStream.available()
- [HDFS-695] - RaidNode should read in configuration from hdfs-site.xml
- [HDFS-699] - Primary datanode should compare replicas' on disk lengths
- [HDFS-706] - Intermittent failures in TestFiHFlush
- [HDFS-709] - TestDFSShell failure
- [HDFS-720] - NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
- [HDFS-722] - The pointcut callCreateBlockWriteStream in FSDatasetAspects is broken
- [HDFS-724] - Pipeline close hangs if one of the datanode is not responsive.
- [HDFS-725] - Support the build error fix for HADOOP-6327
- [HDFS-726] - Eclipse .classpath template has outdated jar files and is missing some new ones.
- [HDFS-735] - TestReadWhileWriting has wrong line termination symbols
- [HDFS-741] - TestHFlush test doesn't seek() past previously written part of the file
- [HDFS-750] - TestRename build failure
- [HDFS-751] - TestCrcCorruption succeeds but is not testing anything of value
- [HDFS-756] - libhdfs unit tests do not run
- [HDFS-757] - Unit tests failure for RAID
- [HDFS-760] - "fs -put" fails if dfs.umask is set to 63
- [HDFS-761] - Failure to process rename operation from edits log due to quota verification
- [HDFS-762] - Trying to start the balancer throws a NPE
- [HDFS-763] - DataBlockScanner reporting of bad blocks is slightly misleading
- [HDFS-774] - Intermittent race condition in TestFiPipelines
- [HDFS-775] - FSDataset calls getCapacity() twice -bug?
- [HDFS-781] - Metrics PendingDeletionBlocks is not decremented
- [HDFS-783] - libhdfs tests brakes code coverage runs with Clover
- [HDFS-785] - Missing license header in java source files.
- [HDFS-787] - Make the versions of libraries consistent
- [HDFS-791] - Build is broken after HDFS-787 patch has been applied
- [HDFS-792] - TestHDFSCLI is failing
- [HDFS-793] - DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet
- [HDFS-797] - TestHDFSCLI much slower after HDFS-265 merge
- [HDFS-802] - Update Eclipse configuration to match changes to Ivy configuration
- [HDFS-812] - FSNamesystem#internalReleaseLease throws NullPointerException on a single-block file's lease recovery
- [HDFS-823] - In Checkpointer the getImage servlet is added to public rather than internal servlet list
- [HDFS-824] - Stop lease checker in TestReadWhileWriting
- [HDFS-825] - Build fails to pull latest hadoop-core-* artifacts
- [HDFS-840] - Update File Context tests to use FileContextTestHelper
- [HDFS-849] - TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
- [HDFS-856] - Hardcoded replication level for new files in fuse-dfs
- [HDFS-857] - Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines
- [HDFS-858] - Incorrect return codes for fuse-dfs
- [HDFS-859] - fuse-dfs utime behavior causes issues with tar
- [HDFS-861] - fuse-dfs does not support O_RDWR
- [HDFS-868] - Link to Hadoop Upgrade Wiki is broken
- [HDFS-877] - Client-driven block verification not functioning
- [HDFS-880] - TestNNLeaseRecovery fails on windows
- [HDFS-885] - Datanode toString() NPEs on null dnRegistration
- [HDFS-894] - DatanodeID.ipcPort is not updated when existing node re-registers
- [HDFS-897] - ReplicasMap remove has a bug in generation stamp comparison
- [HDFS-909] - Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
- [HDFS-913] - TestRename won't run automatically from 'run-test-hdfs-faul-inject' target
- [HDFS-927] - DFSInputStream retries too many times for new block locations
- [HDFS-938] - Replace calls to UGI.getUserName() with UGI.getShortUserName()
- [HDFS-939] - libhdfs test is broken
- [HDFS-940] - libhdfs uses UnixUserGroupInformation
- [HDFS-961] - dfs_readdir incorrectly parses paths
- [HDFS-965] - TestDelegationToken fails in trunk
- [HDFS-966] - NameNode recovers lease even in safemode
- [HDFS-995] - Replace usage of FileStatus#isDir()
- [HDFS-1000] - libhdfs needs to be updated to use the new UGI
- [HDFS-1002] - Secondary Name Node crash, NPE in edit log replay
- [HDFS-1010] - HDFSProxy: Retrieve group information from UnixUserGroupInformation instead of LdapEntry
- [HDFS-1014] - Error in reading delegation tokens from edit logs.
- [HDFS-1015] - Intermittent failure in TestSecurityTokenEditLog
- [HDFS-1024] - SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
- [HDFS-1041] - DFSClient does not retry in getFileChecksum(..)
- [HDFS-1046] - Build fails trying to download an old version of tomcat
- [HDFS-1072] - AlreadyBeingCreatedException with HDFS_NameNode as the lease holder
- [HDFS-1074] - TestProxyUtil fails
- [HDFS-1088] - Prevent renaming a symlink to its target
- [HDFS-1101] - TestDiskError.testLocalDirs() fails
- [HDFS-1104] - Fsck triggers full GC on NameNode
- [HDFS-1159] - clean-cache target removes wrong ivy cache
- [HDFS-1165] - createSymlink should not hold the fsnamesytem lock when sync its edit log to disk
- [HDFS-1173] - Fix references to 0.22 in 0.21 branch
- [HDFS-1181] - Move configuration and script files post project split
- [HDFS-1193] - -mvn-system-deploy target is broken which inturn fails the mvn-deploy task leading to unstable mapreduce build.
- [HDFS-1212] - Harmonize HDFS JAR library versions with Common
- [HDFS-1255] - test-libhdfs.sh fails
- [HDFS-1256] - libhdfs is missing from the tarball
- [HDFS-1258] - Clearing namespace quota on "/" corrupts FS image
- [HDFS-1267] - fuse-dfs does not compile
- [HDFS-1288] - start-all.sh / stop-all.sh does not seem to work with HDFS
- [HDFS-1292] - Allow artifacts to be published to the staging Apache Nexus Maven Repository
- [HDFS-1299] - 'compile-fault-inject' never should be called directly.
- [HDFS-1311] - Running tests with 'testcase' cause triple execution of the same test case
- [HDFS-1313] - HdfsProxy changes from HDFS-481 missed in y20.1xx
Improvement
- [HDFS-173] - Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
- [HDFS-265] - Revisit append
- [HDFS-278] - Should DFS outputstream's close wait forever?
- [HDFS-288] - Redundant computation in hashCode() implemenation
- [HDFS-352] - saveNamespace command should be documented.
- [HDFS-377] - Code Refactoring: separate codes which implement DataTransferProtocol
- [HDFS-381] - Datanode should report deletion of blocks to Namenode explicitly
- [HDFS-385] - Design a pluggable interface to place replicas of blocks in HDFS
- [HDFS-412] - Hadoop JMX usage makes Nagios monitoring impossible
- [HDFS-443] - New metrics in namenode to capture lost heartbeats.
- [HDFS-457] - better handling of volume failure in Data Node storage
- [HDFS-490] - eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
- [HDFS-493] - Only fault-injected tests have to be executed by run-test-*-faul-inject targets; none of fault-injected tests need to be ran normal testing process
- [HDFS-496] - Use PureJavaCrc32 in HDFS
- [HDFS-501] - Use enum to define the constants in DataTransferProtocol
- [HDFS-504] - HDFS updates the modification time of a file when the file is closed.
- [HDFS-510] - Rename DatanodeBlockInfo to be ReplicaInfo
- [HDFS-511] - Redundant block searches in BlockManager.
- [HDFS-512] - Set block id as the key to Block
- [HDFS-524] - Further DataTransferProtocol code refactoring.
- [HDFS-527] - Refactor DFSClient constructors
- [HDFS-529] - More redundant block searches in BlockManager.
- [HDFS-530] - Refactor TestFileAppend* to remove code duplications
- [HDFS-531] - Renaming of configuration keys
- [HDFS-532] - Allow applications to know that a read request failed because block is missing
- [HDFS-539] - Fault injeciton utlis for pipeline testing needs to be refactored for future reuse by other tests
- [HDFS-546] - DatanodeDescriptor block iterator should be BlockInfo based rather than Block.
- [HDFS-548] - TestFsck takes nearly 10 minutes to run - a quarter of the entire hdfs-test time
- [HDFS-549] - Allow non fault-inject specific tests execution with an explicit -Dtestcase=... setting
- [HDFS-563] - Simplify the codes in FSNamesystem.getBlockLocations(..)
- [HDFS-578] - Support for using server default values for blockSize and replication when creating a file
- [HDFS-581] - Introduce an iterator over blocks in the block report array.
- [HDFS-584] - Fail the fault-inject build if any advices are mis-bound
- [HDFS-598] - Eclipse launch task for HDFS
- [HDFS-605] - There's not need to run fault-inject tests by 'run-test-hdfs-with-mr' target
- [HDFS-617] - Support for non-recursive create() in HDFS
- [HDFS-618] - Support for non-recursive mkdir in HDFS
- [HDFS-630] - In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
- [HDFS-631] - Changes in HDFS to rename the config keys as detailed in HDFS-531.
- [HDFS-680] - Add new access method to a copy of a block's replica
- [HDFS-685] - Use the user-to-groups mapping service in the NameNode
- [HDFS-703] - Replace current fault injection implementation with one from Common
- [HDFS-704] - Unify build property names to facilitate cross-projects modifications
- [HDFS-707] - Remove unused method INodeFile.toINodeFileUnderConstruction()
- [HDFS-728] - Create a comprehensive functional test for append
- [HDFS-729] - fsck option to list only corrupted files
- [HDFS-736] - commitBlockSynchronization() should directly update block GS and length.
- [HDFS-737] - Improvement in metasave output
- [HDFS-754] - Reduce ivy console output to observable level
- [HDFS-755] - Read multiple checksum chunks at once in DFSInputStream
- [HDFS-758] - Improve reporting of progress of decommissioning
- [HDFS-764] - Moving Access Token implementation from Common to HDFS
- [HDFS-767] - Job failure due to BlockMissingException
- [HDFS-786] - Implement getContentSummary(..) in HftpFileSystem
- [HDFS-800] - The last block of a file under construction may change to the COMPLETE state in response to getAdditionalBlock or completeFileInternal
- [HDFS-806] - Add new unit tests to the 10-mins 'run-commit-test' target
- [HDFS-822] - Appends to already-finalized blocks can rename across volumes
- [HDFS-826] - Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
- [HDFS-832] - HDFS side of HADOOP-6222.
- [HDFS-844] - Log the filename when file locking fails
- [HDFS-850] - Display more memory details on the web ui
- [HDFS-854] - Datanode should scan devices in parallel to generate block report
- [HDFS-873] - DataNode directories as URIs
- [HDFS-883] - Datanode shutdown should log problems with Storage.unlockAll()
- [HDFS-892] - optionally use Avro for namenode RPC
- [HDFS-921] - Convert TestDFSClientRetries::testNotYetReplicatedErrors to Mockito
- [HDFS-930] - o.a.h.hdfs.server.datanode.DataXceiver - run() - Version mismatch exception - more context to help debugging
- [HDFS-933] - Add createIdentifier() implementation to DelegationTokenSecretManager
- [HDFS-946] - NameNode should not return full path name when lisitng a diretory or getting the status of a file
- [HDFS-949] - Move Delegation token into Common so that we can use it for MapReduce also
- [HDFS-968] - s/StringBuffer/StringBuilder - as necessary
- [HDFS-986] - Push HADOOP-6551 into HDFS
- [HDFS-994] - Provide methods for obtaining delegation token from Namenode for hftp and other uses
- [HDFS-997] - DataNode local directories should have narrow permissions
- [HDFS-998] - The servlets should quote server generated strings sent in the response
- [HDFS-1009] - Support Kerberos authorization in HDFSProxy
- [HDFS-1011] - Improve Logging in HDFSProxy to include cluster name associated with the request
- [HDFS-1012] - documentLocation attribute in LdapEntry for HDFSProxy isn't specific to a cluster
- [HDFS-1016] - HDFS side change for HADOOP-6569
- [HDFS-1031] - Enhance the webUi to list a few of the corrupted files in HDFS
- [HDFS-1047] - Install/deploy source jars to Maven repo
- [HDFS-1054] - Remove unnecessary sleep after failure in nextBlockOutputStream
- [HDFS-1063] - Eclipse .classpath file should be generated from Ivy files to avoid duplicating dependencies
- [HDFS-1078] - update libhdfs build process to produce static libraries
- [HDFS-1083] - Update TestHDFSCLI to not to expect exception class name in the error messages
- [HDFS-1087] - Use StringBuilder instead of Formatter for audit logs
- [HDFS-1089] - Remove uses of FileContext#isFile, isDirectory and exists
- [HDFS-1092] - Use logging rather than System.err in MiniDFSCluster
- [HDFS-1107] - Turn on append by default.
- [HDFS-1126] - Change HDFS to depend on Hadoop 'common' artifacts instead of 'core'
- [HDFS-1134] - Large-scale Automated Framework
- [HDFS-1161] - Make DN minimum valid volumes configurable
- [HDFS-1170] - Add more assertions to TestLargeDirectoryDelete
- [HDFS-1199] - Extract a subset of tests for smoke (DOA) validation.
New Feature
- [HDFS-204] - Revive number of files listed metrics
- [HDFS-222] - Support for concatenating of files into a single file
- [HDFS-235] - Add support for byte-ranges to hftp
- [HDFS-245] - Create symbolic links in HDFS
- [HDFS-447] - proxy to call LDAP for IP lookup and get user ID and directories, validate requested URL
- [HDFS-458] - Create target for 10 minute patch test build for hdfs
- [HDFS-459] - Job History Log Analyzer
- [HDFS-461] - Analyzing file size distribution.
- [HDFS-492] - Expose corrupt replica/block information
- [HDFS-503] - Implement erasure coding as a layer on HDFS
- [HDFS-567] - Two contrib tools to facilitate searching for block history information
- [HDFS-595] - FsPermission tests need to be updated for new octal configuration parameter from HADOOP-6234
- [HDFS-610] - Add support for FileContext
- [HDFS-654] - HDFS needs to support new rename introduced for FileContext
- [HDFS-702] - Add Hdfs Impl for the new file system interface
- [HDFS-731] - Support new Syncable interface in HDFS
- [HDFS-814] - Add an api to get the visible length of a DFSDataInputStream.
- [HDFS-905] - Make changes to HDFS for the new UserGroupInformation APIs (HADOOP-6299)
- [HDFS-935] - Real user in delegation token.
- [HDFS-984] - Delegation Tokens should be persisted in Namenode
- [HDFS-985] - HDFS should issue multiple RPCs for listing a large directory
- [HDFS-991] - Allow browsing the filesystem over http using delegation tokens
- [HDFS-999] - Secondary namenode should login using kerberos if security is configured
- [HDFS-1091] - Implement listStatus that returns an Iterator of FileStatus
Task
- [HDFS-256] - Split HDFS into sub project
- [HDFS-574] - Hadoop Doc Split: HDFS Docs
- [HDFS-651] - HDFS Docs - fix listing of docs in the doc menu
- [HDFS-715] - Hadoop HDFS - Site Logo
- [HDFS-869] - 0.21.0 - snapshot incorrect dependency published in .pom files
- [HDFS-1174] - New properties for suspend and resume process.
- [HDFS-1277] - [Herriot] New property for multi user list.
Test
- [HDFS-409] - Add more access token tests
- [HDFS-451] - Test DataTransferProtocol with fault injection
- [HDFS-669] - Add unit tests framework (Mockito)
- [HDFS-705] - Create an adapter to access some of package-private methods of DataNode from tests
- [HDFS-710] - Add actions with constraints to the pipeline fault injection tests
- [HDFS-713] - Need to properly check the type of the test class from an aspect
- [HDFS-714] - Create fault injection test for the new pipeline close
- [HDFS-804] - New unit tests for concurrent lease recovery
- [HDFS-813] - Enable the append test in TestReadWhileWriting
- [HDFS-902] - Move RAID from HDFS to MR
- [HDFS-907] - Add tests for getBlockLocations and totalLoad metrics.
- [HDFS-919] - Create test to validate the BlocksVerified metric
- [HDFS-1043] - Benchmark overhead of server-side group resolution of users
- [HDFS-1099] - Add test for umask backward compatibility