Apache Pig Changelog

What's new in Apache Pig 0.14.0

Nov 21, 2014
  • The highlight of this release includes Pig on Tez, OrcStorage, loader predicate push down, constant calculation optimization and interface to ship jar

New in Apache Pig 0.13.0 (Jul 5, 2014)

  • This release includes several new features such as pluggable execution engines (to allow pig run on non-mapreduce engines in future), auto-local mode (to jobs with small input data size to run in-process), fetch optimization (to improve interactiveness of grunt), fixed counters for local-mode, support for user level jar cache, support for blacklisting and whitelisting pig commands.
  • This also includes several performance fixes and debuggability features.
  • A few non-backwards compatible interface modifications have been introduced in this release to make pig work with non-mapreduce engines (eg- PigProgressNotificationListener).

New in Apache Pig 0.12.0 (Oct 11, 2013)

  • INCOMPATIBLE CHANGES:
  • PIG-3082: outputSchema of a UDF allows two usages when describing a Tuple schema (jcoveney)
  • PIG-3191: [piggybank] MultiStorage output filenames are not sortable (Danny Antonelli via jcoveney)
  • PIG-3174: Remove rpm and deb artifacts from build.xml (gates)
  • IMPROVEMENTS:
  • PIG-3503: More document for Pig 0.12 new features (daijy)
  • PIG-3445: Make Parquet format available out of the box in Pig (lbendig via aniket486)
  • PIG-3483: Document ASSERT keyword (aniket486 via daijy)
  • PIG-3470: Print configuration variables in grunt (lbendig via daijy)
  • PIG-3493: Add max/min for datetime (tyro89 via daijy)
  • PIG-3479: Fix BigInt, BigDec, Date serialization. Improve perf of PigNullableWritable deserilization (dvryaboy)
  • PIG-3461: Rewrite PartitionFilterOptimizer to make it work for all the cases (aniket486)
  • PIG-2417: Streaming UDFs - allow users to easily write UDFs in scripting languages with no
  • JVM implementation. (jeremykarn via daijy)
  • PIG-3199: Provide a method to retriever name of loader/storer in PigServer (prkommireddi via daijy)
  • PIG-3367: Add assert keyword (operator) in pig (aniket486)
  • PIG-3235: Avoid extra byte array copies in streaming (rohini)
  • PIG-3065: pig output format/committer should support recovery for hadoop 0.23 (daijy)
  • PIG-3390: Make pig working with HBase 0.95 (jarcec via daijy)
  • PIG-3431: Return more information for parsing related exceptions. (jeremykarn via daijy)
  • PIG-3430: Add xml format for explaining MapReduce Plan. (jeremykarn via daijy)
  • PIG-3048: Add mapreduce workflow information to job configuration (billie.rinaldi via daijy)
  • PIG-3436: Make pigmix run with Hadoop2 (rohini)
  • PIG-3424: Package import list should consider class name as is first even if -Dudf.import.list is passed (rohini)
  • PIG-3204: Change script parsing to parse entire script instead of line by line (rohini)
  • PIG-3359: Register Statements and Param Substitution in Macros (jpacker via cheolsoo)
  • PIG-3182: Pig currently lacks functions to trim the whitespace only on one hand side (sarutak via cheolsoo)
  • PIG-3163: Pig current releases lack a UDF endsWith. This UDF tests if a given string ends with the specified suffix (sriramkrishnan via cheolsoo)
  • PIG-3015: Rewrite of AvroStorage (jadler via cheolsoo)
  • PIG-3361: Improve Hadoop version detection logic for Pig unit test (daijy)
  • PIG-3280: Document IN operator and CASE expression (cheolsoo)
  • PIG-3342: Allow conditions in case statement (cheolsoo)
  • PIG-3327: Pig hits OOM when fetching task reports (rohini)
  • PIG-3336: Change IN operator to use or-expressions instead of EvalFunc (cheolsoo)
  • PIG-3339: Move pattern compilation in ToDate as a static variable (rohini)
  • PIG-3332: Upgrade Avro dependency to 1.7.4 (nielsbasjes via cheolsoo)
  • PIG-3307: Refactor physical operators to remove methods parameters that are always null (julien)
  • PIG-3317: disable optimizations via pig properties (traviscrawford via billgraham)
  • PIG-3321: AVRO: Support user specified schema on load (harveyc via rohini)
  • PIG-2959: Add a pig.cmd for Pig to run under Windows (daijy)
  • PIG-3311: add pig-withouthadoop-h2 to mvn-jar (julien)
  • PIG-2873: Converting bin/pig shell script to python (vikram.dixit via daijy)
  • PIG-3308: Storing data in hive columnar rc format (maczech via daijy)
  • PIG-3303: add hadoop h2 artifact to publications in ivy.xml (julien)
  • PIG-3169: Remove intermediate data after a job finishes (mwagner via cheolsoo)
  • PIG-3173: Partition filter push down does not happen when partition keys condition include a AND and OR construct (rohini)
  • PIG-2786: enhance Pig launcher script wrt. HBase/HCat integration (ndimiduk via daijy)
  • PIG-3198: Let users use any function from PigType -> PigType as if it were builtlin (jcoveney)
  • PIG-3268: Case statement support (cheolsoo)
  • PIG-3269: In operator support (cheolsoo)
  • PIG-200: Pig Performance Benchmarks (daijy)
  • PIG-3261: User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not
  • appended (qwertymaniac via daijy)
  • PIG-3141: Giving CSVExcelStorage an option to handle header rows (jpacker via cheolsoo)
  • PIG-3217: Add support for DateTime type in Groovy UDFs (herberts via daijy)
  • PIG-3218: Add support for biginteger/bigdecimal type in Groovy UDFs (herberts via daijy)
  • PIG-3248: Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha (daijy)
  • PIG-3235: Add log4j.properties for unit tests (cheolsoo)
  • PIG-3236: parametrize snapshot and staging repo id (gkesavan via daijy)
  • PIG-3244: Make PIG_HOME configurable ([email protected] via daijy)
  • PIG-3233: Deploy a Piggybank Jar (njw45 via cheolsoo)
  • PIG-3245: Documentation about HBaseStorage (Daisuke Kobayashi via cheolsoo)
  • PIG-3211: Allow default Load/Store funcs to be configurable (prkommireddi via cheolsoo)
  • PIG-3136: Introduce a syntax making declared aliases optional (jcoveney via cheolsoo)
  • PIG-3142: [piggybank] Fixed-width load and store functions for the Piggybank (jpacker via cheolsoo)
  • PIG-3162: PigTest.assertOutput doesn't allow non-default delimiter (dreambird via cheolsoo)
  • PIG-3002: Pig client should handle CountersExceededException (jarcec via billgraham)
  • PIG-3189: Remove ivy/pig.pom and improve build mvn targets (billgraham)
  • PIG-3192: Better call to action to download Pig in docs (rjurney via jcoveney)
  • PIG-3167: Job stats are printed incorrectly for map-only jobs (Mark Wagner via jcoveney)
  • PIG-3131: Document PluckTuple UDF (rjurney via jcoveney)
  • PIG-3098: Add another test for the self join case (jcoveney)
  • PIG-3129: Document syntax to refer to previous relation (rjurney via jcoveney)
  • PIG-2553: Pig shouldn't allow attempts to write multiple relations into same directory (prkommireddi via cheolsoo)
  • PIG-3179: Task Information Header only prints out the first split for each task (knoguchi via rohini)
  • PIG-3108: HBaseStorage returns empty maps when mixing wildcard with other columns (christoph.bauer via billgraham)
  • PIG-3178: Print a stacktrace when ExecutableManager hits an OOM (knoguchi via rohini)
  • PIG-3160: GFCross uses unnecessary loop (sandyr via cheolsoo)
  • PIG-3138: Decouple PigServer.executeBatch() from compilation of batch (pkommireddi via cheolsoo)
  • PIG-2878: Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This
  • check is case insensitive. (shami via gates)
  • PIG-2994: Grunt shortcuts (prasanth_j via cheolsoo)
  • PIG-3140: Document PigProgressNotificationListener configs (billgraham)
  • PIG-3139: Document reducer estimation (billgraham)
  • PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney)
  • PIG-3073: POUserFunc creating log spam for large scripts (jcoveney)
  • PIG-3124: Push FLATTENs After FILTERs If Possible (nwhite via daijy)
  • PIG-3086: Allow A Prefix To Be Added To URIs In PigUnit Tests (nwhite via gates)
  • PIG-3091: Make schema, header and stats file configurable in JsonMetadata (pkommireddi via jcoveney)
  • PIG-3078: Make a UDF that, given a string, returns just the columns prefixed by that string (jcoveney)
  • PIG-3090: Introduce a syntax to be able to easily refer to the previously defined relation (jcoveney)
  • PIG-3057: Make PigStorage.readField() protected (pablomar and billgraham via billgraham)
  • PIG-2788: improved string interpolation of variables (jcoveney)
  • PIG-2362: Rework Ant build.xml to use macrodef instead of antcall (azaroth via cheolsoo)
  • PIG-2857: Add a -tagPath option to PigStorage (prkommireddi via cheolsoo)
  • PIG-2341: Need better documentation on Pig/HBase integration (jthakrar and billgraham via billgraham)
  • PIG-3075: Allow AvroStorage STORE Operations To Use Schema Specified By URI (nwhite via cheolsoo)
  • PIG-3062: Change HBaseStorage to permit overriding pushProjection (billgraham)
  • PIG-3016: Modernize more tests (jcoveney via cheolsoo)
  • PIG-2582: Store size in bytes (not mbytes) in ResourceStatistics (prkommireddi via billgraham)
  • PIG-3006: Modernize a chunk of the tests (jcoveney via cheolsoo)
  • PIG-2997: Provide a convenience constructor on PigServer that accepts Configuration (prkommireddi via rohini)
  • PIG-2933: HBaseStorage is using setScannerCaching which is deprecated (prkommireddi via rohini)
  • PIG-2881: Add SUBTRACT eval function (jocosti via cheolsoo)
  • PIG-3004: Improve exceptions messages when a RuntimeException is raised in Physical Operators (julien)
  • PIG-2990: the -secretDebugCmd shouldn't be a secret and should just be...a command (jcoveney)
  • PIG-2941: Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices (jgordon via azaroth)
  • PIG-2778: Add 'matches' operator to predicate pushdown (cheolsoo via jcoveney)
  • PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms)
  • PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)
  • PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas)
  • OPTIMIZATIONS
  • PIG-3395: Large filter expression makes Pig hang (cheolsoo)
  • PIG-3123: Simplify Logical Plans By Removing Unneccessary Identity Projections (njw45 via cheolsoo)
  • PIG-3013: BinInterSedes improve chararray sort performance (rohini)
  • BUG FIXES
  • PIG-3504: Fix e2e Describe_cmdline_12 (cheolsoo via daijy)
  • PIG-3128: Document the BigInteger and BigDecimal data type (daijy via cheolsoo)
  • PIG-3497: JobControlCompiler should only do reducer estimation when the job has a reduce phase (amatsukawa via aniket486)
  • PIG-3495: Streaming udf e2e tests failures on Windows (daijy)
  • PIG-3494: Several fixes for e2e tests (daijy)
  • PIG-3292: Logical plan invalid state: duplicate uid in schema during self-join to get cross product (cheolsoo via daijy)
  • PIG-3491: Fix e2e failure Jython_Diagnostics_4 (daijy)
  • PIG-3114: Duplicated macro name error when using pigunit (daijy)
  • PIG-3370: Add New Reserved Keywords To The Pig Docs (cheolsoo)
  • PIG-3487: Fix syntax errors in nightly.conf (arpitgupta via daijy)
  • PIG-3458: ScalarExpression lost with multiquery optimization (knoguchi)
  • PIG-3360: Some intermittent negative e2e tests fail on hadoop 2 (daijy)
  • PIG-3468: PIG-3123 breaks e2e test Jython_Diagnostics_2 (daijy)
  • PIG-3466: Race Conditions in InternalDistinctBag during proactive spill (cheolsoo)
  • PIG-3454: Update JsonLoader/JsonStorage (tyro89 via daijy)
  • PIG-3333: Fix remaining Windows core unit test failures (daijy)
  • PIG-3426: Add support for removing s3 files (jeremykarn via daijy)
  • PIG-3349: Document ToString(Datetime, String) UDF (cheolsoo)
  • PIG-3374: CASE and IN fail when expression includes dereferencing operator (cheolsoo)
  • PIG-2606: union/ join operations are not accepting same alias as multiple inputs (hsubramaniyan via daijy)
  • PIG-3379: Alias reuse in nested foreach causes PIG script to fail (xuefuz via daijy)
  • PIG-3432: typo in log message in SchemaTupleFrontend (epishkin via cheolsoo)
  • PIG-3410: LimitOptimizer is applied before PartitionFilterOptimizer (aniket486)
  • PIG-3405: Top UDF documentation indicates improper use (aniket486 via cheolsoo)
  • PIG-3425: Hive jdo api jar referenced in pig script throws error (deepesh via cheolsoo)
  • PIG-3422: AvroStorage failed to read paths separated by commas (yuanlid via rohini)
  • PIG-3420: Failed to retrieve map values from data loaded by AvroStorage (yuanlid via rohini)
  • PIG-3414: QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition (cheolsoo)
  • PIG-3412: jsonstorage breaks when tuple does not have as many columns as schema (aesilberstein via cheolsoo)
  • PIG-3243: Documentation error (sarutak via cheolsoo)
  • PIG-3210: Pig fails to start when it cannot write log to log files (mengsungwu via cheolsoo)
  • PIG-3392: Document STARTSWITH and ENDSWITH UDFs (sriramkrishnan via cheolsoo)
  • PIG-3393: STARTSWITH udf doesn't override outputSchema method (sriramkrishnan via cheolsoo)
  • PIG-3389: "Set job.name" does not work with dump command (cheolsoo)
  • PIG-3387: Miss spelling in test code "TestBuiltin.java" (sarutak via cheolsoo)
  • PIG-3384: Missing negation in UDF doc sample code (ddamours via cheolsoo)
  • PIG-3369: unit test TestImplicitSplitOnTuple.testImplicitSplitterOnTuple failed when using hadoopversion=23 (dreambird via cheolsoo)
  • PIG-3375: CASE does not preserve the order of when branches (cheolsoo)
  • PIG-3364: Case expression fails with an even number of when branches (cheolsoo)
  • PIG-3354: UDF example does not handle nulls (patc888 via daijy)
  • PIG-3355: ColumnMapKeyPrune bug with distinct operator (jeremykarn via aniket486)
  • PIG-3318: AVRO: 'default value' not honored when merging schemas on load with AvroStorage (viraj via rohini)
  • PIG-3250: Pig dryrun generates wrong output in .expanded file for 'SPLIT....OTHERWISE...' command (dreambird via cheolsoo)
  • PIG-3331: Default values not stored in avro file when using specific schemas during store in AvroStorage (viraj via rohini)
  • PIG-3322: AvroStorage give NPE on reading file with union as top level schema (viraj via rohini)
  • PIG-2828: Handle nulls in DataType.compare (aniket486)
  • PIG-3335: TestErrorHandling.tesNegative7 fails on MR2 (xuefuz)
  • PIG-3316: Pig failed to interpret DateTime values in some special cases (xuefuz)
  • PIG-2956: Invalid cache specification for some streaming statement (daijy)
  • PIG-3310: ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations (cstenac via daijy)
  • PIG-3334: Fix Windows piggybank unit test failures (daijy)
  • PIG-3337: Fix remaining Window e2e tests (daijy)
  • PIG-3328: DataBags created with an initial list of tuples don't get registered as spillable (mwagner via daijy)
  • PIG-3313: pig job hang if the job tracker is bounced during execution (yu.chenjie via daijy)
  • PIG-3297: Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc (nielsbasjes via cheolsoo)
  • PIG-3069: Native Windows Compatibility for Pig E2E Tests and Harness (anthony.murphy via daijy)
  • PIG-3291: TestExampleGenerator fails on Windows because of lack of file name escaping (dwann via daijy)
  • PIG-3026: Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences (dwann via daijy)
  • PIG-3025: TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification (dwann via daijy)
  • PIG-2955: Fix bunch of Pig e2e tests on Windows (daijy)
  • PIG:3302: JSONStorage throws NPE if map has null values (rohini)
  • PIG-3309: TestJsonLoaderStorage fails with IBM JDK 6/7 (lrangel via daijy)
  • PIG-3097: HiveColumnarLoader doesn't correctly load partitioned Hive table (maczech via daijy)
  • PIG-3305: Infinite loop when input path contains empty partition directory (maczech via daijy)
  • PIG-3286: TestPigContext.testImportList fails in trunk (cheolsoo)
  • PIG-2970: Nested foreach getting incorrect schema when having unrelated inner query (daijy)
  • PIG-3304: XMLLoader in piggybank does not work with inline closed tags (aseldawy via daijy)
  • PIG-3028: testGrunt dev test needs some command filters to run correctly without cygwin (jgordon via gates)
  • PIG-3290: TestLogicalPlanBuilder.testQuery85 fail in trunk (daijy)
  • PIG-3027: pigTest unit test needs a newline filter for comparisons of golden multi-line (jgordon via gates)
  • PIG-2767: Pig creates wrong schema after dereferencing nested tuple fields (daijy)
  • PIG-3276: change the default value for hcat.bin to hcat instead of /usr/local/hcat/bin/hcat (arpitgupta via daijy)
  • PIG-3277: fix the path to the benchmarks file in the print statement (arpitgupta via daijy)
  • PIG-3122: Operators should not implicitly become reserved keywords (jcoveney via cheolsoo)
  • PIG-3193: Fix "ant docs" warnings (cheolsoo)
  • PIG-3186: tar/deb/pkg ant targets should depend on piggybank (lbendig via gates)
  • PIG-3270: Union onschema failing at runtime when merging incompatible types (knoguchi via daijy)
  • PIG-3271: POSplit ignoring error from input processing giving empty results (knoguchi via daijy)
  • PIG-2265: Test case TestSecondarySort failure (daijy)
  • PIG-3060: FLATTEN in nested foreach fails when the input contains an empty bag (daijy)
  • PIG-3249: Pig startup script prints out a wrong version of hadoop when using fat jar (prkommireddi via daijy)
  • PIG-3110: pig corrupts chararrays with trailing whitespace when converting them to long (prkommireddi via daijy)
  • PIG-3253: Misleading comment w.r.t getSplitIndex() method in PigSplit.java (cheolsoo)
  • PIG-3208: [zebra] TFile should not set io.compression.codec.lzo.buffersize (ekoontz via daijy)
  • PIG-3172: Partition filter push down does not happen when there is a non partition key map column filter (rohini)
  • PIG-3205: Passing arguments to python script does not work with -f option (rohini)
  • PIG-3239: Unable to return multiple values from a macro using SPLIT (dreambird via cheolsoo)
  • PIG-3077: TestMultiQueryLocal should not write in /tmp (dreambird via cheolsoo)
  • PIG-3081: Pig progress stays at 0% for the first job in hadoop 23 (rohini)
  • PIG-3150: e2e Scripting_5 fails in trunk (dreambird via cheolsoo)
  • PIG-3153: TestScriptUDF.testJavascriptExampleScript fails in trunk (cheolsoo)
  • PIG-3145: Parameters in core-site.xml and mapred-site.xml are not correctly substituted (cheolsoo)
  • PIG-3135: HExecutionEngine should look for resources in user passed Properties (prkommireddi via cheolsoo)
  • PIG-3200: MiniCluster should delete hadoop-site.xml on shutDown (prkommireddi via cheolsoo)
  • PIG-3158: Errors in the document "Control Structures" (miyakawataku via cheolsoo)
  • PIG-3161: Update reserved keywords in Pig docs (russell.jurney via cheolsoo)
  • PIG-3156: TestSchemaTuple fails in trunk (cheolsoo)
  • PIG-3155: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk (cheolsoo)
  • PIG-3154: TestPackage.testOperator fails in trunk (dreambird via cheolsoo)
  • PIG-3168: TestMultiQueryBasic.testMultiQueryWithSplitInMapAndMultiMerge fails in trunk (cheolsoo)
  • PIG-3137: Fix Piggybank test to not using /tmp dir (dreambird via cheolsoo)
  • PIG-3149: e2e build.xml still refers to jython 2.5.0 jar even though it's replaced by jython standalone 2.5.2 jar (cheolsoo)
  • PIG-2266: Bug with input file joining optimization in Pig (jadler via cheolsoo)
  • PIG-2645: PigSplit does not handle the case where SerializationFactory returns null (shami via gates)
  • PIG-3031: Update Pig to use a newer version of joda-time (zjshen via cheolsoo)
  • PIG-3071: Update hcatalog jar and path to hbase storage handler jar in pig script (arpitgupta via cheolsoo)
  • PIG-3029 TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution (jgordon via gates)
  • PIG-3120: setStoreFuncUDFContextSignature called with null signature (jdler via cheolsoo)
  • PIG-3115: Distinct Build-in Function Doesn't Handle Null Bags (njw45 via daijy)
  • PIG-2433: Jython import module not working if module path is in classpath (rohini)
  • PIG-2769 a simple logic causes very long compiling time on pig 0.10.0 (nwhite via gates)
  • PIG-2251: PIG leaks Zookeeper connections when using HBaseStorage (jamarkha via cheolsoo)
  • PIG-3112: Errors and lacks in document "User Defined Functions" (miyakawataku via cheolsoo)
  • PIG-3050: Fix FindBugs multithreading warnings (cheolsoo)
  • PIG-3066: Fix TestPigRunner in trunk (cheolsoo)
  • PIG-3101: Increase io.sort.mb in YARN MiniCluste (cheolsoo)
  • PIG-3100: If a .pig_schema file is present, can get an index out of bounds error (jcoveney)
  • PIG-3096: Make PigUnit thread safe (cheolsoo)
  • PIG-3095: "which" is called many, many times for each Pig STREAM statement (nwhite via cheolsoo)
  • PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo)
  • PIG-3084: Improve exceptions messages in POPackage (julien)
  • PIG-3072: Pig job reporting negative progress (knoguchi via rohini)
  • PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo)
  • PIG-2924: PigStats should not be assuming all Storage classes to be file-based storage (cheolsoo)
  • PIG-3046: An empty file name in -Dpig.additional.jars throws an error (prkommireddi via cheolsoo)
  • PIG-2989: Illustrate for Rank Operator (xalan via gates)
  • PIG-2885: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3 (cheolsoo via sms)
  • PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy)

New in Apache Pig 0.11.1 (Apr 2, 2013)

  • IMPROVEMENTS:
  • PIG-3256: Upgrade jython to 2.5.3 (legal concern) (daijy)
  • PIG-2988: start deploying pigunit maven artifact part of Pig release process (njw45 via rohini)
  • PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag. (knoguchi via rohini)
  • PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)
  • PIG-3202: CUBE operator not documented in user docs (prasanth_j via billgraham)
  • BUG FIXES:
  • PIG-3252: AvroStorage gives wrong schema for schemas with named records (mwagner via cheolsoo)
  • PIG-3132: NPE when illustrating a relation with HCatLoader (daijy)
  • PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 (prkommireddi via dvryaboy)
  • PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
  • PIG-3144: Erroneous map entry alias resolution leading to "Duplicate schema alias" errors (jcoveney via cheolsoo)
  • PIG-3212: Race Conditions in POSort and (Internal)SortedBag during Proactive Spill (kadeng via dvryaboy)
  • PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase (rohini)

New in Apache Pig 0.11.0 (Feb 23, 2013)

  • This release includes hundreds of bug fixes and many new features including DateType datatype, RANK, CUBE and ROLLUP operators, Groovy UDFs, pluggable reducer estimation logic, additional UDF features, schema-based tuples and HCatalog DDL integration.
  • New RANK, CUBE and ROLLUP operators
  • New DateType data type
  • Support for Groovy UDFs
  • Support for loading macros from jars
  • Support for custom PigReducerEstimators
  • Support for custom PigProgressNotificatonListeners
  • Support for schema-based Tuples for reduced memory footprint
  • Support for passing environment variables to streaming jobs
  • Support for invoking HCatalog DDL commands from Pig
  • Support for .pigbootup file for defaults
  • Improved support for working with Maps in Pig scripts
  • Grunt improvements: history and clear
  • New cleanupOnSuccess method in StoreFunc interface
  • UDF timing utilities
  • UDF lifecycle improvements
  • UDFs for DateType support
  • Performance improvements to merge join
  • Performance improvements to local mode
  • Performance improvements to in memory aggregation
  • Performance improvements to Spillable management
  • Improvements to HBaseStorage and AvroStorage
  • Penny has been removed
  • 300+ bug fixes

New in Apache Pig 0.10.1 (Feb 6, 2013)

  • IMPROVEMENTS:
  • PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini)
  • PIG-3019: Need a target in build.xml for source releases (gates)
  • PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)
  • PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
  • PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney)
  • PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates)
  • PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy)
  • PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy)
  • PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney)
  • PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien)
  • BUG FIXES
  • PIG-3107: bin and autocomplete are missing in src release (daijy)
  • PIG-3106: Missing license header in several java file (daijy)
  • PIG-3099: Pig unit test fixes for TestGrunt(1), TestStore(2), TestEmptyInputDir(3) (vikram.dixit via daijy)
  • PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini)
  • PIG-2953: "which" utility does not exist on Windows (daijy)
  • PIG-2960: Increase the timeout for unit test (daijy)
  • PIG-2958: Pig tests do not appear to have a logger attached (daijy)
  • PIG-2942: DevTests, TestLoad has a false failure on Windows (jgordon via daijy)
  • PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy)
  • PIG-2801: grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens
  • (jgordon via daijy)
  • PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":"
  • (jgordon via azaroth)
  • PIG-2798: pig streaming tests assume interpreters are auto-resolved (jgordon via daijy)
  • PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI
  • instead (jgordon via daijy)
  • PIG-2796: Local temporary paths are not always valid HDFS path names (jgordon via daijy)
  • PIG-2795: Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path (jgordon via daijy)
  • PIG-2940: HBaseStorage store fails in secure cluster (cheolsoo via daijy)
  • PIG-2821: HBaseStorage should work with secure hbase (rohini via daijy)
  • PIG-2890: Revert PIG-2578 (dvryaboy)
  • PIG-2859: Fix few e2e test failures (rohini via daijy)
  • PIG-2729: Macro expansion does not use pig.import.search.path - UnitTest borked (johannesch via daijy)
  • PIG-2791: Pig does not work with Namenode Federation (rohini via daijy)
  • PIG-2783: Fix Iterator_1 e2e test for Hadoop 23 (rohini via daijy)
  • PIG-2761: With hadoop23 importing modules inside python script does not work (rohini via daijy)
  • PIG-2759: Typo in document "Built In Functions" (daijy)
  • PIG-2745: Pig e2e test RubyUDFs fails in MR mode when running from tarball (cheolsoo via daijy)
  • PIG-2741: Python script throws an NameError: name 'Configuration' is not defined in case cache dir is not created
  • (knoguchi via daijy)
  • PIG-2669: Pig release should include pig-default.properties after rebuild (daijy)
  • PIG-2739: PyList should map to Bag automatically in Jython (daijy)
  • PIG-2730: TFileStorage getStatistics incorrectly throws an exception instead of returning null (traviscrawford via daijy)
  • PIG-2717: Tuple field mangled during flattening (daijy)
  • PIG-2721: Wrong output generated while loading bags as input (knoguchi via daijy)
  • PIG-2912: Pig should clone JobConf while creating JobContextImpl and TaskAttemptContextImpl in Hadoop23 (rohini via daijy)
  • PIG-2775: Register jar does not goes to classpath in some cases (daijy)

New in Apache Pig 0.6.0 (Apr 14, 2010)

  • Added Zebra as a contrib project. See http://wiki.apache.org/pig/zebra
  • Added UDFContext, gives UDFs a way to pass info from front to back end and gives UDFS access to JobConf in the backend. PIG-1085
  • Added left outer join for fragment replicate join. PIG-1036
  • Added ability to set job priority from Pig Latin. PIG-1025
  • Enhanced multi-query to work with joins in some cases. PIG-983
  • Reworked memory manager to significantly reduce GC Overhead and Out of Heap failures. PIG-975
  • Added Accumulator interface for UDFs. PIG-979

New in Apache Pig 0.3.0 (Sep 1, 2009)

  • The main focus of this release is multiquery support that allows to optimize multiple queries within the same script that share a computation.