Branch 174861804 (tensorflow#14326)

* Add ImportGraphDefTest.testMultipleImport to importer_test.py This tests the name deduping behavior of import_graph_def. This behavior is actually defined by the op creation logic, not import_graph_def, but I added a test here since the C++ ImportGraphDef function must emulate it (and presumably we'd like to maintain the import_graph_def behavior moving forward). PiperOrigin-RevId: 174536014 * Apply lib_internal defines to both lib_internal and lib_internal_impl Should fix checkpoint reading with snappy compression. Will follow up with testing for this sort of checkpoint issue. PiperOrigin-RevId: 174538693 * n/a (internal change only) PiperOrigin-RevId: 174539513 * A few changes to ApiDef generation: - Create a separate api_def_*.pbtxt file for each op. - Add attribute and argument descriptions to ApiDef. - Apply overrides based on op_gen_overrides.pbtxt file. PiperOrigin-RevId: 174540421 * Add uniquify_names option to ImportGraphDef. This option allows ImportGraphDef to mimic the behavior of the Python import_graph_def function, which automatically creates unique node names instead of raising an exception (this is due to the Python op construction logic, not import_graph_def directly). This change is a steps towards switching import_graph_def to use the C API version. PiperOrigin-RevId: 174541334 * Fix bad_color param on tf.contrib.summary.image PiperOrigin-RevId: 174549117 * Hlo parser: support control-predecessors. Also, - Changed from printing control-sucessors to printing control-predecessors because predecessors are defined before use. - Surround the predecessors with {}. PiperOrigin-RevId: 174552224 * Support pad node. PiperOrigin-RevId: 174581035 * Add tf.contrib.framework.sort, wrapping tf.nn.top_k (tensorflow#288). Comparable to np.sort, but their "kind" parameter is not implemented (only one sort algorithm) and "order" is not applicable (tensors do not have fields). PiperOrigin-RevId: 174588000 * [TF2XLA] Don't change output port for control dependency in CopySubgraph. If the output is being squashed then we want control output 0, except where the input is a control dependency. PiperOrigin-RevId: 174633829 * Use latest nsync: allows running bazel after having downloaded for "make" build The downloads directory for the make build is within the source tree seen by bazel, which means that BUILD files (by whatever name) without those downloaded trees must all be valid in their new location, or not recognized by bazel as being BUILD files. The new version of nsync handles that, and this change pulls in that new version. PiperOrigin-RevId: 174652898 * Add profiling support to Service::ExecuteParallel. PiperOrigin-RevId: 174682772 * Replicate `Estimator.model_fn` across available GPUs. def replicate_model_fn(model_fn, optimizer_fn, devices=None): """Replicate `Estimator.model_fn` over GPUs. ... I tested that it seems to give the right result on cnn_mnist.py on 1 CPU, 1 real GPU, 4 allow_soft_placement=True GPUs. Some measurements on CNN MNIST across steps 19300-20000: 1) no replicate_model_fn call: global_step/sec: 156.254 global_step/sec: 155.074 global_step/sec: 155.74 global_step/sec: 153.636 global_step/sec: 157.218 global_step/sec: 159.644 2) replicate across one hardware GPU: global_step/sec: 158.171 global_step/sec: 165.618 global_step/sec: 162.773 global_step/sec: 159.204 global_step/sec: 162.289 global_step/sec: 167.173 3) replicate across 4 software GPUs on one hardware GPU (soft placement): global_step/sec: 75.47 global_step/sec: 76.16 global_step/sec: 75.18 Loss numbers didn't change across the three configurations. PiperOrigin-RevId: 174704385 * Enables wrapping input pipeline into tf.while_loop for all users. PiperOrigin-RevId: 174708213 * SerializeIterator: do not unref the resource until we're finished using it. This change avoids a potential use-after-free error if the resource is concurrently serialized and destroyed (e.g. by a DestroyResourceOp or Session::Reset()). PiperOrigin-RevId: 174713115 * Improve error message when a function is already defined with the same name and different hash string. PiperOrigin-RevId: 174715563 * Fix generate_examples build - Add -march=native to host_copts and host_cxxopts in configure.py - Make string.h for abstracting string differences at core interpreter level - Use tensorflow special arg parse instead of flags - Switch to using tool instead of data for dependency - Fix python3 compatibility + Use six.StringIO instead of StringIO.StringIO + Use print_function + Properly set binary flags on TempFile's used in toco_convert - Misc other path fixes PiperOrigin-RevId: 174717673 * Add input format agnostic way to parse HLOs. PiperOrigin-RevId: 174719153 * Remove misleading comment from Eigen build file. PiperOrigin-RevId: 174719222 * Basic plumbing for calling C API from import_graph_def() PiperOrigin-RevId: 174724070 * Memory leak detected when running a heap checker in our tests. PiperOrigin-RevId: 174726228 * [tpu:profiler] Support the Input Pipeline Analyzer tool in TPU profiler (WIP) o. move input pipeline analyzer related proto for grpc between red and green VMs o. rename perftools.gputools.profiler.collector::TfStatsHelperResult to tensorflow::tpu::TfOpStats. PiperOrigin-RevId: 174730411 * Clean up some reference cycles in eager mode. ResourceVariables enter graph mode to get a handle. We should probably revisit that, but in the meantime we can break the resulting reference cycles. PiperOrigin-RevId: 174732964 * Improved encoding on shapes in grappler. PiperOrigin-RevId: 174733491 * [tf.data] Remove unused members from IteratorContext. PiperOrigin-RevId: 174734277 * Refactor helper functions a bit for virtual gpu changes later. PiperOrigin-RevId: 174735029 * Fix invalid flush_secs argument. PiperOrigin-RevId: 174745329 * Replace the implementation of tf.flags with absl.flags. Previous tf.flags implementation is based on argparse. It contains -h/--help flags, which displays all flags. absl.app's --help flag only displays flags defined in the main module. There is a --helpfull flag that displays all flags. So added --helpshort --helpfull flags. app.run now raises SystemError on unknown flags (fixes tensorflow#11195). Accessing flags before flags are parsed will now raise an UnparsedFlagAccessError, instead of causing implicit flag parsing previously. PiperOrigin-RevId: 174747028 * Fold Transpose into Matmul and SparseMatmul. Fold ConjugateTranspose in BatchMatmul. PiperOrigin-RevId: 174750173 * BUGFIX: special_math.ndtri didn't work with dynamic shapes. This was due to use of constant_op.constant(..., shape=p.shape), where sometimes p was a Tensor of unknown shape. PiperOrigin-RevId: 174764744 * Create a routine that can collapse a subgraph into a fused op PiperOrigin-RevId: 174765540 * Force CUDA runtime initialization only when device count is larger than 0. PiperOrigin-RevId: 174767565 * Remove use of xrange which is not python3 compatible. PiperOrigin-RevId: 174768741 * More thoroughly disable the should_use_result decorator when executing eagerly. It was creating reference cycles. Adds a test that TensorArrays create no reference cycles in eager mode. PiperOrigin-RevId: 174768765 * Fix device querying in Keras backend. PiperOrigin-RevId: 174769308 * Fix race bug in AdaptiveSharedBatchScheduler. In ASBSQueue::Schedule, when a new batch is created, it was added to the scheduler outside of the queue's lock. This was done to prevent any unforeseen interactions between the queue lock and scheduler lock. However, this wasn't being done in a thread safe way. PiperOrigin-RevId: 174769383 * Supports multi-dimensional logits and labels in multi class head. PiperOrigin-RevId: 174770444 * Refactor eager benchmarks to subclass Benchmark. PiperOrigin-RevId: 174770787 * Add `parallel_interleave` to tf/contrib/data/__init__.py so that it is directly addressable from tf.contrib.data. PiperOrigin-RevId: 174771870 * Fix DepthToSpaceGrad and SpaceToDepthGrad on data_format NCHW. This fixes tensorflow#14243. PiperOrigin-RevId: 174772870 * Allow for an old_row_vocab_size, in case a subset of the old_row_vocab_file was used during the checkpoint creation (as is allowed in FeatureColumn._VocabularyListCategoricalColumn). PiperOrigin-RevId: 174781749 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 174781987 * [BufferAssignment] Sort allocation's "Assigned" objects before converting to a proto. This makes the buffer assignment's proto dump deterministic. RELNOTES: BufferAssignment's protocol buffer dump is now deterministic. PiperOrigin-RevId: 174783549 * [TF TensorArray] allow reading from an unwritten index if fully defined element_shape is given. This allows one to write to only some indices of a TensorArray before calling stack. Elements that were not written to are treated as all zero tensors. PiperOrigin-RevId: 174783569 * Remove binary dependency from optimize_for_inference_lib PiperOrigin-RevId: 174787363 * Update ops-related pbtxt files. PiperOrigin-RevId: 174787397 * Automated g4 rollback of changelist 174523638 PiperOrigin-RevId: 174788331 * Skip non-existent fetch nodes PiperOrigin-RevId: 174795864 * Automated g4 rollback of changelist 174735029 PiperOrigin-RevId: 174796480 * Add InceptionResNetV2 to tf.keras and update applications module to match Keras 2.0.9. PiperOrigin-RevId: 174796893 * Fix for LLVM API changes for fast math (https://reviews.llvm.org/rL317488). PiperOrigin-RevId: 174799735 * [TF:XLA] Add two disabled tests with while ops that permute tuple elements. These tests permute the tuple elements of a 3-tuple in each iteration in the following cyclic manner (132), i.e. a shift to the left. The first test just return the result tuple, the second returns the sum of all tuple elements (which is expected to be constant 6, no matter which permutation) Both tests are disabled for now because they fail on all back-ends. PiperOrigin-RevId: 174806092 * Refactor function Optimize. PiperOrigin-RevId: 174813300 * Add a unit test for gradient computation with layout optimizer. PiperOrigin-RevId: 174814136 * Previously if ComputeConstant seen a parameter it failed to proceed. After this change we can specify a list of parameters to it and if we specify enough then it will do the computation. The primary goal of this change is to make the HloEvaluator usable with ComputationBuilder from tests through ComputeConstant in cases where the input is a parameter (fed by a literal). PiperOrigin-RevId: 174845108 * Use nesting to reduce the number of modules listed in the API TOC. PiperOrigin-RevId: 174846842 * Added CPU matrix exponential op to TensorFlow. Uses Eigen's unsupported implementation. PiperOrigin-RevId: 174858966 * variables_to_restore: Differentiate python variables by string name rather than object. variables_to_restore ensured that duplicate variables weren't added to the return map by comparing python variable object. Normally there is only one Variable object for each underlying variable, so this wasn't a problem. But when one initializes a graph by importing a GraphDef, duplicate python Variable objects are created for each occurrence of a variable in a collection (say, global variables and moving average variables). This change fixes variables_to_restore to work with an imported graph def by not comparing Variable objects. PiperOrigin-RevId: 174861804
panchbhai1969 · Nov 7, 2017 · 4e69e02 · 4e69e02
1 parent 00e0972
commit 4e69e02
Show file tree

Hide file tree

Showing 1,305 changed files with 35,474 additions and 14,134 deletions.
diff --git a/configure.py b/configure.py
@@ -25,10 +25,12 @@
 import subprocess
 import sys
 
+# pylint: disable=g-import-not-at-top
 try:
   from shutil import which
 except ImportError:
   from distutils.spawn import find_executable as which
+# pylint: enable=g-import-not-at-top
 
 _TF_BAZELRC = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                            '.tf_configure.bazelrc')
@@ -485,7 +487,10 @@ def set_cc_opt_flags(environ_cp):
   cc_opt_flags = get_from_env_or_user_or_default(environ_cp, 'CC_OPT_FLAGS',
                                                  question, default_cc_opt_flags)
   for opt in cc_opt_flags.split():
-    write_to_bazelrc('build:opt --cxxopt=%s --copt=%s' % (opt, opt))
+    host_opt = '-march=native'  # It should be safe on the same build host.
+    write_to_bazelrc(
+        'build:opt --cxxopt=%s --copt=%s' % (opt, opt) +
+        ' --host_cxxopt=%s --host_copt=%s' % (host_opt, host_opt))
 
 
 def set_tf_cuda_clang(environ_cp):

diff --git a/tensorflow/compiler/tf2xla/functionalize_control_flow.cc b/tensorflow/compiler/tf2xla/functionalize_control_flow.cc
@@ -130,7 +130,9 @@ Status CopySubgraph(const Graph& graph, const Frame* frame,
         stack.push_back(src);
       }
       Node* src_copy = (*node_map)[e->src()->id()];
-      int src_output = squash_src_outputs[e->src()->id()] ? 0 : e->src_output();
+      int src_output = squash_src_outputs[e->src()->id()] && !e->IsControlEdge()
+                           ? 0
+                           : e->src_output();
       Node* dst_copy = (*node_map)[e->dst()->id()];
       output->AddEdge(src_copy, src_output, dst_copy, e->dst_input());
     }

diff --git a/tensorflow/compiler/tf2xla/kernels/gather_op.cc b/tensorflow/compiler/tf2xla/kernels/gather_op.cc
@@ -77,18 +77,6 @@ xla::ComputationDataHandle XlaComputeGatherDynamicSlice(
                               out_shape.dim_sizes());
   }
 
-  // Degenerate case: single slice.
-  if (num_indices == 1) {
-    auto index = builder->Reshape(indices, {1});
-    auto start_index = builder->Pad(
-        index, XlaHelpers::Zero(builder, index_type),
-        xla::MakeEdgePaddingConfig(
-            {{input_shape_pre_axis.dims(), input_shape_post_axis.dims()}}));
-    auto slice =
-        builder->DynamicSlice(input, start_index, slice_shape.dim_sizes());
-    return builder->Reshape(slice, out_shape.dim_sizes());
-  }
-
   // Specify the shape of the loop-carried Tensor tuple.
   xla::PrimitiveType ptype;
   TF_CHECK_OK(DataTypeToPrimitiveType(dtype, &ptype));

diff --git a/tensorflow/compiler/xla/client/computation_builder.cc b/tensorflow/compiler/xla/client/computation_builder.cc
@@ -1309,14 +1309,15 @@ Status ComputationBuilder::SetReturnValue(
 }
 
 StatusOr<bool> ComputationBuilder::IsConstant(
-    const ComputationDataHandle& operand) {
+    const ComputationDataHandle& operand, int64 num_parameters) {
   if (!first_error_.ok()) {
     return first_error_;
   }
 
   IsConstantRequest request;
   *request.mutable_computation() = computation_.handle();
   *request.mutable_operand() = operand;
+  request.set_num_parameters(num_parameters);
   IsConstantResponse response;
 
   VLOG(2) << "making IsConstant request";
@@ -1330,7 +1331,8 @@ StatusOr<bool> ComputationBuilder::IsConstant(
 }
 
 StatusOr<std::unique_ptr<Literal>> ComputationBuilder::ComputeConstant(
-    const ComputationDataHandle& operand, const Layout* output_layout) {
+    const ComputationDataHandle& operand, const Layout* output_layout,
+    tensorflow::gtl::ArraySlice<Literal> parameters) {
   if (!first_error_.ok()) {
     return first_error_;
   }
@@ -1341,6 +1343,9 @@ StatusOr<std::unique_ptr<Literal>> ComputationBuilder::ComputeConstant(
   if (output_layout != nullptr) {
     *request.mutable_output_layout() = *output_layout;
   }
+  for (const auto& param : parameters) {
+    *request.add_parameters() = param.ToProto();
+  }
 
   ComputeConstantResponse response;
 

diff --git a/tensorflow/compiler/xla/client/computation_builder.h b/tensorflow/compiler/xla/client/computation_builder.h
@@ -746,11 +746,12 @@ class ComputationBuilder {
   ComputationDataHandle Recv(const Shape& shape, const ChannelHandle& handle);
 
   // Returns true if 'operand' is a compile-time constant. A compile-time
-  // constant does not depend on parameters, or on stateful operators such
-  // as `RngNormal` or `Infeed`. Unlike `ComputeConstant`, `IsConstant` tests
-  // whether a computation is a compile-time constant without evaluating the
-  // computation.
-  StatusOr<bool> IsConstant(const ComputationDataHandle& operand);
+  // constant does not depend on parameters with higher index then
+  // `num_parameters`, or on stateful operators such as `RngNormal` or `Infeed`.
+  // Unlike `ComputeConstant`, `IsConstant` tests whether a computation is a
+  // compile-time constant without evaluating the computation.
+  StatusOr<bool> IsConstant(const ComputationDataHandle& operand,
+                            int64 num_parameters = 0);
 
   // Normalizes operand across spatial and batch dimensions for each feature.
   //
@@ -795,16 +796,19 @@ class ComputationBuilder {
                                       float epsilon, int64 feature_index);
 
   // Computes the value of a constant indicated by a
-  // ComputationDataHandle.
+  // ComputationDataHandle using a non-optimized interpreter on the host.
   //
   // The operand must be from the computation currently being built -
   // i.e., returned from this builder with no intervening call to
   // Build(). This happens to currently work regardless of that, but
   // that may stop working at any time.
   //
   // The operand must represent a constant value, which in this case
-  // means that it must not statically depend on a parameter to the
-  // computation that is being built.
+  // means that it must not statically depend on any parameter of the
+  // computation that is being built other then the ones specified on the
+  // paramtere list. The parameters in the list will be indexed by their
+  // parameter id property so the number of parameters specified should be at
+  // least as many as the largest used parameter index.
   //
   // `IsConstant` can be used to test whether a computation is a compile-time
   // constant without evaluation it. `ComputeConstant` only succeeds for
@@ -822,7 +826,8 @@ class ComputationBuilder {
   // will be stored using that layout.
   StatusOr<std::unique_ptr<Literal>> ComputeConstant(
       const ComputationDataHandle& operand,
-      const Layout* output_layout = nullptr);
+      const Layout* output_layout = nullptr,
+      tensorflow::gtl::ArraySlice<Literal> parameters = {});
 
   // Returns a new ComputationBuilder whose resultant Computation is used only
   // by this ComputationBuilder. The sub-ComputationBuilder has the same

diff --git a/tensorflow/compiler/xla/service/buffer_assignment.cc b/tensorflow/compiler/xla/service/buffer_assignment.cc
@@ -101,6 +101,11 @@ BufferAllocationProto BufferAllocation::ToProto() const {
     proto_assigned->set_offset(buffer_offset_size.second.offset);
     proto_assigned->set_size(buffer_offset_size.second.size);
   }
+  std::sort(proto.mutable_assigned()->begin(), proto.mutable_assigned()->end(),
+            [](const BufferAllocationProto::Assigned& assign1,
+               const BufferAllocationProto::Assigned& assign2) {
+              return assign1.logical_buffer_id() < assign2.logical_buffer_id();
+            });
   return proto;
 }
 

diff --git a/tensorflow/compiler/xla/service/cpu/llvm_ir_runtime.cc b/tensorflow/compiler/xla/service/cpu/llvm_ir_runtime.cc
@@ -52,7 +52,7 @@ llvm::Function* EmitVectorF32TanhIfNeeded(llvm::Module* module,
   llvm::IRBuilder<> ir_builder(vector_tanh_body);
 
   llvm::FastMathFlags fast_math_flags;
-  fast_math_flags.setUnsafeAlgebra();
+  fast_math_flags.setFast();
   ir_builder.setFastMathFlags(fast_math_flags);
 
   llvm::Value* input = &*vector_tanh_function->arg_begin();

diff --git a/tensorflow/compiler/xla/service/executable.h b/tensorflow/compiler/xla/service/executable.h
@@ -88,6 +88,16 @@ class Executable {
           tensorflow::gtl::ArraySlice<perftools::gputools::DeviceMemoryBase>>
           arguments);
 
+  // Populates `hlo_execution_profile` from `executor`. This is implicit in any
+  // Execute* API call that takes a hlo_execution_profile argument, but must be
+  // called explicitly for other (async, for example) variants after the stream
+  // has completed.
+  virtual Status PopulateExecutionProfile(
+      HloExecutionProfile* hlo_execution_profile,
+      perftools::gputools::StreamExecutor* executor) {
+    return Status::OK();
+  }
+
   // Convenience wrapper for calling Executable::ExecuteOnStream. Sets up a
   // timer for the execution, sets up HLO profiling if enabled, and fills in the
   // given ExecutionProfile if non-null.  The ExecuteOnStream overloads have

diff --git a/tensorflow/compiler/xla/service/hlo_instruction.cc b/tensorflow/compiler/xla/service/hlo_instruction.cc
@@ -1901,12 +1901,13 @@ std::vector<string> HloInstruction::ExtraAttributesToString() const {
   if (has_sharding()) {
     extra.push_back(StrCat("sharding=", sharding().ToString()));
   }
-  if (!control_successors_.empty()) {
-    extra.push_back(StrCat(
-        "control-successors=",
-        Join(control_successors_, ", ", [](string* out, HloInstruction* succ) {
-          StrAppend(out, succ->name());
-        })));
+  if (!control_predecessors_.empty()) {
+    extra.push_back(StrCat("control-predecessors={",
+                           Join(control_predecessors_, ", ",
+                                [](string* out, HloInstruction* pre) {
+                                  StrAppend(out, pre->name());
+                                }),
+                           "}"));
   }
   return extra;
 }

diff --git a/tensorflow/compiler/xla/service/hlo_runner.cc b/tensorflow/compiler/xla/service/hlo_runner.cc
@@ -41,11 +41,21 @@ namespace se = ::perftools::gputools;
 namespace xla {
 
 /*static*/ StatusOr<std::unique_ptr<HloModule>>
-HloRunner::ReadModuleFromHloProtoFile(const char* filename,
+HloRunner::ReadModuleFromHloProtoFile(const std::string& filename,
                                       const DebugOptions& debug_options) {
   HloProto proto;
-  TF_RETURN_IF_ERROR(tensorflow::ReadBinaryProto(tensorflow::Env::Default(),
-                                                 filename, &proto));
+
+  const Status s =
+      tensorflow::ReadBinaryProto(tensorflow::Env::Default(), filename, &proto);
+
+  if (!s.ok()) {
+    const Status s2 =
+        tensorflow::ReadTextProto(tensorflow::Env::Default(), filename, &proto);
+    if (!s2.ok()) {
+      return Status(s2.code(), s.error_message() + "\n" + s2.error_message());
+    }
+  }
+
   TF_ASSIGN_OR_RETURN(
       HloModuleConfig config,
       HloModule::CreateModuleConfigFromProto(proto.hlo_module()));
@@ -56,7 +66,7 @@ HloRunner::ReadModuleFromHloProtoFile(const char* filename,
 }
 
 /*static*/ StatusOr<std::unique_ptr<HloModule>>
-HloRunner::ReadModuleFromHloTextDumpFile(const char* filename,
+HloRunner::ReadModuleFromHloTextDumpFile(const std::string& filename,
                                          const DebugOptions& debug_options) {
   string hlo_string;
   TF_RETURN_IF_ERROR(tensorflow::ReadFileToString(tensorflow::Env::Default(),
@@ -66,6 +76,19 @@ HloRunner::ReadModuleFromHloTextDumpFile(const char* filename,
   return tools::Parse(hlo_string, config);
 }
 
+/*static*/ StatusOr<std::unique_ptr<HloModule>> HloRunner::ReadModule(
+    const std::string& filename, const DebugOptions& debug_options) {
+  auto module = HloRunner::ReadModuleFromHloProtoFile(filename, debug_options);
+  if (module.ok()) {
+    return module;
+  }
+  const std::string e = module.status().error_message();
+  module = HloRunner::ReadModuleFromHloTextDumpFile(filename, debug_options);
+  return module.ok() ? std::move(module)
+                     : Status(module.status().code(),
+                              e + "\n" + module.status().error_message());
+}
+
 // Define this in .cc file to avoid having to include eigen or forward declare
 // these types in the header.
 struct HloRunner::EigenThreadPoolWrapper {

diff --git a/tensorflow/compiler/xla/service/hlo_runner.h b/tensorflow/compiler/xla/service/hlo_runner.h
@@ -44,15 +44,23 @@ class HloRunner {
 
   ~HloRunner();
 
-  // Reads the binary proto file in xla.HloProto format, creates and returns the
-  // HloModule.
+  // Reads the proto file in xla.HloProto format, creates and returns the
+  // HloModule. Will try to parse the filename as binary proto, then try as
+  // text proto if that fails.
   static StatusOr<std::unique_ptr<HloModule>> ReadModuleFromHloProtoFile(
-      const char* filename, const DebugOptions& debug_options);
+      const std::string& filename, const DebugOptions& debug_options);
 
   // Reads the hlo text dump file in HloModule::ToString format, creates and
   // returns the HloModule.
   static StatusOr<std::unique_ptr<HloModule>> ReadModuleFromHloTextDumpFile(
-      const char* filename, const DebugOptions& debug_options);
+      const std::string& filename, const DebugOptions& debug_options);
+
+  // Tries to parse the filename specified first as binary proto format, then
+  // as a textual proto format, then textual IR, then gives up if both fail.
+  // ReadModuleFromHloProtoFile or ReadModuleFromHloTextDumpFile should be used
+  // explicitly when you know the format, this if you don't.
+  static StatusOr<std::unique_ptr<HloModule>> ReadModule(
+      const std::string& filename, const DebugOptions& debug_options);
 
   // Executes the given module with given literals as input and returns the
   // result as a Literal. The LiteralPtr type accepts Literal* or

diff --git a/tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc b/tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc
@@ -555,8 +555,9 @@ int64 ByteSizeOf(const Shape& shape, const llvm::DataLayout& data_layout) {
 llvm::FastMathFlags GetFastMathFlags(bool fast_math_enabled) {
   llvm::FastMathFlags flags;
   if (fast_math_enabled) {
-    // UnsafeAlgebra implies NoInfs, NoNaNs, NoSignedZeros, and AllowReciprocal.
-    flags.setUnsafeAlgebra();
+    // Fast implies AllowReassoc, NoInfs, NoNaNs, NoSignedZeros,
+    // AllowReciprocal, AllowContract, and ApproxFunc.
+    flags.setFast();
   }
   return flags;
 }