in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed · Issue #6285 · tensorflow/tensorflow · GitHub
Skip to content

in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285

Description

@aseuteurideu

Environment: Ubuntu 14.04, CUDA 8, CuDNN 5.1, Tensorflow r0.12
Build command
bazel build -c opt --config cuda -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package

previously, following the installation of tensorflow (not in debug mode), my code works well. But, after I rebuild with above command, it shows this error in the middle of running the session.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcupti.so.8.0 locally
python: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:262: static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<const float, const float>, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer> > > >; bool Vectorizable = true]: Assertion `cudaGetLastError() == cudaSuccess' failed.

the backtrace result from gdb when the error comes (seems from ReLU operator):

(gdb) bt
#0 0x00007ff450df6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ff450dfa028 in __GI_abort () at abort.c:89
#2 0x00007ff450defbf6 in __assert_fail_base (
fmt=0x7ff450f403b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess",
file=file@entry=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=line@entry=262,
function=function@entry=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::PRETTY_FUNCTION> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:92
#3 0x00007ff450defca2 in __GI___assert_fail (
assertion=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess",
file=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=262,
function=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::PRETTY_FUNCTION> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:101
#4 0x00007ff42ad39672 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run (expr=..., device=...)
at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:260
#5 0x00007ff42ad37a6b in Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::GpuDevice>::operator=<Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> > (
this=0x7ff35b7fcfe0, other=...)
at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h:35
#6 0x00007ff42ad36cb0 in tensorflow::functor::Relu<Eigen::GpuDevice, float>::operator() (
this=0x7ff35b7fd04f, d=..., features=..., activations=...)
at ./tensorflow/core/kernels/relu_op_functor.h:35
#7 0x00007ff42acf48c1 in tensorflow::ReluOp<Eigen::GpuDevice, float>::Operate (this=0x59fde00,
context=0x7ff35b7fd8f0, input=..., output=0x7ff23ce68390)
at ./tensorflow/core/kernels/relu_op.h:40
#8 0x00007ff42acee479 in tensorflow::UnaryElementWiseOp<float, tensorflow::ReluOp<Eigen::GpuDevice, float> >::Compute (this=0x59fde00, context=0x7ff35b7fd8f0)
at ./tensorflow/core/framework/numeric_op.h:62
#9 0x00007ff42c18523c in tensorflow::BaseGPUDevice::Compute (this=0x5964560, op_kernel=0x59fde00,
context=0x7ff35b7fd8f0) at tensorflow/core/common_runtime/gpu/gpu_device.cc:385
#10 0x00007ff42c402c39 in tensorflow::(anonymous namespace)::ExecutorState::Process (
this=0x3d553c80, tagged_node=..., scheduled_usec=1481634999542636)
at tensorflow/core/common_runtime/executor.cc:1364
#11 0x00007ff42c404881 in tensorflow::(anonymous namespace)::ExecutorState::__lambda4::operator() (
__closure=0x3d557a70) at tensorflow/core/common_runtime/executor.cc:1746
#12 0x00007ff42c40ada4 in std::_Function_handler<void(), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(const TaggedNodeSeq&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::__lambda4>::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/4.8/functional:2071
#13 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x3d557ab0)
at /usr/include/c++/4.8/functional:2471
#14 0x00007ff42c6f0b80 in tensorflow::thread::EigenEnvironment::ExecuteTask (this=0x59921a8, t=...)
at tensorflow/core/lib/core/threadpool.cc:82
#15 0x00007ff42c6f29fb in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::WorkerLoop (this=0x59921a0, thread_id=7)
at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:167
#16 0x00007ff42c6f1508 in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}::operator()() const
() at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:58
#17 0x00007ff42c6f3927 in std::_Function_handler<void (), Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8/functional:2071
#18 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x59c3bf0)

I haven't tried anything as I don't have any idea about this error. I tried with other architecture, it was working. The difference between this and other architecture is this architecture (the one with error) has depthwise layer and the other (the one without error) doesn't have depthwise layer. Seems not connected with the error from relu layer, so I don't have any idea. Both architectures were working in the non-debugging mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions